pynenc_mongo.util.chunked_data

Utilities for compressing and splitting large strings into chunks.

Provides functions to compress data with zlib and split it into chunks that fit within MongoDB’s BSON document size limit. Chunks are simple byte sequences with an index for ordered reassembly.

Key components:

  • compress / decompress: zlib-based string compression

  • split_into_chunks / reassemble_chunks: size-based splitting and reassembly

  • exceeds_bson_threshold: check if data needs chunking

Module Contents

Functions

compress

Compress a string using zlib.

decompress

Decompress zlib-compressed bytes back to a string.

split_into_chunks

Split bytes into ordered chunks of at most chunk_size bytes.

reassemble_chunks

Reassemble ordered chunks into the original bytes.

exceeds_bson_threshold

Check if a string or dict of strings exceeds the size threshold for a single BSON document.

Data

API

pynenc_mongo.util.chunked_data.logger

‘getLogger(…)’

pynenc_mongo.util.chunked_data.compress(data: str) bytes[source]

Compress a string using zlib.

Uses compression level 6 for balanced speed/ratio tradeoff. Level 6 is significantly faster than default (9) with minimal size difference.

Parameters:

data – UTF-8 string to compress

Returns:

Compressed bytes

pynenc_mongo.util.chunked_data.decompress(data: bytes) str[source]

Decompress zlib-compressed bytes back to a string.

Parameters:

data – Compressed bytes

Returns:

Decompressed UTF-8 string

pynenc_mongo.util.chunked_data.split_into_chunks(data: bytes, chunk_size: int) list[bytes][source]

Split bytes into ordered chunks of at most chunk_size bytes.

Parameters:
  • data – The bytes to split

  • chunk_size – Maximum size per chunk in bytes

Returns:

List of byte chunks in order

pynenc_mongo.util.chunked_data.reassemble_chunks(chunks: list[bytes]) bytes[source]

Reassemble ordered chunks into the original bytes.

Parameters:

chunks – List of byte chunks in order

Returns:

Reassembled bytes

pynenc_mongo.util.chunked_data.exceeds_bson_threshold(data: dict[str, str] | str, threshold: int) bool[source]

Check if a string or dict of strings exceeds the size threshold for a single BSON document.

Parameters:
  • data – The string or dict of strings to check

  • threshold – Size threshold in bytes

Returns:

True if the encoded data exceeds the threshold