Module geode

Module geode 

Source
Expand description

Chunk-based file storage implementation. This is a building block for a DHT or something similar.

The API supports file/directory insertion and retrieval. There is intentionally no remove support. File removal should be handled externally, and then it is only required to run garbage_collect() to clean things up.

The hash of a file is the BLAKE3 hash of hashed chunks in the correct order. The hash of a directory is the BLAKE3 hash of hashed chunks in the correct order and the ordered list of (file path, file sizes). All hashes (file, directory, chunk) are 32 bytes long, and are encoded in base58 whenever necessary.

The filesystem hierarchy stores a files directory storing metadata about a full file and a directories directory storing metadata about all files in a directory (all subdirectories included). The filename of a file in files or directories is the hash of the file/directory as defined above. Inside a file in files is the ordered list of the chunks making up the full file. Inside a file in directories is the ordered list of the chunks making up each full file, and the (relative) file path and hash of all files in the directory.

To get the chunks you split the full file into MAX_CHUNK_SIZE sized slices, the last chunk is the only one that can be smaller than that.

It might look like the following:

/files/B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX
/files/8nA3ndjFFee3n5wMPLZampLpGaMJi3od4MSyaXPDoF91
/files/...
/directories/FXDduPcEohVzsSxtNVSFU64qtYxEVEHBMkF4k5cBvt3B
/directories/AHjU1LizfGqsGnF8VSa9kphSQ5pqS4YjmPqme5RZajsj
/directories/...

Inside a file metadata (file in files) is the ordered list of chunk hashes, for example:

2bQPxSR8Frz7S7JW3DRAzEtkrHfLXB1CN65V7az77pUp
CvjvN6MfWQYK54DgKNR7MPgFSZqsCgpWKF2p8ot66CCP

Inside a directory metadata (file in directories) is, in addition to chunk hashes, the path and size of each file in the directory. For example:

8Kb55jeqJsq7WTBN93gvBzh2zmXAXVPh111VqD3Hi42V
GLiBqpLPTbpJhSMYfzi3s7WivrTViov7ShX7uso6fG5s
picture.jpg 312948

Chunks of a directory can include multiple files, if multiple files fit into MAX_CHUNK_SIZE. The chunks are computed as if all the files were concatenated into a single big file, to minimize the number of chunks.

The full file is not copied, and individual chunks are not stored by geode. Additionally it does not keep track of the full files path.

ModulesΒ§

chunked_storage πŸ”’
file_sequence πŸ”’
util πŸ”’

StructsΒ§

Chunk
ChunkedStorage
ChunkedStorage is a representation of a file or directory we’re trying to retrieve from Geode.
FileSequence
FileSequence is an object that implements AsyncRead, AsyncSeek, and AsyncWrite for an ordered list of (file path, file size).
Geode
Chunk-based file storage interface.

ConstantsΒ§

DIRS_PATH πŸ”’
Path prefix where directory metadata is stored
FILES_PATH πŸ”’
Path prefix where file metadata is stored
MAX_CHUNK_SIZE
Defined maximum size of a stored chunk (256 KiB)

FunctionsΒ§

hash_to_string
read_until_filled
smol::fs::File::read does not guarantee that the buffer will be filled, even if the buffer is smaller than the file. This is a workaround. This reads the stream until the buffer is full or until we reached the end of the stream.