Expand description
Chunk-based file storage implementation. This is a building block for a DHT or something similar.
The API supports file/directory insertion and retrieval. There is
intentionally no remove support. File removal should be handled
externally, and then it is only required to run garbage_collect() to
clean things up.
The hash of a file is the BLAKE3 hash of hashed chunks in the correct order. The hash of a directory is the BLAKE3 hash of hashed chunks in the correct order and the ordered list of (file path, file sizes). All hashes (file, directory, chunk) are 32 bytes long, and are encoded in base58 whenever necessary.
The filesystem hierarchy stores a files directory storing metadata
about a full file and a directories directory storing metadata about all
files in a directory (all subdirectories included).
The filename of a file in files or directories is the hash of the
file/directory as defined above.
Inside a file in files is the ordered list of the chunks making up the
full file.
Inside a file in directories is the ordered list of the chunks making up
each full file, and the (relative) file path and hash of all files in the
directory.
To get the chunks you split the full file into MAX_CHUNK_SIZE sized
slices, the last chunk is the only one that can be smaller than that.
It might look like the following:
/files/B9fFKaEYphw2oH5PDbeL1TTAcSzL6ax84p8SjBKzuYzX
/files/8nA3ndjFFee3n5wMPLZampLpGaMJi3od4MSyaXPDoF91
/files/...
/directories/FXDduPcEohVzsSxtNVSFU64qtYxEVEHBMkF4k5cBvt3B
/directories/AHjU1LizfGqsGnF8VSa9kphSQ5pqS4YjmPqme5RZajsj
/directories/...Inside a file metadata (file in files) is the ordered list of chunk
hashes, for example:
2bQPxSR8Frz7S7JW3DRAzEtkrHfLXB1CN65V7az77pUp
CvjvN6MfWQYK54DgKNR7MPgFSZqsCgpWKF2p8ot66CCPInside a directory metadata (file in directories) is, in addition to
chunk hashes, the path and size of each file in the directory. For example:
8Kb55jeqJsq7WTBN93gvBzh2zmXAXVPh111VqD3Hi42V
GLiBqpLPTbpJhSMYfzi3s7WivrTViov7ShX7uso6fG5s
picture.jpg 312948Chunks of a directory can include multiple files, if multiple files fit
into MAX_CHUNK_SIZE. The chunks are computed as if all the files were
concatenated into a single big file, to minimize the number of chunks.
The full file is not copied, and individual chunks are not stored by geode. Additionally it does not keep track of the full files path.
ModulesΒ§
- chunked_
storage π - file_
sequence π - util π
StructsΒ§
- Chunk
- Chunked
Storage ChunkedStorageis a representation of a file or directory weβre trying to retrieve fromGeode.- File
Sequence FileSequenceis an object that implementsAsyncRead,AsyncSeek, andAsyncWritefor an ordered list of (file path, file size).- Geode
- Chunk-based file storage interface.
ConstantsΒ§
- DIRS_
PATH π - Path prefix where directory metadata is stored
- FILES_
PATH π - Path prefix where file metadata is stored
- MAX_
CHUNK_ SIZE - Defined maximum size of a stored chunk (256 KiB)
FunctionsΒ§
- hash_
to_ string - read_
until_ filled - smol::fs::File::read does not guarantee that the buffer will be filled, even if the buffer is smaller than the file. This is a workaround. This reads the stream until the buffer is full or until we reached the end of the stream.