LevelDB Project

by: burt rosenberg
at: university of miami
date: nov 2021
NAME
    LevelDB filesystem using FUSE
    
SYNOPSIS	
    leveldbd [-v] _keystore_dir_ _tablestore_dir_
    
DESCRIPTION
    Uses FUSE to attach a key-value filesystem implemented using the 
    ideas of the LevelDB datastore that runs Google's BigTable. 
    
OPTIONS
    -v verbose output
        
HISTORY
    New for csc421-201, november 2019

BUGS

US Patent US4905110 A

Goals

LevelDB is a very durable method for storing data. It is resistant to corruption caused by concurrency issues. The LevelDB system can be used in a distributed architecture, with multiple threads, because the LevelDB files are immutable. Immutable files cannot give conflicting data, because the data never changes.

Updates replace old immutable file with new ones in a manner that is atomic, consistent, efficient, and concurrent. The datastore will never be unavailable, and its files can be copied or backed up at any point, as they never change and are never wrong.

Immutable Key-Value Stores

The traditional file system is a data store that sometimes does not scale well. It provides the service of data persistence. However, scalable datastores need to be distributed so that many processes can access the data at higher access rates; they need to be tolerant to failure, and to this end, the data is replicated. Distributed access, replicated data, and fault tolerence are easier accommodated with key-value stores.

Databases that change in time are difficult to make consistent. A database that never changes is of limited application. A compromise is a database that many ways is immutable, and otherwise grows by appending new facts. No fact is over wrong, but it could be out-dated. The collection of all facts at any moment is a subset of all facts at any future time.

A data store based on key-value pairs has advantages in implementing such a database. A key-value store is a collection pairs, a key with a value. The functions on the store are,

The key-value pairs are stored in files that are either completely immutable, once created they never change, or are append only. Any file is therefore complete with respect to its own consistency, and the append only files are consistent with respect to the common items they share.

These files are ordered, and a key is appended to the front, whether it be an insert, update or delete. A retrieval works backards through the key-value pairs found in the append only file, then through the time-ordered sequence sequence of the immutable files, looking for the first key match.

Deletes are handled by tombstone key-value pairs, that halt the search. A tombstone is a key-value value pair, with key matching the delete, and value which proclaims its own non-existence.

LevelDB

LevelDB is particular implementation of the sort of key-value store just described. It solves many practical issues with implementing the desired data stores by organizing into three levels of files,

LevelDB tables
         +-----------------------------------------------------------+ 
level 2  | k,v | k,v | k,v |           ...                     | k,v |
         +-----------------------------------------------------------+
         
           ⇑ search level 2 last
           
                +----------------------------+    
level 1         | k,v | k,v |   ...    | k,v |
                +----------------------------+
                    
           ⇑ search level 1 newest to oldest
           
                           +----------------------------+    
level 1                    | k,v | k,v |   ...    | k,v |
                           +----------------------------+ 

            ⇐ search level 0 newest first, open table first ⇐   

                  (closed)                     (open) 
         +------------------------+     +------------------
level 0  | k,v | k,v | .... | k,v |     | k,v | k,v | ....    ⇐   | k,v |
         +------------------------+     +------------------`   new pairs append to
                                                               open table



        +----------------+------------------------------------------------+
        | KEY (16 bytes) |           VALUE (48 bytes)                     |
        +----------------+------------------------------------------------+

The LevelDB filesystem

The key–value store is given the patina of a file system by running your FUSE enabled program, leveldbd, on the mount point store.

Under the directory store, these files do not exist. The store directory is a way of inserting a key-value pair into actually true files, in the kv-table directory.

The kv-table directory can contain a number of level 0 and level 1 tables, and a single level 2 table, each as separate files. The level 0 tables are append only until a certain size, when they become immutable. The other tables are always immutable.

The files all share a common format. They are a sequence of fixed size (64 bytes) records, with a key (first 16 bytes) then a value (48 bytes). The keys and values are generally null-terminated strings: the key and value are placed into their respective fields, with null ('\0') padding on the right. The null byte is not allowed in the key or value itself.

Filesystem Layout
proj5
    |
    +----  all the .c, .h files, etc
    |
    +----  store  (fuse mounts here, these are simulated files)
    |      |
    |      +----- key-a (contains "hello leveldb!")
    |      |
    |      +----- ... other key names
    |
    +----  kv-table (the backing store, where the data really is)
           |
           +----- l0.dat  
           |
           +----- ... other .dat files

Project: Level 0

This project implements just the first level 0 file. You have template code that does the FUSE part and also opens up the file kv-table/l0.dat, which will be the unique level 0 table for this part of the project.

Your leveldbd program,

Truncate keys longer than 15 characters to 15 characters, and values longer than 47 characters to 47 characters.

The Fuse demon has been set to debug, and the leveldb-sub.c file has printf's. Read the debug and printf output to see what other Fuse functions are called, to determine what implementations are needed to support the behavior.

Working with Memory Mapped I/O

To make file manipulation (cumbersome) look like array access (can anything be better?) you can make use of memory mapped I/O.

The leveldb file is in fact an array of fixed sized structs, with two fields: a key field and a value field. Memory map the file so it appears in memory as such.

Refer to the class/example/mmap directory (in github) for how this is done.

Frequently in class I have stressed that a C string is not just an array of bytes. It is an array of bytes that additionally,

  1. Must have a null character at the end,
  2. Cannot have a null character not at the end,
  3. Has to be copied carefully so not to over-run the destination,
  4. Needs an extra byte of storage for the terminating null byte.
Sure, C strings are convenient, but you must obey the above rules.

Note that the value strings defined for this project are C string only if the value is of length greater than zero. Otherwise, it is more like a two byte character array, char[2].

And remember, hexdump -C kv-table/l0.dat is your friend.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 10 nov 2019
update: 13 nov 2021