LevelDB Project: Final Version

by: burt rosenberg
at: university of miami
date: nov 2021
 Manpage 
NAME
    LevelDB filesystem using FUSE
    
SYNOPSIS	
    leveldbd [-v] _keystore_dir_ _tablestore_dir_
    leveldb-merge  [-v] _closed_level0_ _level2_ _merged_level2_
    
DESCRIPTION
    Uses FUSE to attach a key-value filesystem implemented using the 
    ideas of the LevelDB datastore that runs Google's BigTable. 
    
    leveldbd is the LevelDB deamon that attaches to the keystore 
    directory and arranges the key value pairs into leveldb tables 
    in the tablestore directory.
    
    leveldb-merge, along with an archiver-reaper process, 
    complete the leveldb system by promoting and merging leveldb 
    tables. This version takes on the specific task of promoting the 
    closed level zero table and merging it with the level two table 
    given in the arguments producing a new level 2 table.
    
    
OPTIONS
    -v verbose output
        
HISTORY
    A separate merge program is new for csc421-221, november 2021

BUGS

US Patent US4905110 A

Goals

General Discussion

LevelDB is particular implementation of a key-value store. We are using FUSE to reuse the filesystem as our interface to the LevelDB store, although it could be an API accessed by a socket, or other connector software technology. The actual data is stored in files in a traditional filesystem, however the kinds of file, their access and use, is conducive to a more robust and distributable data store.

There are three types of files, called level 0, 1 and 2. With the exception of one level 0 file, the open level 0 file, all files are atomic and immutable. They either exist or not, and if the exist, they never change. The open level 0 file can change, but it accepts only appending new key-value pairs to the table. Any two versions of the open level 0 file agree on any key value pair the have in common, and one of the two contains on those pairs that are in common.

At no point is there an uncertainty about what key-value pairs are in the store or their value. New tables are created from old tables, without the old tables needing to be taken off line, since they are immutable. The new tables are redundant with the information in the old tables that they replace. Switch over to the new table is seemless as both the new and old agree entirely on the key-value pairs stored, it is just that the new table summarizes and organizes the old tables.

The files might also be eternal, creating a persistent version of the database at past moments in time. Or they might be reaped, deleting that history. A separate program and policy can repeat or archive old tables, according to the requirements of the particular application.

Project: Mini-levelDB

Your project maintains most of the important principles of LevelDB, but does not implement it fully. Your project will only keep one level 2 table, and up to two level 0 tables, one open the other closed. You will most likely experience level 1 tables in your coding, as your merge and reorganize the tables.

Subproject: Leveldb-merge

Given a level 0 and and level 2, merge the keys into a level table. In some manner you must,

  1. Sort the level 0 table, careful not to exchange the order of pairs with equal valued keys.
  2. When multiple pairs exist with the same key, discard all but the most recent pair. Do retain tombstones.
  3. Do a merge (see merge-sort) of the resulting table with the level 0 table. In cases of equal keys in both tables, discard the pair in the level 2 table (it is older). Discard tombstones.
Remember not to modify the input tables. They might still be in use by the leveldbd program. Your algorithm might combine the sort and glean step, or the glean and merge step.

Subproject: Level 2 table integration

Modify your leveldbd to search first in the open level 0 table (done) and if a key is not found, then binary search in level 2 table (new code to write).

Combine your programs in a simple implementation that,

  1. The leveldbd daemon is launched with empty level 2 and level 0 tables.
  2. The daemon program is stopped, and leveldb-merge runs on the tables.
  3. The leveldbd program is restarted on the result of the merge as the level 2 table and an empty level 0 table.

Implementation Notes:

As in the previous project, you can assume a standard name for the level 2 file,

// in leveldb.h
#define LEVEL0_NAME "l0.dat"
#define LEVEL2_NAME "l2.dat"

The makefile is a good place to deal with filenames. If the output of merge is not l2.dat, the rename can be part of the make-target.

 Extra Credit 

The essential idea that is missing from the required implementation is how the system can be made non-stop.

If one were to implement at least one closed level 0 table, then the merge of the closed level 0 and the current level 2, creating a new level 2, can be done with the leveldbd program is running.

To implement the closing of a level 0, on the write that triggers the action, the leveldbd program copies the in memory level 0 file to an empty level 0 file; then zeros it. It continues with the write and all further reads go first to the open level 0, then to the closed level 0, and then to the current level 2.

The leveldbd program then signals a running merge deamon that it has work to do. See man signal.

A return signal can then inform the leveldbd program when the merged level 2 table is available.

Summary

  1. Run leveldbd until it closes the open level 0 tabled, and it signals leveldb-merge to begin work
  2. Leveldbd continues, all three tables, the open and closed level 0's and the current level 2. It is only modifying the open level 0 table, so the merge program is not affected.
  3. Leveldb-merge does its work then signals leveldbd that the new level 2 table is available.
  4. Leveldbd receives the signal and swiches out the old tables for the new. This can be synchronized with the running leveldbd but checking during a read if there is a switch over pending.
  5. If there is a switch over pending, before the read, leveldbd closes the closed level 0 and current level 2, opens the new level 2, and updates its data structure for the new search path.
  6. Writes are unaffected by the switch, as writes only manipulate the open level 0.
A creative aspect of this is how to name the level files, unless one does more than a SIGINT to prod each process into action (see man signal).

Since in Unix, the remove operation on a file only removes its name from the directory, but does not rescind an open for any program holding a open handle to that file, it is possible to reuse the level 2 filename. The leveldbd on signal simply closes and reopens the level 2 file to get the new version.

Implementing level 1 files is beyond the scope of this extra credit.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 18 nov 2021
update: 5 dec 2021