Skeleton Key Project

by: burt rosenberg
at: university of miami
date: dec 2022

US Patent US4905110 A

Goals

The goals are:

This project is about 40 to 80 lines of code.

A Key Value Store

We shall use are FAT file system to create a key-value store. Such stores are the basis for many modern persistent stores, and can go by the fashionable name of No-SQL, because they deal well with the limitations presented by the CAP theorem, which states,
☄ Consistency, 
☀ Availability, 
☂ Partition-Resistance 

- you get at most two out of the three

A key-value store is one of the best ways to help with this problem by storing data is a simple pair of strings, which cannot be modified although it can be supplanted by a new pair of the same key. We shall implement this as follows: the filename is the key, the file contents is the value. The value cannot be edited, although the pair can be completely removed, or a new value associated to the key. As a bonus, we complete our Fat Skeleton to a Skeleton Key that is a pretty good approximation to an actual conventional file system, but without many of the hassles.

Our rules are that both the key and the values are text data, the key has a length limitation (and maybe follows other conventions left to the application), but the value does not have a length limitation (up to the storage available).

The template code for this project is also found on github, csc421:Project 5 (see Project 4 as well).

The MAN Page

NAME
    skeleton-key
    
SYNOPSIS
    fat-skeleton  [-v]
    
DESCRIPTION
    A Key-Value store built on top of an vastly simplified FAT file system, and
    entirely in memory. It is an interactive program with the commands described:
    
COMMANDS

    qu
       to quit. this is an administrative command
       
    dd
       to print out the contents of the data area
       this is mainly a diagnostic command
       
    pc _keyname_
       to print out the cluster chain storing the value for key _keyname_
       this is mainly a diagnostic command
    
    ls 
       to list the keys names, data lengths, and directory index

    cr _len_ _keyname_
       to create a key with an empty value of _len_ bytes
         
    rm _keyname_ 
       to remove a key-value pair with key _keyname_

    wr _keyname_ _value_ 
       to create (or replace if exists) the key with name _keyname_ and value _value_

    rd _keyname_
       to print the key-value pair with key _keyname_, or be silent if it does not exit
    
OPTIONS
    -v verbose output. Can be used multiple times.
        
HISTORY
    Created for the 231 edition of CSC421 (2022-2023 academic year).
    
BUGS
    Values cannot have spaces. Create has a legacy definition that does not
    fit the key-value model.

Project Details

To received credit for the project, some parts of the your code must be carried out in detailed concert with these requirements.

Your project 4 is the basis for this project. There is just a little bit more to write. For what concerns this project, a cluster is the same thing as a block, just the term a filesystem person would use, while a driver person would say block.

  1. You should have used the defines CLUSTER_SIZE, FAT_N and DIR_N, as I have changed their values to make the file system much smaller, for the purpose of testing.
  2. A new static global array is defined, called data, which contains the data bytes. The conversion between location in data and the cluster number is done by multiplying the cluster number by the cluster size then adding any offset in the cluster. Be careful that the offset is between 0 and one less than CLUSTER_SIZE.
  3. The key-value pairs are all text. For key we know what this means: as it has to fit into the directory filename field. For the value it means that the last character has to be a null byte. An empty key, which can be created by the cr action, will have the empty string as the value.
  4. The functions chaining_write and chaining_read are the heart of this project. The four inputs are the buffer, its size, the first cluster number, and the file length. The create action will already be called, so a write can assume a sufficiently long block chain.
  5. For write, it should be that the buffer size and the file length are equal, and are one more than the strint length, to account for the null byte. For read, the buffer size should be at least as long as the file length, and file length number of characters should be copied to the buffer.
  6. As an implementation hint, which is also seen in the template, refactor your code to split cr_action into cr_action and cr_action_aux, and rm_action into rm_action and rm_action_aux. The aux versions differ from the non-aux in how they get their parameters. The non-aux versions get the command as a vector of strings, the aux versions get the parameters directly.
  7. A write on an existing key removes to old key first. The wr_action will call rm_action_aux.
  8. Now that it is assumed that the write action is for a new key, cr_action_aux is called to create the directory entry and the cluster chain with enough space for the value string (including null byte).
  9. Having created the directory entry and cluster chain the chaining_write function is called to copy into the data area.
  10. The rd_action calls chaining_read to copy from the data area into a buffer. This buffer can be malloc'ed and free'ed by rd_action. Do not forget to free, else you will have a memory leak, and might loose points as well as memory.
  11. The filesize of a key-value pair will be one greater than the number of characters in the key, as the filesize is the actual number of bytes, including the null character.

Suggestion for a Reduced Implementation

If you can't solve a problem, then there is an easier problem you can solve: find it.
☀ George Polya, Mathematical Discovery on Understanding, Learning, and Teaching Problem Solving, Volume I

In this Makefile, I have included samples, rather than tests. It appears people make passing the tests their goal too early, before their code solves even simpler problems. I will release the tests on Monday. For the moment, consider the samples, and even those I'd rather you started with even simpler samples of your own devising.

In fact, when faced with this program, if you find the algorithms to difficult, here is a Reduced Implementation that you code, commit, and perhaps make a stepping stone towards a more complete implementation:

  Code a reduced implementation in which every file is exactly one block.
The rules of this are,
  1. Assume the value give in a write command will fit into one block, including counting the null byte.
  2. If you receive a larger value, silently truncate so its strlen is one less than the BLOCK_SIZE.
  3. The create action from the command line is truncated to 1 byte requiring just one block. But do put a null in the 0-th byte of that one block, so that it is a valid key-value pair with the empty string as the value.
  4. Because of this restriction, there are no cluster chains. In the end, the FAT table has zeros' for free blocks and -1's for allocated blocks.
This will not receive full credit, so after this is done, consider a future commit of a more complete solution, however,
    Never underestimate the value of running code.
It seems obvious, but people often see code that works on only simple cases as less valuable than code that does not work on any cases — perhaps seeing the code fail on more complicated cases redeems it for it failing on simpler cases.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 29 nov 2022
update: 29 nov 2022