Skeleton Key Project
by: burt rosenberg
at: university of miami
date: dec 2022
US Patent US4905110 A
Goals
The goals are:
-
To perfect our knowledge of file systems and the sort of data structure techniques
used to build them.
-
To think some more about code correctness: efficient building, careful testing,
and structured debugging.
This project is about 40 to 80 lines of code.
A Key Value Store
We shall use are FAT file system to create a key-value store. Such stores
are the basis for many modern persistent stores, and can go by the fashionable name
of No-SQL, because they deal well with the limitations presented by the CAP theorem,
which states,
A key-value store is one of the best ways to help with this problem by storing data
is a simple pair of strings, which cannot be modified although it can be supplanted
by a new pair of the same key.
We shall implement this as follows: the filename is the key, the file
contents is the value. The value cannot be edited, although the pair can be
completely removed, or a new value associated to the key.
As a bonus, we complete our Fat Skeleton to a Skeleton Key that is a pretty good
approximation to an actual conventional file system, but without many of the hassles.
Our rules are that both the key and the values are text data, the key has a length
limitation (and maybe follows other conventions left to the application), but the
value does not have a length limitation (up to the storage available).
The template code for this project is also found on github, csc421:Project 5
(see Project 4 as well).
The MAN Page
NAME
skeleton-key
SYNOPSIS
fat-skeleton [-v]
DESCRIPTION
A Key-Value store built on top of an vastly simplified FAT file system, and
entirely in memory. It is an interactive program with the commands described:
COMMANDS
qu
to quit. this is an administrative command
dd
to print out the contents of the data area
this is mainly a diagnostic command
pc _keyname_
to print out the cluster chain storing the value for key _keyname_
this is mainly a diagnostic command
ls
to list the keys names, data lengths, and directory index
cr _len_ _keyname_
to create a key with an empty value of _len_ bytes
rm _keyname_
to remove a key-value pair with key _keyname_
wr _keyname_ _value_
to create (or replace if exists) the key with name _keyname_ and value _value_
rd _keyname_
to print the key-value pair with key _keyname_, or be silent if it does not exit
OPTIONS
-v verbose output. Can be used multiple times.
HISTORY
Created for the 231 edition of CSC421 (2022-2023 academic year).
BUGS
Values cannot have spaces. Create has a legacy definition that does not
fit the key-value model.
Project Details
To received credit for the project, some parts of the your code must be carried out
in detailed concert with these requirements.
Your project 4 is the basis for this project. There is just a little bit more to write.
For what concerns this project, a cluster is the same thing as a block, just
the term a filesystem person would use, while a driver person would say block.
-
You should have used the defines CLUSTER_SIZE, FAT_N and DIR_N, as
I have changed their values to make the file system much smaller, for the purpose
of testing.
-
A new static global array is defined, called data, which contains the data bytes.
The conversion between location in data and the cluster number is done by multiplying
the cluster number by the cluster size then adding any offset in the cluster.
Be careful that the offset is between 0 and one less than CLUSTER_SIZE.
-
The key-value pairs are all text. For key we know what this means: as it has to
fit into the directory filename field. For the value it means that the last character
has to be a null byte. An empty key, which can be created by the cr action, will
have the empty string as the value.
-
The functions chaining_write and chaining_read are the heart of
this project. The four inputs are the buffer, its size, the first cluster number,
and the file length. The create action will already be called, so a write
can assume a sufficiently long block chain.
-
For write, it should be that the buffer size and the file length are equal, and are
one more than the strint length, to account for the null byte. For read, the buffer
size should be at least as long as the file length, and file length number of
characters should be copied to the buffer.
-
As an implementation hint, which is also seen in the template, refactor your
code to split cr_action into cr_action and cr_action_aux, and
rm_action into rm_action and rm_action_aux. The aux versions
differ from the non-aux in how they get their parameters. The non-aux versions
get the command as a vector of strings, the aux versions get the parameters
directly.
-
A write on an existing key removes to old key first. The wr_action will call
rm_action_aux.
-
Now that it is assumed that the write action is for a new key, cr_action_aux
is called to create the directory entry and the cluster chain with enough space
for the value string (including null byte).
-
Having created the directory entry and cluster chain the
chaining_write function is called to copy into the data area.
-
The rd_action calls chaining_read to copy from the data area
into a buffer. This buffer can be malloc'ed and free'ed by rd_action.
Do not forget to free, else you will have a memory leak, and might loose
points as well as memory.
-
The filesize of a key-value pair will be one greater than the number of characters
in the key, as the filesize is the actual number of bytes, including the null character.
Suggestion for a Reduced Implementation
If you can't solve a problem, then there is an easier problem you can solve: find it.
☀ George Polya, Mathematical Discovery on Understanding, Learning,
and Teaching Problem Solving, Volume I
In this Makefile, I have included samples, rather than tests. It appears people
make passing the tests their goal too early, before their code solves even simpler
problems. I will release the tests on Monday.
For the moment, consider the samples, and even those I'd rather you started with
even simpler samples of your own devising.
In fact, when faced with this program, if you find the algorithms to difficult,
here is a Reduced Implementation that you code, commit, and perhaps
make a stepping stone towards a more complete implementation:
Code a reduced implementation in which every file is exactly one block.
The rules of this are,
-
Assume the value give in a write command will fit into one block, including
counting the null byte.
-
If you receive a larger value, silently truncate so its strlen is one
less than the BLOCK_SIZE.
-
The create action from the command line is truncated to 1 byte requiring
just one block. But do put a null in the 0-th byte of that one block, so
that it is a valid key-value pair with the empty string as the value.
-
Because of this restriction, there are no cluster chains. In the end, the FAT
table has zeros' for free blocks and -1's for allocated blocks.
This will not receive full credit, so after this is done, consider a future commit
of a more complete solution, however,
Never underestimate the value of running code.
It seems obvious, but people often see code that works on only simple cases
as less valuable than code that does not work on any cases — perhaps
seeing the code fail on more complicated cases redeems it for it failing on
simpler cases.