Fat Skeleton Project
by: burt rosenberg
at: university of miami
date: nov 2022
US Patent US4905110 A
Goals
File systems are an example of data persistence. Computer operating
systems make the file system one of their core provisions to the user.
Modern operating systems allow for a variety of file systems to be in use
at the same time.
A very popular file system is FAT. It was Microsoft's original file system,
even before windows, and continues to this day in many devices. All file systems
must keep track of disk space associated with a file. Disk space is allocated
in units of blocks or clusters, so that filesystem must keep for each file
an ordered sequence of clusters. The file system also needs to be able
to identify a file by name, or by the file name in a hierarchy of directory
names.
The FAT file system did these tasks using two important data structures,
- The FAT Table, standing for File Allocation Table.
- The DIRENT, the directory entry structure.
We are implementing a very simplified FAT that is just the skeletal
data structure of FAT, hence the name. A skeleton is not fat.
The FAT Table
The FAT table kept track of the clusters associated with a file as an
array representation of the sequence of clusters as a linked list. If
a file needed three clusters, say 17, 9 and 11 in that order, then location
17 of the fat will contain a 9, location 9 will contain an 11, and location 11 will
contain a special value marking the end of the sequence.
The location 17 is the head of this cluster chain, and would
be entered into the DIRENT associated with the file.
If an entry in the FAT table is 0, that means the cluster was unused.
See CLR 2nd ed.
section 10.3 Implementing pointers and objects for more about an
array implementation of a linked list.
The Directory and DIRENTS
DIRENTS are structures that contain the name, starting cluster, and exact
byte length of the file. The collection of clusters has space to
contain at least the requested byte length, but may contain more. The unused bytes in the last
cluster are called slack space. In FAT a directory is a
sequence of DIRENTS that themselves form a file. We shall implement
an even simpler filesystem where the DIRENT sequence is a fixed array of
struct's, and we will not have directories inside directories.
We will write code for DirEnt's in a DirEnt table, and cluster numbers in a
FAT table, but we will not program anything to actually read and write data
to the data clusters that the FAT table is linking together.
The MAN Page
NAME
fat-skeleton
SYNOPSIS
fat-skeleton [-v]
DESCRIPTION
An interactive program to simulate the operation of the File Allocation
Table of a very much simplified FAT file system.
OPTIONS
-v verbose output. Can be used multiple times.
HISTORY
Created for the 231 edition of CSC421 (2022-2023 academic year).
BUGS
Project Details
To received credit for the project, some parts of the your code must be carried out
in detailed concert with these requirements.
The directory structure will be an array called root_dir of structures
called struct DirEnt. The structure has three fields,
- name: This field will either have the empty string or the file name.
- The field is fixed at length FAT_NAME_LEN hence the longest allowed
file name is one less than this (to account for the null byte).
- Longer names can be silently truncated.
- If name is the empty string, the values of all other bytes
in the DirEnt are irrelevant.
- When allocating a DirEnt for a new file, take the lowest indexed free
DirEnt, else your test runs might not past the Basic Test.
- len: The length in bytes of the file.
- The length value is provided
on file creation and does not change.
- The number of clusters is the maximum of 1 and the smallest number of clusters
necessary to store the file.
- starting_cluster: The index in the FAT of the first cluster
in the chain of clusters supplying storage for the file.
An initialized FAT table
+---+
0 | 0 |
+---+
1 | 0 |
+---+
2 | 0 |
+---+
3 | 0 |
+---+
4 | 0 |
+---+
5 | 0 |
+---+
6 | 0 |
+---+
7 | 0 |
+---+
After creating 3 files requiring 2 blocks
(length between 129 and 256 bytes)
+---+
0 | 1 |
+---+
1 |-1 |
+---+
2 | 3 |
+---+
3 |-1 |
+---+
4 | 5 |
+---+
5 |-1 |
+---+
6 | 0 |
+---+
7 | 0 |
+---+
After deleted the file starting at cluser 2
+---+
0 | 1 |
+---+
1 |-1 |
+---+
2 | 0 |
+---+
3 | 0 |
+---+
4 | 5 |
+---+
5 |-1 |
+---+
6 | 0 |
+---+
7 | 0 |
+---+
After creating a file requiring 3 clusters
(length between 257 and 384)
+---+
0 | 1 |
+---+
1 |-1 |
+---+
2 | 3 |
+---+
3 | 6 |
+---+
4 | 5 |
+---+
5 |-1 |
+---+
6 |-1 |
+---+
7 | 0 |
+---+
The FAT Table is an array called fat_table. The i-th entry in the FAT
refers to the i-th block (or cluster) on the disk. Values in the FAT create sequences
of clusters, one sequence for each file.
The technique is the standard method of representing a linked list in an array.
To create a link pointing from cluster i to cluster j,
fat_table[i] = j
Two special values are 0, meaning the cluster is free, and -1, meaning the chain ends here.
In standard FAT, 0 cannot viably mean cluster 0, because the lowest numbered clusters on a
disk are reserved for the disk label. Hence no confusion results. We do no reserve 0,
but make the rule that cluster 0, if on a chain, can only be the first cluster in the chain.
To maintain this convention and so that you match the Basic Test exactly,
when creating cluster chains, you must take the clusters of lowest index first.
FAT does not do this, as it would wear the disk unevenly (why?). But that's not a problem we have
for our simulation.
The code uses a function call-out table to jump to the code implementing
these operations.
The functions have the signature,
int (*action_function)(int actc,char * actv[])
and are kept in the call-out table paired with the string that triggers the function
and the number of parameters the function should have. This parameters actc
and actv mimic the parameters to a C main fucntion to pass
the parameters to the action functions as an array of strings.
The action functions
- ls_action() function to implement that ls operation to list the root directory.
- There is no other directory besides the root directory.
- The listing must follow precisely the format provided to pass the basic tests.
- The listing gives the directory index, which somewhat models the -i unix option,
to show the inode number.
- The listing given the number of bytes in the file, as was presented when creating the file.
- DirEnt's without filenames are not listed.
- cr_action() function to implement the cr operation
to create a file.
- The byte length as an string is given in actv[1] and the file name is given
as a string in actv[2].
- Use the first available DirEnt rule to find the DirEnt to occupy.
- If there is no more room in the root_direction, return ERR_DIRFULL.
- Use strncpy copy the name into the DirEnt, to make sure you do not write beyond the number of
allowed bytes.
- Allow strncpy to truncate if needed without returning an error.
- Create a cluster chain of at least 1 block, but of otherwise minimal number
of blocks sufficient for the length, and put the starting block number of
the chain in the DirEnt
- If there is not enough clusters to create the file, return ERR_FATFULL.
- It is an error to create a file that already exists. Return ERR_CREATE.
- rm_action() to implement that rm operation to remove a file, given the file name.
- The file name is given as the argument in argv[1].
- If the file is not found in the root_directory array, return a ERR_NOEXIST value.
- To delete make the the empty string, and the cluster
chain is walked to make all the clusters free (marked with a 0 in the FAT).
- pc_action() to implement the pc operation to print the cluster chain. This is not a typical file operation
in a file system.
- The file name is given as the argument in argv[1].
- The format must be precisely as specified to pass the Basic Test.
- qu_action() to implement the qu operation quit the program.
- It simply returns ERR_ABORT for the main procedure
to interpret as an exit/return.
Algorithms
The template suggest several subtasks to accomplish.
-
Given a name, search the directory for the file with that name, and returned
the integer index where found; or -1 if not found.
-
Given a integer number of bytes, calculate how many clusters is needed, and
write into the FAT to
create a cluster chain of that many bytes. Return the index of the head of the
chain.
-
Given the head of a cluster chain, walk the chain making all clusters in
the chain free.
How a Skeleton is not FAT
A skeleton is not fat because it is just the bones. Here are other ways our
file system is not FAT,
- Filenames in FAT are of a particular 8+3 format, from a restricted character set.
- A Dirent is free in FAT if the first character in the filename is an ASCII blank (0x20).
- The FAT Dirent also has some date stamps and a bit vector for file attributes.
- Although the original FAT had a fixed root directory with exactly 99 dirent slots,
and no subdirectories, all subsequent FATs had variable size directories with subdirectories.
- Our directory array is in a separate memory area than where the data would be. FAT
keeps the directories as data in the data area.
- FAT probably does not use the lowest index first placement of clusters.
- FAT actually stores data in the clusters!