CSC220 Project Lab
Due: 10:59AM, Tuesday, November 2
The goal of this lab is to write an application that reads
a text file, records for each word that appears in the file
the line numbers in which they appear, and then given a set
of words provides the list of all line numbers in which all
of the words in the set appear.
Here is a sample program, LabHash.jar.
We use a Class StringAndNumbers for storing a string
and a list of numbers in increasing order. The template for
this class is given in
StringAndNumbersInt.java. The methods required are as
follows:
-
String getString().
This is a method for obtaining the string part
of the object.
-
List < Integer > getNumbers().
This is a method for obtaining the number list part
of the object. Note that it returns an object from
class that implements List < Integer >. The list
must provide the numbers in increasing order.
---------------------------------------
Hint:
To implement this method efficiently you might want to
use a class that implements List < Integer > for
storing the numbers; i.e., either
Linked List < Integer > or
ArrayList < Integer >.
Then by simply returning the list, you will be able to
implment this method.
---------------------------------------
-
int size().
This is a method for obtaining the number of entries
in the number list.
-
void add(int a).
This is a method for adding a number at the end of the list.
-
void insert(int a).
This is a method for inserting a number that is not necessarily
largen that the largest number currently on the list.
Hint:
If you take the above advice and use one of the two classes
that implement List, then you can use the add method of List,
which takes two inputs. The first input is the position of
insertion and the other is the object to be inserted.
The question is how to figure out the location of insertion.
We use a class HashtableChain.java
that implements the hash map interface
KWHashMap.java.
in keeping information about the word that has been encountered before.
As discussed in the textbook and in class, KWHaspMap prescribes the way
to record associations between an object in a generic class K and
an object in a generic class V. The implementation code of
HashtableChain uses a hash table with chaining for this purpose, but
two methods are missing.
-
V remove(Object key).
This is for removing association of
an object key, which is generally supposed to be a K object.
-
public void rehash().
This is for doubling the size of table
and storing the associations from the current table to the new
table.
You must complete the coding of this class.
Using the HashtableChain class and the StringAndNumbers class we build
a class for storing collection of StringAndNumbers objects as follows.
An interface StorageInt.java presents
a template for such a class. You must write a class Storage.java that
implements this interface. The interface has the following public methods:
-
List < String > getWList().
This is for obtaining a list of all the strings that are stored
as the String parts of the StringAndNumbers objects that are
recorded.
-
int size().
This is for obtaining the number of StringAndNumbers objects
recorded.
-
int add(String w, int number).
This method is for adding a string and a number.
-
List< Integer > getPositions(String w).
This is for obtaining the list of numbers in increasing order
that correspond to the word w.
-
List< Integer > getPositions(String[] words).
This is for obtaining the list of numbers in increasing order
that correspond to the words.
-
List< Integer >
listIntersection(List< Iterable< Integer >> list).
This is supposed to be a main part of the above getPositions method.
This takes a list of Iterble < Integer > objects,
each of which naturally represents a list of numbers in increasing
order, returns the intersection of the lists represented by the
Iterable objects.
---------------------------------------
Hint:
Note that the List interface has an iterator method.
What you must do is to scan the elements of the lists
concurrently by remembering for each list the number that
has been returned by the next method of the accompanying iterator.
You look at the numbers that have been returned from these
iterators and decided what to do depending on whether all the
numbers are equals and if not how they are different from each
other.
---------------------------------------
To store StringAndNumbers objects, we for example can do the following:
-
We associate each StringAndNumbers object stored with a unique index
starting from 0. That is, the first STringAndNumbers object stored is
given the index of 0, the next 1, the next 2, and so and so forth.
-
Using a KWHashMap< String, Integer > we record the above associations.
-
Also, the the StringAndNumbers objects are maintained on a List with
respect to their associated indices.
-
Then the problem of retrieving a number list corresponding to a given
word w boils down to the problem of obtaining the index corresponding
to w, obtaining the StringAndNumbers object at that index, and then
retrieving the number list of the StringAndNumbers object.
---------------------------------------
Hint:
The aforementioned architecture allows you to substitute each word
in the input text with a unique ID, which may become useful for possible
future extensions. Note that the methods required in the interface can
be implemented in a different manner. For example:
-
Use only a list of StringAndNumbers objects, where the existence of a word
is checked by scanning the StringAndNumbers object list.
This implementation offers indexing but searching is expensive.
-
Use only a hash table, where the word is the key and the StringAndNumbers
object is the value.
This implementation offers fast searching but does not offer indexing
for future extension.
-
Use only a hash table, where the word is the key and an object that combines
a StringAndNumbers object and an index.
This implementation offers fast searching and indexing.
The last two alternatives are fine but the list only implementation is
not permissible since a focus of this lab is on hashing.
---------------------------------------
One can write a code for storing words and
their locations appearing in a given file and then obtaining the
line numbers in which all the words in a given set of words appear.
The code is given in StorageTest.java.
Two sample text files are provided:
Both files are obtained through
The Project Gutenberg.