# CS210 Lab: Hash Table

## Lab Exercise:

Click the little computer above for a detailed description.
For this excercise you will be asked to implement a password checking system to authenticate a users password.

## 1. Definition of a Hash Table

Before we get into the definition of Hash Tables, it is good to introduce WHY to use Hash tables.

Hash tables are good for doing a quick search on things.

For instance if we have an array full of data (say 100 items). If we knew the position that a specific item is stored in an array, then we could quickly access it. For instance, we just happen to know that the item we want it is at position 3, we can apply myitem=myarray[3];

With this, we don't have to search through each element in the array, we just access position 3.

The question is, how do we know that position 3 stores the data that we are interested in?

This is where hashing comes in handy. Given some key, we can apply a hash function to it to find an index or position that we want to access.

### 1.1 What is the hash function?

There are many different hash functions. Some hash functions will take an integer key and turn it into an index. A common one is the division method.

Let's learn through an example:

### 1.2 Division method (one hash method for integers)

Let's say you had the following numbers or keys that you wanted to map into an array of 10 elements:

123456
123467
123450

To apply the division method, you could divide the number by 10 (or the maximum number of elements in the array) and use the remainder (the modulo) as an index. The following would result:

123456 % 10 = 6 (the remainder is 6 when dividing by 10)
123467 % 10 = 7 (the remainder is 7)
123450 % 10 = 0 (the remainder is 0)

These numbers would be inserted into the array at positions 6, 7, and 0 respectively. It might look something like this:

The important thing with the division method is that the keys are integers.

### 1.3 What happens when the keys aren't integers?

You have to apply another hash function to turn them into integers. Effectively, you get two hash functions in one:

1. function to get an integer
2. function to apply a hash method from above to get an index to an array

What do we mean that the keys aren't integers? Well, let's say that the keys are people's names. Such as:

Sarah Jones
Tony Balognie
Tom Katz

The goal is to type in one of these names and get an index to an array in order to access that information. How do we do this?
The hash function has to do two things:

1. Convert the names into integers
For instance, we have a function which turns a string into an integer. The results will be as follows:
Sarah Jones → 1038
Tony Balognie → 1259
Tom Katz → 746
2. Apply a hash method to get an index
We can now apply the division method to get an index for an array of 10 elements
Sarah Jones → 1038 % 10 → 8
Tony Balognie → 1259 % 10 → 9
Tom Katz → 746 % 10 → 6

### 1.4 What would that look like in the array?

The array is known as a hash table. It stores the key (used to find the index) along with associated values. In the above example, we might have a hash table that looked something like this:

Again, the idea is that we will insert items into the hash table using the key and applying the hash function(s) to get the index.

A problem occurs when two keys yield the same index. For Instance, say we wanted to include:
John Smith --> 948 % 10 --> 8

We have a collision because Sarah Jones is already stored at array index 8.

We need a method to resolve this. The resolution comes in how you create your hash table. There two major approaches given in the book:

1. Linear Probing
2. Chaining
The approach used in this lab is referred to as chaining.

The details are left as class material, but recognize that in chaining the hash table is basically an array of linked lists. These linked lists are called buckets. All the data in the same bucket, have colliding index values.

Consider a diagram of the above example. Remember, there was a collision with Sarah Jones and John Smith. Notice that John Smith is "chained" or "linked" after Sarah Jones; John Smith and Sarah Jones are in the same bucket.

### 1.5 Applications of a Hash Table

Hash tables are good in situations where you have enormous amounts of data from which you would like to quickly search and retrieve information. A few typical hash table implementations would be in the following situations:

• For driver's license record's. With a hash table, you could quickly get information about the driver (ie. name, address, age) given the licence number.

• For compiler symbol tables. The compiler uses a symbol table to keep track of the user-defined symbols in a C++ program. This allows the compiler to quickly look up attributes associated with symbols (for example, variable names)

• For telephone book databases. You could make use of a hash table implementatation to quickly look up John Smith's telephone number.

• For electronic library catalogs. Hash Table implementations allow for a fast find among the millions of materials stored in the library.

• For implementing passwords for systems with multiple users. Hash Tables allow for a fast retrieval of the password which corresponds to a given username.

## 2 STL Hash Table Based Containers

As of C++11, the STL provides four hash table based containers:

1. unordered_set: The elements themselves are used as hash keys. Elements must be unique. Elements cannot be changed while in the hash table.
2. unordered_multiset: Like unordered_set but duplicate elements (keys) are allowed.
3. unordered_map: The elements are made up of a pair of types: a key_type and a mapped_type. Keys are stored separately from mapped values. Key values must be unique. The mapped value may be changed while in the hash table, while the key may not.
4. unordered_multimap: Like unordered_map but duplicate keys are allowed.

Of these, the easiest to work wih is the unordered_map. Elements in the map are stored as a pair: the key and its corresponding value. Storage and retrieval of the elements requires that the key have defined methods for hashing and checking for equality. So long as the key is a simple built-in type like char, int, float or string, C++ and the STL provide this functionality. For more complex keys, you would need to define your own hashing function object and provide an overloaded operator==.

Before you can use unordered_maps you need to understand two new concepts: the STL pair type, and function objects.

### 2.1 STL Pair

This is a simple utility type that allows STL functions to return or accept two items as if they were one. You will see them used to indicate whether an operation was successful, or to pass and return a key and value as one.

Here is an sample program that declares a pair and accesses its two members: first and second.

```// Using a pair. Notice that the pair has two members named first and second.
// The pair is templated, and you may specialize each member to separate or to
// the same types.
//
// Written by: Alex Clarke Nov. 7 2014

#include <iostream>
#include <string>
#include <utility> //pair lives in this header.
//You do not normally include it...
//it is included for you by containers that require it.

using namespace std;

template <class A, class B>
void printMyPair(pair<A, B>& p)
{
cout << p.first << " " << p.second << endl;
}

int main()
{
//declare, set and print a pair
pair<string, int> a;
a.first = "Pair number: ";
a.second = 1;
printMyPair(a);

//alternate way to set a pair
a = make_pair("New pair number: ", 2);
printMyPair(a);

//initialize the pair at construction time
pair<double, double> b(2.2, 3.3);
printMyPair(b);

return 0;
}
```

### 2.2 Function Objects

The easiest way to define your own hash function for an unordered_map, is to make a class whose instances can be used as functions. To do this, you simply add an operator() to a class that accepts whatever argument you need. For our hash table exercise, we will need to hash a string. Although STL provides a superior hashing function for strings, it is good to know how to write your own. Here is a simple example of such a function.

```// String hashing with a function object.
//
// Although STL provides a hashing function object for strings, it is nice to know
// how to define and use your own hashing function object should the need arise.
//
// This demo compares STL's string hashing function to our own simple one.
//
// Written by: Alex Clarke Nov. 7 2014

#include <iostream>
#include <string>
#include <functional> //STL hash lives in this header.
//You do not normally include it...
//it is included for you by containers that require it.

using namespace std;

// Our special string hashing class
class stringHash
{
public:
//this operator allows us to use stringHash objects as functions to hash strings.
size_t operator()(const string& hashit) const
{
size_t val = 0;

for (size_t i = 0; i < hashit.length(); i++)
val += hashit[i];
return val;
}
};

int main()
{
// This example shows how to instance and use a function object on its own.
stringHash myHash;
hash<string> stlHash;
cout << "Hashes for the sentence \"What number will this produce?\"\n\n";
cout << "stringHash:\t" << myHash("What number will this produce?") << endl;
cout << "hash<string>:\t" << stlHash("What number will this produce?") << "\n\n\n";

// This example shows how to use a function object to specialize a template.
// This is similar to what you need to do with unordered_map, but the hash function
// will be called for you by various member functions.
pair<string, stringHash> mine;
pair<string, hash<string> > STLs;
cout << "Hashes for the word \"mary\"\n\n";
mine.first = STLs.first = "mary";
cout << "stringHash:\t" << mine.second(mine.first) << endl;
cout << "hash<string>:\t" << STLs.second(STLs.first) << "\n\n\n";

return 0;
}```

### 2.3 Using STL unordered_map

In order to use STL unordered map, you must

`#include <unordered_map>`

The number of methods for creating and working with an unordered map are overwhelming at first. Here is an abbreviated list of some typical functions associated with the STL unordered map. Note that size_type type is an unsigned integral value.

 create a map unordered_map name; To create an unordered map you must provide at least a key type, and the mapped value type, like so: `unordered_map mymap;` unordered_map name; You may also want to provide a custom hashing function, especially if your key type is not supported. For example, to use the stringHash class discussed above, you would write: `unordered_map mymap;` unordered_map name(size_type n); By default, an unordered map will resize itself (known as rehashing) when its size is equal to the number of buckets. Rehashing is expensive. If you know in advance how many items your map will likely hold, you can specify an appropriate minimum number n of buckets. The implementation may choose to use any number larger than n. `unordered_map mymap( 64 ); // mymap will have at least 64 buckets` empty bool empty() const noexcept; Returns a true value if the number of elements is zero, false otherwise. ```if (!mymap.empty()) { ... } ``` create / update an element mapped_type& operator[](const key_type &key); Used to insert new mapped values, or to retrieve and modify existing ones. You must be careful when using it, since if the key was not in the table, it will be added with a default value. An insert function exists, but this is much easier to use. Note: Should not be used if you only want to get the mapped value of a key. ```mymap["quarter"] = 0.25; mymap["dime"] = 0.1; mymap["nickel"] = 0.5; //whoops - wrong value mymap["nickle"] = 0.05; //double whoops - wrong spelling mymap["nickel"] = 0.05; //partly fixed``` erase size_type erase (const key_type &key); Searches the hash table for the data item with the key key. If the data item is found, it removes the data item and returns 1. Otherwise, it returns 0. ```if (mymap.erase("nickle") == 1) { //"nickel" was found and removed } else { //"nickel" was not found }``` retrieve mapped_type& at (const key_type &key); Searches the hash table for the data item with the key key. If the data item is found, it returns a reference to the mapped value. Otherwise, it throws an out_of_range exception.. ```try { float value = mymap.at("nickel"); cout << "A nickle is worth " << 100*value << " cents." << endl; } catch (out_of_range) { cerr << "Could not find the last key you searched for" << endl; }``` iterator find (const key_type &key); Searches the hash table for the data item with the key key. If the data item is found, it returns a iterator to the stored pair. Otherwise, it returns an iterator to end(). ```unordered_set::iterator findit; findit = find("nickel"); if (findit != mymap.end()) { cout << "A " << findit->first << " is worth " << 100*findit->second << " cents." << endl; } else { cerr << "Could not find the last key you searched for" << endl; }``` begin iterator begin() noexcept; Returns an iterator that references the first element in the unordered map. This could be any element, since the unordered map container does not guarantee which element comes first. end iterator end() noexcept; Returns an iterator that references a special element just past the last valid element in the table. size size_type size() const noexcept; Returns the number of elements (pairs) currently stored in the map. The size_type type is an unsigned integral value. ```// Print the number of elements in the list. cout << mymap.size() << endl; ``` bucket count size_type bucket_count() const noexcept; Returns the number of buckets in the map. If this number changes, you know the map has been rehashed. The size_type type is an unsigned integral value. ```// Print the number of buckets in the list. cout << mymap.size() << endl; ``` bucket number size_type bucket(const key_type& key) const; Returns the bucket number corresponding to the given key. ``` cout << mymap.bucket(myset.begin()->first) << endl; ``` print contents / show structure This is not a member function, but rather a demonstration of using begin() and end() to print all members of the hash table. The order of elements between begin() and end() is not guaranteed. For this reason, I have asked for the bucket number of each element so we can better see the structure of the hash table. ```for (unordered_set::iterator i = myset.begin(); i != myset.end(); i++) { cout << "Key: " << i->first << " Value: " << i->second << " Bucket number: " << myset.bucket(i->first) << endl; }```

## 3. Application: Looking up Passwords

The following section outlines an algorithm for authenticating a user's password. Later, in the lab exercise, you will be given the skeleton code and asked to add lines to make it work.

There are two major steps to this program:
1. The program will load username/userinfo sets from the file password.dat and insert them into the hash table until the end of file is reached on password.dat. The password.dat file will look something like this with one username and its associated information per line:
```                         first   last
jill		tumbling!down	Jill	Hill
mary		contrary		Mary	Stuart
bopeep	sheep!lost		Bo		Peep
... etc.
```

2. The program will then present a login prompt, read one username, present a password prompt, and after looking up the username's password in the hash table, will print either "Authentication succeeded." and some user information, or "Authentication failed.". The output might look something like this:
```Login: mary
Authentication succeeded. Welcome Mary Stuart.

Authentication failed.

Authentication failed.
```
Step 2 will be repeated until the end of the input data (EOF) is reached on the console input stream (cin). The EOF character on the PC's is the CTRL Z character.

### Let's fill in some of the details:

Here's a closer look at the algorithm used in the string hashing function we saw earlier. A simple way to convert from a string to an integer is to add the ASCII value of each character together. For instance, mary's conversion from string to integer might look something like this:
```109('m') + 97('a') + 114('r') + 121('y')=441
```
The code will look like this:
```    int hash(const string str) const
{
int val = 0;

for (int i=0; i<str.length(); i++)
val += str[i];
return val;
}
```
We've converted the string to an integer, but we still need to convert the integer to an index. For an array of 16 elements we can divide by 16 and use the remainder as an index. This is what most implementations of unordered map do. Combining these two hash functions, we will get some code that looks like this:
```   int index = hash ( searchKey ) % 16;
```
Therefore mary's index will be:
``` 441 % 16 = 9.
```

Notice that if there were a student named ramy, this hash function would generate exactly the same result as for mary because both names have the same letters. This is only one of the bad behaviours of this particular string hasher.

## 3. Lab Exercise

• Get the files:
```Click here

To get the zipped files for this exercise.
Note: the code for this exercise requires a C++11 capable compiler.
```
• Extract all of the files to the WORKAREA. Open the WORKAREA and double click on STL_Hash_Map.sln. This will open up the project. There are two files used in this program:
• login.cpp -- the main program. An unordered map is already set up for you. For keys it uses a string corresponding to usernames. For mapped values it uses the UserInfo struct.
• password.dat -- contains all the users and corresponding passwords and other user information.

1. Loads user information into the unordered map
3. Uses a custom hashing function like the one discussed in Section 2.2

Steps include:

### 3.0 Explore

• Try to run this program. You should find that it will prompt you for "Login:" and "Password:" (type in random words at these prompts). You will notice that it continuously cycles around asking you for this information.
To stop the program from running, at the "Login:" prompt, type CTRL and z (simultaneously) and then the Enter key .

• Hint: notice that the name of the unordered map is passwords and that you want to create a new element with the username as the key and tempInfo as the mapped value.

2. Add in code to print out the contents of the hash table including:
• the number of entries in the hash table
• the number of buckets in the hash table
• each username and its bucket number
• each user's information.
• Hints:
• most of what you need is at the end of the table in Section 2.3
• use setw(n) to set column widths for data where n is: 10, 4, 15, 10, 10 for each column
• use cout << left; to change the default column alignment from right to left.
• Build and Run this program. If all is working well, you should get some output that looks like this:
```Number of entries in table: 11
Number of buckets: 64

jack      26 broken.crown   Jack      Hill

starbuck  26 not@ghost      Kara      Thrace

jill      20 tumbling!down  Jill      Hill

mary      54 contrary       Mary      Stuart

gbaltar   40 angel          Gaius     Baltar

bopeep    40 sheep!lost     Bo        Peep

jhyneman  61 bigboom        Jamie     Hyneman

lroslin   48 live4today     Laura     Roslin

This shows the hash table that has resulted from inserting data from the password.dat file (mentioned in Section 3).

• "Authentication succeeded. Welcome firstname lastname."
• Or "Authentication failed".
• Hint: Compare the input password to the password within the tempPass struct (which has been retrieved).

• Build and Run your program. You should get results like the following:
• ```Login: mary
Authentication succeeded. Welcome Mary Stuart.

Authentication failed.

Authentication failed.
```

### 3.3 Change the Hasher

1. Add the string hashing class from Section 2.2 to your program

2. make the passwords unordered map use it instead of the default hasher.
• You will know it works by examining the bucket numbers from your printed hash table.

Now it's time to explore the behaviour of the hash table a little

• Microsoft's hash table implementation has a minimum size of 8, which is too small for our data so it gets rehashed to 64. Try starting with a bigger hashtable:
`unordered_map<  Answer Redacted  > passwords(16);        `
• What happens to the final size of the hash table? Why?
• Notice that mary is in the bucket we predicted in Section 3.

• Edit the password.dat file. This file has been added to the project (under "Resource Files" in the Solution Explorer).