CS210 Lab: Hash Table

Highlights of This Lab:

Definition of a Hash Table
STL Hash Table Based Containers
Application: Looking up Passwords

Lab Exercise:

Click the little computer above for a detailed description.
For this excercise you will be asked to implement a password checking system to authenticate a users password.

1. Definition of a Hash Table

Before we get into the definition of Hash Tables, it is good to introduce WHY to use Hash tables.

Hash tables are good for doing a quick search on things.

For instance if we have an array full of data (say 100 items). If we knew the position that a specific item is stored in an array, then we could quickly access it. For instance, we just happen to know that the item we want it is at position 3, we can apply myitem=myarray[3];

With this, we don't have to search through each element in the array, we just access position 3.

The question is, how do we know that position 3 stores the data that we are interested in?

This is where hashing comes in handy. Given some key, we can apply a hash function to it to find an index or position that we want to access.

1.1 What is the hash function?

There are many different hash functions. Some hash functions will take an integer key and turn it into an index. A common one is the division method.

Let's learn through an example:

1.2 Division method (one hash method for integers)

Let's say you had the following numbers or keys that you wanted to map into an array of 10 elements:

123456
123467
123450

To apply the division method, you could divide the number by 10 (or the maximum number of elements in the array) and use the remainder (the modulo) as an index. The following would result:

123456 % 10 = 6 (the remainder is 6 when dividing by 10)
123467 % 10 = 7 (the remainder is 7)
123450 % 10 = 0 (the remainder is 0)

These numbers would be inserted into the array at positions 6, 7, and 0 respectively. It might look something like this:

The important thing with the division method is that the keys are integers.

1.3 What happens when the keys aren't integers?

You have to apply another hash function to turn them into integers. Effectively, you get two hash functions in one:

function to get an integer
function to apply a hash method from above to get an index to an array

What do we mean that the keys aren't integers? Well, let's say that the keys are people's names. Such as:

Sarah Jones
Tony Balognie
Tom Katz

The goal is to type in one of these names and get an index to an array in order to access that information. How do we do this?
The hash function has to do two things:

Convert the names into integers
Apply a hash method to get an index

1.4 What would that look like in the array?

The array is known as a hash table. It stores the key (used to find the index) along with associated values. In the above example, we might have a hash table that looked something like this:

Again, the idea is that we will insert items into the hash table using the key and applying the hash function(s) to get the index.

A problem occurs when two keys yield the same index. For Instance, say we wanted to include:
John Smith --> 948 % 10 --> 8

We have a collision because Sarah Jones is already stored at array index 8.

We need a method to resolve this. The resolution comes in how you create your hash table. There two major approaches given in the book:

Linear Probing
Chaining

The approach used in this lab is referred to as chaining.

The details are left as class material, but recognize that in chaining the hash table is basically an array of linked lists. These linked lists are called buckets. All the data in the same bucket, have colliding index values.

Consider a diagram of the above example. Remember, there was a collision with Sarah Jones and John Smith. Notice that John Smith is "chained" or "linked" after Sarah Jones; John Smith and Sarah Jones are in the same bucket.

1.5 Applications of a Hash Table

Hash tables are good in situations where you have enormous amounts of data from which you would like to quickly search and retrieve information. A few typical hash table implementations would be in the following situations:

For driver's license record's. With a hash table, you could quickly get information about the driver (ie. name, address, age) given the licence number.
For compiler symbol tables. The compiler uses a symbol table to keep track of the user-defined symbols in a C++ program. This allows the compiler to quickly look up attributes associated with symbols (for example, variable names)
For internet search engines. For more information, click here
For telephone book databases. You could make use of a hash table implementatation to quickly look up John Smith's telephone number.
For electronic library catalogs. Hash Table implementations allow for a fast find among the millions of materials stored in the library.
For implementing passwords for systems with multiple users. Hash Tables allow for a fast retrieval of the password which corresponds to a given username.

2 STL Hash Table Based Containers

As of C++11, the STL provides four hash table based containers:

unordered_set: The elements themselves are used as hash keys. Elements must be unique. Elements cannot be changed while in the hash table.
unordered_multiset: Like unordered_set but duplicate elements (keys) are allowed.
unordered_map: The elements are made up of a pair of types: a key_type and a mapped_type. Keys are stored separately from mapped values. Key values must be unique. The mapped value may be changed while in the hash table, while the key may not.
unordered_multimap: Like unordered_map but duplicate keys are allowed.

Of these, the easiest to work wih is the unordered_map. Elements in the map are stored as a pair: the key and its corresponding value. Storage and retrieval of the elements requires that the key have defined methods for hashing and checking for equality. So long as the key is a simple built-in type like char, int, float or string, C++ and the STL provide this functionality. For more complex keys, you would need to define your own hashing function object and provide an overloaded operator==.

Before you can use unordered_maps you need to understand two new concepts: the STL pair type, and function objects.

2.1 STL Pair

This is a simple utility type that allows STL functions to return or accept two items as if they were one. You will see them used to indicate whether an operation was successful, or to pass and return a key and value as one.

Here is an sample program that declares a pair and accesses its two members: first and second.

// Using a pair. Notice that the pair has two members named first and second.
// The pair is templated, and you may specialize each member to separate or to
// the same types.
//
// Written by: Alex Clarke Nov. 7 2014
 
#include <iostream>
#include <string>
#include <utility> //pair lives in this header. 
                   //You do not normally include it... 
                   //it is included for you by containers that require it.
 
using namespace std;
 
template <class A, class B>
void printMyPair(pair<A, B>& p)
{
   cout << p.first << " " << p.second << endl;
}
 
int main()
{
   //declare, set and print a pair
   pair<string, int> a;
   a.first = "Pair number: ";
   a.second = 1;
   printMyPair(a);
 
   //alternate way to set a pair
   a = make_pair("New pair number: ", 2);
   printMyPair(a);
 
   //initialize the pair at construction time
   pair<double, double> b(2.2, 3.3);
   printMyPair(b);
 
   return 0;
}

2.2 Function Objects

The easiest way to define your own hash function for an unordered_map, is to make a class whose instances can be used as functions. To do this, you simply add an operator() to a class that accepts whatever argument you need. For our hash table exercise, we will need to hash a string. Although STL provides a superior hashing function for strings, it is good to know how to write your own. Here is a simple example of such a function.

// String hashing with a function object.
//
// Although STL provides a hashing function object for strings, it is nice to know
// how to define and use your own hashing function object should the need arise.
//
// This demo compares STL's string hashing function to our own simple one.
//
// Written by: Alex Clarke Nov. 7 2014
 
#include <iostream>
#include <string>
#include <functional> //STL hash lives in this header.
                      //You do not normally include it... 
                      //it is included for you by containers that require it.
 
using namespace std;
 
 
// Our special string hashing class
class stringHash
{
public:
   //this operator allows us to use stringHash objects as functions to hash strings.
   size_t operator()(const string& hashit) const
   {
      size_t val = 0;
 
      for (size_t i = 0; i < hashit.length(); i++)
         val += hashit[i];
      return val;
   }
};
 
 
int main()
{
   // This example shows how to instance and use a function object on its own.
   stringHash myHash;
   hash<string> stlHash;
   cout << "Hashes for the sentence \"What number will this produce?\"\n\n";
   cout << "stringHash:\t" << myHash("What number will this produce?") << endl;
   cout << "hash<string>:\t" << stlHash("What number will this produce?") << "\n\n\n";
 
   // This example shows how to use a function object to specialize a template.
   // This is similar to what you need to do with unordered_map, but the hash function
   // will be called for you by various member functions.
   pair<string, stringHash> mine;
   pair<string, hash<string> > STLs;
   cout << "Hashes for the word \"mary\"\n\n";
   mine.first = STLs.first = "mary";
   cout << "stringHash:\t" << mine.second(mine.first) << endl;
   cout << "hash<string>:\t" << STLs.second(STLs.first) << "\n\n\n";
 
   return 0;
}

2.3 Using STL unordered_map

In order to use STL unordered map, you must

#include <unordered_map>

The number of methods for creating and working with an unordered map are overwhelming at first. Here is an abbreviated list of some typical functions associated with the STL unordered map. Note that size_type type is an unsigned integral value.

create a map	`unordered_map<key_type, mapped_type> name;` To create an unordered map you must provide at least a key type, and the mapped value type, like so: unordered_map<string, float> mymap; `unordered_map<key_type, mapped_type, hasher> name;`You may also want to provide a custom hashing function, especially if your key type is not supported. For example, to use the `stringHash` class discussed above, you would write: unordered_map<string, float, stringHash> mymap; `unordered_map<key_type, mapped_type> name(size_type n);` By default, an unordered map will resize itself (known as rehashing) when its size is equal to the number of buckets. Rehashing is expensive. If you know in advance how many items your map will likely hold, you can specify an appropriate minimum number `n` of buckets. The implementation may choose to use any number larger than `n`. unordered_map<string, float> mymap( 64 ); // mymap will have at least 64 buckets
empty	`bool empty() const noexcept;` Returns a true value if the number of elements is zero, false otherwise. if (!mymap.empty()) { ... }
create / update an element	`mapped_type& operator[](const key_type &key);` Used to insert new mapped values, or to retrieve and modify existing ones. You must be careful when using it, since if the key was not in the table, it will be added with a default value. An insert function exists, but this is much easier to use. Note: Should not be used if you only want to get the mapped value of a key. mymap["quarter"] = 0.25; mymap["dime"] = 0.1; mymap["nickel"] = 0.5; //whoops - wrong value mymap["nickle"] = 0.05; //double whoops - wrong spelling mymap["nickel"] = 0.05; //partly fixed
erase	`size_type erase (const key_type &key);` Searches the hash table for the data item with the key `key`. If the data item is found, it removes the data item and returns `1`. Otherwise, it returns `0`. if (mymap.erase("nickle") == 1) { //"nickel" was found and removed } else { //"nickel" was not found }
retrieve	`mapped_type& at (const key_type &key);` Searches the hash table for the data item with the key `key`. If the data item is found, it returns a reference to the mapped value. Otherwise, it throws an out_of_range exception.. try { float value = mymap.at("nickel"); cout << "A nickle is worth " << 100value << " cents." << endl; } catch (out_of_range) { cerr << "Could not find the last key you searched for" << endl; } `iterator find (const key_type &key);` Searches the hash table for the data item with the key `key`. If the data item is found, it returns a iterator to the stored pair. Otherwise, it returns an iterator to `end()`. unordered_set<string, float>::iterator findit; findit = find("nickel"); if (findit != mymap.end()) { cout << "A " << findit->first << " is worth " << 100findit->second << " cents." << endl; } else { cerr << "Could not find the last key you searched for" << endl; }
begin	`iterator begin() noexcept;` Returns an iterator that references the first element in the unordered map. This could be any element, since the unordered map container does not guarantee which element comes first.
end	`iterator end() noexcept;` Returns an iterator that references a special element just past the last valid element in the table.
size	`size_type size() const noexcept;` Returns the number of elements (pairs) currently stored in the map. The `size_type` type is an unsigned integral value. // Print the number of elements in the list. cout << mymap.size() << endl;
bucket count	`size_type bucket_count() const noexcept;` Returns the number of buckets in the map. If this number changes, you know the map has been rehashed. The `size_type` type is an unsigned integral value. // Print the number of buckets in the list. cout << mymap.size() << endl;
bucket number	`size_type bucket(const key_type& key) const;` Returns the bucket number corresponding to the given key. cout << mymap.bucket(myset.begin()->first) << endl;
print contents / show structure	This is not a member function, but rather a demonstration of using `begin()` and `end()` to print all members of the hash table. The order of elements between `begin()` and `end()` is not guaranteed. For this reason, I have asked for the bucket number of each element so we can better see the structure of the hash table. for (unordered_set<string, float>::iterator i = myset.begin(); i != myset.end(); i++) { cout << "Key: " << i->first << " Value: " << i->second << " Bucket number: " << myset.bucket(i->first) << endl; }

3. Application: Looking up Passwords

The following section outlines an algorithm for authenticating a user's password. Later, in the lab exercise, you will be given the skeleton code and asked to add lines to make it work.

One possible use for a hash table is to store computer user login usernames and passwords.

There are two major steps to this program:

The program will load username/userinfo sets from the file password.dat and insert them into the hash table until the end of file is reached on password.dat. The password.dat file will look something like this with one username and its associated information per line:
```
                         first   last
username	password		name	namejack		broken.crown	Jack	Hill
jill		tumbling!down	Jill	Hill
mary		contrary		Mary	Stuart
bopeep	sheep!lost		Bo		Peep
... etc.
```
The program will then present a login prompt, read one username, present a password prompt, and after looking up the username's password in the hash table, will print either "Authentication succeeded." and some user information, or "Authentication failed.". The output might look something like this:
```
Login: mary
Password: contrary
Authentication succeeded. Welcome Mary Stuart.

Login: jim
Password: contrary
Authentication failed.

Login: bopeep
Password: sheeplost
Authentication failed.
```

Step 2 will be repeated until the end of the input data (EOF) is reached on the console input stream (cin). The EOF character on the PC's is the CTRL Z character.

Let's fill in some of the details:

Here's a closer look at the algorithm used in the string hashing function we saw earlier. A simple way to convert from a string to an integer is to add the ASCII value of each character together. For instance, mary's conversion from string to integer might look something like this:

109('m') + 97('a') + 114('r') + 121('y')=441

The code will look like this:

    int hash(const string str) const
    {
        int val = 0;

        for (int i=0; i<str.length(); i++) 
            val += str[i];
        return val;
    }

We've converted the string to an integer, but we still need to convert the integer to an index. For an array of 16 elements we can divide by 16 and use the remainder as an index. This is what most implementations of unordered map do. Combining these two hash functions, we will get some code that looks like this:

   int index = hash ( searchKey ) % 16;

Therefore mary's index will be:

 441 % 16 = 9.

Notice that if there were a student named ramy, this hash function would generate exactly the same result as for mary because both names have the same letters. This is only one of the bad behaviours of this particular string hasher.

3. Lab Exercise

Get the files:

Click here

To get the zipped files for this exercise. 
Note: the code for this exercise requires a C++11 capable compiler.

Extract all of the files to the WORKAREA. Open the WORKAREA and double click on STL_Hash_Map.sln. This will open up the project. There are two files used in this program:
- login.cpp -- the main program. An unordered map is already set up for you. For keys it uses a string corresponding to usernames. For mapped values it uses the UserInfo struct.
- password.dat -- contains all the users and corresponding passwords and other user information.
Your primary tasks for this exercise are to edit login.cpp to add in lines so that it does the following:
1. Loads user information into the unordered map
2. Authenticates users by asking for usernames and passwords, comparing retrieved user password to input password and printing appropriate feedback.
3. Uses a custom hashing function like the one discussed in Section 2.2

Steps include:

3.0 Explore

Try to run this program. You should find that it will prompt you for "Login:" and "Password:" (type in random words at these prompts). You will notice that it continuously cycles around asking you for this information.
To stop the program from running, at the "Login:" prompt, type CTRL and z (simultaneously) and then the Enter key .

3.1 Load User Information

Add in a line to insert passwords into the table.
- Hint: notice that the name of the unordered map is passwords and that you want to create a new element with the username as the key and tempInfo as the mapped value.
Add in code to print out the contents of the hash table including:
- the number of entries in the hash table
- the number of buckets in the hash table
- each username and its bucket number
- each user's information.
- Hints:
  - most of what you need is at the end of the table in Section 2.3
  - use setw(n) to set column widths for data where n is: 10, 4, 15, 10, 10 for each column
  - use cout << left; to change the default column alignment from right to left.

Build and Run this program. If all is working well, you should get some output that looks like this:

Number of entries in table: 11
Number of buckets: 64
apollo    42 pres,kobol     Lee       Adama

jack      26 broken.crown   Jack      Hill

starbuck  26 not@ghost      Kara      Thrace

jill      20 tumbling!down  Jill      Hill

mary      54 contrary       Mary      Stuart

gbaltar   40 angel          Gaius     Baltar

bopeep    40 sheep!lost     Bo        Peep

jhyneman  61 bigboom        Jamie     Hyneman

badama    25 find.earth     William   Adama

asavage   9  busted         Adam      Savage

lroslin   48 live4today     Laura     Roslin

Login:

This shows the hash table that has resulted from inserting data from the password.dat file (mentioned in Section 3).

3.2 Authenticate Logins

Add lines to compare the true password to the input password and print either:
- "Authentication succeeded. Welcome firstname lastname."
- Or "Authentication failed".
- Hint: Compare the input password to the password within the tempPass struct (which has been retrieved).
Build and Run your program. You should get results like the following:

Login: mary
Password: contrary
Authentication succeeded. Welcome Mary Stuart.

Login: jim
Password: contrary
Authentication failed.

Login: bopeep
Password: sheeplost
Authentication failed.

3.3 Change the Hasher

Add the string hashing class from Section 2.2 to your program
make the passwords unordered map use it instead of the default hasher.
- You will know it works by examining the bucket numbers from your printed hash table.

3.4 Explore and Answer

Now it's time to explore the behaviour of the hash table a little

Microsoft's hash table implementation has a minimum size of 8, which is too small for our data so it gets rehashed to 64. Try starting with a bigger hashtable:
```
unordered_map<  Answer Redacted  > passwords(16);        
```
- What happens to the final size of the hash table? Why?
- Notice that mary is in the bucket we predicted in Section 3.
Edit the password.dat file. This file has been added to the project (under "Resource Files" in the Solution Explorer).
- Add the username ramy. Give it a simple password and other user details.
- Run the program... ramy should be in the same bucket as mary. Why?
- Switch back to the default hasher and run again. Are ramy and mary still together? Why?
- Do the above two results stay true for all sizes of hash table?

4. Postlab Exercises

For postlab exercises, click here.

CS Dept Home Page CS Dept Class Files CS210 Class Files

Copyright: Department of Computer Science, University of Regina.