Hash Table

Before we get into the definition of Hash Tables, it is good to introduce WHY to use Hash tables.

For instance, say we have an array full 100 items. If we know the position of a specific item in an array, then we can quickly access it. For example, we might know that the item we want is at position 3. We can retrieve it quickly and simply like this: myitem=myarray[3];

We don't have to search through each element in the array. We don't have to do a binary search either. We just access position 3.

The question is, how do we know that position 3 stores the data that we are interested in?

This is where hashing comes in handy. Given some key, we can apply a hash function to it to find an index or position that we want to access.

1.1 What is the hash function?

There are many different hash functions. Each data type will require a slightly different technique. There's even hash functions to take an integer key and turn it into an index. This is a very important last step in hashing other data types... they become integers, then we hash the integer. A common one for integers is the division method.

1.2 Division method (one hash method for integers)

Let's say you had the following numbers or keys that you wanted to map into an array of 10 elements:

To apply the division method, you could divide the number by 10 (or the maximum number of elements in the array) and use the remainder (the modulo) as an index. The following would result:

These numbers would be inserted into the array at positions 6, 7, and 0 respectively. It might look something like this:

1.3 What happens when the keys aren't integers?

You have to apply another hash function to turn them into integers. Effectively, you get two hash functions in one:

What do we mean that the keys aren't integers? Well, let's say that the keys are people's names. Such as:

The goal is to type in one of these names and get an index to an array in order to access that information. How do we do this?
The hash function has to do two things:

1.4 What would that look like in the array?

The array is known as a hash table. It stores the key (used to find the index) along with associated values. In the above example, we might have a hash table that looked something like this:

Again, the idea is that we will insert items into the hash table using the key and applying the hash function(s) to get the index.

A problem occurs when two keys yield the same index. For Instance, say we wanted to include:
John Smith --> 948 % 10 --> 8

We need a method to resolve this. The resolution comes in how you create your hash table. There two major approaches given in the book:

The details are left as class material, but recognize that in chaining the hash table is basically an array of linked lists. These linked lists are called buckets. All data in the same bucket have colliding index values.

Consider a diagram of the above example. Remember, there was a collision with Sarah Jones and John Smith. Notice that John Smith is "chained" or "linked" after Sarah Jones; John Smith and Sarah Jones are in the same bucket.

1.5 Applications of a Hash Table

2 STL Hash Table Based Containers

Of these, the easiest to work wih is the unordered_map. Elements in the map are stored as a pair: the key and its corresponding value. Storage and retrieval of the elements requires that the key have defined methods for hashing and checking for equality. So long as the key is a simple built-in type like char, int, float or string, C++ and the STL provide this functionality. For more complex keys, you would need to define your own hashing function object and provide an overloaded operator==.

Before you can use unordered_maps you need to understand two new concepts: the STL pair type, and function objects.

2.1 STL Pair

This is a simple utility type that allows STL functions to return or accept two items as if they were one. You will see them used to indicate whether an operation was successful, or to pass and return a key and value as one.

Here is an sample program that declares a pair and accesses its two members: first and second.

2.2 Function Objects

The easiest way to define your own hash function for an unordered_map, is to make a class whose instances can be used as functions. To do this, you simply add an operator() to a class that accepts whatever argument you need. For our hash table exercise, we will need to hash a string. Although STL provides a superior hashing function for strings, it is good to know how to write your own. Here is a simple example of such a function.

2.3 Using STL unordered_map

The number of methods for creating and working with an unordered map are overwhelming at first. Here is an abbreviated list of some typical functions associated with the STL unordered map. Note that size_type type is an unsigned integral value.

declare a map	`unordered_map<key_type, mapped_type> name;` To declare an unordered map you must provide at least a key type, and the mapped value type, like so: unordered_map<string, float> mymap; `unordered_map<key_type, mapped_type, hasher> name;`You may also want to provide a custom hashing function, especially if your key type is not supported. For example, to use the `stringHash` class discussed above, you would write: unordered_map<string, float, stringHash> mymap; `unordered_map<key_type, mapped_type> name(size_type n);` By default, an unordered map will rehash, or resize, itself when it contains as many elements as it has buckets. Rehashing is expensive. If you know in advance how many items your map will likely hold, you can specify an appropriate minimum number `n` of buckets. The implementation may choose to use any number larger than `n`. unordered_map<string, float> mymap( 64 ); // mymap will have at least 64 buckets
empty	`bool empty() const noexcept;` Returns a true value if the number of elements is zero, false otherwise. if (!mymap.empty()) { //... }
insert / update an element	`mapped_type& operator[](const key_type &key);` Used to insert new mapped values, or to retrieve and modify existing ones. You must be careful when using it, since if the key was not in the table, it will be added with a default value. An insert function exists, but this is much easier to use. Note: Should not be used if you only want to get the mapped value of a key. In your example, the key is a string, and the mapped value is a double. Let's store some coin values. mymap["quarter"] = 0.25; mymap["dime"] = 0.1; mymap["nickel"] = 0.5; //whoops - wrong value mymap["nickle"] = 0.05; //double whoops - wrong spelling mymap["nickel"] = 0.05; //partly fixed
erase	`size_type erase (const key_type &key);` Searches the hash table for the data item with the key `key`. If the data item is found, it removes the data item and returns `1`. Otherwise, it returns `0`. if (mymap.erase("nickle") == 1) { //"nickel" was found and removed } else { //"nickel" was not found }
retrieve	`mapped_type& at (const key_type &key);` Searches the hash table for the data item with the key `key`. If the data item is found, it returns a reference to the mapped value. Otherwise, it throws an out_of_range exception.. try { float value = mymap.at("nickel"); cout << "A nickle is worth " << 100value << " cents." << endl; } catch (out_of_range) { cerr << "Could not find the last key you searched for" << endl; } `iterator find (const key_type &key);` Searches the hash table for the data item with the key `key`. If the data item is found, it returns a iterator to the stored pair. Otherwise, it returns an iterator to `end()`. auto findit = mymap.end(); findit = mymap.find("nickel"); if (findit != mymap.end()) { cout << "A " << findit->first << " is worth " << 100findit->second << " cents." << endl; } else { cerr << "Could not find the last key you searched for" << endl; }
begin	`iterator begin() noexcept;` Returns an iterator that references the first element in the unordered map. This could be any element, since the unordered map container does not guarantee which element comes first.
end	`iterator end() noexcept;` Returns an iterator that references a special element just past the last valid element in the table.
size	`size_type size() const noexcept;` Returns the number of elements (pairs) currently stored in the map. The `size_type` type is an unsigned integral value. // Print the number of elements in the list. cout << mymap.size() << endl;
bucket count	`size_type bucket_count() const noexcept;` Returns the number of buckets in the map. If this number changes, you know the map has been rehashed. The `size_type` type is an unsigned integral value. // Print the number of buckets in the list. cout << mymap.bucket_count() << endl;
bucket number	`size_type bucket(const key_type& key) const;` Returns the bucket number corresponding to the given key. cout << mymap.bucket(mymap.begin()->first) << endl;
print out / show structure	This is not a member function, but rather a demonstration of using a ranged-for loop to print all members of a hash table. The order of elements is not guaranteed. For this reason, the code also shows the bucket number of each element so we can begin to understand the structure of the hash table. for (auto i: mymap) { cout << "Key: " << i.first << " Value: " << i.second << " Bucket number: " << mymap.bucket(i.first) << endl; } You can also loop through the hash table from `.begin()` to `.end()` just like with STL List.

3. Application: Looking up Passwords

One possible use for a hash table is to store computer user login usernames and passwords.

Let's fill in some of the details:

Notice that if there were a student named ramy, this hash function would generate exactly the same result as for mary because both names have the same letters. This is only one of the bad behaviours of this particular string hasher.

CS210 Lab: Hash Table

Highlights of This Lab:

Lab Exercise:

1. Definition of a Hash Table

1.1 What is the hash function?

1.2 Division method (one hash method for integers)

1.3 What happens when the keys aren't integers?

1.4 What would that look like in the array?

1.5 Applications of a Hash Table

2 STL Hash Table Based Containers

2.1 STL Pair

2.2 Function Objects

2.3 Using STL unordered_map

3. Application: Looking up Passwords

Let's fill in some of the details:

4. Lab Exercise

4.0 Explore

4.1 Load User Information

4.2 Authenticate Logins

4.3 Change the Hasher

4.4 Explore and Answer

5. Postlab Exercises