Before we get into the definition of Hash Tables, it is good to introduce WHY to use Hash tables.
Hash tables are good for doing a quick search on things.
For instance, say we have an array full 100 items. If we know the position of a specific item in an array, then we can quickly access it. For example, we might know that the item we want is at position 3. We can retrieve it quickly and simply like this: myitem=myarray[3];
We don't have to search through each element in the array. We don't have to do a binary search either. We just access position 3.
The question is, how do we know that position 3 stores the data that we are interested in?
This is where hashing comes in handy. Given some key, we can apply a hash function to it to find an index or position that we want to access.
There are many different hash functions. Each data type will require a slightly different technique. There's even hash functions to take an integer key and turn it into an index. This is a very important last step in hashing other data types... they become integers, then we hash the integer. A common one for integers is the division method.
Let's learn through an example:
Let's say you had the following numbers or keys that you wanted to map into an array of 10 elements:
123456To apply the division method, you could divide the number by 10 (or the maximum number of elements in the array) and use the remainder (the modulo) as an index. The following would result:
123456 % 10 = 6 (the remainder is 6 when dividing by 10)These numbers would be inserted into the array at positions 6, 7, and 0 respectively. It might look something like this:
The important thing with the division method is that the keys are integers.
You have to apply another hash function to turn them into integers. Effectively, you get two hash functions in one:
What do we mean that the keys aren't integers? Well, let's say that the keys are people's names. Such as:
Sarah Jones
The goal is to type in one of these names and get an index to an array in
order to access that information. How do we do this?
The hash function has to do two things:
The array is known as a hash table. It stores the key (used to find the index) along with associated values. In the above example, we might have a hash table that looked something like this:
Again, the idea is that we will insert items into the hash table using the key and applying the hash function(s) to get the index.
A problem occurs when two keys yield the same index. For Instance, say we
wanted to include:
John Smith --> 948 % 10 --> 8
We have a collision because Sarah Jones is already stored at array index 8.
We need a method to resolve this. The resolution comes in how you create your hash table. There two major approaches given in the book:
The details are left as class material, but recognize that in chaining the hash table is basically an array of linked lists. These linked lists are called buckets. All data in the same bucket have colliding index values.
Consider a diagram of the above example. Remember, there was a collision with Sarah Jones and John Smith. Notice that John Smith is "chained" or "linked" after Sarah Jones; John Smith and Sarah Jones are in the same bucket.
As of C++11, the STL provides four hash table based containers:
Of these, the easiest to work wih is the unordered_map. Elements in the map are stored as a pair: the key and its corresponding value. Storage and retrieval of the elements requires that the key have defined methods for hashing and checking for equality. So long as the key is a simple built-in type like char, int, float or string, C++ and the STL provide this functionality. For more complex keys, you would need to define your own hashing function object and provide an overloaded operator==.
Before you can use unordered_maps you need to understand two new concepts: the STL pair type, and function objects.
This is a simple utility type that allows STL functions to return or accept two items as if they were one. You will see them used to indicate whether an operation was successful, or to pass and return a key and value as one.
Here is an sample program that declares a pair and accesses its two members: first and second.
// Using a pair. Notice that the pair has two members named first and second. // The pair is templated, and you may specialize each member to separate or to // the same types. // // Written by: Alex Clarke Nov. 7 2014 #include <iostream> #include <string> #include <utility> //pair lives in this header. //You do not normally include it... //it is included for you by containers that require it. using namespace std; template <class A, class B> void printMyPair(pair<A, B>& p) { cout << p.first << " " << p.second << endl; } int main() { //declare, set and print a pair pair<string, int> a; a.first = "Pair number: "; a.second = 1; printMyPair(a); //alternate way to set a pair a = make_pair("New pair number: ", 2); printMyPair(a); //initialize the pair at construction time pair<double, double> b(2.2, 3.3); printMyPair(b); return 0; }
The easiest way to define your own hash function for an unordered_map, is to make a class whose instances can be used as functions. To do this, you simply add an operator() to a class that accepts whatever argument you need. For our hash table exercise, we will need to hash a string. Although STL provides a superior hashing function for strings, it is good to know how to write your own. Here is a simple example of such a function.
// String hashing with a function object. // // Although STL provides a hashing function object for strings, it is nice to know // how to define and use your own hashing function object should the need arise. // // This demo compares STL's string hashing function to our own simple one. // // Written by: Alex Clarke Nov. 7 2014 #include <iostream> #include <string> #include <functional> //STL hash lives in this header. //You do not normally include it... //it is included for you by containers that require it. using namespace std; // Our special string hashing class class stringHash { public: //this operator allows us to use stringHash objects as functions to hash strings. size_t operator()(const string& hashit) const { size_t val = 0; for (size_t i = 0; i < hashit.length(); i++) val += hashit[i]; return val; } }; int main() { // This example shows how to instance and use a function object on its own. stringHash myHash; hash<string> stlHash; cout << "Hashes for the sentence \"What number will this produce?\"\n\n"; cout << "stringHash:\t" << myHash("What number will this produce?") << endl; cout << "hash<string>:\t" << stlHash("What number will this produce?") << "\n\n\n"; // This example shows how to use a function object to specialize a template. // This is similar to what you need to do with unordered_map, but the hash function // will be called for you by various member functions. pair<string, stringHash> mine; pair<string, hash<string> > STLs; cout << "Hashes for the word \"mary\"\n\n"; mine.first = STLs.first = "mary"; cout << "stringHash:\t" << mine.second(mine.first) << endl; cout << "hash<string>:\t" << STLs.second(STLs.first) << "\n\n\n"; return 0; }
In order to use STL unordered map, you must
#include <unordered_map>
The number of methods for creating and working with an unordered map are overwhelming at first. Here is an abbreviated list of some typical functions associated with the STL unordered map. Note that size_type type is an unsigned integral value.
declare a map | unordered_map<key_type, mapped_type> name; unordered_map<string, float> mymap; unordered_map<key_type, mapped_type, hasher> name; unordered_map<string, float, stringHash> mymap; unordered_map<key_type, mapped_type> name(size_type n); unordered_map<string, float> mymap( 64 ); // mymap will have at least 64 buckets |
empty | bool empty() const noexcept; Returns a true value if the number of elements is zero, false otherwise. if (!mymap.empty()) { //... } |
insert / |
mapped_type& operator[](const key_type &key); Note: Should not be used if you only want to get the mapped value of a key. In your example, the key is a string, and the mapped value is a double. Let's store some coin values. mymap["quarter"] = 0.25; mymap["dime"] = 0.1; mymap["nickel"] = 0.5; //whoops - wrong value mymap["nickle"] = 0.05; //double whoops - wrong spelling mymap["nickel"] = 0.05; //partly fixed |
erase |
size_type erase (const key_type &key); Searches the hash table for the data item with the key key. If the data item is found, it removes the data item and returns 1. Otherwise, it returns 0. if (mymap.erase("nickle") == 1) { //"nickel" was found and removed } else { //"nickel" was not found } |
retrieve |
mapped_type& at (const key_type &key); Searches the hash table for the data item with the key key. If the data item is found, it returns a reference to the mapped value. Otherwise, it throws an out_of_range exception.. try { float value = mymap.at("nickel"); cout << "A nickle is worth " << 100*value << " cents." << endl; } catch (out_of_range) { cerr << "Could not find the last key you searched for" << endl; }iterator find (const key_type &key); Searches the hash table for the data item with the key key. If the data item is found, it returns a iterator to the stored pair. Otherwise, it returns an iterator to end(). auto findit = mymap.end(); findit = mymap.find("nickel"); if (findit != mymap.end()) { cout << "A " << findit->first << " is worth " << 100*findit->second << " cents." << endl; } else { cerr << "Could not find the last key you searched for" << endl; } |
begin |
iterator begin() noexcept; Returns an iterator that references the first element in the unordered map. This could be any element, since the unordered map container does not guarantee which element comes first. |
end |
iterator end() noexcept; |
size | size_type size() const noexcept; Returns the number of elements (pairs) currently stored in the map. The size_type type is an unsigned integral value. // Print the number of elements in the list. cout << mymap.size() << endl; |
bucket count | size_type bucket_count() const noexcept; Returns the number of buckets in the map. If this number changes, you know the map has been rehashed. The size_type type is an unsigned integral value. // Print the number of buckets in the list. cout << mymap.bucket_count() << endl; |
bucket number | size_type bucket(const key_type& key) const; Returns the bucket number corresponding to the given key. cout << mymap.bucket(mymap.begin()->first) << endl; |
print out / |
This is not a member function, but rather a demonstration of using a ranged-for loop to print all members of a hash table. The order of elements is not guaranteed. For this reason, the code also shows the bucket number of each element so we can begin to understand the structure of the hash table. for (auto i: mymap) { cout << "Key: " << i.first << " Value: " << i.second << " Bucket number: " << mymap.bucket(i.first) << endl; } You can also loop through the hash table from .begin() to .end() just like with STL List. |
One possible use for a hash table is to store computer user login usernames and passwords.
There are two major steps to this program:first last username password name namejack broken.crown Jack Hill jill tumbling!down Jill Hill mary contrary Mary Stuart bopeep sheep!lost Bo Peep ... etc.
Login: mary Password: contrary Authentication succeeded. Welcome Mary Stuart. Login: jim Password: contrary Authentication failed. Login: bopeep Password: sheeplost Authentication failed.
109('m') + 97('a') + 114('r') + 121('y')=441The code will look like this:
int hash(const string str) const { int val = 0; for (int i=0; i<str.length(); i++) val += str[i]; return val; }We've converted the string to an integer, but we still need to convert the integer to an index. For an array of 16 elements we can divide by 16 and use the remainder as an index. This is what most implementations of unordered map do. Combining these two hash functions, we will get some code that looks like this:
int index = hash ( searchKey ) % 16;Therefore mary's index will be:
441 % 16 = 9.
Notice that if there were a student named ramy, this hash function would generate exactly the same result as for mary because both names have the same letters. This is only one of the bad behaviours of this particular string hasher.
Click here To get the zipped files for this exercise. Note: the code for this exercise requires a C++11 capable compiler.
Your primary tasks for this exercise are to edit login.cpp to add in lines so that it does the following:
Steps include:
Number of entries in table: 11 Number of buckets: 64 apollo 42 pres,kobol Lee Adama jack 26 broken.crown Jack Hill starbuck 26 not@ghost Kara Thrace jill 20 tumbling!down Jill Hill mary 54 contrary Mary Stuart gbaltar 40 angel Gaius Baltar bopeep 40 sheep!lost Bo Peep jhyneman 61 bigboom Jamie Hyneman badama 25 find.earth William Adama asavage 9 busted Adam Savage lroslin 48 live4today Laura Roslin Login:This shows the hash table that has resulted from inserting data from the password.dat file (mentioned in Section 3).
Login: mary Password: contrary Authentication succeeded. Welcome Mary Stuart. Login: jim Password: contrary Authentication failed. Login: bopeep Password: sheeplost Authentication failed.
Now it's time to explore the behaviour of the hash table a little
unordered_map< Answer to 4.3 B > passwords(16);
|
|
|