Hashing
There are times when you need to retrieve values from a table instantaniously. The tables studied so far do not permit this.
A hash table maps a specific key directly to the value stored. Thus, by passing the key we can retrieve the value in one step.
A hash table is similar to an array. With an array, you have an index number mapped to a value. With a hash table, you use the search key to calculate the index position in which to place it. You calculate the index position using what's called a hash function. The index position returned from a hash function is called a hash key.
Definitions:
search key: the unique value identifying the record
hash key: the index position in which to store the record
The requirements for a hash function are:
1. Fast and easy to compute
2. Places items evenly throughout the table
3. The key must not be larger than the table size
Some common hash functions for numeric search key
Selecting digits
Choose specific digits from the number to form the hash key
If the value is a SSN, you could choose the 4th and 9th digits
h(001364825) = 35
The record would be stored in T[35]
Folding
Simply add the digits together to generate the hash key
h(001364825) = 0 + 0 + 1 + 3 + 6 + 4 + 8 + 2 + 5 = 29
The record would be stored in T[29]
With this hash function, the maximum size of the hash table would be 81
h(999999999) = 9 + 9 + 9 + 9 + 9 + 9 + 9 + 9 + 9 = 81
Another option is to group the digits in the search key
h(001364825) = 001 + 364 + 825 = 1190
With this hash function, the maximum size of the hash table would be 2997
h(999999999) = 999 + 999 + 999 = 2997
Modulo Arithmetic
The simplest and most effective hash function. Take the remainder after dividing by the table size:
h(x) = x mod TableSize
This works most effectively when the table size is a prime number. We'll look at this in more detail later.
It is acceptable to use more than one hash function at a time.
1. Hash the search key to return a hash key.
2. Hash the hash key from step one using a different hash function
Converting strings to numbers
There are times when the search key is a string instead of a number. It will be necessary to convert the string to a number before applying the hash function. A few common methods:
Convert the letters to ASCII values
h("NOTE") = 78 + 79 + 84 + 69 = 311
Convert the letters to their position in the alphabet
h("NOTE") = 14 + 15 + 20 + 5 = 54
The problem with the two methods above is that other search keys may convert to the same value:
h("TONE") = 20 + 15 + 14 + 5 = 54
One solution would be to convert the numbers to 5-bit binary numbers, concatenate, then convert to base 10
h("NOTE") = 14 + 15 + 20 + 5
= "01110" + "01111" + "10100" + "00101"
= 01110011111010000101
= 474757
Why a 5-bit binary number? I chose the position in the alphabet for the number. The largest number would be 26. Converted to binary this would be 11010, a 5-bit binary number.
The steps involved would be as follows:
1. Convert letter to numeric equivilant
2. Convert numbers to 5-bit binary
3. Concatenate the binary numbers
4. Convert the new binary number to base 10
Let's develop a simpler process for this computation.
The first step is recognizing that the numeric representations of the letters, through the process of converting to 5-bit binary numbers, is essentially the same as treating it as a base 32 number ( 2^5 = 32):
14-15-20-5 base 32
To convert a base 32 to base 10:
14 * 32^3 + 15 * 32^2 + 20 * 32^1 + 5 * 32^0
Notice the pattern of the exponent- it starts at 0 on the far right side, and increases by one for each position. We could easily develop an alogorithm to handle this:
long hashKey(string searchKey) {
int len = searchKey.length();
long lRetVal = 0;
for ( int i=0; i