The Java Course provides a general introduction to programming in Java. It is based on A.B. Downey's book, How to Think Like a Computer Scientist. Click here for details.


Hash Functions

And that's where hash functions come in. We need some way to look at a key and know, without searching, which list it will be in. We'll assume that the lists are in an array (or Vector) so we can refer to them by index.

The solution is to come up with some mapping---almost any mapping---between the key values and the indices of the lists. For every possible key there has to be a single index, but there might be many keys that map to the same index.

For example, imagine an array of 8 lists and a table made up of keys that are Integers and values that are Strings. It might be tempting to use the intValue of the Integers as indices, since they are the right type, but there are a whole lot of integers that do not fall between 0 and 7, which are the only legal indices.

The modulus operator provides a simple (in terms of code) and efficient (in terms of run time) way to map all the integers into the range (0, 7). The expression

    key.intValue() % 8

is guaranteed to produce a value in the range from -7 to 7 (including both). If you take its absolute value (using Math.abs) you will get a legal index.

For other types, we can play similar games. For example, to convert a Character to an integer, we can use the built-in method Character.getNumericValue and for Doubles there is intValue.

For Strings we could get the numeric value of each character and add them up, or instead we might use a shifted sum. To calculate a shifted sum, alternate between adding new values to the accumulator and shifting the accumulator to the left. By "shift to the left" I mean "multiply by a constant."

To see how this works, take the list of numbers { 1, 2, 3, 4, 5, 6 } and compute their shifted sum as follows. First, initialize the accumulator to 0. Then,

  1. Multiply the accumulator by 10.
  2. Add the next element of the list to the accumulator.
  3. Repeat until the list is finished.

As an exercise, write a method that calculates the shifted sum of the numeric values of the characters in a String using a multiplier of 32.

For each type, we can come up with a function that takes values of that type and generates a corresponding integer value. These functions are called hash functions, because they often involve making a hash of the components of the object. The integer value for each object is called its hash code.

There is one other way we might generate a hash code for Java objects. Every Java object provides a method called hashCode that returns an integer that corresponds to that object. For the built-in types, the hashCode method is implemented so that if two objects contain the same data, they will have the same hash code (as in deep equality). The documentation of these methods explains what the hash function is. You should check them out.

For user-defined types, it is up to the implementor to provide an appropriate hash function. The default hash function, provided in the Object class, often uses the location of the object to generate a hash code, so its notion of "sameness" is shallow equality. Most often when we are searching a hash table for a key, shallow equality is not what we want.

Regardless of how the hash code is generated, the last step is to use modulus and absolute value to map the hash code into the range of legal indices.



Last Update: 2011-01-24