How to implement a hash table in Java (Part 2)
In the first post of this series we've implemented the initial version of our hash table. The current implementation allows for "putting" and "getting" mappings. But it is only possible to do so in a very limited set of conditions; the previous post focused on implementing happy path test cases.
In this second post of the series we'll start to deviate from that path. We will do it in a incremental manner, as we have been working so far.
In this post we will add test cases that:
-
uses keys having negative hash code values;
-
replaces the value of an existing mapping; and
-
have the
get
method handle non-existing keys.
Let's continue.
Before we continue
While not strictly required, I recommend reading the first part of this series if you haven't already. The "Writing style" section, in particular, explains why the examples are written the way they are.
Iteration 05: support negative hash code values
Up until this point, we have been working with keys having positive hash code values.
Let's introduce keys with negative hash code values.
Iteration 05: test case
Let's start with a test case:
public class HashTableTest {
@Test(description = """
put() & get() methods
- negative hash code
""")
public void iter05() {
var ht = new HashTable<Integer, String>();
assertEquals(ht.size(), 0);
assertEquals(ht.put(-1, "Minus One"), null);
assertEquals(ht.size(), 1);
assertEquals(ht.get(-1), "Minus One");
assertEquals(ht.put(-2, "Minus Two"), null);
assertEquals(ht.size(), 2);
assertEquals(ht.get(-2), "Minus Two");
}
}
It is a variation of the test written in the previous iteration. The difference is that the keys are guaranteed to produce negative hash code values.
Iteration 05: running our test
Let's run this new test with the implementation from the previous iteration:
java.lang.ArrayIndexOutOfBoundsException: Index -1 out of bounds for length 4
at iter03.HashTable.put0(HashTable.java:33)
at iter02.HashTable.put(HashTable.java:25)
at iter05.HashTableTest.iter05(HashTableTest.java:32)
It fails as it tries to insert the key at a negative array index.
This is expected. We mentioned it during iteration #03.
The problem is in our current bucket
method implementation:
// current implementation
protected int bucket(Object key) {
var hc = key.hashCode();
return hc % keys.length;
}
If a key's hash code is negative then the method result will also be negative.
Iteration 05: implementation
To prevent the method from returning a negative value, we can use the absolute value of the key's hash code instead:
public class HashTable<K, V> extends iter04.HashTable<K, V> {
@Override
protected int bucket(Object key) {
int hc = key.hashCode();
return Math.abs(hc) % keys.length;
}
}
It works. Our test passes.
Iteration 06: distinct hash code values having the same absolute value
As mentioned in this post's introduction, our objective is to incrementally deviate from the happy path.
In the previous iteration, the keys of both mappings we added to our map:
-
had the same sign, both were negative; and
-
had different absolute values.
Let's, in this iteration, do something a little different. We will, once more, add two mappings. But the key's hash code values:
-
will have distinct signs, i.e., one will be positive and the other negative; and
-
will have the same absolute value.
Iteration 06: test case
Here's our test case:
public class HashTableTest {
@Test(description = """
put() & get() methods
- negative hash code
- positive hash code
""")
public void iter06() {
var ht = new HashTable<Integer, String>();
assertEquals(ht.size(), 0);
assertEquals(ht.put(-3, "Minus Three"), null);
assertEquals(ht.size(), 1);
assertEquals(ht.get(-3), "Minus Three");
assertEquals(ht.put(3, "Three"), null);
assertEquals(ht.size(), 2);
assertEquals(ht.get(3), "Three");
}
}
It is very similar to the last few tests. Except the keys were selected so their hash code values meet our requirements.
Iteration 06: running our test
Let's run this test with the implementation from the previous iteration:
java.lang.UnsupportedOperationException: Implement me
at iter03.HashTable.put1(HashTable.java:43)
at iter03.HashTable.put0(HashTable.java:39)
at iter02.HashTable.put(HashTable.java:25)
at iter06.HashTableTest.iter06(HashTableTest.java:37)
It fails as it reaches a branch of our code we did not implement yet. We can see from the stack trace it is a code written during iteration #03.
Iteration 06: hash collision
The current test failure is expected. We mentioned it in the "same key or collision?" section of the previous post.
Remember that, in the previous iteration, we fixed the negative array index issue by using the absolute value of the key's hash code. But, in doing so, it caused a positive and a negative hash code having the same absolute value to yield the same "bucket".
Take, for example, two distinct keys A
and B
having the following hash code values:
-
A = +2
-
B = -2
As the following is our current bucket
implementation:
int hc = key.hashCode();
return Math.abs(hc) % keys.length;
Both keys will be put in the same "bucket".
When keys.length
is 4
, for example, both keys will be put at the bucket = 2
.
When two distinct keys produce the same "bucket" it configures a situation called hash collision.
Any hash table must handle hash collisions. At the same time, a "good" hash function will try to minimize hash collisions.
Iteration 06: implementation
We will handle hash collisions. As mentioned, we must.
But not right now. We can make our test pass by improving our hash function.
The following implementation makes our test pass:
public class HashTable<K, V> extends iter05.HashTable<K, V> {
@Override
protected int bucket(Object key) {
int hc = key.hashCode();
return Math.floorMod(hc, keys.length);
}
}
Our solution uses the Math.floorMod
static method.
From its Javadocs we can see that:
-
as the divisor,
keys.length
, is always positive the result offloorMod
is guaranteed to be positive. Therefore, thebucket
method is guaranteed to return a positive value (or zero); -
when the hash code value
hc
is positive, thebucket
return value is equivalent to:return hc % keys.length;
-
when the hash code value
hc
is negative, it might give a distinct result than:return Math.abs(hc) & keys.length;
Put in other words, when two distinct keys produce different hash code values having the same absolute value, our new solution causes fewer hash collisions than the previous solution.
The following listing tries to illustrate this fact for an array of length 4
:
Math.floorMod( 1, 4) = 1
Math.floorMod(-1, 4) = 3
Math.floorMod( 2, 4) = 2
Math.floorMod(-2, 4) = 2
Math.floorMod( 3, 4) = 3
Math.floorMod(-3, 4) = 1
Math.floorMod( 4, 4) = 0
Math.floorMod(-4, 4) = 0
Math.floorMod( 5, 4) = 1
Math.floorMod(-5, 4) = 3
The previous solution would cause hash collisions for all of listed positive/negative pairs.
The new solution, on the other hand, causes hash collisions only for the (2, -2)
and (4, -4)
pairs.
A small improvement. An improvement nonetheless.
This technique is used by:
Iteration 07: replace the value mapped to an existing key
When the put
method is invoked with a key-value pair such that:
-
the map already contains a mapping for the specified key; and
-
the existing mapped value is distinct from the specified value.
Then the put
method must replace the existing mapped value to the specified one.
We will implement this use-case in this iteration.
Iteration 07: test case
And here's the test case for this iteration:
public class HashTableTest {
@Test(description = """
put() method
- replace the value mapped to an existing key
""")
public void iter07() {
var ht = new HashTable<Integer, String>();
assertEquals(ht.size(), 0);
assertEquals(ht.put(1, "Won"), null);
assertEquals(ht.size(), 1);
assertEquals(ht.put(1, "One"), "Won");
assertEquals(ht.size(), 1);
assertEquals(ht.put(2, "Two"), null);
assertEquals(ht.size(), 2);
assertEquals(ht.get(1), "One");
assertEquals(ht.get(2), "Two");
}
}
We start by associating the Won
string to the 1
key.
Next, we replace the Won
value to the "correct" One
value.
As it is a replacement, we verify that the put
returns the old value.
We additionally verify that the map size remains the same.
Finally we add the 2 = Two
mapping.
And, for completeness, we verify that we can correctly "get" both mappings from the map.
Iteration 07: running our test
The test fails when run against the implementation of the previous iteration:
java.lang.UnsupportedOperationException: Implement me
at iter03.HashTable.put1(HashTable.java:43)
at iter03.HashTable.put0(HashTable.java:39)
at iter02.HashTable.put(HashTable.java:25)
at iter07.HashTableTest.iter07(HashTableTest.java:35)
As in the previous iteration, our test fails as it reaches a branch of our code we did not implement yet. The stack trace indicates it is the code written during iteration #03.
Iteration 07: implementation
Here's an implementation that make our test pass:
public class HashTable<K, V> extends iter06.HashTable<K, V> {
@Override
protected final V put1(K key, V value, int bucket, Object existing) {
if (existing.equals(key)) {
return putReplace(key, value, bucket);
}
return put2(key, value, bucket);
}
protected V put2(K key, V value, int bucket) {
throw new UnsupportedOperationException("Implement me");
}
@SuppressWarnings("unchecked")
protected final V putReplace(K key, V value, int bucket) {
var oldValue = values[bucket];
keys[bucket] = key;
values[bucket] = value;
return (V) oldValue;
}
}
It starts by overriding the put1
method defined during iteration #03.
Remember, the put1
parameters are:
-
key
= the key theput
method was invoked with; -
value
= the value theput
method was invoked with; -
bucket
= the computed array index forkey
; and -
existing
= the existing key, if one exists, found at the computedbucket
index.
If the existing
key is equal to the specified key
it means the current mapped value must be replaced.
Iteration 07: the putReplace
method
The actual replacement is done by the putReplace
method:
@SuppressWarnings("unchecked")
protected final V putReplace(K key, V value, int bucket) {
var oldValue = values[bucket];
keys[bucket] = key;
values[bucket] = value;
return (V) oldValue;
}
It starts by assigning the current mapped value to the oldValue
variable.
Next, both the key
and the value
are stored at their respective arrays.
Finally, the oldValue
is returned, as required by the put
method contract.
Note that, since the put
method only allows values of type V
to be associated, the cast operation is safe.
As the cast is safe, we can safely suppress the unchecked conversion warning.
Iteration 07: hash collision
If the existing
key is not equal to the specified key
it means we have hit a hash collision.
We will handle it eventually, but, for now, we simply invoke the yet to be implemented put2
method:
protected V put2(K key, V value, int bucket) {
throw new UnsupportedOperationException("Implement me");
}
Iteration 08: querying for a non-existing mapping
Let's focus once more on the get
method.
In this iteration we will try to get
the mapping of a non-existing key.
We expected the get
method to return null
in this case.
Iteration 08: test case
We write a minimal test exercising our use-case:
public class HashTableTest {
@Test(description = """
get() method
- non-existing key should return null
""")
public void iter08() {
var ht = new HashTable<Integer, String>();
assertEquals(ht.get(1), null);
assertEquals(ht.put(1, "One"), null);
assertEquals(ht.get(1), "One");
}
}
We start by immediately querying for a non-existing key.
We assert that the get
method returns null
in this case.
For completeness, we associate a value to the previously queried key and verify that the get
method now returns the associated value.
Iteration 08: running our test
Let's run our new test against the implementation of the previous iteration:
java.lang.UnsupportedOperationException: Implement me
at iter04.HashTable.get0(HashTable.java:37)
at iter04.HashTable.get(HashTable.java:33)
at iter08.HashTableTest.iter08(HashTableTest.java:31)
It fails at the code written during iteration #04. To be more precise, it fails because it reaches a branch left unimplemented during that iteration:
if (key.equals(candidate)) {
return (V) values[bucket];
}
// method get0 just throws UOE
return get0(key, bucket, candidate);
It reaches this path as the key candidate
found at the bucket
index is not equal to the key
that was supplied to the get
method.
Iteration 08: implementation
Here is an implementation that make the test pass:
public class HashTable<K, V> extends iter07.HashTable<K, V> {
@Override
protected final V get0(Object key, int bucket, Object candidate) {
if (candidate == null) {
return null;
}
return get1(key, bucket);
}
protected V get1(Object key, int bucket) {
throw new UnsupportedOperationException("Implement me");
}
}
It is somewhat trivial.
If candidate
is null it means there was not key found at the bucket
index.
In other words, the map does not contain the specified key.
In this case, and as required by the get
method contract, it must return null
.
On the other hand, if candidate
is not null then we have hit a hash collision once again.
Except, this time, it occurs during a "read" operation.
We will handle hash collisions in the next post of this series.
For now, we invoke a get1
method that just throws UnsupportedOperationException
.
In the next blog post of this series
In this second blog post of the series we started to deviate from the "happy path". In doing so we have incrementally improved our hash table implementation:
-
it now supports keys with negative hash code values;
-
the hash function produces fewer hash collisions when two distinct hash code values have the same absolute value;
-
the
put
method now supports replacing an existing value mapping; and -
the
get
method now properly handles a non-existing key mapping.
In the next blog posts of this series we will:
-
handle hash collisions; and
-
perform a rehash operation, i.e., resize the inner arrays so our hash table can hold more mappings
Stay tuned!
The source code for all of the examples in this post can be found at this GitHub repository.
Continue reading
The third part of this series is already available.
You can continue reading by following the link.