How to implement a hash table in Java (Part 3)
In the previous post of this series we've improved on the initial version of our hash table. It is capable of:
-
"putting" mappings;
-
"getting" mappings; and
-
replacing the value of existing mappings.
However, it still misses a major functionality: it does not handle hash collisions yet.
That's what we will work on during this third post of our series.
We will have our hash table handle hash collisions during both put
and get
operations.
Let's continue.
Before we continue
While not strictly required, I recommend reading the previous parts of this series if you haven't already:
In particular:
-
the "Writing style" section of Part 1 explains why the examples are written the way they are; and
-
the "Iteration 06: hash collision" section of Part 2 gives a brief explanation of what a hash collision is.
Iteration 09: two testing conveniences
In order to properly test that our hash table is capable of handling hash collisions we need:
-
an easy way to create key values capable of causing hash collisions; and
-
a way to assert the internal disposition of our hash table contents.
Therefore, in this iteration, we will introduce two testing conveniences:
-
a
Key
class for causing hash collisions; and -
a
toString
implementation for displaying the contents of our hash table.
Iteration 09: the Key
class
Two distinct keys a
and b
are guaranteed to cause a hash collision in our hash table if:
-
they are distinct, i.e.,
!a.equals(b)
evaluates totrue
; and -
they both have the same hash code values, i.e.,
a.hashCode() == b.hashCode()
evaluates totrue
.
The following class allows to easily create instances satisfying these criteria:
public final class Key {
private final String value;
private final int hash;
public Key(String value, int hash) {
this.value = value;
this.hash = hash;
}
@Override
public final boolean equals(Object obj) {
return obj == this
|| obj instanceof Key that
&& Objects.equals(value, that.value);
}
@Override
public final int hashCode() { return hash; }
@Override
public final String toString() { return value; }
}
As an example, the following test passes:
var a = new Key("A", 123);
var b = new Key("B", 123);
assertNotEquals(a, b);
assertEquals(a.hashCode(), 123);
assertEquals(b.hashCode(), 123);
Of course the class also makes it easy to violate the equals
and hashCode
contract.
For example, we can easily create two instances that are equal to each other but have distinct hash code values.
This is a violation of the hashCode
contract.
Keep in mind we are only using it to cause hash collisions in a testing environment. In other words, I am not advocating for the use of a class like this one in production.
Iteration 09: test case
Let's now focus on the toString
method.
Here's our test case:
public class HashTableTest {
@Test(description = """
toString() method
- renders an ascii table with columns:
1. array index
2. key toString()
3. value toString()
""")
public void iter09() {
var ht = new HashTable<Key, String>();
var a = new Key("AAA", 4);
var c = new Key("CCC", 6);
ht.put(a, "aaa");
ht.put(c, "ccc");
assertEquals(
ht.toString(),
"""
+-----+-----+-----+
| idx | key | val |
+-----+-----+-----+
| 0 | AAA | aaa |
| 1 | | |
| 2 | CCC | ccc |
| 3 | | |
+-----+-----+-----+
"""
);
}
}
First, we create two instances, a
and c
, of our Key
class.
As seen in the previous section, their hash code values are given by the constructor's second argument.
The hash code values were chosen so that the mapping stays at a known array index.
Given their values, we know that:
-
key
a
will be stored at indexidx = 0
; and -
key
c
will be stored at indexidx = 2
.
Remember, we know how our hash function is implemented.
We also know the length of the internal keys
and values
arrays.
Therefore, we can "manually" compute the array index.
Next, we associate the keys to their values.
We then write how we want the toString
output to look like.
It is a table showing the internal contents of our hash table.
It has three columns:
-
the array index
-
the key
toString
output -
the value
toString
output
When the key and the value are null
their respective table cells are blank.
Iteration 09: implementation
Here's an implementation that make the test pass:
public class HashTable<K, V> extends iter08.HashTable<K, V> {
@Override
public final String toString() {
var sb = new StringBuilder();
sb.append(
"""
+-----+-----+-----+
| idx | key | val |
+-----+-----+-----+
""");
for (int idx = 0; idx < keys.length; idx++) {
var key = keys[idx];
if (key != null) {
var value = values[idx];
sb.append("| %3d | %3s | %3s |\n".formatted(idx, key, value));
} else {
sb.append("| %3d | | |\n".formatted(idx));
}
}
sb.append("+-----+-----+-----+\n");
return sb.toString();
}
}
It starts by writing the table header.
It then loops through the keys
array.
If a non-null key
is found, it writes, in order, the current array index, the key itself and the mapped value.
If the key
is null
it only writes the array index.
It ends by writing the table footer.
Iteration 10: handling hash collision during put
operations
Let's (at last) begin the implementation of the hash collision handling code of our hash table.
We'll focus on put
operations initially.
In this iteration we will put
a series of distinct mappings that "want to stay in the same bucket".
Iteration 10: test case
With the help of our Key
class created earlier, we create our test case:
public class HashTableTest {
@Test(description = """
put() method
- handle hash collisions
- no value replacement
""")
public void iter10() {
var ht = new HashTable<Key, String>();
var a = new Key("AAA", 2);
var b = new Key("BBB", 2);
var c = new Key("CCC", 2);
assertEquals(ht.put(a, "aaa"), null);
assertEquals(ht.put(b, "bbb"), null);
assertEquals(ht.put(c, "ccc"), null);
assertEquals(
ht.toString(),
"""
+-----+-----+-----+
| idx | key | val |
+-----+-----+-----+
| 0 | CCC | ccc |
| 1 | | |
| 2 | AAA | aaa |
| 3 | BBB | bbb |
+-----+-----+-----+
"""
);
}
}
We create three Key
instances a
, b
and c
in such way that:
-
they are all distinct to each other; and
-
they all produce the same hash code value.
Therefore, keys a
, b
and c
are guaranteed to cause hash collisions in our hash table.
When associated with a value via the put
method, they will all try to occupy the bucket idx = 2
.
But, of course, only a single key can occupy that bucket at any given time.
In other words, the first put
will occupy the bucket;
the keys of the subsequent put
operations will have to look for alternative buckets.
The final internal disposition in our hash table is given by the toString
assertion.
How did we get to this exact disposition depicted by the toString
output?
Iteration 10: linear probing
As all of the keys in our current test produce the same hash code value, they will all try to occupy the same bucket.
The hash table is empty when we associate key a
to its value.
So the a
mapping occupies the bucket with index idx = 2
.
When we try to associate key b
to its value, that bucket it is already taken by key a
.
So we have key b
occupy the next empty cell in the array.
The next empty bucket is the one with index idx = 3
and that's where the b
mapping is stored.
When we try to associate key c
to its value it also wants to stay at bucket idx = 2
.
As it is occupied by key a
, it tries the next cell.
As the next cell is occupied by key b
it tries the next cell.
Except there's no next cell: we are at the end of the array.
So it tries again from the start of the array, idx = 0
.
As idx = 0
is empty that's where key c
stays.
So that's how we got to the toString
assertion:
assertEquals(
ht.toString(),
"""
+-----+-----+-----+
| idx | key | val |
+-----+-----+-----+
| 0 | CCC | ccc |
| 1 | | |
| 2 | AAA | aaa |
| 3 | BBB | bbb |
+-----+-----+-----+
"""
);
This technique of resolving hash collisions by "probing" for the next empty cell is called linear probing. You should know that there are more ways to handle hash collisions. This is just one of the possible solutions.
In the JDK, the following JDK classes use the same linear probing solution:
The first is the one returned by the Map.of
methods when two or more mapping are specified.
The second is the one returned by the Set.of
methods when three or more elements are specified.
I should note that these are the values returned by the methods at time of writing.
Iteration 10: running our test
Running our test against the previous implementation fails:
java.lang.UnsupportedOperationException: Implement me
at iter07.HashTable.put2(HashTable.java:29)
at iter07.HashTable.put1(HashTable.java:25)
at iter03.HashTable.put0(HashTable.java:39)
at iter02.HashTable.put(HashTable.java:25)
at iter10.HashTableTest.iter10(HashTableTest.java:38)
It reaches a branch of our code we did not implement yet. The stack trace indicates it is part of the code written during iteration #07.
Iteration 10: implementation
The following code makes our current test pass:
public class HashTable<K, V> extends iter09.HashTable<K, V> {
@Override
protected V put2(K key, V value, int bucket) {
var index = bucket + 1;
while (true) {
if (index == keys.length) {
index = 0;
}
var existing = keys[index];
if (existing == null) {
return putInsert(key, value, index);
}
index++;
}
}
}
It overrides the put2
method created during iteration #07.
This method is invoked with an unresolved hash collision.
The parameters are:
-
key
= the key theput
method was invoked with; -
value
= the value theput
method was invoked with; and -
bucket
= the computed array index forkey
.
Being a hash collision the specified bucket
index is already occupied by a mapping.
We need to look for an empty bucket.
Iteration 10: looking for an empty bucket
We start looking at the next adjacent bucket:
var index = bucket + 1;
The index
variable indicates our current array index candidate.
The search itself is done in a while
loop:
while (true) {
if (index == keys.length) {
index = 0;
}
...
index++;
}
At the very end of the loop the index
variable is incremented.
So, if an empty bucket is not found at the current candidate, we continue the search with the next one.
Except index
might already be the last index of the array.
When this is the case, we have to continue the search from the beginning of the array.
That's the responsibility of the if
statement at the beginning of the loop.
At the "core" of the loop, we test the current index
candidate:
var existing = keys[index];
if (existing == null) {
return putInsert(key, value, index);
}
If the hash table contains no mappings at the current index
candidate it means we have found our bucket.
We proceed to store the specified key-value pair, at the current index, by invoking the putInsert
method.
Iteration 11: hash collisions and value replacements
During iteration #07 of Part 2 we allowed the put
method to replace the value of an existing mapping.
In this iteration we will implement the same use-case. Except we will do it for a mapping participating in a hash collision.
Iteration 11: test case
Here's our test case:
public class HashTableTest {
@Test(description = """
put() method
- hash collision
- value replacement
""")
public void iter11() {
var ht = new HashTable<Key, String>();
var a = new Key("AAA", 2);
var b = new Key("BBB", 2);
assertEquals(ht.put(a, "aaa"), null);
assertEquals(ht.put(b, "123"), null);
assertEquals(ht.put(b, "bbb"), "123", "\n" + ht.toString());
var c = new Key("CCC", 3);
assertEquals(ht.put(c, "ccc"), null);
assertEquals(
ht.toString(),
"""
+-----+-----+-----+
| idx | key | val |
+-----+-----+-----+
| 0 | CCC | ccc |
| 1 | | |
| 2 | AAA | aaa |
| 3 | BBB | bbb |
+-----+-----+-----+
"""
);
}
}
We create two Key
instances.
Keys a
and b
are guaranteed to cause a hash collision.
We start by associating key a
to the "aaa"
string value.
Then, immediately after we associate key b
to the "123"
value, we replace its value with "bbb"
.
As this is a replacement operation, we verify that the put
method returns the previous value.
Finally, to additionally verify the correctness of our put
method implementation, we associate the "ccc"
value to key c
.
Unlike keys a
and b
, key c
wants to stay at index idx = 3
.
In other words, it does not cause a hash collision with neither key a
nor key b
.
But, as its "natural bucket" is already taken by key b
, it must be put in an alternative location.
So it is stored at the beginning of the array.
Iteration 11: running the test
Let's run our current test against the implementation of the previous iteration:
java.lang.AssertionError:
+-----+-----+-----+
| idx | key | val |
+-----+-----+-----+
| 0 | BBB | bbb |
| 1 | | |
| 2 | AAA | aaa |
| 3 | BBB | 123 |
+-----+-----+-----+
expected [123] but found [null]
at org.testng.Assert.fail(Assert.java:110)
at org.testng.Assert.failNotEquals(Assert.java:1413)
at org.testng.Assert.assertEqualsImpl(Assert.java:149)
at org.testng.Assert.assertEquals(Assert.java:131)
at org.testng.Assert.assertEquals(Assert.java:655)
at iter11.HashTableTest.iter11(HashTableTest.java:39)
The test fails: the put
method does not return the previous mapping.
In fact, observing the error message, it seems our hash table now contains different mappings for the same key.
Iteration 11: implementation
Here's an implementation that makes our test pass:
public class HashTable<K, V> extends iter10.HashTable<K, V> {
@Override
protected final V put2(K key, V value, int bucket) {
var index = bucket + 1;
while (true) {
if (index == keys.length) {
index = 0;
}
var existing = keys[index];
if (existing == null) {
return putInsert(key, value, index);
}
if (existing.equals(key)) {
return putReplace(key, value, index);
}
index++;
}
}
}
Once again, we override the put2
method.
It is the same method we worked on during the previous iteration.
The implementation is almost identical to the one from the last iteration.
The difference is an additional if
statement:
var existing = keys[index];
...
if (existing.equals(key)) {
return putReplace(key, value, index);
}
It tests whether existing
is equal to the specified key
.
When the test evaluates to true
, the corresponding mapped value must be replaced.
We do so by invoking the putReplace
method.
Iteration 11: infinite loop?
You may have noticed that our while
statement contains an always true expression:
while (true) {
...
}
So, unless one of the return
statements in its "body" is executed, the while
statement will run indefinitely.
In fact, we can create a test case that does just that:
@Test(enabled = false, description = """
put() method
- guaranteed to cause an infinite loop
""")
public void iter11_infiniteLoop() {
var ht = new HashTable<Key, String>();
var a = new Key("AAA", 2);
var b = new Key("BBB", 2);
var c = new Key("CCC", 2);
var d = new Key("DDD", 2);
var e = new Key("EEE", 2);
ht.put(a, "aaa");
ht.put(b, "bbb");
ht.put(c, "ccc");
ht.put(d, "ddd");
ht.put(e, "eee");
}
Notice that we are trying to put more mappings than the hash table's capacity of 4
.
The infinite loop is expected in this case.
And there's nothing wrong (as far as I can tell) with our while
loop.
The actual problem is that our hash table does not grow to accommodate more mappings. If the arrays are large enough, then an empty bucket will always be found.
Don't worry. We will implement the resize operation in the next post of this series.
Iteration 12: handling hash collision during get
operations
We are done implementing the hash collision handling for put
operations.
We can now focus on the get
operation.
Let's now have our get
method support retrieving the values associated to keys that cause hash collisions.
Iteration 12: test case
This is our test case:
public class HashTableTest {
@Test(description = """
get() test case
- hash collision
- non existing key
""")
public void iter12() {
var ht = new HashTable<Key, String>();
var a = new Key("AAA", 3);
var b = new Key("BBB", 3);
var c = new Key("CCC", 3);
assertEquals(ht.put(a, "aaa"), null);
assertEquals(ht.put(b, "bbb"), null);
assertEquals(ht.put(c, "ccc"), null);
assertEquals(ht.get(a), "aaa");
assertEquals(ht.get(b), "bbb");
assertEquals(ht.get(c), "ccc");
var d = new Key("DDD", 3);
assertEquals(ht.get(d), null);
}
}
The first section of our test is very similar to the one we wrote during iteration #10:
-
we create three
Key
instances that produce the same hash code value; and -
we proceed to associate a distinct value to each key.
Next, we try to retrieve the values we have just inserted by invoking the get
method.
We verify that the value returned is the one we expect it to be.
For completeness, we verify that the get
method returns null
when the hash table does not contain a mapping for the specified key:
var d = new Key("DDD", 3);
assertEquals(ht.get(d), null);
Note that key d
has the same hash code value as the other three keys.
In other words, if we were to put
it in our map, it would cause a hash collision at the same bucket as the other keys.
Iteration 12: running our test
As expected, running our test against the current implementation fails:
java.lang.UnsupportedOperationException: Implement me
at iter08.HashTable.get1(HashTable.java:29)
at iter08.HashTable.get0(HashTable.java:25)
at iter04.HashTable.get(HashTable.java:33)
at iter12.HashTableTest.iter12(HashTableTest.java:42)
It reaches the method get1
.
It was left unimplemented during iteration #08.
Iteration 12: implementation
The following is the iteration's implementation:
public class HashTable<K, V> extends iter11.HashTable<K, V> {
@SuppressWarnings("unchecked")
@Override
protected final V get1(Object key, int bucket) {
var index = bucket + 1;
while (true) {
if (index == keys.length) {
index = 0;
}
var existing = keys[index];
if (existing == null) {
return null;
}
if (key.equals(existing)) {
return (V) values[index];
}
index++;
}
}
}
We override the get1
method created during iteration #08.
It is called when a hash collision occurs during a get
operation.
The parameters are:
-
key
= the key theget
method was invoked with; and -
bucket
= the computed array index forkey
.
Iteration 12: looking for the specified key
The implementation is very similar to the one of the put
operation:
In fact, the "outer layer" is identical to the put
operation:
we perform the search in a while
loop starting at the next adjacent bucket.
var index = bucket + 1;
while (true) {
if (index == keys.length) {
index = 0;
}
...
index++;
}
The differences are in the "core" of the loop.
If the current bucket is empty, it means our hash table does not contain the specified key
.
We must return null
in this case:
var existing = keys[index];
if (existing == null) {
// empty bucket == key not found
return null;
}
On the other hand, if the bucket is not empty and the stored existing
key is equal to the specified key
, it means we have found the key'. In this case, we return the corresponding value from the
values` array:
var existing = keys[index];
...
if (key.equals(existing)) {
// we've found the key!
// return the corresponding value
return (V) values[index];
}
In the next blog post of this series
In this third blog post of our series our hash table is finally capable of handling hash collisions.
We have implemented the linear probing technique for both the put
and the get
operations.
But the implementation is still incomplete. Trying to put a fifth mapping to our hash table causes an infinite loop.
To fix this issue, our hash table needs to dynamically grow so it can store more mappings. In doing so, all of the key-value mappings must be re-inserted.
This is what we call a rehash operation and will be the topic of the next and final blog post of this series.
Stay tuned!
The source code for all of the examples in this post can be found at this GitHub repository.
Continue reading
The fourth part of this series is already available.
You can continue reading by following the link.