The Java System::identityHashCode method
There's an ongoing project on OpenJDK called Lilliput. It is about reducing the size of an object header in the Hotspot JVM.
Roman Kennke recently posted about the effort to have compact identity hash codes.
It got me curious: how does the JVM compute the hash code value of an object when the programmer does not provide one?
In other words, if we do not override the Object::hashCode
method in our class,
what value is returned when we call the hashCode
method in an instance of our class?
Let's learn.
Our running example
Let's create a class named HashCode
.
It will be a direct Object
subclass.
We will simply print an instance of our class to the console:
package blog;
public class HashCode {
public static void main(String[] args) {
HashCode o = new HashCode();
System.out.println(o);
}
}
When executed it prints a familiar output:
blog.HashCode@543c6f6d
You probably have seen this "class name"@"hex number" output before.
The output comes from the Object::toString
implementation.
The Object::toString
method
Here's the Object::toString
implementation:
public String toString() {
return getClass().getName() + "@" + Integer.toHexString(hashCode());
}
The value after the '@' character is the hash code of the object in hexadecimal.
This implementation is required by the specification in the method Javadocs:
The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `@', and the unsigned hexadecimal representation of the hash code of the object.
We did not override the Object::hashCode
method either.
So the hex value is being provided by the hashCode
method from the Object
class.
The Object::hashCode
method
Here's the Object::hashCode
implementation:
@IntrinsicCandidate
public native int hashCode();
It is native. Now what?
Let's see the System::identityHashCode
method.
The System::identityHashCode
method
Here's an excerpt from the System::identityHashCode
documentation:
Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().
So can we assume that the Object::hashCode
uses it internally?
Maybe. Maybe not.
But the values are guaranteed to be the same.
So the Object::hashCode
implementation can be thought of being the following:
public int hashCode() {
return System.identityHashCode(this);
}
And here's the System::identityHashCode
method implementation:
@IntrinsicCandidate
public static native int identityHashCode(Object x);
It is also native.
We will have to look for the implementation in the JVM source code.
Searching the JVM source code for the hash code generation
We want to find out how the hash code is computed in the JVM.
The native java.lang.Object
class
We know that the System::identityHashCode
computes the value we are looking for.
But, just to be sure, let's look for the native Object::hashCode
implementation.
The best I could find was the src/java.base/share/native/libjava/Object.c
file.
But it only provides the native getClass
method implementation:
JNIEXPORT jclass JNICALL
Java_java_lang_Object_getClass(JNIEnv *env, jobject this)
{
if (this == NULL) {
JNU_ThrowNullPointerException(env, NULL);
return 0;
} else {
return (*env)->GetObjectClass(env, this);
}
}
So I had no luck finding the native Object::hashCode
implementation.
Let's move on to the System::identityHashCode
method.
The native java.lang.System
class
The src/java.base/share/native/libjava/System.c
file is where one can find the native parts of the java.lang.System
class.
And here's the code for the identityHashCode
method:
JNIEXPORT jint JNICALL
Java_java_lang_System_identityHashCode(JNIEnv *env, jobject this, jobject x)
{
return JVM_IHashCode(env, x);
}
It delegates to the JVM_IHashCode
symbol.
I don't know C++ so I don't know if the symbol refers to a function, a method or a macro.
In any case, let's continue our search.
The JVM_IHashCode
symbol
We find the JVM_IHashCode
symbol in the src/hotspot/share/prims/jvm.cpp
file:
JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))
// as implemented in the classic virtual machine; return 0 if object is null
return handle == nullptr ? 0 :
checked_cast<jint>(ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)));
JVM_END
Once again, I don't know C++. Nor do I know the JVM source code.
But I assume it is delegating to the ObjectSynchronizer::FastHashCode
method.
The ObjectSynchronizer::FastHashCode
method
In the src/hotspot/share/runtime/synchronizer.cpp
file we find the FastHashCode
method.
It is quite a long method. So here's a relevant section:
hash = mark.hash();
if (hash != 0) { // if it has a hash, just return it
return hash;
}
hash = get_next_hash(current, obj); // get a new hash
I assume the mark
variable refers to the mark word of the Java object header.
So, if the hash is not already computed, it calls the get_next_hash
function.
I believe that's the code we are looking for.
The get_next_hash
function
In the same file we find the get_next_hash
function.
Here's the code with most of the comments removed:
static inline intptr_t get_next_hash(Thread* current, oop obj) {
intptr_t value = 0;
if (hashCode == 0) {
value = os::random();
} else if (hashCode == 1) {
intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
} else if (hashCode == 2) {
value = 1; // for sensitivity testing
} else if (hashCode == 3) {
value = ++GVars.hc_sequence;
} else if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj);
} else {
// Marsaglia's xor-shift scheme with thread-specific state
...
unsigned t = current->_hashStateX;
t ^= (t << 11);
current->_hashStateX = current->_hashStateY;
current->_hashStateY = current->_hashStateZ;
current->_hashStateZ = current->_hashStateW;
unsigned v = current->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
current->_hashStateW = v;
value = v;
}
...
}
The hash code generation depends on an external flag named hashCode
.
There are currently six different generation methods:
-
the first one, when
hashCode == 0
, seems to get the next pseudo random value generated from the underlying operating system; -
the second one, when
hashCode == 1
, seems to use the memory address of the object. It mixes the bits around before generating the final value; -
the third one, when
hashCode == 2
, simply sets the value to 1; -
the fourth one, when
hashCode == 3
, seems to increment and then get the value of a global variable; -
the fifth one, when
hashCode == 4
, seems to use the memory address of the object as it is; and -
the sixth one, from the comment, seems to use a Marsaglia's Xorshift scheme. In other words, from some state coming from the executing thread, it generates the next pseudo random number.
Please take this list with a grain of salt. These are only my assumptions from reading the code.
Now the questions are:
-
which generation method is used by default? and
-
how can one choose which generation method to use?
Let's investigate.
The hashCode
flag
In the src/hotspot/share/runtime/globals.hpp
file we find the hashCode
flag:
product(intx, hashCode, 5, EXPERIMENTAL,
"(Unstable) select hashCode generation algorithm")
So it seems that:
-
its default value is 5. and
-
it is an experimental flag.
We can confirm it by running the following command:
java -XX:+UnlockExperimentalVMOptions \
-XX:+PrintFlagsFinal 2>/dev/null | grep hashCode
It prints the following:
intx hashCode = 5 {experimental} {default}
So we can set it via -XX:hashCode=value
JVM option.
The native Thread
class
So, by default, the JVM will use the Marsaglia Xorshift algorithm for the identityHashCode
method implementation:
// Marsaglia's xor-shift scheme with thread-specific state
...
unsigned t = current->_hashStateX;
t ^= (t << 11);
current->_hashStateX = current->_hashStateY;
current->_hashStateY = current->_hashStateZ;
current->_hashStateZ = current->_hashStateW;
unsigned v = current->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
current->_hashStateW = v;
value = v;
The current
variable refers to the current thread.
The code uses the X
and W
hash state values directly.
The Y
and Z
are used to store the two previously generated values.
In the src/hotspot/share/runtime/thread.cpp
file we can find their initial values:
// thread-specific hashCode stream generator state - Marsaglia shift-xor form
_hashStateX = os::random();
_hashStateY = 842502087;
_hashStateZ = 0x8767; // (int)(3579807591LL & 0xffff) ;
_hashStateW = 273326509;
The important thing to notice is that the X
value is initialized with the next pseudo random number from the OS.
Testing our findings
Let's get back to our running example.
For each of the six different generation methods we'll run our example three times.
So, in the command line, we define the following function:
hashCode() { for i in {1..3}; do java -XX:+UnlockExperimentalVMOptions -XX:hashCode=${1} HashCode.java; done }
That's what we'll use to run our example.
Running with -XX:hashCode=0
It should use the following hash code generation:
value = os::random();
When we run hashCode 0
in the command line we get:
blog.HashCode@387162d9
blog.HashCode@387162d9
blog.HashCode@387162d9
Interesting. It produces the same hash code value for different JVM runs.
Either the -XX:hashCode=0
option had no effect or the os::random()
uses a fixed random seed.
I think it is the latter.
Running with -XX:hashCode=1
It should use the following hash code generation:
intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
As it seems to use the memory address of the object, we should expect different values in different runs.
When we run hashCode 1
in the command line we get:
blog.HashCode@633c0297
blog.HashCode@633c02eb
blog.HashCode@633df072
OK. We got different values for each run.
Running with -XX:hashCode=2
It should use the following hash code generation:
value = 1
When we run hashCode 2
in the command line we get:
blog.HashCode@1
blog.HashCode@1
blog.HashCode@1
So we get the constant hash code.
Running with -XX:hashCode=3
It should use the following hash code generation:
value = ++GVars.hc_sequence;
When we run hashCode 3
in the command line we get:
blog.HashCode@5d1
blog.HashCode@5d1
blog.HashCode@5d1
The hash code values are all the same as it is the same object instance.
We need to use a different program:
public class HashCode3 {
public static void main(String[] args) {
System.out.println(new HashCode3());
System.out.println(new HashCode3());
System.out.println(new HashCode3());
}
}
It creates three different instances and prints each one of them.
Let's run it a single time:
java -XX:+UnlockExperimentalVMOptions -XX:hashCode=3 HashCode.java
It prints:
blog.HashCode3@5da
blog.HashCode3@5db
blog.HashCode3@5dc
The hash code values are in sequence.
Running with -XX:hashCode=4
It should use the following hash code generation:
value = cast_from_oop<intptr_t>(obj);
It also seems to use the memory address of the object. We should expect different values for each run.
When we run hashCode 4
in the command line we get:
blog.HashCode@20e8c4d0
blog.HashCode@20e72e28
blog.HashCode@20e76398
So we got different values for each run.
Running with -XX:hashCode=5
It should use the Marsaglia Xorshit hash code generation:
unsigned t = current->_hashStateX;
t ^= (t << 11);
current->_hashStateX = current->_hashStateY;
current->_hashStateY = current->_hashStateZ;
current->_hashStateZ = current->_hashStateW;
unsigned v = current->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
current->_hashStateW = v;
value = v;
When we run hashCode 5
in the command line we get:
blog.HashCode@543c6f6d
blog.HashCode@543c6f6d
blog.HashCode@543c6f6d
Similarly to our first run, we get the same value for each distinct run.
As discussed in the "The native Thread class" section, this generation relies on os::random()
.
And, as mentioned earlier, the seed for the os::random()
generator seems to be a fixed number:
volatile unsigned int os::_rand_seed = 1234567;
As a final note, since this is the default option, we should get the same value when we run our program without options:
$ java HashCode.java
blog.HashCode@543c6f6d
And we get the same hash code value.
Conclusion
So, if we do not override the Object::hashCode
method in our class,
what value is returned when we call the hashCode
method in an instance of our class?
The System::identityHashCode
static method is responsible for returning the value.
If the object does not already have an identity hash code computed then a new one is generated by the get_next_hash
function in the (Hotspot) JVM.
At time of writing, the get_next_hash
function provides six different ways to generate the hash code.
We can select the generation method by using the -XX:hashCode=value
JVM option.
The default generation method is a thread-local pseudo random generator which uses the Marsaglia Xorshift method.
You can find the source code of the examples in this GitHub repository.
If you want to learn more about the identity hash code then read Aleksey Shipilëv's post on the subject.