The Java System::identityHashCode method

Marcio EndoFeb 25, 2024

There's an ongoing project on OpenJDK called Lilliput. It is about reducing the size of an object header in the Hotspot JVM.

Roman Kennke recently posted about the effort to have compact identity hash codes.

It got me curious: how does the JVM compute the hash code value of an object when the programmer does not provide one?

In other words, if we do not override the Object::hashCode method in our class, what value is returned when we call the hashCode method in an instance of our class?

Let's learn.

Our running example

Let's create a class named HashCode. It will be a direct Object subclass.

We will simply print an instance of our class to the console:

package blog;public class HashCode {  public static void main(String[] args) {    HashCode o = new HashCode();    System.out.println(o);  }}

When executed it prints a familiar output:

blog.HashCode@543c6f6d

You probably have seen this "class name"@"hex number" output before.

The output comes from the Object::toString implementation.

The Object::toString method

Here's the Object::toString implementation:

public String toString() {  return getClass().getName() + "@" + Integer.toHexString(hashCode());}

The value after the '@' character is the hash code of the object in hexadecimal.

This implementation is required by the specification in the method Javadocs:

The toString method for class Object returns a string consisting of the name of the class of which the object is an instance, the at-sign character `@', and the unsigned hexadecimal representation of the hash code of the object.

We did not override the Object::hashCode method either.

So the hex value is being provided by the hashCode method from the Object class.

The Object::hashCode method

Here's the Object::hashCode implementation:

@IntrinsicCandidatepublic native int hashCode();

It is native. Now what?

Let's see the System::identityHashCode method.

The System::identityHashCode method

Here's an excerpt from the System::identityHashCode documentation:

Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().

So can we assume that the Object::hashCode uses it internally?

Maybe. Maybe not.

But the values are guaranteed to be the same.

So the Object::hashCode implementation can be thought of being the following:

public int hashCode() {  return System.identityHashCode(this);}

And here's the System::identityHashCode method implementation:

@IntrinsicCandidatepublic static native int identityHashCode(Object x);

It is also native.

We will have to look for the implementation in the JVM source code.

Searching the JVM source code for the hash code generation

We want to find out how the hash code is computed in the JVM.

The native java.lang.Object class

We know that the System::identityHashCode computes the value we are looking for.

But, just to be sure, let's look for the native Object::hashCode implementation.

The best I could find was the src/java.base/share/native/libjava/Object.c file.

But it only provides the native getClass method implementation:

JNIEXPORT jclass JNICALLJava_java_lang_Object_getClass(JNIEnv *env, jobject this){    if (this == NULL) {        JNU_ThrowNullPointerException(env, NULL);        return 0;    } else {        return (*env)->GetObjectClass(env, this);    }}

So I had no luck finding the native Object::hashCode implementation.

Let's move on to the System::identityHashCode method.

The native java.lang.System class

The src/java.base/share/native/libjava/System.c file is where one can find the native parts of the java.lang.System class.

And here's the code for the identityHashCode method:

JNIEXPORT jint JNICALLJava_java_lang_System_identityHashCode(JNIEnv *env, jobject this, jobject x){    return JVM_IHashCode(env, x);}

It delegates to the JVM_IHashCode symbol.

I don't know C++ so I don't know if the symbol refers to a function, a method or a macro.

In any case, let's continue our search.

The JVM_IHashCode symbol

We find the JVM_IHashCode symbol in the src/hotspot/share/prims/jvm.cpp file:

JVM_ENTRY(jint, JVM_IHashCode(JNIEnv* env, jobject handle))  // as implemented in the classic virtual machine; return 0 if object is null  return handle == nullptr ? 0 :         checked_cast<jint>(ObjectSynchronizer::FastHashCode (THREAD, JNIHandles::resolve_non_null(handle)));JVM_END

Once again, I don't know C++. Nor do I know the JVM source code.

But I assume it is delegating to the ObjectSynchronizer::FastHashCode method.

The ObjectSynchronizer::FastHashCode method

In the src/hotspot/share/runtime/synchronizer.cpp file we find the FastHashCode method.

It is quite a long method. So here's a relevant section:

hash = mark.hash();if (hash != 0) {                     // if it has a hash, just return it  return hash;}hash = get_next_hash(current, obj);  // get a new hash

I assume the mark variable refers to the mark word of the Java object header.

So, if the hash is not already computed, it calls the get_next_hash function.

I believe that's the code we are looking for.

The get_next_hash function

In the same file we find the get_next_hash function.

Here's the code with most of the comments removed:

static inline intptr_t get_next_hash(Thread* current, oop obj) {  intptr_t value = 0;  if (hashCode == 0) {    value = os::random();  } else if (hashCode == 1) {    intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;    value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;  } else if (hashCode == 2) {    value = 1;            // for sensitivity testing  } else if (hashCode == 3) {    value = ++GVars.hc_sequence;  } else if (hashCode == 4) {    value = cast_from_oop<intptr_t>(obj);  } else {    // Marsaglia's xor-shift scheme with thread-specific state    ...    unsigned t = current->_hashStateX;    t ^= (t << 11);    current->_hashStateX = current->_hashStateY;    current->_hashStateY = current->_hashStateZ;    current->_hashStateZ = current->_hashStateW;    unsigned v = current->_hashStateW;    v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));    current->_hashStateW = v;    value = v;  }  ...}

The hash code generation depends on an external flag named hashCode.

There are currently six different generation methods:

the first one, when hashCode == 0, seems to get the next pseudo random value generated from the underlying operating system;
the second one, when hashCode == 1, seems to use the memory address of the object. It mixes the bits around before generating the final value;
the third one, when hashCode == 2, simply sets the value to 1;
the fourth one, when hashCode == 3, seems to increment and then get the value of a global variable;
the fifth one, when hashCode == 4, seems to use the memory address of the object as it is; and
the sixth one, from the comment, seems to use a Marsaglia's Xorshift scheme. In other words, from some state coming from the executing thread, it generates the next pseudo random number.

Please take this list with a grain of salt. These are only my assumptions from reading the code.

Now the questions are:

which generation method is used by default? and
how can one choose which generation method to use?

Let's investigate.

The hashCode flag

In the src/hotspot/share/runtime/globals.hpp file we find the hashCode flag:

product(intx, hashCode, 5, EXPERIMENTAL,             "(Unstable) select hashCode generation algorithm")

So it seems that:

its default value is 5. and
it is an experimental flag.

We can confirm it by running the following command:

java -XX:+UnlockExperimentalVMOptions \
     -XX:+PrintFlagsFinal 2>/dev/null | grep hashCode

It prints the following:

intx hashCode = 5 {experimental} {default}

So we can set it via -XX:hashCode=value JVM option.

The native Thread class

So, by default, the JVM will use the Marsaglia Xorshift algorithm for the identityHashCode method implementation:

// Marsaglia's xor-shift scheme with thread-specific state...unsigned t = current->_hashStateX;t ^= (t << 11);current->_hashStateX = current->_hashStateY;current->_hashStateY = current->_hashStateZ;current->_hashStateZ = current->_hashStateW;unsigned v = current->_hashStateW;v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));current->_hashStateW = v;value = v;

The current variable refers to the current thread. The code uses the X and W hash state values directly. The Y and Z are used to store the two previously generated values.

In the src/hotspot/share/runtime/thread.cpp file we can find their initial values:

// thread-specific hashCode stream generator state - Marsaglia shift-xor form_hashStateX = os::random();_hashStateY = 842502087;_hashStateZ = 0x8767;    // (int)(3579807591LL & 0xffff) ;_hashStateW = 273326509;

The important thing to notice is that the X value is initialized with the next pseudo random number from the OS.

Testing our findings

Let's get back to our running example.

For each of the six different generation methods we'll run our example three times.

So, in the command line, we define the following function:

hashCode() { for i in {1..3}; do java -XX:+UnlockExperimentalVMOptions -XX:hashCode=${1} HashCode.java; done }

That's what we'll use to run our example.

Running with -XX:hashCode=0

It should use the following hash code generation:

value = os::random();

When we run hashCode 0 in the command line we get:

blog.HashCode@387162d9
blog.HashCode@387162d9
blog.HashCode@387162d9

Interesting. It produces the same hash code value for different JVM runs.

Either the -XX:hashCode=0 option had no effect or the os::random() uses a fixed random seed.

I think it is the latter.

Running with -XX:hashCode=1

It should use the following hash code generation:

intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;

As it seems to use the memory address of the object, we should expect different values in different runs.

When we run hashCode 1 in the command line we get:

blog.HashCode@633c0297
blog.HashCode@633c02eb
blog.HashCode@633df072

OK. We got different values for each run.

Running with -XX:hashCode=2

It should use the following hash code generation:

value = 1

When we run hashCode 2 in the command line we get:

blog.HashCode@1
blog.HashCode@1
blog.HashCode@1

So we get the constant hash code.

Running with -XX:hashCode=3

It should use the following hash code generation:

value = ++GVars.hc_sequence;

When we run hashCode 3 in the command line we get:

blog.HashCode@5d1
blog.HashCode@5d1
blog.HashCode@5d1

The hash code values are all the same as it is the same object instance.

We need to use a different program:

public class HashCode3 {  public static void main(String[] args) {    System.out.println(new HashCode3());    System.out.println(new HashCode3());    System.out.println(new HashCode3());  }}

It creates three different instances and prints each one of them.

Let's run it a single time:

java -XX:+UnlockExperimentalVMOptions -XX:hashCode=3 HashCode.java

It prints:

blog.HashCode3@5da
blog.HashCode3@5db
blog.HashCode3@5dc

The hash code values are in sequence.

Running with -XX:hashCode=4

It should use the following hash code generation:

value = cast_from_oop<intptr_t>(obj);

It also seems to use the memory address of the object. We should expect different values for each run.

When we run hashCode 4 in the command line we get:

blog.HashCode@20e8c4d0
blog.HashCode@20e72e28
blog.HashCode@20e76398

So we got different values for each run.

Running with -XX:hashCode=5

It should use the Marsaglia Xorshit hash code generation:

unsigned t = current->_hashStateX;t ^= (t << 11);current->_hashStateX = current->_hashStateY;current->_hashStateY = current->_hashStateZ;current->_hashStateZ = current->_hashStateW;unsigned v = current->_hashStateW;v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));current->_hashStateW = v;value = v;

When we run hashCode 5 in the command line we get:

blog.HashCode@543c6f6d
blog.HashCode@543c6f6d
blog.HashCode@543c6f6d

Similarly to our first run, we get the same value for each distinct run.

As discussed in the "The native Thread class" section, this generation relies on os::random().

And, as mentioned earlier, the seed for the os::random() generator seems to be a fixed number:

volatile unsigned int os::_rand_seed      = 1234567;

As a final note, since this is the default option, we should get the same value when we run our program without options:

$ java HashCode.java
blog.HashCode@543c6f6d

And we get the same hash code value.

Conclusion

So, if we do not override the Object::hashCode method in our class, what value is returned when we call the hashCode method in an instance of our class?

The System::identityHashCode static method is responsible for returning the value.

If the object does not already have an identity hash code computed then a new one is generated by the get_next_hash function in the (Hotspot) JVM.

At time of writing, the get_next_hash function provides six different ways to generate the hash code.

We can select the generation method by using the -XX:hashCode=value JVM option.

The default generation method is a thread-local pseudo random generator which uses the Marsaglia Xorshift method.

You can find the source code of the examples in this GitHub repository.

More on this subject

If you want to learn more about the identity hash code then read Aleksey Shipilëv's post on the subject.