Scanning Java Class Files #3: The Constant Pool

Marcio EndoMarcio EndoMar 9, 2025

In the previous post in this series, we learned that a Java class file consists of a single ClassFile structure, whose first three items are magic, minor_version and major_version. We started building our Java class file scanner, and, by the end of that post, our scanner could successfully read these initial elements.

These three items immediately preceed the constant pool. So, in the third blog post in this series, we'll improve our implementation and have it traverse the constant pool of a Java class file.

A Note on the Class-File API

We will not use the Class-File API in this blog post series, as it abstracts away the low-level details of the Java class file format we want to learn about.

Our Class File

Here's the source code of the class whose constant pool we'll traverse:

void main() {  IO.println("Hello, World!");}

It is the same class from the previous post. Let's compile it using JDK 23 with preview language features enabled:

$ javac --enable-preview --release 23 HelloWorld.java

To verify the correctness of our implementation, we'll compare our output to the following simplified output from the javap tool:

Constant pool:
   #1 = Methodref          #2.#3
   #2 = Class              #4
   #3 = NameAndType        #5:#6
   #4 = Utf8               java/lang/Object
   #5 = Utf8               <init>
   #6 = Utf8               ()V
   #7 = String             #8
   #8 = Utf8               Hello, World!
   #9 = Methodref          #10.#11
  #10 = Class              #12
  #11 = NameAndType        #13:#14
  #12 = Utf8               java/io/IO
  #13 = Utf8               println
  #14 = Utf8               (Ljava/lang/Object;)V
  #15 = Class              #16
  #16 = Utf8               HelloWorld
  #17 = Utf8               Code
  #18 = Utf8               LineNumberTable
  #19 = Utf8               main
  #20 = Utf8               SourceFile
  #21 = Utf8               HelloWorld.java

Let's continue our implementation.

The Constant Pool

Our implementation currently reads the first three items of the ClassFile structure. The next two items make up the constant pool of the Java class file, as shown below:

ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    ...
}

The fourth item is the constant_pool_count. It is an u2 value indicating the number of entries in the constant pool plus one. For example, if its value is four, then the constant pool contains three entries, labeled #1, #2, and #3. In other words, the constant pool uses 1-based indexing, starting at #1 rather than #0.

The fifth item is the constant_pool table. It is a stream of constant_pool_count - 1 elements of type cp_info, which vary in size. As a result, we can only know where the constant pool ends by traversing it.

Entries of Varying-Size

We can see the varying-size aspect of the entries in the constant pool in the javap output. For example, let's look entries #7 and #8:

Constant pool:
   ...
   #7 = String             #8
   #8 = Utf8               Hello, World!
   ...

We see that entry #7 is of type String and, from the output, we can assume that its size is equal to "one constant pool index". The type of constant_pool_count is u2, so it is safe to assume that "one constant pool index" is a 2-byte quantity. We see that entry #8 is of type Utf8 and its size seems to be equal to the length of the Hello, World! string, which is greater than two bytes.

So, even though we know the number of entries in the constant pool table, we don't know what its total length is. In order to find the end of the constant pool we have to traverse it.

Additionally, a constant pool index does not translate automatically to an offset inside the constant pool table. So, while traversing the constant pool, we need to store the index where we found the first byte of each entry.

05: The Constant Pool Loop

With the information we have right now, we can start writing the constant pool traversing code:

private int[] constantPoolIndex;private int entry = 1;private void traverseConstantPool() throws IOException {  System.out.println("Constant pool:");  final int constantPoolCount;  constantPoolCount = readU2();    constantPoolIndex = new int[constantPoolCount];  while (entry < constantPoolCount) {    constantPoolIndex[entry] = idx;    readConstantPoolEntry();        entry++;  }}private void readConstantPoolEntry() {  ...  }

And here's a breakdown of the code above:

  1. We begin the method by printing the "Constant pool:" message that mimics the output from the javap tool.

  2. Next, using the readU2 method created in the previous post, we read the fourth item of the ClassFile structure, namely the constant_pool_count value, and store its value in the constantPoolCount local variable.

  3. We initialize the constantPoolIndex array with a size equal to constantPoolCount obtained in the previous step. This array stores the position in the ClassFile structure where we'll find the first byte of a given constant pool entry.

  4. Next, we have the constant pool loop itself. The entry instance variable represents the number of the constant pool entry being currently read. As mentioned earlier, the constant pool uses 1-based indexing, so the entry variable starts with the number 1. We keep iterating while the entry number is smaller than the constantPoolCount value.

  5. At the beginning of each iteration, we store at the constantPoolIndex array the value of the idx instance variable for the current entry. In other words, we are storing the idx value to the first byte of the current constant pool entry.

  6. Next, we invoke the readConstantPoolEntry method, currently a stub with no real implementation. It will be responsible for reading each constant pool entry.

  7. Finally, at the end of each iteration, we increment the entry variable value.

Next, in order to implement the readConstantPoolEntry method, we'll examine the structure of the entries of the constant pool.

The Constant Pool Entry Structure

We find the definition of the cp_info structure in Section 4.4 of the Java Virtual Machine Specification (JVMS). It defines that all constant pool entries have the following format:

cp_info {
    u1 tag;
    u1 info[];
}

So, each entry in the constant pool:

  • Begins with a 1-byte value named tag, which indicates the kind of the constant of the entry.

  • Ends with two or more bytes, providing additional information specific to the kind of the constant.

As of Java 23, there are 17 kinds of constant pool entries, each with a distinct tag value.

06: The Constant Pool Tags

We find a listing of the constant kinds and their tag values in Table 4.4-B of the JVMS. Translating it to Java code gives us the following:

private static final byte CONSTANT_Utf8 = 1;private static final byte CONSTANT_Integer = 3;private static final byte CONSTANT_Float = 4;private static final byte CONSTANT_Long = 5;private static final byte CONSTANT_Double = 6;private static final byte CONSTANT_Class = 7;private static final byte CONSTANT_String = 8;private static final byte CONSTANT_Fieldref = 9;private static final byte CONSTANT_Methodref = 10;private static final byte CONSTANT_InterfaceMethodref = 11;private static final byte CONSTANT_NameAndType = 12;private static final byte CONSTANT_MethodHandle = 15;private static final byte CONSTANT_MethodType = 16;private static final byte CONSTANT_Dynamic = 17;private static final byte CONSTANT_InvokeDynamic = 18;private static final byte CONSTANT_Module = 19;private static final byte CONSTANT_Package = 20;

So, an Utf8 constant pool entry has a tag value of 1, an Integer constant pool entry has a tag value of 3, a Float constant pool entry has a tag value of 4, and so on.

07: Partial readConstantPoolEntry Implementation

With all of the constant pool tags listed, we write the follow readConstantPoolEntry partial implementation:

private void readConstantPoolEntry() throws IOException {  byte tag;  tag = readU1();  switch (tag) {    case CONSTANT_Utf8 -> { ... }    case CONSTANT_Integer -> { ... }    case CONSTANT_Float -> { ... }    case CONSTANT_Long -> { ... }    case CONSTANT_Double -> { ... }    case CONSTANT_Class -> { ... }    case CONSTANT_String -> { ... }    case CONSTANT_Fieldref -> { ... }    case CONSTANT_Methodref -> { ... }    case CONSTANT_InterfaceMethodref -> { ... }    case CONSTANT_NameAndType -> { ... }    case CONSTANT_MethodHandle -> { ... }    case CONSTANT_MethodType -> { ... }    case CONSTANT_Dynamic -> { ... }    case CONSTANT_InvokeDynamic -> { ... }    case CONSTANT_Module -> { ... }    case CONSTANT_Package -> { ... }    default -> throw new IOException("Unknown constant pool tag=" + tag);  }}private byte readU1() throws IOException {  check(1);    return data[idx++];}

We begin the method by reading the tag value of the current constant pool entry. Then, we switch over the tag value, with each case being a constant we defined in the previous section. In the default case, when the tag value does not correspond to any of the specified values, we complete the switch statement by throwing an IOException.

We didn't implement the body of the switch rules, but we know that each rule should:

  1. Print the entry number and its type.

  2. Print the contents of each entry, if possible.

  3. At the end of each rule, the idx should point to the first byte of the next constant pool entry, or the first byte of the structure that comes after the constant pool.

What's in a Constant Pool Entry?

At time of writing, there are 17 kinds of constant pool entries. Discussing all of them is beyond the scope of this blog post. Right now, we're interested in traversing the constant pool. In order to that, we need to know the size of each entry kind.

As an example, let's look at the String constant pool entry. It's structure is defined in Section 4.4.3 of the JVMS:

CONSTANT_String_info {
    u1 tag;
    u2 string_index;
}

So, after its tag value, we have the string_index value whose type is u2. The latter represents the index to another entry of type Utf8 in the same constant pool, which should contain the actual string literal contents. So, if we know we are reading a String entry, meaning we've already read its tag value, we only need to read an u2 value in order to get to the next constant pool entry.

Reading about the other entry kinds, we can assemble the following table:

Utf8               = u1 len=u2 u1[len]
Integer            = u1 u4
Float              = u1 u4
Long               = u1 u4 u4
Double             = u1 u4 u4
Class              = u1 u2
String             = u1 u2
Fieldref           = u1 u2 u2
Methoref           = u1 u2 u2
InterfaceMethodref = u1 u2 u2
NameAndType        = u1 u2 u2
MethodHandle       = u1 u1 u2
MethodType         = u1 u2
Dynamic            = u1 u2 u2
InvokeDynamic      = u1 u2 u2
Module             = u1 u2
Package            = u1 u2

Suppose we've read the tag of an entry and its value corresponds to the Utf8 kind, the first line in our table. In order to get to the next entry of the constant pool, we need to read an u2 value, representing a length, and skip this number of bytes.

If our tag corresponds to the Integer kind, the second line in our table, then we need to skip an u4 value, or 4 bytes. It is the same if the kind is Float, the third line in our table. If the tag value is a Long, then we need to skip two u4 values, or 8 bytes. And so on.

08: The readConstantPoolEntry Method

We can finalize the implementation of our readConstantPoolEntry method:

private void readConstantPoolEntry() throws IOException {  byte tag;  tag = readU1();  switch (tag) {    case CONSTANT_Utf8 -> { p("Utf8"); int l = readU2(); idx += l; }    case CONSTANT_Integer -> { p("Integer"); idx += 4; }    case CONSTANT_Float -> { p("Float"); idx += 4; }    case CONSTANT_Long -> { p("Long"); idx += 8; entry++; }    case CONSTANT_Double -> { p("Double"); idx += 8; entry++; }    case CONSTANT_Class -> { p("Class"); idx += 2; }    case CONSTANT_String -> { p("String"); idx += 2; }    case CONSTANT_Fieldref -> { p("Fieldref"); idx += 4; }    case CONSTANT_Methodref -> { p("Methodref"); idx += 4; }    case CONSTANT_InterfaceMethodref -> { p("InterfaceMethodref"); idx += 4; }    case CONSTANT_NameAndType -> { p("NameAndType"); idx += 4; }    case CONSTANT_MethodHandle -> { p("MethodHandle"); idx += 3; }    case CONSTANT_MethodType -> { p("MethodType"); idx += 2; }    case CONSTANT_Dynamic -> { p("Dynamic"); idx += 4; }    case CONSTANT_InvokeDynamic -> { p("InvokeDynamic"); idx += 4; }    case CONSTANT_Module -> { p("Module"); idx += 2; }    case CONSTANT_Package -> { p("Package"); idx += 2; }    default -> throw new IOException("Unknown constant pool tag=" + tag);  }}private void p(String name) {  System.out.printf("%5s = %s%n", "#" + entry, name);}

At the start of each switch rule, we invoke the p method. It prints the current entry number and the name of the entry kind. Next, it skips the contents of each entry using the information from the table from the previous section.

It is worth mentioning that, on the switch rules for the Long and Double entry kinds, apart from skipping the required number of bytes, it also increments the entry number. The specification of both kinds determine that they take up two consecutive entries in the constant pool. The specification also says the following:

In retrospect, making 8-byte constants take two constant pool entries was a poor choice.

But, apart from this remark, I could not find an explanation on why this decision was initially made.

Running Our Scanner

We should be ready to test our scanner implementation. First, let's update the execute method:

private void execute(String[] args) throws IOException {  readClassFile(args);  verifyMagic();  printVersion();    traverseConstantPool();}

Finally, let's run our scanner on our HelloWorld class file. Here's the output:

ClassFile /tmp/HelloWorld.class
  minor version 65535
  major version 67
Constant pool:
   #1 = Methodref
   #2 = Class
   #3 = NameAndType
   #4 = Utf8
   #5 = Utf8
   #6 = Utf8
   #7 = String
   #8 = Utf8
   #9 = Methodref
  #10 = Class
  #11 = NameAndType
  #12 = Utf8
  #13 = Utf8
  #14 = Utf8
  #15 = Class
  #16 = Utf8
  #17 = Utf8
  #18 = Utf8
  #19 = Utf8
  #20 = Utf8
  #21 = Utf8

Apart from not printing the contents of each entry, it matches the output from the javap tool.

In the Next Blog Post in This Series

In this third blog post in this series, we learned that the constant pool is a sequence of varying-size elements. While the ClassFile structure provides the number of entries in a constant pool, we can only know the location of a particular entry by walking the constant pool up to that entry. So, in this blog post, we traversed the entire constant pool and we stored where each entry begins in the constantPoolIndex array.

In the next blog post in this series, we'll read the value of the string literals found in the constant pool. It's an additional step towards a Tailwind CSS inspired engine, which scans Java class files for string literals and processes their values for CSS utility class names.

You can find the source code used in this blog post in this Gist.