Scanning Java Class Files #3: The Constant Pool

In the previous post in this series,
we learned that a Java class file consists of a single ClassFile
structure,
whose first three items are magic
, minor_version
and major_version
.
We started building our Java class file scanner, and, by the end of that post,
our scanner could successfully read these initial elements.
These three items immediately preceed the constant pool. So, in the third blog post in this series, we'll improve our implementation and have it traverse the constant pool of a Java class file.
A Note on the Class-File API
We will not use the Class-File API in this blog post series, as it abstracts away the low-level details of the Java class file format we want to learn about.
Our Class File
Here's the source code of the class whose constant pool we'll traverse:
void main() { IO.println("Hello, World!");}
It is the same class from the previous post. Let's compile it using JDK 23 with preview language features enabled:
$ javac --enable-preview --release 23 HelloWorld.java
To verify the correctness of our implementation,
we'll compare our output to the following simplified output from the javap
tool:
Constant pool:
#1 = Methodref #2.#3
#2 = Class #4
#3 = NameAndType #5:#6
#4 = Utf8 java/lang/Object
#5 = Utf8 <init>
#6 = Utf8 ()V
#7 = String #8
#8 = Utf8 Hello, World!
#9 = Methodref #10.#11
#10 = Class #12
#11 = NameAndType #13:#14
#12 = Utf8 java/io/IO
#13 = Utf8 println
#14 = Utf8 (Ljava/lang/Object;)V
#15 = Class #16
#16 = Utf8 HelloWorld
#17 = Utf8 Code
#18 = Utf8 LineNumberTable
#19 = Utf8 main
#20 = Utf8 SourceFile
#21 = Utf8 HelloWorld.java
Let's continue our implementation.
The Constant Pool
Our implementation currently reads the first three items of the ClassFile
structure.
The next two items make up the constant pool of the Java class file, as shown below:
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
...
}
The fourth item is the constant_pool_count
.
It is an u2
value
indicating the number of entries in the constant pool plus one.
For example, if its value is four, then the constant pool contains three entries, labeled #1
, #2
, and #3
.
In other words, the constant pool uses 1-based indexing, starting at #1
rather than #0
.
The fifth item is the constant_pool
table.
It is a stream of constant_pool_count - 1
elements of type cp_info
, which vary in size.
As a result, we can only know where the constant pool ends by traversing it.
Entries of Varying-Size
We can see the varying-size aspect of the entries in the constant pool in the javap
output.
For example, let's look entries #7
and #8
:
Constant pool:
...
#7 = String #8
#8 = Utf8 Hello, World!
...
We see that entry #7
is of type String
and, from the output, we can assume that its size is equal to "one constant pool index".
The type of constant_pool_count
is u2
, so it is safe to assume that "one constant pool index" is a 2-byte quantity.
We see that entry #8
is of type Utf8
and its size seems to be equal to the length of the Hello, World!
string,
which is greater than two bytes.
So, even though we know the number of entries in the constant pool table, we don't know what its total length is. In order to find the end of the constant pool we have to traverse it.
Additionally, a constant pool index does not translate automatically to an offset inside the constant pool table. So, while traversing the constant pool, we need to store the index where we found the first byte of each entry.
05: The Constant Pool Loop
With the information we have right now, we can start writing the constant pool traversing code:
private int[] constantPoolIndex;private int entry = 1;private void traverseConstantPool() throws IOException { System.out.println("Constant pool:"); final int constantPoolCount; constantPoolCount = readU2(); constantPoolIndex = new int[constantPoolCount]; while (entry < constantPoolCount) { constantPoolIndex[entry] = idx; readConstantPoolEntry(); entry++; }}private void readConstantPoolEntry() { ... }
And here's a breakdown of the code above:
-
We begin the method by printing the "Constant pool:" message that mimics the output from the
javap
tool. -
Next, using the
readU2
method created in the previous post, we read the fourth item of theClassFile
structure, namely theconstant_pool_count
value, and store its value in theconstantPoolCount
local variable. -
We initialize the
constantPoolIndex
array with a size equal toconstantPoolCount
obtained in the previous step. This array stores the position in theClassFile
structure where we'll find the first byte of a given constant pool entry. -
Next, we have the constant pool loop itself. The
entry
instance variable represents the number of the constant pool entry being currently read. As mentioned earlier, the constant pool uses 1-based indexing, so theentry
variable starts with the number1
. We keep iterating while the entry number is smaller than theconstantPoolCount
value. -
At the beginning of each iteration, we store at the
constantPoolIndex
array the value of theidx
instance variable for the current entry. In other words, we are storing theidx
value to the first byte of the current constant pool entry. -
Next, we invoke the
readConstantPoolEntry
method, currently a stub with no real implementation. It will be responsible for reading each constant pool entry. -
Finally, at the end of each iteration, we increment the
entry
variable value.
Next, in order to implement the readConstantPoolEntry
method, we'll examine the structure of the entries of the constant pool.
The Constant Pool Entry Structure
We find the definition of the cp_info
structure in Section 4.4 of the Java Virtual Machine Specification (JVMS).
It defines that all constant pool entries have the following format:
cp_info {
u1 tag;
u1 info[];
}
So, each entry in the constant pool:
-
Begins with a 1-byte value named
tag
, which indicates the kind of the constant of the entry. -
Ends with two or more bytes, providing additional information specific to the kind of the constant.
As of Java 23, there are 17 kinds of constant pool entries, each with a distinct tag
value.
06: The Constant Pool Tags
We find a listing of the constant kinds and their tag values in Table 4.4-B of the JVMS. Translating it to Java code gives us the following:
private static final byte CONSTANT_Utf8 = 1;private static final byte CONSTANT_Integer = 3;private static final byte CONSTANT_Float = 4;private static final byte CONSTANT_Long = 5;private static final byte CONSTANT_Double = 6;private static final byte CONSTANT_Class = 7;private static final byte CONSTANT_String = 8;private static final byte CONSTANT_Fieldref = 9;private static final byte CONSTANT_Methodref = 10;private static final byte CONSTANT_InterfaceMethodref = 11;private static final byte CONSTANT_NameAndType = 12;private static final byte CONSTANT_MethodHandle = 15;private static final byte CONSTANT_MethodType = 16;private static final byte CONSTANT_Dynamic = 17;private static final byte CONSTANT_InvokeDynamic = 18;private static final byte CONSTANT_Module = 19;private static final byte CONSTANT_Package = 20;
So, an Utf8
constant pool entry has a tag
value of 1
,
an Integer
constant pool entry has a tag
value of 3
,
a Float
constant pool entry has a tag
value of 4
, and so on.
07: Partial readConstantPoolEntry Implementation
With all of the constant pool tags listed, we write the follow readConstantPoolEntry
partial implementation:
private void readConstantPoolEntry() throws IOException { byte tag; tag = readU1(); switch (tag) { case CONSTANT_Utf8 -> { ... } case CONSTANT_Integer -> { ... } case CONSTANT_Float -> { ... } case CONSTANT_Long -> { ... } case CONSTANT_Double -> { ... } case CONSTANT_Class -> { ... } case CONSTANT_String -> { ... } case CONSTANT_Fieldref -> { ... } case CONSTANT_Methodref -> { ... } case CONSTANT_InterfaceMethodref -> { ... } case CONSTANT_NameAndType -> { ... } case CONSTANT_MethodHandle -> { ... } case CONSTANT_MethodType -> { ... } case CONSTANT_Dynamic -> { ... } case CONSTANT_InvokeDynamic -> { ... } case CONSTANT_Module -> { ... } case CONSTANT_Package -> { ... } default -> throw new IOException("Unknown constant pool tag=" + tag); }}private byte readU1() throws IOException { check(1); return data[idx++];}
We begin the method by reading the tag
value of the current constant pool entry.
Then, we switch over the tag
value, with each case being a constant we defined in the previous section.
In the default
case, when the tag
value does not correspond to any of the specified values,
we complete the switch
statement by throwing an IOException
.
We didn't implement the body of the switch rules, but we know that each rule should:
-
Print the entry number and its type.
-
Print the contents of each entry, if possible.
-
At the end of each rule, the
idx
should point to the first byte of the next constant pool entry, or the first byte of the structure that comes after the constant pool.
What's in a Constant Pool Entry?
At time of writing, there are 17 kinds of constant pool entries. Discussing all of them is beyond the scope of this blog post. Right now, we're interested in traversing the constant pool. In order to that, we need to know the size of each entry kind.
As an example, let's look at the String
constant pool entry.
It's structure is defined in Section 4.4.3 of the JVMS:
CONSTANT_String_info {
u1 tag;
u2 string_index;
}
So, after its tag
value, we have the string_index
value whose type is u2
.
The latter represents the index to another entry of type Utf8
in the same constant pool,
which should contain the actual string literal contents.
So, if we know we are reading a String
entry,
meaning we've already read its tag
value,
we only need to read an u2
value in order to get to the next constant pool entry.
Reading about the other entry kinds, we can assemble the following table:
Utf8 = u1 len=u2 u1[len]
Integer = u1 u4
Float = u1 u4
Long = u1 u4 u4
Double = u1 u4 u4
Class = u1 u2
String = u1 u2
Fieldref = u1 u2 u2
Methoref = u1 u2 u2
InterfaceMethodref = u1 u2 u2
NameAndType = u1 u2 u2
MethodHandle = u1 u1 u2
MethodType = u1 u2
Dynamic = u1 u2 u2
InvokeDynamic = u1 u2 u2
Module = u1 u2
Package = u1 u2
Suppose we've read the tag
of an entry and its value corresponds to the Utf8
kind, the first line in our table.
In order to get to the next entry of the constant pool, we need to read an u2
value, representing a length,
and skip this number of bytes.
If our tag
corresponds to the Integer
kind, the second line in our table, then we need to skip an u4
value, or 4 bytes.
It is the same if the kind is Float
, the third line in our table.
If the tag
value is a Long
, then we need to skip two u4
values, or 8 bytes.
And so on.
08: The readConstantPoolEntry Method
We can finalize the implementation of our readConstantPoolEntry
method:
private void readConstantPoolEntry() throws IOException { byte tag; tag = readU1(); switch (tag) { case CONSTANT_Utf8 -> { p("Utf8"); int l = readU2(); idx += l; } case CONSTANT_Integer -> { p("Integer"); idx += 4; } case CONSTANT_Float -> { p("Float"); idx += 4; } case CONSTANT_Long -> { p("Long"); idx += 8; entry++; } case CONSTANT_Double -> { p("Double"); idx += 8; entry++; } case CONSTANT_Class -> { p("Class"); idx += 2; } case CONSTANT_String -> { p("String"); idx += 2; } case CONSTANT_Fieldref -> { p("Fieldref"); idx += 4; } case CONSTANT_Methodref -> { p("Methodref"); idx += 4; } case CONSTANT_InterfaceMethodref -> { p("InterfaceMethodref"); idx += 4; } case CONSTANT_NameAndType -> { p("NameAndType"); idx += 4; } case CONSTANT_MethodHandle -> { p("MethodHandle"); idx += 3; } case CONSTANT_MethodType -> { p("MethodType"); idx += 2; } case CONSTANT_Dynamic -> { p("Dynamic"); idx += 4; } case CONSTANT_InvokeDynamic -> { p("InvokeDynamic"); idx += 4; } case CONSTANT_Module -> { p("Module"); idx += 2; } case CONSTANT_Package -> { p("Package"); idx += 2; } default -> throw new IOException("Unknown constant pool tag=" + tag); }}private void p(String name) { System.out.printf("%5s = %s%n", "#" + entry, name);}
At the start of each switch rule, we invoke the p
method.
It prints the current entry number and the name of the entry kind.
Next, it skips the contents of each entry using the information from the table from the previous section.
It is worth mentioning that, on the switch rules for the Long
and Double
entry kinds,
apart from skipping the required number of bytes, it also increments the entry
number.
The specification of both kinds determine that they take up two consecutive entries in the constant pool.
The specification also says the following:
In retrospect, making 8-byte constants take two constant pool entries was a poor choice.
But, apart from this remark, I could not find an explanation on why this decision was initially made.
Running Our Scanner
We should be ready to test our scanner implementation.
First, let's update the execute
method:
private void execute(String[] args) throws IOException { readClassFile(args); verifyMagic(); printVersion(); traverseConstantPool();}
Finally, let's run our scanner on our HelloWorld
class file.
Here's the output:
ClassFile /tmp/HelloWorld.class
minor version 65535
major version 67
Constant pool:
#1 = Methodref
#2 = Class
#3 = NameAndType
#4 = Utf8
#5 = Utf8
#6 = Utf8
#7 = String
#8 = Utf8
#9 = Methodref
#10 = Class
#11 = NameAndType
#12 = Utf8
#13 = Utf8
#14 = Utf8
#15 = Class
#16 = Utf8
#17 = Utf8
#18 = Utf8
#19 = Utf8
#20 = Utf8
#21 = Utf8
Apart from not printing the contents of each entry, it matches the output from the javap
tool.
In the Next Blog Post in This Series
In this third blog post in this series, we learned that the constant pool is a sequence of varying-size elements.
While the ClassFile
structure provides the number of entries in a constant pool,
we can only know the location of a particular entry by walking the constant pool up to that entry.
So, in this blog post, we traversed the entire constant pool and we stored where each entry begins in the constantPoolIndex
array.
In the next blog post in this series, we'll read the value of the string literals found in the constant pool. It's an additional step towards a Tailwind CSS inspired engine, which scans Java class files for string literals and processes their values for CSS utility class names.
You can find the source code used in this blog post in this Gist.