Java switch internals: the tableswitch and lookupswitch JVM instructions
I am currently writing a bare-bones AsciiDoc parser. It will be used to generate both this blog and the Objectos documentation.
AsciiDoc is a markup language for writing technical content. I chose it over Markdown because it provides more features suited for my use-cases.
In a initial iteration much of the parser code looked like the following:
private static final int DOCUMENT_HEADING = 1;
private static final int PREAMBLE = 2;
private static final int PREAMBLE_PARAGRAPH = 3;
private static final int SECTION = 4;
private static final int SECTION_HEADING = 5;
private static final int SECTION_PARAGRAPH = 6;
private int parseLineEndEof() {
return switch (state) {
case DOCUMENT_HEADING -> { ... }
case PREAMBLE -> { ... }
case PREAMBLE_PARAGRAPH -> { ... }
case SECTION -> { ... }
case SECTION_HEADING -> { ... }
case SECTION_PARAGRAPH -> { ... }
default -> { ... };
};
}
Recently the code was changed to the following:
private static final int DOCUMENT = 1 << 0;
private static final int HEADING = 1 << 1;
private static final int PREAMBLE = 1 << 2;
private static final int SECTION = 1 << 3;
private static final int PARAGRAPH = 1 << 4;
private int parseLineEndEof() {
return switch (state) {
case DOCUMENT | HEADING -> { ... }
case PREAMBLE -> { ... }
case PREAMBLE | PARAGRAPH -> { ... }
case SECTION -> { ... }
case SECTION | HEADING -> { ... }
case SECTION | PARAGRAPH -> { ... }
default -> { ... };
};
}
So the state
was changed from:
-
a fixed number of possible values; to
-
a combination of previously defined "sub-states".
This refactoring made the code easier to maintain. In particular as the number of "sub-states" and their combinations increases.
But this changes the way the Java compiler treats the switch
expression:
-
the former switch expression compiles to a
tableswitch
JVM instruction; while -
the latter switch expression compiles to a
lookupswitch
JVM instruction.
Can this change have runtime consequences?
In particular, does the use of lookupswitch
instead of a tableswitch
have an impact on performance?
But wait, what is a tableswitch
?
And what is a lookupswitch
?
In this blog post I will discuss these JVM instructions. Let's begin.
How does the compiler decides which instruction to use?
The first thing to know is that any Java switch expression or statement will be compiled into one of two distinct JVM instructions:
Before we go into the details of how each instruction work, let's first see how the compiler decides which instruction to use.
The following code generates a tableswitch
:
public String tableSwitch() {
return switch (value) {
case 1 -> "one";
case 2 -> "two";
case 4 -> "four";
default -> "other";
};
}
The following code, on the other hand, generates a lookupswitch
:
public String lookupSwitch() {
return switch (value) {
case 1 -> "one";
case 10 -> "ten";
case 100 -> "one hundred";
default -> "other";
};
}
Do you see the difference?
When the values on the case labels are close together the compiler generates a tableswitch
.
On the other hand, when the values are far apart from each other the compiler generates a lookupswitch
.
Of course, this is a simplification.
The actual logic is a bit more complicated.
If you're interested, you can find the actual javac
code here.
Next, let's take a closer look at how these instructions work.
The tableswitch
instruction
Let's take a closer look at the tableswitch
instruction.
Consider our running example:
public String tableSwitch(int value) {
return switch (value) {
case 5 -> five();
case 6 -> six();
case 8 -> eight();
default -> default_();
};
}
private String five() { return "five"; }
private String six() { return "six"; }
private String eight() { return "eight"; }
private String default_() { return "other"; }
The compiler would generate a tableswitch
instruction similar to the following:
tableswitch // instruction
default_() // default case: goto default_ method
5 // low value
8 // high value
five() // case 5: goto five() method
six() // case 6: goto six() method
default_() // case 7: goto default_ method
eight() // case 8: goto eight() method
The first value after the instruction itself is the location of the default
code.
In other words, if the value does not match any of the case
labels then execution would jump to the location given by this value.
Next are, in order, the low value and the high value.
These are respectively the minimum and maximum values of all of the case
labels.
The last values are the locations of the code to be run for each case
label.
They are listed in order from the low value to the high value.
Understanding the tableswitch
instruction using Java code
The tableswitch
is a JVM instruction.
Understanding computer architecture is beyond the scope of this post.
Since this is a blog written primarily for Java developers, I will try to explain the instruction using plain Java code.
We could execute the instruction defined in the previous section with the following Java code:
private int[] instructions;
private int ip;
private void tableswitch(int value) {
var def = instructions[ip++];
var low = instructions[ip++];
var high = instructions[ip++];
if (value >= low && value <= high) {
var offset = value - low;
ip = instructions[ip + offset];
} else {
ip = def;
}
}
There are two field declarations:
-
instructions
is anint
array simulating the CPU instructions in memory; and -
ip
is anint
field simulating a instruction pointer.
The method is called when a tableswitch
instruction is found.
The switch
value is supplied as the method's argument.
Three values are then decoded from the instructions. These are in order:
-
the
default
jump location; -
the
low
value, i.e., the lowest case label; and -
the
high
value, i.e., the highest case label.
If the specified value
is between the low
and the high
values, then an offset
is computed.
It indicates the offset from the current instruction from where to decode the jump value.
The instruction pointer is updated with said value and execution branches to the location indicated.
If value
is not between the low
and high
values then execution branches to the location indicated by the default
value.
Simulating our tableswitch
with a micro virtual machine
To see our tableswitch
instruction in action we can write a bare-bones micro virtual machine.
Here's a version:
public class VirtualMachine {
static final int TABLESWITCH = 0xaa;
private static final int PRINT_STRING = -1;
private static final int STOP = -2;
private static final int NOOP = -3;
private final int[] instructions = new int[256];
private final String[] strings = new String[16];
private int stringsIndex;
private int ip;
public VirtualMachine() {
Arrays.fill(instructions, NOOP);
}
public final void execute(int value) {
ip = 0;
while (true) {
var inst = instructions[ip++];
if (inst == NOOP) {
// noop
}
else if (inst == TABLESWITCH) {
tableswitch(value);
}
else if (inst == PRINT_STRING) {
var index = instructions[ip++];
var s = strings[index];
System.out.println(s);
}
else if (inst == STOP) {
return;
}
else {
throw new UnsupportedOperationException("instruction=" + inst);
}
}
}
// tableswitch impl...
// instruction setting methods
}
Apart from the tableswitch
our virtual machine defines three more instructions:
-
a no-operation instruction (NOOP)
-
a print to the standard output instruction (PRINT_STRING)
-
a stop execution (STOP)
They are added for the sole purpose of simplifying our implementation.
We can write a small program:
public class TableSwitch {
public static void main(String[] args) {
var vm = new VirtualMachine();
// add our tableswitch instruction
// values taken from `javap` output
vm.tableswitch(
53, // default
5, // low
8, // high
32, // case 5
39, // case 6
53, // case 7 == default
46 // case 8
);
// our 'methods'
// i.e. the jump locations for our 'cases'
vm.printString(32, "five");
vm.printString(39, "six");
vm.printString(46, "eight");
vm.printString(53, "other");
// executes the switch with
// the specificed values
vm.execute(5);
vm.execute(6);
vm.execute(7);
vm.execute(8);
vm.execute(9);
}
}
To make it more realistically the jump values were extracted from the javap
output of our running example:
Code:
stack=4, locals=2, args_size=2
0: iload_1
1: tableswitch { // 5 to 8
5: 32
6: 39
7: 53
8: 46
default: 53
}
32: aload_0
33: invokevirtual #16 // Method five:()Ljava/lang/String;
36: goto 57
39: aload_0
40: invokevirtual #20 // Method six:()Ljava/lang/String;
43: goto 57
46: aload_0
47: invokevirtual #23 // Method eight:()Ljava/lang/String;
50: goto 57
53: aload_0
54: invokevirtual #26 // Method default_:()Ljava/lang/String;
57: areturn
Running our TableSwitch
program results in the following output:
five
six
other
eight
other
Great.
Let's now investigate the lookupswitch
instruction.
The lookupswitch
instruction
We will take a closer look at the lookupswitch
instruction.
Consider the following running example:
public String lookupSwitch(int value) {
return switch (value) {
case 100 -> oneHundred();
case 10 -> ten();
case 1000 -> oneThousand();
default -> default_();
};
}
private String ten() { return "ten"; }
private String oneHundred() { return "one hundred"; }
private String oneThousand() { return "one thousand"; }
private String default_() { return "other"; }
The compiler would generate something similar to this:
lookupswitch // instruction
default_() // default case: goto default_ method
3 // number of cases
10 // first pair: value
ten() // first pair: jump
100 // second pair: value
oneHundred() // second pair: jump
1000 // third pair: value
oneThousand() // third pair: jump
Notice that the pairs are sorted.
Even though, in source code, the case
order is:
-
100
-
10
-
1000
The order in the compiled instruction is:
-
10
-
100
-
1000
So the compiler, before generating the instruction, sorts the values in ascending order.
Why is this necessary? We will see in the next section.
Understanding the lookupswitch
instruction using Java code
Let's translate the previous section to Java code. Before we do that, and in order to simplify our implementation, we will store the case pairs in slightly different manner. Instead of:
10 // first pair: value
ten() // first pair: jump
100 // second pair: value
oneHundred() // second pair: jump
1000 // third pair: value
oneThousand() // third pair: jump
We will store it like so:
10 // first pair: value
100 // second pair: value
1000 // third pair: value
ten() // first pair: jump
oneHundred() // second pair: jump
oneThousand() // third pair: jump
So the match values are grouped together at the beginning while the jump locations are grouped at the end.
With this in mind, let's see a possible implementation:
private void lookupswitch(int value) {
var def = instructions[ip++];
var count = instructions[ip++];
var res = Arrays.binarySearch(instructions, ip, ip + count, value);
if (res >= 0) {
var offset = res - ip;
var index = ip + count + offset;
ip = instructions[index];
} else {
ip = def;
}
}
So two values are decoded from the instructions:
-
the
default
jump location; and -
the total count of
case
labels.
Using a binary search it looks for the switch
value among the case
labels.
As the values were sorted by the compiler a binary search is possible.
If a match is found then execution branches to the corresponding jump location.
If a match is not found then execution branches to the jump location given by the default
value.
Simulating our lookupswitch
with our micro virtual machine
Let's simulate our lookupswitch
instruction like we did with the tableswitch
instruction.
We will use the same bare-bones VirtualMachine
.
It was modified to include the lookupswitch
method from the previous section.
The following program does just that:
public class LookupSwitch {
public static void main(String[] args) {
var vm = new VirtualMachine();
// add our lookupswitch instruction
// values taken from `javap` output
vm.lookupswitch(
57, // default
3, // count
10, 43, // case 10 value
100, 36, // case 100 value
1000, 50 // case 1000 value
);
// our 'methods'
// i.e. the jump locations for our 'cases'
vm.printString(36, "one hundred");
vm.printString(43, "ten");
vm.printString(50, "one thousand");
vm.printString(57, "other");
// executes the switch with
// the specified values
vm.execute(10);
vm.execute(50);
vm.execute(100);
vm.execute(1000);
vm.execute(5000);
}
}
Once again, we are using the addresses from the javap
output:
Code:
stack=4, locals=2, args_size=2
0: iload_1
1: lookupswitch { // 3
10: 43
100: 36
1000: 50
default: 57
}
36: aload_0
37: invokevirtual #16 // Method oneHundred:()Ljava/lang/String;
40: goto 61
43: aload_0
44: invokevirtual #20 // Method ten:()Ljava/lang/String;
47: goto 61
50: aload_0
51: invokevirtual #23 // Method oneThousand:()Ljava/lang/String;
54: goto 61
57: aload_0
58: invokevirtual #26 // Method default_:()Ljava/lang/String;
61: areturn
Running our LookupSwitch
program:
ten
other
one hundred
one thousand
other
Nice.
Our lookupswitch
implementation seems correct.
Benchmarking the instructions with JMH
Are there any performance differences between the two instructions? From the previous sections it would seem so:
-
lookupswitch
requires a binary search; while -
tableswitch
does a simpler array component access.
But, as Java applications typically run on a JVM having a JIT compiler, we need to investigate it further. We will use the JMH tool for the investigation.
Before we go any further, I must say that I have very little knowledge:
-
in benchmarking in general; and
-
in writing and interpreting JMH benchmarks in particular.
With that said, let's begin.
We will use the following class:
@State(Scope.Thread)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@Warmup(time = 2, timeUnit = TimeUnit.SECONDS)
@Measurement(time = 2, timeUnit = TimeUnit.SECONDS)
public class Benchmarks {
int[] lookup;
int[] table;
int x = 123;
@Setup
public void _setup() {
lookup = new int[] {
257, 68, 13, 16, 529, 528,
144, 146, 32, 96, 160, 162
};
table = new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12};
}
@Benchmark
public int lookupSwitch() {
var result = 0;
for (int i = 0; i < lookup.length; i++) {
var state = lookup[i];
result += switch (state) {
case 257 -> compute(1001);
case 68 -> compute(1002);
case 13 -> compute(1003);
case 16 -> compute(1004);
case 529 -> x;
case 528 -> compute(1006);
case 144 -> compute(1007);
case 146 -> compute(1008);
case 32 -> 1234;
case 96 -> compute(1010);
case 160 -> state;
case 162 -> state;
default -> 0;
};
}
return result;
}
@Benchmark
public int tableSwitch() {
var result = 0;
for (int i = 0; i < table.length; i++) {
var state = table[i];
result += switch (state) {
case 1 -> compute(1001);
case 2 -> compute(1002);
case 3 -> compute(1003);
case 4 -> compute(1004);
case 5 -> x;
case 6 -> compute(1006);
case 7 -> compute(1007);
case 8 -> compute(1008);
case 9 -> 1234;
case 10 -> compute(1010);
case 11 -> state;
case 12 -> state;
default -> 0;
};
}
return result;
}
private int compute(int value) {
return value * x;
}
}
There are two benchmark methods.
Each has a switch expression with twelve case
labels.
The methods do the same work.
The only difference is that each method is compiled to a different instruction:
javap -v target/classes/post/Benchmarks.class | grep switch
Classfile target/classes/post/Benchmarks.class
16: lookupswitch { // 12
16: tableswitch { // 1 to 12
Running this benchmark generates the following results:
Benchmark Mode Cnt Score Error Units
Benchmarks.lookupSwitch avgt 25 43.368 ± 0.843 ns/op
Benchmarks.tableSwitch avgt 25 41.615 ± 1.505 ns/op
There are no significant differences. From this result it is possible to say that the refactoring discussed in the introduction has no impact on the performance.
Conclusion
In this blog post we have discussed two JVM instructions:
-
tableswitch
-
lookupswitch
Every Java switch
expression or statement is compiled to one of the two instructions.
In a simple interpreter implementation it would seem that tableswitch
would execute faster than a lookupswitch
.
However, as shown by a JMH benchmark, they both seem to have equivalent performance.
All of the examples in this post can be found at this GitHub repository.