Java switch internals: the tableswitch and lookupswitch JVM instructions

Marcio EndoAug 1, 2022

I am currently writing a bare-bones AsciiDoc parser. It will be used to generate both this blog and the Objectos documentation.

AsciiDoc is a markup language for writing technical content. I chose it over Markdown because it provides more features suited for my use-cases.

In a initial iteration much of the parser code looked like the following:

private static final int DOCUMENT_HEADING = 1;private static final int PREAMBLE = 2;private static final int PREAMBLE_PARAGRAPH = 3;private static final int SECTION = 4;private static final int SECTION_HEADING = 5;private static final int SECTION_PARAGRAPH = 6;private int parseLineEndEof() {  return switch (state) {    case DOCUMENT_HEADING -> { ... }    case PREAMBLE -> { ... }    case PREAMBLE_PARAGRAPH -> { ... }    case SECTION -> { ... }    case SECTION_HEADING -> { ... }    case SECTION_PARAGRAPH -> { ... }    default -> { ... };  };}

Recently the code was changed to the following:

private static final int DOCUMENT = 1 << 0;private static final int HEADING = 1 << 1;private static final int PREAMBLE = 1 << 2;private static final int SECTION = 1 << 3;private static final int PARAGRAPH = 1 << 4;private int parseLineEndEof() {  return switch (state) {    case DOCUMENT | HEADING -> { ... }    case PREAMBLE -> { ... }    case PREAMBLE | PARAGRAPH -> { ... }    case SECTION -> { ... }    case SECTION | HEADING -> { ... }    case SECTION | PARAGRAPH -> { ... }    default -> { ... };  };}

So the state was changed from:

a fixed number of possible values; to
a combination of previously defined "sub-states".

This refactoring made the code easier to maintain. In particular as the number of "sub-states" and their combinations increases.

But this changes the way the Java compiler treats the switch expression:

the former switch expression compiles to a tableswitch JVM instruction; while
the latter switch expression compiles to a lookupswitch JVM instruction.

Can this change have runtime consequences? In particular, does the use of lookupswitch instead of a tableswitch have an impact on performance?

But wait, what is a tableswitch? And what is a lookupswitch?

In this blog post I will discuss these JVM instructions. Let's begin.

How does the compiler decides which instruction to use?

The first thing to know is that any Java switch expression or statement will be compiled into one of two distinct JVM instructions:

Before we go into the details of how each instruction work, let's first see how the compiler decides which instruction to use.

The following code generates a tableswitch:

public String tableSwitch() {  return switch (value) {    case 1 -> "one";    case 2 -> "two";    case 4 -> "four";    default -> "other";  };}

The following code, on the other hand, generates a lookupswitch:

public String lookupSwitch() {  return switch (value) {    case 1 -> "one";    case 10 -> "ten";    case 100 -> "one hundred";    default -> "other";  };}

Do you see the difference?

When the values on the case labels are close together the compiler generates a tableswitch.

On the other hand, when the values are far apart from each other the compiler generates a lookupswitch.

Of course, this is a simplification. The actual logic is a bit more complicated. If you're interested, you can find the actual javac code here.

Next, let's take a closer look at how these instructions work.

The tableswitch instruction

Let's take a closer look at the tableswitch instruction.

Consider our running example:

public String tableSwitch(int value) {  return switch (value) {    case 5 -> five();    case 6 -> six();    case 8 -> eight();    default -> default_();  };}private String five() { return "five"; }private String six() { return "six"; }private String eight() { return "eight"; }private String default_() { return "other"; }

The compiler would generate a tableswitch instruction similar to the following:

tableswitch // instruction
default_()  // default case: goto default_ method
5           // low value
8           // high value
five()      // case 5: goto five() method
six()       // case 6: goto six() method
default_()  // case 7: goto default_ method
eight()     // case 8: goto eight() method

The first value after the instruction itself is the location of the default code. In other words, if the value does not match any of the case labels then execution would jump to the location given by this value.

Next are, in order, the low value and the high value. These are respectively the minimum and maximum values of all of the case labels.

The last values are the locations of the code to be run for each case label. They are listed in order from the low value to the high value.

Understanding the tableswitch instruction using Java code

The tableswitch is a JVM instruction. Understanding computer architecture is beyond the scope of this post. Since this is a blog written primarily for Java developers, I will try to explain the instruction using plain Java code.

We could execute the instruction defined in the previous section with the following Java code:

private int[] instructions;private int ip;private void tableswitch(int value) {  var def = instructions[ip++];  var low = instructions[ip++];  var high = instructions[ip++];  if (value >= low && value <= high) {    var offset = value - low;    ip = instructions[ip + offset];  } else {    ip = def;  }}

There are two field declarations:

instructions is an int array simulating the CPU instructions in memory; and
ip is an int field simulating a instruction pointer.

The method is called when a tableswitch instruction is found. The switch value is supplied as the method's argument.

Three values are then decoded from the instructions. These are in order:

the default jump location;
the low value, i.e., the lowest case label; and
the high value, i.e., the highest case label.

If the specified value is between the low and the high values, then an offset is computed. It indicates the offset from the current instruction from where to decode the jump value. The instruction pointer is updated with said value and execution branches to the location indicated.

If value is not between the low and high values then execution branches to the location indicated by the default value.

Simulating our tableswitch with a micro virtual machine

To see our tableswitch instruction in action we can write a bare-bones micro virtual machine.

Here's a version:

public class VirtualMachine {  static final int TABLESWITCH = 0xaa;  private static final int PRINT_STRING = -1;  private static final int STOP = -2;  private static final int NOOP = -3;  private final int[] instructions = new int[256];  private final String[] strings = new String[16];  private int stringsIndex;  private int ip;  public VirtualMachine() {    Arrays.fill(instructions, NOOP);  }  public final void execute(int value) {    ip = 0;    while (true) {      var inst = instructions[ip++];      if (inst == NOOP) {        // noop      }      else if (inst == TABLESWITCH) {        tableswitch(value);      }      else if (inst == PRINT_STRING) {        var index = instructions[ip++];        var s = strings[index];        System.out.println(s);      }      else if (inst == STOP) {        return;      }      else {        throw new UnsupportedOperationException("instruction=" + inst);      }    }  }  // tableswitch impl...  // instruction setting methods}

Apart from the tableswitch our virtual machine defines three more instructions:

a no-operation instruction (NOOP)
a print to the standard output instruction (PRINT_STRING)
a stop execution (STOP)

They are added for the sole purpose of simplifying our implementation.

We can write a small program:

public class TableSwitch {  public static void main(String[] args) {    var vm = new VirtualMachine();    // add our tableswitch instruction    // values taken from `javap` output    vm.tableswitch(      53, // default      5, // low      8, // high      32, // case 5      39, // case 6      53, // case 7 == default      46 // case 8    );    // our 'methods'    // i.e. the jump locations for our 'cases'    vm.printString(32, "five");    vm.printString(39, "six");    vm.printString(46, "eight");    vm.printString(53, "other");    // executes the switch with    // the specificed values    vm.execute(5);    vm.execute(6);    vm.execute(7);    vm.execute(8);    vm.execute(9);  }}

To make it more realistically the jump values were extracted from the javap output of our running example:

Code:
  stack=4, locals=2, args_size=2
     0: iload_1
     1: tableswitch   { // 5 to 8
                   5: 32
                   6: 39
                   7: 53
                   8: 46
             default: 53
        }
    32: aload_0
    33: invokevirtual #16                 // Method five:()Ljava/lang/String;
    36: goto          57
    39: aload_0
    40: invokevirtual #20                 // Method six:()Ljava/lang/String;
    43: goto          57
    46: aload_0
    47: invokevirtual #23                 // Method eight:()Ljava/lang/String;
    50: goto          57
    53: aload_0
    54: invokevirtual #26                 // Method default_:()Ljava/lang/String;
    57: areturn

Running our TableSwitch program results in the following output:

five
six
other
eight
other

Great. Let's now investigate the lookupswitch instruction.

The lookupswitch instruction

We will take a closer look at the lookupswitch instruction.

Consider the following running example:

public String lookupSwitch(int value) {  return switch (value) {    case 100 -> oneHundred();    case 10 -> ten();    case 1000 -> oneThousand();    default -> default_();  };}private String ten() { return "ten"; }private String oneHundred() { return "one hundred"; }private String oneThousand() { return "one thousand"; }private String default_() { return "other"; }

The compiler would generate something similar to this:

lookupswitch  // instruction
default_()    // default case: goto default_ method
3             // number of cases
10            // first pair: value
ten()         // first pair: jump
100           // second pair: value
oneHundred()  // second pair: jump
1000          // third pair: value
oneThousand() // third pair: jump

Notice that the pairs are sorted. Even though, in source code, the case order is:

100
10
1000

The order in the compiled instruction is:

10
100
1000

So the compiler, before generating the instruction, sorts the values in ascending order.

Why is this necessary? We will see in the next section.

Understanding the lookupswitch instruction using Java code

Let's translate the previous section to Java code. Before we do that, and in order to simplify our implementation, we will store the case pairs in slightly different manner. Instead of:

10            // first pair: value
ten()         // first pair: jump
100           // second pair: value
oneHundred()  // second pair: jump
1000          // third pair: value
oneThousand() // third pair: jump

We will store it like so:

10            // first pair: value
100           // second pair: value
1000          // third pair: value
ten()         // first pair: jump
oneHundred()  // second pair: jump
oneThousand() // third pair: jump

So the match values are grouped together at the beginning while the jump locations are grouped at the end.

With this in mind, let's see a possible implementation:

private void lookupswitch(int value) {  var def = instructions[ip++];  var count = instructions[ip++];  var res = Arrays.binarySearch(instructions, ip, ip + count, value);  if (res >= 0) {    var offset = res - ip;    var index = ip + count + offset;    ip = instructions[index];  } else {    ip = def;  }}

So two values are decoded from the instructions:

the default jump location; and
the total count of case labels.

Using a binary search it looks for the switch value among the case labels. As the values were sorted by the compiler a binary search is possible.

If a match is found then execution branches to the corresponding jump location.

If a match is not found then execution branches to the jump location given by the default value.

Simulating our lookupswitch with our micro virtual machine

Let's simulate our lookupswitch instruction like we did with the tableswitch instruction. We will use the same bare-bones VirtualMachine. It was modified to include the lookupswitch method from the previous section.

The following program does just that:

public class LookupSwitch {  public static void main(String[] args) {    var vm = new VirtualMachine();    // add our lookupswitch instruction    // values taken from `javap` output    vm.lookupswitch(      57, // default      3, // count      10, 43, // case 10 value      100, 36, // case 100 value      1000, 50 // case 1000 value    );    // our 'methods'    // i.e. the jump locations for our 'cases'    vm.printString(36, "one hundred");    vm.printString(43, "ten");    vm.printString(50, "one thousand");    vm.printString(57, "other");    // executes the switch with    // the specified values    vm.execute(10);    vm.execute(50);    vm.execute(100);    vm.execute(1000);    vm.execute(5000);  }}

Once again, we are using the addresses from the javap output:

Code:
  stack=4, locals=2, args_size=2
     0: iload_1
     1: lookupswitch  { // 3
                  10: 43
                 100: 36
                1000: 50
             default: 57
        }
    36: aload_0
    37: invokevirtual #16                 // Method oneHundred:()Ljava/lang/String;
    40: goto          61
    43: aload_0
    44: invokevirtual #20                 // Method ten:()Ljava/lang/String;
    47: goto          61
    50: aload_0
    51: invokevirtual #23                 // Method oneThousand:()Ljava/lang/String;
    54: goto          61
    57: aload_0
    58: invokevirtual #26                 // Method default_:()Ljava/lang/String;
    61: areturn

Running our LookupSwitch program:

ten
other
one hundred
one thousand
other

Nice. Our lookupswitch implementation seems correct.

Benchmarking the instructions with JMH

Are there any performance differences between the two instructions? From the previous sections it would seem so:

lookupswitch requires a binary search; while
tableswitch does a simpler array component access.

But, as Java applications typically run on a JVM having a JIT compiler, we need to investigate it further. We will use the JMH tool for the investigation.

Before we go any further, I must say that I have very little knowledge:

in benchmarking in general; and
in writing and interpreting JMH benchmarks in particular.

With that said, let's begin.

We will use the following class:

@State(Scope.Thread)@BenchmarkMode(Mode.AverageTime)@OutputTimeUnit(TimeUnit.NANOSECONDS)@Warmup(time = 2, timeUnit = TimeUnit.SECONDS)@Measurement(time = 2, timeUnit = TimeUnit.SECONDS)public class Benchmarks {  int[] lookup;  int[] table;  int x = 123;  @Setup  public void _setup() {    lookup = new int[] {        257, 68, 13, 16, 529, 528,        144, 146, 32, 96, 160, 162    };    table = new int[] {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12};  }  @Benchmark  public int lookupSwitch() {    var result = 0;    for (int i = 0; i < lookup.length; i++) {      var state = lookup[i];      result += switch (state) {        case 257 -> compute(1001);        case 68 -> compute(1002);        case 13 -> compute(1003);        case 16 -> compute(1004);        case 529 -> x;        case 528 -> compute(1006);        case 144 -> compute(1007);        case 146 -> compute(1008);        case 32 -> 1234;        case 96 -> compute(1010);        case 160 -> state;        case 162 -> state;        default -> 0;      };    }    return result;  }  @Benchmark  public int tableSwitch() {    var result = 0;    for (int i = 0; i < table.length; i++) {      var state = table[i];      result += switch (state) {        case 1 -> compute(1001);        case 2 -> compute(1002);        case 3 -> compute(1003);        case 4 -> compute(1004);        case 5 -> x;        case 6 -> compute(1006);        case 7 -> compute(1007);        case 8 -> compute(1008);        case 9 -> 1234;        case 10 -> compute(1010);        case 11 -> state;        case 12 -> state;        default -> 0;      };    }    return result;  }  private int compute(int value) {    return value * x;  }}

There are two benchmark methods. Each has a switch expression with twelve case labels. The methods do the same work. The only difference is that each method is compiled to a different instruction:

javap -v target/classes/post/Benchmarks.class | grep switch
Classfile target/classes/post/Benchmarks.class
        16: lookupswitch  { // 12
        16: tableswitch   { // 1 to 12

Running this benchmark generates the following results:

Benchmark                Mode  Cnt   Score   Error  Units
Benchmarks.lookupSwitch  avgt   25  43.368 ± 0.843  ns/op
Benchmarks.tableSwitch   avgt   25  41.615 ± 1.505  ns/op

There are no significant differences. From this result it is possible to say that the refactoring discussed in the introduction has no impact on the performance.

Conclusion

In this blog post we have discussed two JVM instructions:

tableswitch
lookupswitch

Every Java switch expression or statement is compiled to one of the two instructions.

In a simple interpreter implementation it would seem that tableswitch would execute faster than a lookupswitch. However, as shown by a JMH benchmark, they both seem to have equivalent performance.

All of the examples in this post can be found at this GitHub repository.