HN Debrief

Computed goto for efficient dispatch tables (2012)

  • Programming
  • Compilers
  • Infrastructure
  • Performance

The article is a 2012 walkthrough of "computed goto" in GNU C, a nonstandard extension that lets an interpreter jump directly to opcode handlers through a table of label addresses instead of re-entering a switch on every bytecode. It shows the classic result: in tight virtual machine dispatch loops, this often beats a switch, partly because the compiler emits less bookkeeping and partly because the CPU can predict the control flow better.

If you maintain an interpreter, state machine, or hot dispatch loop, measure dispatch shape on modern CPUs instead of assuming the compiler or a switch will do the right thing. The win here depends on branch prediction behavior, opcode layout, compiler output, and how much complexity you are willing to trade for a few percent to double-digit speedup.

Discussion mood

Interested and mostly positive. People still buy the technique for interpreter hot loops, but the comments are less impressed by the old "switch is slower" framing and more focused on branch predictors, undefined behavior, and how much modern compilers and CPUs already narrow the gap.

Key insights

  1. 01

    Indirect predictor behavior drives the win

    The performance gain comes from how CPUs index indirect branch predictions, not from some mystical property of goto. A switch loop usually gives the predictor one branch address that must map to many opcode targets, which quickly trashes its history. Computed goto creates separate branch sites for each handler, so the processor can learn local opcode transitions instead of one blended mess. One comment adds that branch target buffers help even direct and conditional branches because they let fetch move to the likely target before decode finishes.

    Look at generated assembly and count the hot indirect branch sites in your dispatcher. If all traffic goes through one site, branch prediction is probably your bottleneck even when the source code looks clean.

      Attribution:
    • adrian_b #1
    • MobiusHorizons #1
    • monocasa #1
  2. 02

    The bounds check is a type guarantee issue

    The extra check in the switch version is not some unavoidable tax on structured control flow. It exists because the opcode is just an integer, so out-of-range values are legal inputs the compiler must handle. If you mark the default path as __builtin_unreachable, or otherwise promise the values are closed, the compiler can remove that guard and you have traded safety for speed just like computed goto. The stronger claim is that a real enum type would let that proof live outside the dispatch loop instead of being reconstructed by optimizer tricks.

    Separate two questions in your own code: whether dispatch is unpredictable, and whether you are paying validation costs in the hot path. If invalid opcodes are impossible after decoding, prove that once and keep the proof out of the inner loop.

      Attribution:
    • variadix #1
    • fweimer #1
    • kibwen #1
    • flohofwoe #1
  3. 03

    Compiler quality can erase part of the advantage

    Computed goto is partly compensating for code generation that a compiler could in principle produce from a switch inside a loop. Comments point out that some compilers already replicate jump structure in favorable cases, and that Rust or Clang may do related transforms. Another note says the trick matters less on modern CPUs with stronger indirect predictors and global history. That does not kill the optimization, but it turns it from a timeless rule into a target-specific measurement problem.

    Do not cargo-cult the 2012 benchmark. Recheck on your current compiler, target CPU, and workload before accepting the complexity of GNU C extensions or interpreter-specific rewrites.

      Attribution:
    • adrian_b #1
    • ack_complete #1
    • dhosek #1
  4. 04

    People push the trick beyond simple opcode tables

    One line of experimentation goes further than the article by storing computed-goto addresses directly in runtime structures, skipping even the enum-to-table index in dispatch. That can save a few more percent and preserve predictor and prefetch resources, but it requires ugly setup tricks, weak portability, and compiler-specific features. Another comment suggests a static global array hack to expose labels more directly. The shared conclusion is that the last bits of speed are available, but the engineering cost climbs fast.

    If you are already using direct-threaded dispatch, treat any attempt to embed label addresses in data structures as a late-stage optimization. It only makes sense after profiling shows dispatch itself still dominates runtime.

      Attribution:
    • wahern #1
    • cygx #1

Against the grain

  1. 01

    Dense opcodes are not optional

    The classic computed-goto pattern assumes opcode values map cleanly into a jump table. If your cases are sparse, the switch may compile into a chain or tree of conditional branches instead of one indexed jump, and the comparison against computed goto changes completely. In that world you are not choosing between two equivalent dispatch mechanisms. You are redesigning the opcode representation as much as the control flow.

    Before benchmarking dispatch style, inspect whether your opcode set is dense enough for jump-table code generation. If it is not, normalize the encoding first or expect the comparison to be misleading.

      Attribution:
    • nly #1
    • adrian_b #1

In plain english

__builtin_unreachable
A compiler builtin that tells the optimizer a code path can never be executed, allowing more aggressive optimization at the cost of undefined behavior if it is reached.
branch prediction
A CPU technique that guesses the next instruction path before the current branch is fully resolved so execution can keep moving.
bytecode
A compact instruction format executed by an interpreter or virtual machine instead of directly by the hardware CPU.
Clang
A widely used C, C++, and Objective-C compiler front end built on LLVM.
computed goto
A GNU C extension that lets code jump to a label whose address is stored in a variable or table, often used to build very fast dispatch loops.
GNU C
The C language as implemented by GCC with extra compiler-specific extensions beyond the standard.
indirect branch
A jump where the destination address comes from a register or memory rather than being hard-coded in the instruction.
opcode
An operation code, meaning the numeric value that identifies which instruction a bytecode interpreter should execute.
Rust
A systems programming language focused on memory safety and performance.

Reference links

Compiler and code generation examples