HN Debrief

I hate compilers

  • Programming
  • Developer Tools
  • Open Source
  • Security
  • Infrastructure

The post starts from a practical problem in Anubis, a browser proof-of-work gate used to slow abusive scrapers. Its challenge path is written in WebAssembly, but some clients have WebAssembly disabled, so the author added a fallback by recompiling the WebAssembly to JavaScript. That should have been routine. Instead, it exposed how fragile reproducible builds still are at the low end of the stack. The author could eliminate obvious sources of nondeterminism like timestamps, then discovered a nastier one: Clang appeared to emit different output based on address space layout randomization on the machine doing the compile. Several people zeroed in on that as the important fact. Build-time date macros and similar inputs are boring and expected. A compiler or linker path whose output changes with pointer layout is a genuine bug.

If you ship security-sensitive or verifiable binaries, treat reproducibility as an explicit requirement and test for it early across machines and runs. If your output changes with address layout or other ambient state, assume a real toolchain bug and reduce it to an upstreamable test case instead of papering over it locally.

Discussion mood

Mostly sympathetic to the debugging pain and convinced the ASLR-dependent output is a real Clang or LLVM bug. The mood turned sour around Anubis itself, with a lot of hostility to browser proof-of-work on accessibility, battery, and principle, even from people who accepted that scraper abuse is a real cost problem.

Key insights

  1. 01

    ASLR-dependent codegen points to a real LLVM bug

    Address space layout randomization affecting the compiler process and then changing the compiled program is not normal ambient variance. It suggests some pass is depending on unstable iteration order, likely through data structures like DenseMap, where pointer layout bleeds into emitted output. The useful shift here is from "reproducible builds are hard" to "this specific behavior is a fixable compiler defect," and commenters backed that with LLVM's own guidance to use deterministic containers when order matters.

    If repeated builds differ only when compiler process layout changes, stop treating it as inevitable noise. Capture that condition in a reproducer and file it against the toolchain, because upstream already considers deterministic iteration a requirement in these paths.

      Attribution:
    • biglost #1
    • RyanSquared #1
    • pertymcpert #1
    • ammar2 #1
  2. 02

    Signatures do not replace reproducible builds

    Cryptographic signatures prove who shipped a binary. They do not prove that the binary matches the published source. That distinction matters for audited systems, regulated software delivery, and post-incident verification like checking whether a distro package matches the source after a supply chain scare. Reproducibility is what lets outside parties validate the build, not just trust the builder.

    If you rely on signed release artifacts today, decide where independent rebuild verification would catch a class of failure your current process misses. That is especially relevant for customer audits, open source distribution, and high-trust internal tools.

      Attribution:
    • robinsonb5 #1 #2
    • harrouet #1
  3. 03

    Nix helps with inputs, not compiler correctness

    Several people reached for Nix as the obvious answer, but the stronger clarification was narrower. Nix and SOURCE_DATE_EPOCH can freeze timestamps, dependencies, and much of the host environment. They cannot make a buggy compiler deterministic, and they do not require output determinism just to hash build inputs. That separates hermetic builds from reproducible binaries, which often get blurred together in practice.

    Use Nix or similar systems to remove easy environmental drift, but keep a separate check for byte-for-byte reproducibility. Passing one does not imply the other.

      Attribution:
    • edude03 #1
    • trexd #1
    • xena #1
    • lloeki #1
    • stabbles #1
  4. 04

    Anubis works by pricing out cheap scraping

    The practical defense of Anubis was not that it can overpower OpenAI or other well-funded crawlers on raw compute. It is that most abusive traffic is not coming from perfectly optimized, GPU-backed crawlers. A challenge that adapts by client signals, IP reputation, and request behavior can push low-quality scrapers and opportunistic abuse below the threshold where they swamp a small site. That makes it a traffic-shaping tool, not an absolute gate.

    If you run a small service under scraper pressure, evaluate defenses by whether they reduce your actual bad traffic and hosting bill, not by whether they are theoretically unbeatable by the richest attacker.

      Attribution:
    • xena #1
    • saintfire #1
    • lifthrasiir #1
    • Analemma_ #1
  5. 05

    There is a clear path to an upstreamable reproducer

    Commenters did more than diagnose the bug class. They pointed to cvise and existing WebAssembly exception-handling tests in LLVM as the likely route to a minimal failing case, and even suggested the relevant pass, WebAssemblyCFGStackify.cpp. That turns a frustrating anecdote into tractable compiler work. The hard part is reduction from a large codebase like Binaryen, not figuring out where to start.

    When you hit a likely compiler bug, ask for reducer tooling and nearest existing tests first. That shortens the path from "weird local failure" to something maintainers can actually merge a fix for.

      Attribution:
    • amatria #1
    • ammar2 #1
    • xena #1

Against the grain

  1. 01

    Proof of work punishes users more than major scrapers

    The strongest pushback on Anubis was that browser proof-of-work is a poor weapon against the largest targets. A puzzle that is tolerable on a phone is not serious friction for a well-funded crawler, while the cost in waiting time, battery drain, and accessibility lands immediately on legitimate visitors. That reframes Anubis as a tax on users, not a robust anti-AI defense.

    Before adding proof-of-work to a public site, measure user-visible latency and abandonment on low-end devices. If the friction is noticeable, you need evidence that the reduction in abusive traffic is large enough to justify it.

      Attribution:
    • antirez #1
    • ericpauley #1
    • Animats #1
  2. 02

    Deterministic binaries may be overvalued operationally

    A minority view argued that reproducible outputs are less important than they are often made out to be. In many real organizations, the answer is simply to build from reviewed source in your own environment and rely on CI traceability, signing, and logs for assurance. From that perspective, byte-identical rebuilds are nice to have, but not worth large engineering cost unless a concrete audit need demands them.

    If you are considering a reproducible-build push, tie it to a specific threat model or compliance need. Otherwise you risk burning time on infrastructure work that does not change how your releases are actually trusted.

      Attribution:
    • charcircuit #1 #2
    • skydhash #1 #2

In plain english

Binaryen
A compiler and toolchain library for WebAssembly, used to optimize and transform WASM programs.
CI
Continuous Integration, an automated system that builds and tests code changes.
Clang
A widely used open source compiler front end for C, C++, and related languages, built as part of LLVM.
DenseMap
An LLVM hash map container optimized for speed, but its iteration order is not stable unless extra care is taken.
JavaScript
The scripting language built into web browsers and widely used for web applications.
LLVM
A large open source compiler infrastructure project that includes optimization passes, code generators, and tools like Clang.
Nix
A package manager and build system focused on declarative, isolated, and repeatable software environments.
reproducible builds
A build process where the same source code and inputs produce byte-for-byte identical binaries every time.
SOURCE_DATE_EPOCH
A standard environment variable used by build tools to replace the current time with a fixed timestamp for reproducible builds.
WebAssembly
A low-level binary format for code that can run in web browsers and other environments, often abbreviated as WASM.

Reference links

Compiler debugging and determinism

Reproducible build references

  • SOURCE_DATE_EPOCH documentation
    Linked to clarify that fixed build timestamps are part of a broader reproducible builds standard, not a Nix-specific feature.

Anubis documentation

Related background reading

  • Much ado about nothing
    Referenced to explain why the author no longer uses Nix despite commenters suggesting it as the obvious solution.
  • Firefox WebAssembly sandboxing article
    Used as an example of real systems compiling C or C++ to WebAssembly for sandboxing inside the browser stack.
  • polywasm
    Suggested as an alternative to recompiling WebAssembly to JavaScript for fallback execution.
  • Author character guide
    Shared to explain the author's in-post character voices and styling choices.