I hate compilers

Programming
Developer Tools
Open Source
Security
Infrastructure

The post starts from a practical problem in Anubis, a browser proof-of-work gate used to slow abusive scrapers. Its challenge path is written in WebAssembly, but some clients have WebAssembly disabled, so the author added a fallback by recompiling the WebAssembly to JavaScript. That should have been routine. Instead, it exposed how fragile reproducible builds still are at the low end of the stack. The author could eliminate obvious sources of nondeterminism like timestamps, then discovered a nastier one: Clang appeared to emit different output based on address space layout randomization on the machine doing the compile. Several people zeroed in on that as the important fact. Build-time date macros and similar inputs are boring and expected. A compiler or linker path whose output changes with pointer layout is a genuine bug.

If you ship security-sensitive or verifiable binaries, treat reproducibility as an explicit requirement and test for it early across machines and runs. If your output changes with address layout or other ambient state, assume a real toolchain bug and reduce it to an upstreamable test case instead of papering over it locally.

June 18, 2026
xeiaso.net
Discuss on HN

Discussion mood

Mostly sympathetic to the debugging pain and convinced the ASLR-dependent output is a real Clang or LLVM bug. The mood turned sour around Anubis itself, with a lot of hostility to browser proof-of-work on accessibility, battery, and principle, even from people who accepted that scraper abuse is a real cost problem.

Key insights

ASLR-dependent codegen points to a real LLVM bug

Address space layout randomization affecting the compiler process and then changing the compiled program is not normal ambient variance. It suggests some pass is depending on unstable iteration order, likely through data structures like DenseMap, where pointer layout bleeds into emitted output. The useful shift here is from "reproducible builds are hard" to "this specific behavior is a fixable compiler defect," and commenters backed that with LLVM's own guidance to use deterministic containers when order matters.

If repeated builds differ only when compiler process layout changes, stop treating it as inevitable noise. Capture that condition in a reproducer and file it against the toolchain, because upstream already considers deterministic iteration a requirement in these paths.

Attribution:

biglost #1
RyanSquared #1
pertymcpert #1
ammar2 #1

Signatures do not replace reproducible builds

Cryptographic signatures prove who shipped a binary. They do not prove that the binary matches the published source. That distinction matters for audited systems, regulated software delivery, and post-incident verification like checking whether a distro package matches the source after a supply chain scare. Reproducibility is what lets outside parties validate the build, not just trust the builder.

If you rely on signed release artifacts today, decide where independent rebuild verification would catch a class of failure your current process misses. That is especially relevant for customer audits, open source distribution, and high-trust internal tools.

Attribution:

robinsonb5 #1 #2
harrouet #1

Nix helps with inputs, not compiler correctness

Several people reached for Nix as the obvious answer, but the stronger clarification was narrower. Nix and SOURCE_DATE_EPOCH can freeze timestamps, dependencies, and much of the host environment. They cannot make a buggy compiler deterministic, and they do not require output determinism just to hash build inputs. That separates hermetic builds from reproducible binaries, which often get blurred together in practice.

Use Nix or similar systems to remove easy environmental drift, but keep a separate check for byte-for-byte reproducibility. Passing one does not imply the other.

Attribution:

edude03 #1
trexd #1
xena #1
lloeki #1
stabbles #1

Anubis works by pricing out cheap scraping

The practical defense of Anubis was not that it can overpower OpenAI or other well-funded crawlers on raw compute. It is that most abusive traffic is not coming from perfectly optimized, GPU-backed crawlers. A challenge that adapts by client signals, IP reputation, and request behavior can push low-quality scrapers and opportunistic abuse below the threshold where they swamp a small site. That makes it a traffic-shaping tool, not an absolute gate.

If you run a small service under scraper pressure, evaluate defenses by whether they reduce your actual bad traffic and hosting bill, not by whether they are theoretically unbeatable by the richest attacker.

Attribution:

xena #1
saintfire #1
lifthrasiir #1
Analemma_ #1

There is a clear path to an upstreamable reproducer

Commenters did more than diagnose the bug class. They pointed to cvise and existing WebAssembly exception-handling tests in LLVM as the likely route to a minimal failing case, and even suggested the relevant pass, WebAssemblyCFGStackify.cpp. That turns a frustrating anecdote into tractable compiler work. The hard part is reduction from a large codebase like Binaryen, not figuring out where to start.

When you hit a likely compiler bug, ask for reducer tooling and nearest existing tests first. That shortens the path from "weird local failure" to something maintainers can actually merge a fix for.

Attribution:

amatria #1
ammar2 #1
xena #1

Against the grain

Proof of work punishes users more than major scrapers

The strongest pushback on Anubis was that browser proof-of-work is a poor weapon against the largest targets. A puzzle that is tolerable on a phone is not serious friction for a well-funded crawler, while the cost in waiting time, battery drain, and accessibility lands immediately on legitimate visitors. That reframes Anubis as a tax on users, not a robust anti-AI defense.

Before adding proof-of-work to a public site, measure user-visible latency and abandonment on low-end devices. If the friction is noticeable, you need evidence that the reduction in abusive traffic is large enough to justify it.

Attribution:

antirez #1
ericpauley #1
Animats #1

Deterministic binaries may be overvalued operationally

A minority view argued that reproducible outputs are less important than they are often made out to be. In many real organizations, the answer is simply to build from reviewed source in your own environment and rely on CI traceability, signing, and logs for assurance. From that perspective, byte-identical rebuilds are nice to have, but not worth large engineering cost unless a concrete audit need demands them.

If you are considering a reproducible-build push, tie it to a specific threat model or compliance need. Otherwise you risk burning time on infrastructure work that does not change how your releases are actually trusted.

Attribution:

charcircuit #1 #2
skydhash #1 #2

In plain english

Binaryen ↩

A compiler and toolchain library for WebAssembly, used to optimize and transform WASM programs.

CI ↩

Continuous integration, automated build and test pipelines that run when code changes are made.

Clang ↩

A compiler front end for C, C++, and related languages built on the LLVM project.

DenseMap ↩

An LLVM hash map container optimized for speed, but its iteration order is not stable unless extra care is taken.

JavaScript ↩

The main programming language used to add behavior and interactivity to web pages.

LLVM ↩

A widely used compiler infrastructure that can generate machine code for many hardware architectures and toolchains.

Nix ↩

A package manager and configuration system aimed at making software environments reproducible.

reproducible builds ↩

A build process designed so independent people can rebuild the same source code and get bit-for-bit identical outputs.

SOURCE_DATE_EPOCH ↩

A standard environment variable used by build tools to replace the current time with a fixed timestamp for reproducible builds.

WebAssembly ↩

A portable binary format that lets code run in web browsers at near-native speed.

Reference links

Compiler debugging and determinism

cvise reduction tutorial comment on LLVM issue
Suggested as a practical way to reduce a large failing case to a minimal compiler bug reproducer.
LLVM reverse iteration bots discussion
Referenced as background on LLVM efforts to catch nondeterministic behavior caused by unstable iteration order.
LLVM commit fixing nondeterminism
Used as evidence that LLVM does treat deterministic output as worth fixing.
LLVM SetVector documentation
Cited to show LLVM has explicit guidance on deterministic iteration containers.

Reproducible build references

SOURCE_DATE_EPOCH documentation
Linked to clarify that fixed build timestamps are part of a broader reproducible builds standard, not a Nix-specific feature.

Anubis documentation

Anubis FAQ on crypto mining
Cited to show that using client proof-of-work for cryptocurrency mining is an explicitly rejected design choice.
Anubis honeypot overview
Linked to support the claim that Anubis uses trap content and resulting behavior to adjust challenge difficulty.
Anubis metarefresh challenge documentation
Provided as the non-JavaScript fallback answer when users disable JavaScript entirely.
Anubis GitHub repository
Referenced for the project's own framing of proof-of-work as a last-resort defense.

I hate compilers

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Compiler debugging and determinism

Reproducible build references

Anubis documentation

Related background reading