Formal methods and the future of programming

Programming
AI
Developer Tools
Security
Infrastructure

Jane Street’s post is an index to its work on formal methods, meaning techniques that state what software must do in a mathematically precise way and then prove the implementation satisfies it. The big claim is not that every team should write proofs for everything. It is that tighter integration between programming languages, proof systems, and AI could shift more engineering effort from hand-reviewing generated code toward machine-checked verification. That landed with readers because Jane Street is one of the few places where the economics make immediate sense: deterministic systems, strong internal language tooling, and a lot of value tied to avoiding subtle failures.

If your systems have crisp semantics, start moving invariants into types, specs, model checkers, or proof tools now, especially around security, concurrency, compilers, and infrastructure. If your product is still discovering its requirements, treat formal methods as a targeted tool for high-risk components rather than a whole-company development style.

June 14, 2026
blog.janestreet.com
Discuss on HN

Discussion mood

Cautiously optimistic. People liked the direction and agreed AI plus stronger specs could make verification much more practical, but they were blunt that the real cost sits in writing correct specs, choosing domains with stable semantics, and avoiding proof systems that become more complex than the code they justify.

Key insights

AI changes the proof bottleneck

Manual proof assistants like Rocq have usually been the painful fallback when SMT solvers such as CVC5 or Z3 cannot discharge an obligation automatically. The interesting shift is that frontier models are now good enough to brute-force many of those manual proofs in minutes, which changes how you structure verified software. If proofs become cheap enough to generate, you no longer have to contort code and lemmas for prover convenience first. You can optimize for human-readable structure and let the AI absorb more of the proof maintenance work, as long as the harness prevents it from weakening contracts or editing semantics behind your back.

If you are experimenting with verification, invest as much in the agent harness as in the prover. Lock down who can change specs, contracts, and code, or you will end up proving the wrong thing faster.

Attribution:

addaon #1 #2

Proofs cover what tests cannot

The useful line is not 'formal methods versus testing'. It is exhaustive reasoning versus sampling. Unit tests, fuzzing, and property-based testing explore many executions, but they still leave unvisited behaviors and give you confidence by experiment. Formal verification can prove a property over all inputs and can establish things that test generation cannot reach in principle, like certain termination or refinement guarantees. That is why people keep using tests in practice, but still reach for proofs in places where missing one edge case is too expensive.

Use property tests and fuzzers to find bugs fast, then reserve proofs for invariants where probabilistic confidence is not enough. Security boundaries, protocol state machines, and compiler correctness are the obvious candidates.

Attribution:

jlebar #1
Jtsummers #1
Agingcoder #1
akoboldfrying #1
jcranmer #1

A spec is usually simpler than code

The best answer to 'isn't the spec just another implementation' was that a good spec lives at a higher level and leaves many irrelevant choices unspecified. Saying that a sort returns an ordered permutation of its input is far shorter than writing quicksort, timsort, or a pipelined hardware sorter. The point of the spec is to describe the required behavior while ignoring details like register allocation, data layout, or algorithm choice. In some cases the gap is so large that deriving an implementation from the spec would imply a breakthrough or a cryptographic failure.

When a spec feels like duplicate code, it is often the wrong level of abstraction. Rewrite specs around externally visible properties and invariants, not step-by-step execution.

Attribution:

bluGill #1
nemo1618 #1
IshKebab #1
gsnedders #1

Well-defined subsystems are the real target

The practical boundary is not 'backend good, frontend bad'. It is whether the semantics are crisp enough to formalize. Kernels, CPU logic, protocols, parsers, compilers, cryptography, and concurrency primitives are obvious fits. Even UI code can often be reduced to state machines and verified at that layer, while the higher-level product question remains fuzzy. That framing makes formal methods look less like a universal programming paradigm and more like precision tooling for the parts of the stack where ambiguity is low and failure costs are high.

Map your system by semantic clarity, not by team or repo. Apply formal methods to the components with stable state transitions and costly failure, even if the overall product remains exploratory.

Attribution:

benlivengood #1
sn9 #1
petra #1
stephenlf #1

Tooling matters more than elegant logic

People with older verification experience argued that the banal work dominates. The hard human task is feeding the prover the right lemmas and integrating verification into the language and build flow so it feels like programming, not like maintaining shadow files and comment syntax. That is why language control matters so much. Integrated specs, strong automation, and dumb proof checkers that validate explicit proof traces are more valuable than ever-more-clever logical formalisms if the goal is commercial reliability rather than academic novelty.

Favor proof systems that fit your language and CI pipeline cleanly. Developer friction will decide adoption long before expressive power does.

Attribution:

Animats #1 #2
nextos #1

Against the grain

Specs can outgrow the software

The clean story that requirements are a smaller surface than the program got a direct rebuttal. For nontrivial systems, full formal specifications often become larger, stranger, and harder to review than the code itself, because they also need proof-oriented scaffolding and edge-case detail. seL4 was cited as a concrete reminder that even when the top-level spec is only thousands of lines, the proof burden can explode into hundreds of thousands. AI may reduce labor, but it does not erase the cost of deciding which properties are worth proving or checking that the spec matches reality.

Before committing to verification, estimate spec and proof maintenance as first-class engineering work. If you cannot afford to review the spec deeply, you probably cannot afford to rely on it.

Attribution:

Veserv #1
rzmmm #1
eddiepete #1
pron #1

Startups may still need offensive programming

For teams still searching for product fit, the complaint was not that formal methods are wrong. It was that they are misaligned with work where requirements are still being discovered and speed matters more than airtight correctness. That view sharpened into a useful distinction: AI lowers the cost of defensive programming, but only after you have enough stable domain knowledge to formalize the right contracts. Jane Street likely does. A founder chasing uncertain demand often does not.

Do not impose proof-heavy process on the parts of the business that are still learning what to build. Keep the rigor for settled, high-risk components and let discovery work stay lightweight.

Attribution:

jdw64 #1 #2
ngruhn #1
stephenlf #1

Verification narrows trust, not eliminates it

seL4 came up as a warning against treating 'formally verified' as magic. Bugs can still exist in unverified configurations, adjacent components, assumptions about the environment, or the boundary between the formal model and the deployed system. The useful mental model is not total certainty. It is shrinking the trusted computing base and making the remaining assumptions explicit.

When evaluating a verified system, ask exactly what was proven and what remains outside the proof boundary. The trust surface is the product, not the badge.

Attribution:

xvilka #1
cayley_graph #1
Paracompact #1

In plain english

Dafny ↩

A programming language and verifier designed to make proving correctness properties more approachable.

formal methods ↩

Mathematical techniques for specifying and checking software or system behavior more rigorously than ordinary testing.

Frama-C ↩

A platform for analyzing and verifying C programs using formal methods.

fuzzing ↩

A testing technique that feeds many unexpected or random inputs into software to trigger crashes, security issues, or logic bugs.

Isabelle ↩

A general-purpose interactive theorem prover for writing and checking formal proofs.

LLM ↩

Large language model, a machine learning system trained to generate and understand text.

property-based testing ↩

A testing style where you state general properties a function should satisfy and then generate many inputs to try to break them.

Rocq ↩

The new name for Coq, a proof assistant based on dependent type theory.

seL4 ↩

A formally verified microkernel designed to provide strong isolation and high assurance security.

SMT ↩

Satisfiability Modulo Theories, a class of automated solvers used to check logical formulas involving arithmetic, arrays, and other structured domains.

SPARK ↩

A programming and verification environment for a restricted subset of Ada, aimed at high-assurance software.

trusted computing base ↩

The set of software components that must be correct and trustworthy for a system’s safety or security claims to hold.

type system ↩

The part of a programming language that defines what kinds of values exist and what operations are valid on them.

Reference links

Background essays and talks

Previous post on AI and formal verification
Earlier essay cited as context for the same argument that AI could change the economics of formal verification.
Formal methods won, now what?
Related keynote write-up offered as a broader framing for where formal methods are already succeeding.
Formally specifying UIs
Used to argue that even UI logic can often be modeled as state machines and formally specified.

Papers on logic and verification

On the Unusual Effectiveness of Logic in Computer Science
Cited to support the claim that richer industrial logics are practical and foundational to software engineering.
From Philosophical to Industrial Logics
Cited alongside the Vardi paper to argue that modern logics are useful engineering tools, not just academic formalism.

Verified systems and specs

seL4 specification PDF
Referenced as a concrete example of the size and complexity of a real industrial-strength formal specification.
seL4 issue #1199
Linked in a cautionary note that formal verification is not a blanket guarantee across every configuration or component.

Books and older proof systems

Program = Proof
Recommended to illustrate how proofs can establish correctness far beyond what any practical test suite could cover.
Boyer-Moore theorem prover manual
Shared by a commenter with historical experience to show older proof automation and how long these ideas have been around.

Tools and projects mentioned

mdspan-cute repository
Example project offered as evidence that proofs can be used to verify tensor layout and scheduling machinery for fast code generation.
Lambda hierarchy diagram
Supplementary link from the same commenter describing a proof-heavy AI and systems stack.
QED verified frontends framework
Shared as a Lean-based framework that might fit Jane Street’s interest in verified tooling.
Hyperconsole demo video
Demo for one of the proof-heavy systems projects described in the comments.