HN Debrief

Formal methods and the future of programming

  • Programming
  • AI
  • Developer Tools
  • Security
  • Infrastructure

Jane Street’s post is an index to its work on formal methods, meaning techniques that state what software must do in a mathematically precise way and then prove the implementation satisfies it. The big claim is not that every team should write proofs for everything. It is that tighter integration between programming languages, proof systems, and AI could shift more engineering effort from hand-reviewing generated code toward machine-checked verification. That landed with readers because Jane Street is one of the few places where the economics make immediate sense: deterministic systems, strong internal language tooling, and a lot of value tied to avoiding subtle failures.

If your systems have crisp semantics, start moving invariants into types, specs, model checkers, or proof tools now, especially around security, concurrency, compilers, and infrastructure. If your product is still discovering its requirements, treat formal methods as a targeted tool for high-risk components rather than a whole-company development style.

Discussion mood

Cautiously optimistic. People liked the direction and agreed AI plus stronger specs could make verification much more practical, but they were blunt that the real cost sits in writing correct specs, choosing domains with stable semantics, and avoiding proof systems that become more complex than the code they justify.

Key insights

  1. 01

    AI changes the proof bottleneck

    Manual proof assistants like Rocq have usually been the painful fallback when SMT solvers such as CVC5 or Z3 cannot discharge an obligation automatically. The interesting shift is that frontier models are now good enough to brute-force many of those manual proofs in minutes, which changes how you structure verified software. If proofs become cheap enough to generate, you no longer have to contort code and lemmas for prover convenience first. You can optimize for human-readable structure and let the AI absorb more of the proof maintenance work, as long as the harness prevents it from weakening contracts or editing semantics behind your back.

    If you are experimenting with verification, invest as much in the agent harness as in the prover. Lock down who can change specs, contracts, and code, or you will end up proving the wrong thing faster.

      Attribution:
    • addaon #1 #2
  2. 02

    Proofs cover what tests cannot

    The useful line is not 'formal methods versus testing'. It is exhaustive reasoning versus sampling. Unit tests, fuzzing, and property-based testing explore many executions, but they still leave unvisited behaviors and give you confidence by experiment. Formal verification can prove a property over all inputs and can establish things that test generation cannot reach in principle, like certain termination or refinement guarantees. That is why people keep using tests in practice, but still reach for proofs in places where missing one edge case is too expensive.

    Use property tests and fuzzers to find bugs fast, then reserve proofs for invariants where probabilistic confidence is not enough. Security boundaries, protocol state machines, and compiler correctness are the obvious candidates.

      Attribution:
    • jlebar #1
    • Jtsummers #1
    • Agingcoder #1
    • akoboldfrying #1
    • jcranmer #1
  3. 03

    A spec is usually simpler than code

    The best answer to 'isn't the spec just another implementation' was that a good spec lives at a higher level and leaves many irrelevant choices unspecified. Saying that a sort returns an ordered permutation of its input is far shorter than writing quicksort, timsort, or a pipelined hardware sorter. The point of the spec is to describe the required behavior while ignoring details like register allocation, data layout, or algorithm choice. In some cases the gap is so large that deriving an implementation from the spec would imply a breakthrough or a cryptographic failure.

    When a spec feels like duplicate code, it is often the wrong level of abstraction. Rewrite specs around externally visible properties and invariants, not step-by-step execution.

      Attribution:
    • bluGill #1
    • nemo1618 #1
    • IshKebab #1
    • gsnedders #1
  4. 04

    Well-defined subsystems are the real target

    The practical boundary is not 'backend good, frontend bad'. It is whether the semantics are crisp enough to formalize. Kernels, CPU logic, protocols, parsers, compilers, cryptography, and concurrency primitives are obvious fits. Even UI code can often be reduced to state machines and verified at that layer, while the higher-level product question remains fuzzy. That framing makes formal methods look less like a universal programming paradigm and more like precision tooling for the parts of the stack where ambiguity is low and failure costs are high.

    Map your system by semantic clarity, not by team or repo. Apply formal methods to the components with stable state transitions and costly failure, even if the overall product remains exploratory.

      Attribution:
    • benlivengood #1
    • sn9 #1
    • petra #1
    • stephenlf #1
  5. 05

    Tooling matters more than elegant logic

    People with older verification experience argued that the banal work dominates. The hard human task is feeding the prover the right lemmas and integrating verification into the language and build flow so it feels like programming, not like maintaining shadow files and comment syntax. That is why language control matters so much. Integrated specs, strong automation, and dumb proof checkers that validate explicit proof traces are more valuable than ever-more-clever logical formalisms if the goal is commercial reliability rather than academic novelty.

    Favor proof systems that fit your language and CI pipeline cleanly. Developer friction will decide adoption long before expressive power does.

      Attribution:
    • Animats #1 #2
    • nextos #1

Against the grain

  1. 01

    Specs can outgrow the software

    The clean story that requirements are a smaller surface than the program got a direct rebuttal. For nontrivial systems, full formal specifications often become larger, stranger, and harder to review than the code itself, because they also need proof-oriented scaffolding and edge-case detail. seL4 was cited as a concrete reminder that even when the top-level spec is only thousands of lines, the proof burden can explode into hundreds of thousands. AI may reduce labor, but it does not erase the cost of deciding which properties are worth proving or checking that the spec matches reality.

    Before committing to verification, estimate spec and proof maintenance as first-class engineering work. If you cannot afford to review the spec deeply, you probably cannot afford to rely on it.

      Attribution:
    • Veserv #1
    • rzmmm #1
    • eddiepete #1
    • pron #1
  2. 02

    Startups may still need offensive programming

    For teams still searching for product fit, the complaint was not that formal methods are wrong. It was that they are misaligned with work where requirements are still being discovered and speed matters more than airtight correctness. That view sharpened into a useful distinction: AI lowers the cost of defensive programming, but only after you have enough stable domain knowledge to formalize the right contracts. Jane Street likely does. A founder chasing uncertain demand often does not.

    Do not impose proof-heavy process on the parts of the business that are still learning what to build. Keep the rigor for settled, high-risk components and let discovery work stay lightweight.

      Attribution:
    • jdw64 #1 #2
    • ngruhn #1
    • stephenlf #1
  3. 03

    Verification narrows trust, not eliminates it

    seL4 came up as a warning against treating 'formally verified' as magic. Bugs can still exist in unverified configurations, adjacent components, assumptions about the environment, or the boundary between the formal model and the deployed system. The useful mental model is not total certainty. It is shrinking the trusted computing base and making the remaining assumptions explicit.

    When evaluating a verified system, ask exactly what was proven and what remains outside the proof boundary. The trust surface is the product, not the badge.

      Attribution:
    • xvilka #1
    • cayley_graph #1
    • Paracompact #1

In plain english

Dafny
A verification-oriented programming language that lets developers write code with specifications and prove correctness properties automatically.
formal methods
Mathematical techniques for specifying and proving that software behaves correctly.
Frama-C
A framework for analyzing and verifying C programs, often using annotations and external provers.
fuzzing
Automatically feeding many unexpected or random inputs into software to find crashes, security flaws, or logic bugs.
Isabelle
A general-purpose interactive theorem prover used for formal verification and mathematical proofs.
LLM
Large Language Model, an AI system trained to generate and analyze text.
property-based testing
A testing approach where you define general properties of correct behavior and the tool generates many random inputs to try to falsify them.
Rocq
The new name for Coq, an interactive theorem prover used to write machine-checked mathematical proofs and verified programs.
seL4
A high-assurance microkernel known for extensive formal verification of core kernel properties.
SMT
Satisfiability Modulo Theories, a class of solvers that extend SAT solving with reasoning about numbers, arrays, bit vectors, and other structured data.
SPARK
A formally analyzable subset of the Ada programming language used for high-assurance software.
trusted computing base
The set of components that must be correct for the security or correctness guarantees of a system to hold.
type system
The part of a programming language that classifies values and can reject whole classes of invalid operations before the program runs.

Reference links

Background essays and talks

Papers on logic and verification

Verified systems and specs

  • seL4 specification PDF
    Referenced as a concrete example of the size and complexity of a real industrial-strength formal specification.
  • seL4 issue #1199
    Linked in a cautionary note that formal verification is not a blanket guarantee across every configuration or component.

Books and older proof systems

  • Program = Proof
    Recommended to illustrate how proofs can establish correctness far beyond what any practical test suite could cover.
  • Boyer-Moore theorem prover manual
    Shared by a commenter with historical experience to show older proof automation and how long these ideas have been around.

Tools and projects mentioned