What can you confidently guarantee about your software?

Programming
Developer Tools
Security
Infrastructure

The article asks what software teams can confidently guarantee and makes the case that formal verification is becoming more practical, with AI potentially reducing the cost of producing proofs. It uses the familiar framing that tests sample behavior while proofs can establish that certain properties always hold. The catch, which quickly became the center of gravity here, is that those guarantees stop at the verification boundary. Pure logic can be proven. UI behavior, network calls, databases, and third-party systems usually cannot, at least not without assumptions.

If you run an app team, treat formal methods as a selective tool for high-value logic, protocols, parsers, and safety-critical components, not a blanket path to bug-free software. The gating issue is not just proof tooling but whether you can state stable, useful properties precisely enough to justify the effort.

June 29, 2026
queue.acm.org
Discuss on HN

Key insights

Model checking fits stateful app logic

Model checking and model-based testing land in a much more usable place for ordinary software than full proofs. They force you to write down system behavior in a formal way, then exhaustively explore small state spaces or generated traces, which is often enough to expose bad transitions, missing cases, and broken assumptions in CRUD-like systems without pretending you have solved end-to-end correctness.

If your team cannot justify proof engineering, start by modeling a workflow or state machine and checking invariants over small cases. You will often catch design bugs before code review or production, and the model can double as executable documentation.

Attribution:

agentultra #1
inaseer #1

Properties help teams communicate requirements

Property-based work is not just a bug-finding technique. It can act as a shared language with domain experts. A concrete example from supply chain simulation showed that expressing expected metric behavior as mathematical properties cut down on specification churn, made debugging faster, and later enabled a safer rewrite in Rust because the behavioral contract was already captured.

Use properties when product or ops stakeholders struggle to describe edge cases in prose. A short list of behavioral rules can align implementation and unlock refactors later.

Attribution:

tikhonj #1

You can prove useful safety properties first

Even without a rich business specification, verification tools can still enforce baseline guarantees like bounds safety, null safety, termination in bounded contexts, and absence of undefined behavior. That shifts formal methods from a moonshot about perfect business correctness into something closer to machine-checked defensive programming, especially in languages and codebases where these failure modes are still common.

Aim early formal effort at low-level correctness conditions that are stable and cheap to maintain. This works best in systems code, legacy C, and critical components where crashes or undefined behavior carry real operational cost.

Attribution:

derdi #1

Verified flows still need escape hatches

Real workflows do not stay inside neat state machines. Shipments get destroyed, people die, fraud happens, and disputes force manual overrides. That means even a well-verified happy path needs an explicit recovery path for contested or exceptional states. Verification helps keep routine execution sane, but it does not remove the need for human resolution mechanisms.

When designing a verified workflow, budget for admin overrides, compensating actions, and audit trails from the start. If your model has no place for exceptions, the production system will invent one under pressure.

Attribution:

taeric #1 #2
AnimalMuppet #1

AI does not erase proof economics

The expensive part of formal verification does not disappear just because an LLM writes some proof artifacts. Someone still pays for the search, the tool runs, and the expert review of what properties are actually worth proving. People with hands-on experience said proof-guided development can pay off by shrinking debugging time, but maintaining proofs as code evolves is still painful and the article gave little evidence that AI changes the cost curve enough for broad adoption.

Do not budget formal verification like autocomplete for tests. Treat it like a focused investment with ongoing maintenance cost, and ask which properties save enough incident or debugging time to earn that spend.

Attribution:

pron #1 #2
GregDavidson #1

Certification-heavy domains are the real near term market

Safety-critical software already lives in a world of rigorous specifications, audits, and evidence generation. In those settings, auto-generated proof evidence is not extra process. It fits the process teams already endure. The same goes for vulnerable parsers and similar narrow components where the security payoff is immediate and the behavior is constrained enough to model.

If you sell into regulated or security-sensitive environments, formal methods are closer to a product advantage than a research toy. Start where you already need assurance evidence, not in the most fluid parts of your app.

Attribution:

nilkn #1
thewillowcat #1

Against the grain

Partial guarantees are still valuable

Treating formal verification as useless unless it proves the whole application misses how software is built. If you specify the behavior of a database call, a parser, or a refund state transition, you can still prove meaningful facts under those assumptions. That is the same way teams reason about typed languages, functional cores, and library contracts every day. Shrinking the untrusted surface is progress even if the world outside it stays messy.

Do not reject formal methods because they stop at system boundaries. Ask which assumptions you are already making informally, then decide whether turning a few of them into checked contracts reduces real risk.

Attribution:

gf000 #1
microgpt #1
harperlee #1

Even simple validation benefits from properties

The claim that CRUD apps rarely have durable invariants was challenged with mundane examples like email handling. The deeper point was not that RFC-perfect email validation is wise. It was that developers routinely get even simple acceptance rules wrong, and property-based tests can expose surprising Unicode and edge-case failures that example-based tests miss. Low complexity is not a reason to skip stronger checks.

For user-input rules, write down the narrow behavior you actually want and test it generatively. This is especially useful when the implementation looks too trivial to deserve careful review.

Attribution:

win311fwg #1 #2 #3

Static typing already proves useful things

A practical counterpoint reframed limited formal verification as something developers already rely on. Type systems prove small correctness facts before runtime and save real debugging time. That does not settle how much farther teams should go, but it undercuts the idea that formal methods are only exotic tools for specialists.

Frame the discussion as a spectrum, not a cliff. If your team values compile-time guarantees from types, it may be ready for stronger machine-checked invariants in a few critical modules.

Attribution:

win311fwg #1

In plain english

CRUD ↩

Create, Read, Update, Delete, the basic operations in many business applications that mostly store and retrieve data.

formal verification ↩

A set of techniques that use mathematics and logic to prove that software satisfies specified properties, instead of only testing selected cases.

LLM ↩

Large language model, an AI system trained on large amounts of text that can generate and transform language and code.

model checking ↩

A formal method that exhaustively explores the states of a simplified system model to see whether specified properties hold.

model-based testing ↩

A testing approach where tests are generated from a model of system behavior, often for stateful workflows.

property-based testing ↩

A testing style where you define general rules that should always hold, then generate many inputs automatically to look for violations.

Rust ↩

A programming language designed for performance and memory safety, especially in systems programming.

undefined behavior ↩

Program behavior that a language specification does not define, which can lead to unpredictable results.

Reference links

Tools and projects

Accordant
Example framework mentioned for model-based testing of stateful and CRUD-like systems.
Qed
A project presented as a formally verified web frontend.
Dafny
The verification language used in a college class example.

Background references

Lamport's bakery algorithm
Referenced in an anecdote about formally verifying a classic locking algorithm and discovering bounded-counter issues.
What is wrong with email validation
Cited to argue that strict syntactic email validation is often a poor match for real-world email behavior.
Metamorphic Testing of Web APIs
Referenced as evidence that property-style testing can find surprising API bugs such as bad pagination behavior.

What can you confidently guarantee about your software?

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Tools and projects

Background references