Transformers are inherently succinct

AI
Machine Learning
Research
Verification

The paper is a complexity-theory result about the transformer architecture, not a benchmark on chatbots. Its core claim is that transformers can be exponentially more succinct than linear temporal logic, meaning they can describe some languages with far smaller representations than older symbolic formalisms. The sting is in the corollary. Once a model class gets that compact, even basic questions like whether it accepts anything at all or whether two models are equivalent become EXPSPACE-complete in the worst case. That landed as the practical headline for most readers: if you want a system whose behavior you can prove, a transformer is a bad substrate for the thing being proved.

If you are betting on LLMs in safety-critical or formally verified systems, assume verification will stay a major bottleneck and design hard boundaries around what the model is allowed to do. Also do not overread a complexity-theory result as a direct statement about today's trained models or product behavior.

June 5, 2026
openreview.net
Discuss on HN

Discussion mood

Interested and respectful, with a strong tilt toward the verification implication. People liked seeing transformer behavior framed in classic complexity theory, but a lot of the energy went into stopping others from overclaiming what the paper says about trained LLMs, practical verification, or transformer expressivity in general.

Key insights

Worst-case hardness does not kill all analysis

The EXPSPACE result rules out a general efficient verifier for arbitrary transformers. It does not say useful restricted analyses are impossible. That is the same gap software engineering lives with already. General bug-finding is intractable or undecidable in many settings, yet testing, abstract interpretation, SAT-based methods, and narrow-domain verifiers still deliver value on real systems.

Treat this as a boundary on universal proofs, not a reason to stop building practical checks. If you need assurance around model behavior, focus on restricted model classes, bounded properties, and testable interfaces instead of full equivalence claims.

Attribution:

dfabulich #1
ebiederm #1
AlotOfReading #1

The representation choice drives how dramatic the gap looks

The sharpest technical pushback was that the paper's comparison target matters a lot. One commenter argued the construction effectively encodes binary decision diagrams without the reductions and canonical ordering that make reduced ordered binary decision diagrams useful in verification. Another replied that unrestricted binary decision diagrams can in fact be much more compact than reduced ordered ones, so swapping them is not a small detail. The underlying point stands: succinctness claims are highly sensitive to which formal representation you compare against and what constraints you impose on it.

Do not repeat the headline as if it were architecture-independent truth. When evaluating theory papers for product or research strategy, check the exact comparison class and whether it matches the representations your team actually uses.

Attribution:

thesz #1
dragontamer #1 #2

Succinctness and expressivity are different questions

A compact representation can still be weak in what it can compute, and a larger representation can solve classes of problems the compact one cannot. That distinction mattered here because commenters pointed to related results showing transformers and recurrent models are not simply on one ladder of capability. Their solvable problem classes can be incomparable. That makes hybrid architectures more than engineering compromise. They are a response to real theoretical gaps between attention-heavy models and stateful sequence models.

Do not treat a succinctness result as evidence that transformers are the endpoint for sequence modeling. If your workload depends on persistent state or structured recurrence, hybrids deserve serious consideration.

Attribution:

nextos #1
aesthesia #1
measurablefunc #1
yorwba #1

This says nothing about terse chatbot writing

Several readers briefly mapped the paper's use of "succinct" onto the way Claude or other assistants sometimes answer in compressed language. That is the wrong level of abstraction. Here, succinctness is a formal property about how compactly an architecture can represent languages, closer to circuit complexity than product behavior. Token-saving style changes, batching, or model tuning may explain terse outputs, but they are unrelated to the theorem.

Keep the paper in the theory bucket. Do not use it to explain UX changes in commercial models or infer anything direct about why a chatbot writes the way it does.

Attribution:

AlotOfReading #1
parasti #1
7e #1

Against the grain

LLMs can still help build verified systems

The strongest overreach in the reactions was the jump from "transformers are hard to verify" to "do not use LLMs around formal verification at all." That collapses two very different roles. A model can still assist with code generation, proof search, or implementation against a formal spec even if the model itself is not the verified artifact. The paper cuts against making the transformer the thing you certify. It does not cut against using one as a tool in a larger verified pipeline.

Separate toolchain roles from trusted components. You can keep using LLMs to speed development as long as the final trusted boundary sits in code, proofs, or systems you can actually check.

Attribution:

doug_durham #1
platinumrad #1

Succinctness is not a path to optimality

Some readers initially took exponential succinctness as evidence that transformers are nearing some theoretical optimum. That confuses compact description length with overall capability. A more succinct formalism can be harder to reason about, less expressive on some tasks, or simply sitting at a different point in the capability-versus-tractability tradeoff. Turing machines are far more general. Regexes are far more tractable. Transformers do not dominate that space just because they compress one representation family well.

Avoid turning complexity headlines into architecture rankings. For system design, the right question is what tradeoff you want among capability, memory, interpretability, and verifiability.

Attribution:

thesz #1
muvlon #1
drdeca #1

In plain english

abstract interpretation ↩

A static analysis technique that approximates program behavior to prove properties without running every possible case.

EXPSPACE-complete ↩

A complexity class for problems that, in the worst case, require an amount of memory that grows exponentially with input size and are among the hardest problems in that class.

linear temporal logic ↩

A formal language for specifying how properties of a system should hold over sequences of time steps.

Transformer ↩

A neural network architecture commonly used for language and sequence modeling that predicts the next element in a sequence from prior context.

Reference links

Background reading on sequence models

Attention and transformers notebook
Linked as the likely source for an example contrasting finite-memory Markov models with more general stateful processes.
bactra.org
Referenced as a source discussing why stateful models may have advantages over finite-order Markov views.

Prior discussion

Previous Hacker News discussion of the paper
Earlier discussion of the same paper was linked for extra context, including a detailed criticism repeated here.

Transformers are inherently succinct

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Related papers and technical references

Background reading on sequence models

Prior discussion