Rich Sutton on AI creativity and discovery

AI
Programming
Developer Tools
Science

Rich Sutton’s post argues that novelty alone is cheap. Real creativity and discovery need three things: variation, evaluation, and selective retention. His point is not that AI can never discover anything. It is that a standalone generative model can spray out novel candidates, but without a mechanism to test them, score them, and preserve the good ones, those candidates do not become discoveries. He points to systems like AlphaGo and coding agents as the more interesting pattern because they sit inside a loop tied to an external source of truth.

If you are building AI products, focus less on raw model output and more on the evaluation loop around it. The near-term leverage is in domains with cheap, reliable feedback like code, math, simulation, and other verifiable tasks.

June 10, 2026
twitter.com
Discuss on HN

Discussion mood

Mostly positive toward Sutton’s core framing, but impatient with how narrowly he described current AI. Readers broadly agreed that evaluation and retention are essential, then argued that modern LLM systems already rely on exactly those loops in coding and math, so the interesting question is not whether AI can discover but where we can build strong enough evaluators.

Key insights

Harness engineering is doing the real work

The strongest reading of current AI progress is that the model is only one component in a larger search system. Coding works because an agent can generate options, run them through compilers, terminals, tests, or Lean, and keep the ones that survive. That reframes recent product wins as infrastructure wins around the model, not as proof that next-token prediction alone became a scientist.

When you assess an AI product, inspect its feedback loop before its benchmark score. Teams with strong verification hooks and retry logic will keep outrunning teams that just swap in a larger base model.

Attribution:

musebox35 #1
flir #1
piker #1
anthonypasq #1

RL helps most when rewards are verifiable

Commenters zeroed in on Reinforcement Learning with Verifiable Rewards as the reason models suddenly got much better at coding and formal tasks. Where outputs can be checked by execution, proofs, or simulations, reinforcement learning can reshape the model toward high-scoring behavior. The unresolved part is how far that really expands capability versus just concentrating probability mass on already nearby solutions.

Expect the next reliable gains in domains that can be scored automatically. If your problem cannot be turned into a crisp verifier or simulator, today’s RL-heavy recipe will transfer badly.

Attribution:

kibibu #1
LarsDu88 #1
porridgeraisin #1 #2
highfrequency #1

Open-ended discovery still lacks a fitness function

The hard wall is not generating candidates but knowing what to reward before the answer is obvious. That is why open-ended learning and novelty search came up. In deceptive search spaces, a direct objective can trap you, so systems may need stepping-stone incentives rather than one clean target. This is a much harder setup than code compilation or theorem checking.

For research automation, spend time designing intermediate signals and search scaffolding. A vague top-level goal without usable rewards will produce lots of activity and very little progress.

Attribution:

flir #1
visarga #1

Supervision may be the scaffold for later search

One useful framing treated supervised fine-tuning as the source of inductive bias and reinforcement learning as the search process that exploits it. That makes discovery less mysterious. The model first learns the shape of viable trajectories, then search pushes within and around that space. The open question is whether that scaffold also traps the system inside inherited assumptions and limits more radical novelty.

Do not treat pretraining, supervised fine-tuning, and RL as interchangeable knobs. In your own systems, decide explicitly which stage is teaching priors and which stage is allowed to explore away from them.

Attribution:

visarga #1
musebox35 #1

Useful recombination may be enough for now

Several comments cut through the AGI framing and pointed out that many valuable advances do not require a civilization-changing insight. Recombining known methods better, searching more thoroughly, and executing with fewer bad choices can already create huge economic value. Rare genius-level breakthroughs matter, but most work benefits from faster accumulation of smaller wins.

Do not block adoption waiting for proof of machine originality. In product and operations work, better search and execution on known ideas can already justify deployment.

Attribution:

balazstorok #1
whatever1 #1
whiplash451 #1

Continual backpropagation points at persistent plasticity

One commenter highlighted Sutton’s own proposed fix for static models: continual backpropagation, where underused neurons are periodically reset to restore plasticity. The claim is that a model needs ongoing capacity for variation, not just one giant pretraining pass, if it is going to keep adapting instead of settling into a fixed representation.

Watch for model architectures and training schemes that preserve adaptability after initial training. Products that need long-lived learning may benefit more from plastic systems than from ever larger frozen models.

Attribution:

skybrian #1

Against the grain

RLVR may polish more than it discovers

A skeptical line held that Reinforcement Learning with Verifiable Rewards mostly improves selection and retention, not genuine variation or planning. In that view, the model still depends on an external planner, whether that is AlphaEvolve’s search procedure or a human steering Claude Code. If that is right, current gains say more about scaffolding than about autonomous discovery.

Be careful about attributing system-level performance to the model alone. If a product needs a human or hard-coded search loop to keep making progress, that dependency will shape both cost and scaling.

Attribution:

porridgeraisin #1 #2

RL-tuned models can get worse off-domain

One commenter argued that models optimized for verifiable tasks often feel narrower everywhere else. The same push toward decisive, checkable answers that helps on code and math can hurt ambiguous tasks such as diagnosis or nuanced advisory work, where premature certainty is a bug.

Do not assume a model that excels on benchmarkable domains is a better general assistant. Evaluate separately for ambiguous, high-judgment workflows where overconfident convergence can create risk.

Attribution:

code_biologist #1

Human-like creativity is the wrong benchmark

A more pragmatic pushback said Sutton’s framing overweights philosophical questions about creativity and underweights usefulness. Planes are not birds and still transformed transport. Likewise, a system that farms high-value solutions without human-style originality can still be enormously important for science, engineering, and business.

Measure systems by the quality and volume of useful outcomes they produce in your domain. Debates about whether they qualify as truly creative do not change procurement or product decisions.

Attribution:

balazstorok #1

In plain english

AGI ↩

Artificial general intelligence, the idea of an AI system with broad human-like capability across many tasks.

AlphaEvolve ↩

An AI system mentioned in comments as using an external evolutionary search or planning loop around a model.

AlphaGo ↩

A Go-playing AI system that combined neural networks with search and self-play to achieve superhuman performance.

Continual backpropagation ↩

A training approach mentioned in comments that periodically resets underused neurons so a model stays adaptable.

inductive bias ↩

The built-in assumptions or structural preferences that make a system learn some kinds of patterns more easily than others.

Lean ↩

A formal proof assistant and programming language used to write machine-checkable mathematical proofs.

Novelty search ↩

A search method that rewards finding new behaviors or states instead of directly optimizing one fixed objective.

Open-ended learning ↩

Research on systems that keep generating new behaviors, goals, or capabilities without a single fixed endpoint.

RL ↩

Reinforcement learning, a training method where a system learns by trying actions and getting rewards or penalties.

Reference links

Primary source and access links

YouTube video of Rich Sutton on AI creativity and discovery
Main submission linked in the story text
Xcancel mirror of Sutton tweet thread
Alternative access link for the tweet thread when X was unavailable

Papers and technical references

Box on mathematics and scientific method
Referenced to connect LLM plus evaluation loops to an older generate-test-refine view of science
In-context Learning and Induction Heads
Mentioned in discussion of whether RL could create test-time discovery behavior inside models
TRM architecture paper
Suggested as related work on modeling the problem and solution jointly
Continual backpropagation Nature paper
Cited as Sutton’s group proposal for preserving plasticity in deep learning systems

Books and research directions

Why Greatness Cannot Be Planned
Brought up as a pointer to novelty search and open-ended learning
TRIZ
Referenced as a framework that AI resembles in structured invention and problem solving

Rich Sutton on AI creativity and discovery

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Primary source and access links

Papers and technical references

Books and research directions

Related talks and articles