Rich Sutton’s post argues that novelty alone is cheap. Real creativity and discovery need three things: variation, evaluation, and selective retention. His point is not that AI can never discover anything. It is that a standalone generative model can spray out novel candidates, but without a mechanism to test them, score them, and preserve the good ones, those candidates do not become discoveries. He points to systems like AlphaGo and coding agents as the more interesting pattern because they sit inside a loop tied to an external source of truth.
Most readers accepted that framing but thought Sutton aimed it at the wrong target. The pushback was that modern AI practice is already built around that loop. People kept pointing to
RL with verifiable rewards, compiler feedback, theorem provers like
Lean, and coding agents such as Claude Code as evidence that current systems do not stop at next-token generation. The useful unit is the whole system, not the base model. That turned the discussion away from metaphysical arguments about whether LLMs are “really creative” and toward a narrower engineering claim: progress is strongest where evaluation is cheap, reliable, and automatable.
That is where the conversation landed. Coding and some areas of math look good because the harness is strong. Science and open-ended discovery remain hard because the fitness function is weak or missing. Several commenters said this is the real bottleneck, not generation. You can produce endless candidates if you have enough compute, but if you cannot tell which ones are promising, you just burn money. A related split emerged over what RL actually buys you. Some argued RL with verifiable rewards already pushes models beyond their base behavior distribution. Others said it mostly sharpens search inside a space the pretrained model already made reachable, so the true missing piece is better planning and better evaluators, not more polishing of the same model.
A smaller but recurring theme was that Sutton’s claim matters even if it undersells current systems. You do not need human-like creativity for business value. High-quality recombination plus a tight test loop is already enough to produce useful software, math, and optimization wins. The practical reading was blunt: agentic harnesses are carrying more of the current AI wave than people admit, and the next gains will come from better environments, rewards, memory, and search rather than from treating generative models alone as discovery machines.