The text in Claude Code’s “Extended Thinking” output

AI
Developer Tools
Security
Open Source

The post claims Anthropic is presenting Claude Code’s “Extended Thinking” as if it were the model’s live reasoning, when in practice users usually see a summary or rewritten version instead of the original token stream. The author’s complaint is not philosophical. It is operational. If you want to inspect model drift, understand a bad tool choice, or keep a faithful record of what an agent did during a long coding session, a polished recap is not the thing you need.

Treat hidden or summarized reasoning as a product constraint, not a temporary quirk. If your workflow depends on auditability, reproducibility, or debugging why an agent made a bad call, favor tools and models that expose more of their process or be ready to build your own instrumentation around opaque APIs.

June 22, 2026
patrickmccanna.net
Discuss on HN

Discussion mood

Mostly negative toward vendor opacity. People were not shocked that the visible text is summarized, but they were frustrated that major labs optimize for anti-distillation and safety optics over user observability, especially in coding agents where hidden reasoning makes bad decisions harder to debug.

Key insights

Reasoning retention changes across sessions

Anthropic’s own docs and postmortem show that hidden reasoning is not a stable artifact you can rely on from turn to turn. Depending on model class, cache state, and whether a session sat idle, prior thinking may be preserved, trimmed to the last turn, or cleared entirely. That means even perfect access to visible traces would not give you a consistent longitudinal record unless you also control the product path and session lifecycle.

Do not build evals or regression checks that assume a hosted model carries its prior hidden reasoning forward in a fixed way. Log explicit intermediate artifacts you care about in normal output, or your reproducibility will fall apart when vendor behavior shifts.

Attribution:

btown #1
flaghacker #1
Roritharr #1
haus20xx #1

Hidden rationale blocks real agent debugging

When a coding agent makes a strange implementation choice, asking it afterward why it did that can produce a neat story instead of the real cause. One concrete example traced a bad architectural decision back to a line in CLAUDE.md that the model misread as a ban on touching an existing module. Visible summaries did not expose that. Rawer traces likely would have made the failure mode obvious much earlier.

If you use AI coding agents in review or production, treat retrospective explanations as unreliable. Put more effort into prompt constraints, file-level policies, and observable checkpoints during execution rather than trusting the model to explain itself afterward.

Attribution:

drdexebtjl #1

Readable thought traces can still be gibberish

Examples from DeepSeek R1 and Mythos show that a model can reach the right answer while emitting reasoning text that looks like compressed private shorthand or outright nonsense. That weakens the idea that exposing chain-of-thought automatically gives you human-legible interpretability. At best, it gives you another model artifact, and sometimes not even a very readable one.

Do not confuse access to chain-of-thought with true transparency. If you need confidence in high-stakes workflows, combine traces with output tests, tool logs, and task-level evals instead of assuming readable prose will explain the model’s behavior.

Attribution:

ekidd #1
arjie #1
drdaeman #1

Opaque reasoning widens the tool exfiltration risk

The security concern is not just hidden prose. It is the combination of hidden reasoning, web retrieval, and server-side tools. A poisoned page or retrieved document can smuggle instructions into context, and a model can then make follow-on requests that leak context or secrets while the human only sees a benign summary. Even if client-side tool calls are visible, server-side integrations like search or cloud storage access reduce what the operator can audit.

If your agent can browse, search, or touch sensitive systems, model transparency is only one part of the control surface. Reduce privileges, isolate secrets, and log tool activity independently from whatever reasoning summary the vendor shows you.

Attribution:

irthomasthomas #1 #2 #3
exit #1

Prompted reasoning still leaks some process

Several practitioners said they get more actionable insight by forcing explicit planning into the normal output channel than by relying on vendor “thinking mode.” Old-school chain-of-thought prompting, specs, checklists, and review artifacts are cruder, but they are at least visible, durable, and under user control. That also sidesteps model-specific rules about which hidden thinking blocks survive into later turns.

If inspectability matters, ask the model to produce planning artifacts you can store and review. You may get less raw capability than hidden reasoning mode, but you gain process visibility you can actually use.

Attribution:

stingraycharles #1
stavros #1
nomel #1

Against the grain

Raw chain-of-thought may not help much

Several commenters argued that the post asks too much from reasoning traces in the first place. Models often arrive at correct answers through token sequences that look wrong, ugly, or disconnected from the real underlying computation. If that is true, then a demand for raw traces risks overvaluing an artifact that was never a faithful window into cognition.

Be careful not to turn chain-of-thought access into a proxy for trust. Validate models by task performance and controlled experiments, not by whether their visible traces feel sensible to a human reader.

Attribution:

datastoat #1
VulgarExigency #1
CamperBob2 #1
andai #1

Summaries are better for routine use

A smaller but credible view was that terse summaries are actually the right product default. Full traces are expensive to scan and often add noise, while a short recap gives enough context for daily use without burying the answer in pages of internal monologue. For these users, the loss is mostly about power-user inspection, not ordinary productivity.

Separate your needs for convenience and for auditability. A summarized mode may be fine for day-to-day work, but keep a different toolchain for debugging, benchmarking, or sensitive workflows.

Attribution:

_fat_santa #1
solarkraft #1

This was plainly disclosed already

Some pushback said the controversy is overstated because vendors have been explicit for months that they provide “summarized thinking” rather than raw traces. The issue is less deception than the fact that many users did not internalize what the product labels implied. That weakens the claim of a shocking discovery, even if the product tradeoff is still unpopular.

Read model docs literally, especially around context retention, thinking preservation, and tool behavior. Product names like “thinking” or “reasoning” are marketing wrappers, not guarantees about what artifact you are actually seeing.

Attribution:

layer8 #1
InsideOutSanta #1
sigmar #1

In plain english

chain-of-thought ↩

The model’s intermediate reasoning text, whether visible or hidden, used to help produce an answer.

CLAUDE.md ↩

Anthropic’s repository-specific instruction file for Claude Code, similar in purpose to AGENTS.md.

distillation ↩

Training a smaller or cheaper model to imitate a stronger model’s outputs so it can copy some of its behavior.

Extended Thinking ↩

Anthropic’s feature that lets Claude spend extra tokens on intermediate reasoning before producing an answer.

tool calls ↩

Requests from a model to external functions or services such as search, file access, or code execution.

Reference links

Vendor documentation and postmortems

Anthropic postmortem on April 23 context clearing bug
Used to explain how Anthropic handled hidden thinking after idle periods and how a bug changed session behavior.
Anthropic context windows documentation
Cited for Anthropic’s description of how thinking blocks are stripped or preserved across turns.
Anthropic extended thinking preservation by model
Lists model-specific rules for keeping prior thinking blocks.
OpenAI encrypted reasoning items documentation
Referenced as OpenAI’s equivalent mechanism for hidden or encrypted reasoning across turns.

Security and hidden reasoning risks

Fooling Around with Encrypted Reasoning Blobs
Referenced repeatedly in concerns about hidden reasoning, prompt injection, and what encrypted traces mean for security and control.

Interpretability and chain-of-thought reliability

Word Magic blog note on illegible chains of thought
Example of a garbled reasoning trace that still led to a correct chemistry answer.
Even illegible Mythos reasoning traces seem pretty legible
Example of compressed, odd-looking reasoning text used to argue that visible traces may drift toward nonhuman shorthand.
Attribution graphs biology dive on chain-of-thought
Cited to support the claim that visible chain-of-thought can diverge from the actual internal computation.
Meta paper on neuralese-style latent reasoning
Mentioned in discussion of future or hypothetical reasoning that stays in internal vectors instead of text tokens.

Distillation examples and alternative workflows

Qwen distilled from Claude 4.6 Opus reasoning
Concrete example offered to show that labs and hobbyists already train on exposed reasoning traces.
AI loops and collaboration prompting workflow
Shared as a practical workflow using multiple visible planning and review artifacts instead of hidden thinking alone.

Related Hacker News threads

HN thread on Claude context clearing rationale
Referenced for additional explanation from a Claude team member about why thinking was elided after idle periods.
HN thread on OpenAI dropping reasoning tokens
Referenced as a comparable case where OpenAI reportedly drops reasoning tokens selectively.
HN thread on GLM vs Opus transparency
Pointed to as an example of users valuing models that reveal more of what they are doing.