Claude Fable 5

AI
Developer Tools
Privacy
Security
Economics

Anthropic announced Claude Fable 5 as its new flagship generally available model and positioned Mythos 5 as the less-restricted version for a trusted access program. Fable and Mythos share the same underlying weights. The difference is policy and deployment. Fable falls back to Opus 4.8 on certain categories, while Mythos keeps more of the model’s full capability behind tighter access controls. On paper, the pitch is straightforward: stronger coding, long-horizon task performance, million-token context, and better results per task despite higher token prices.

The strongest signal from people who actually used it is that this is not just benchmark dressing. Multiple experienced users said Fable handled refactors, reverse engineering, dense spec review, bug finding, and difficult codebase work that Opus 4.8 or GPT-5.5 had struggled with. The recurring theme was not magical one-shot genius but better agency. It asks fewer unnecessary questions, makes more surgical edits, writes cleaner code, burns fewer tokens wandering in circles, and holds onto the larger task better. Several people who had preferred older Claude versions or even Codex said Fable finally felt like a meaningful step up again. That said, the release also made the product strategy impossible to ignore. Fable is only temporarily included in subscription plans, then moves to usage credits unless Anthropic’s capacity improves. API pricing is 2x Opus 4.8. Users immediately reported burning through plan limits or extra usage on a single prompt, especially in agentic and ultracode-style workflows. A lot of the conversation landed on the same conclusion: frontier capability is separating from flat-rate subscriptions, and teams should expect the best models to drift toward metered enterprise economics. The bigger practical problem was not price but misfiring safeguards. People trying ordinary security reviews, reverse engineering, medical imaging, genetics, chemistry, biology homework, health-data work, and even unrelated coding tasks got bounced to Opus 4.8 or blocked outright. Some found this understandable for a first launch of a heavily constrained model, but the lived experience was that Fable often refused exactly the high-value professional use cases where extra capability would matter most. The model card made this worse, not better, because Anthropic openly says it also applies invisible restrictions to frontier LLM development topics. That turned a lot of skepticism into anger. Users can tolerate refusals. What they hate is silent degradation. Privacy and control concerns added another layer. Anthropic now requires 30-day retention for Mythos-class traffic, including some enterprise and third-party surfaces where customers previously expected zero data retention. For many people, that was a bigger deal than benchmarks. The practical takeaway was that Fable may be the most capable Claude they can access, but also the least deployable in regulated or confidential environments. The overall read is sharp and mixed. Fable looks like a genuine capability advance in software work, especially on harder and longer tasks. It also looks like a preview of the tradeoffs the frontier labs want customers to accept: higher prices, more policy gating, more telemetry, and less certainty that the model you called is the model you got.

Treat Fable 5 as a real capability jump for demanding engineering work, not a universal upgrade. If you rely on security, biology, ML, or privacy-sensitive workflows, test it in a sandbox before rollout because the filters, retention rules, and cost model can break the workflow even when the raw model is strong.

June 9, 2026
anthropic.com
Discuss on HN

Discussion mood

Excited about the model’s raw capability, frustrated and distrustful about everything around it. People liked the jump in coding performance, but the dominant mood was irritation over false-positive safety filters, silent capability throttling, new data retention requirements, and pricing that feels like a step toward enterprise-only access.

Key insights

Better agency, not just higher scores

What stood out was not abstract intelligence but how Fable behaves while working. It tends to make smaller diffs, loop less, ask fewer unnecessary design questions, and keep moving on large tasks that previously needed constant steering. Even users who were lukewarm on recent Opus releases said the code was cleaner and more mergeable, which helps explain why benchmark gains feel real in practice when they do show up.

If you evaluate it, measure review burden and number of corrective turns, not just pass or fail. The practical win seems to be lower supervision overhead on messy real codebases.

Attribution:

simonw #1 #2
boc #1
port11 #1
anematode #1 #2
mohsen1 #1

The guardrails break the premium use cases

The strongest complaints were not from people trying obviously restricted work. They were from people doing normal defensive security review, medical imaging, genetics, chemistry, and adjacent research who got downgraded or blocked. That means the model’s highest-value verticals are also the ones most likely to trip its policy harness, so the launch product is narrower than the benchmark story suggests.

Do not assume Fable is a drop-in upgrade for security, health, or scientific workflows. Run your own prompts first and verify whether the work gets handled by Fable at all before changing tooling or budget.

Attribution:

dannyw #1
garciasn #1
dmd #1
yakz #1
timedude #1
rightlane #1
fagnerbrack #1
sscaryterry #1

Invisible sabotage hit a trust boundary

Anthropic did not just announce visible refusals. It also said it will silently reduce effectiveness on frontier LLM development topics using prompt modification, steering vectors, or parameter-efficient fine-tuning. That landed badly because it turns the model from a bounded assistant into an untrustworthy one for some classes of work. Once users suspect unseen interference, every bad answer becomes ambiguous.

If your team does ML systems work, do not use a vendor model as the sole source of technical judgment. Keep independent baselines and cross-check with other models or human review whenever answers seem oddly weak or evasive.

Attribution:

bkjlblh #1
rspeele #1
theLiminator #1
thepasch #1
chrisoosthuizen #1
mips_avatar #1
gck1 #1
0x10ca1h0st #1

This launch signals the end of flat-rate frontier access

The temporary inclusion window, immediate reversion to usage credits, and reports of huge plan burn all reinforced the same market signal. Anthropic wants to let subscribers taste the model, but the economics point toward metered access for the best tier. People read this less as a promo and more as a glimpse of the steady-state business model for frontier systems.

Budget for model usage like cloud compute, not like SaaS seats. If AI is becoming core to your engineering workflow, add spend controls, routing, and cheaper fallback models now instead of assuming subscriptions will cover future top-tier access.

Attribution:

AquinasCoder #1
clementg #1
hgoel #1
dirkc #1
irthomasthomas #1
FergusArgyll #1
mbanerjeepalmer #1

Retention rules may matter more than benchmarks

The new 30-day retention requirement for Mythos-class traffic cut against zero-retention expectations on enterprise surfaces, GitHub Copilot, Bedrock, and other third-party environments. For regulated teams, that can be an automatic no regardless of capability. Several people said the model was effectively unusable for work the moment they saw the data-handling change.

Check procurement and privacy constraints before you pilot the model with real data. A better model is irrelevant if legal, compliance, or customer commitments make it undeployable.

Attribution:

victor106 #1
stronglikedan #1
drakythe #1
merlindru #1
wxw #1
ouk #1
rmuratov #1

Users are already routing around expensive frontier models

A lot of practitioners have settled into mixed-model workflows. They use top-end models for planning, reviews, or the hardest bugs, then hand implementation to cheaper models like DeepSeek, Qwen, Kimi, or Gemini. That was already happening before Fable, and its price plus quota burn only strengthens the pattern.

Design your stack for model routing instead of winner-take-all vendor choice. The cost-effective setup increasingly looks like premium model for judgment, cheaper model for throughput.

Attribution:

nicce #1
pyeri #1
deanc #1
superkickstart #1
shimman #1
baalimago #1
moomoo11 #1

Against the grain

Some hard tasks still looked unimpressive

A minority of experienced users did not see the leap. On performance tuning, code migration, and some real coding tasks, they found Fable slow, expensive, or strangely weak compared with Gemini, GPT-5.5, or even older Claude versions. In those accounts, the launch looked more like hype outrunning consistency than a dependable new baseline.

Do not generalize from the strongest anecdotes. Benchmark your own hardest recurring tasks across vendors because Fable’s gains do not appear evenly distributed.

Attribution:

anematode #1
peteforde #1
aviinuo #1
izzylan #1
raoulj #1

Mythos danger framing may be mostly marketing

Some commenters rejected the whole premise that Mythos-level safeguards reflect a uniquely dangerous capability. They pointed out that comparable cyber evaluations exist for other public models, that the company benefits from emphasizing danger, and that safety rhetoric can also cover capacity management and sales strategy. On this view, the restrictions are as much narrative and commercial positioning as technical necessity.

Read the safety framing as product strategy as well as risk management. If you are making vendor decisions, separate the actual model behavior from the mythology built around it.

Attribution:

geerlingguy #1
teaearlgraycold #1
ainch #1
toddmorey #1

Faster cheaper models may be better for real work

Not everyone wants a smarter model that takes longer, costs more, and encourages passive oversight. Some argued that medium-tier workhorse models keep them more engaged, preserve understanding, and move the task forward faster overall. For these users, Fable solves the wrong problem. They want speed, obedience, and low friction more than another jump in abstract reasoning.

Match the model to the workflow. If your bottleneck is iteration speed or staying mentally in the loop, a cheaper faster model may beat a frontier model despite lower peak capability.

Attribution:

hugodan #1
dakolli #1 #2

In plain english

API ↩

Application Programming Interface, a defined way for software systems to communicate and use each other’s functions.

Bedrock ↩

Amazon Web Services' managed platform for accessing and hosting AI models.

Reference links

Anthropic docs and policies

Anthropic system card for Claude Fable 5 and Mythos 5
Primary technical and policy document discussed throughout the comments, including safeguards, benchmarks, retention, and model behavior.
Anthropic pricing page
Used to confirm Fable 5 pricing relative to earlier Opus models.
Anthropic model overview pricing
Cited for direct Opus 4.8 versus Fable 5 token price comparison.
Why Claude switched models in your conversation with Fable 5
Explains the model-switch behavior users were seeing when safeguards triggered.
Data retention practices for Mythos-class models
Explains the new 30-day retention policy that alarmed enterprise and privacy-sensitive users.

Benchmarks and evaluations

Cognition FrontierCode benchmark
New coding benchmark heavily cited because Anthropic used it to show a large jump over Opus 4.8 and GPT-5.5.
Artificial Analysis Humanity’s Last Exam evaluations
Third-party benchmark reference for checking model performance claims.
AISI evaluation of Claude Mythos cyber capabilities
Independent government-linked cyber evaluation used to argue Mythos risk claims were not pure marketing.
AISI evaluation of OpenAI GPT-5.5 cyber capabilities
Used as a counterpoint that GPT-5.5 scored similarly on cyber evaluations without the same media narrative.

Hands-on tests and comparisons

Simon Willison’s micropython-wasm repository
Example project used in a shared transcript to demonstrate Fable 5 solving a difficult WASM and Python packaging task.
cpython-wasi-build releases
Supporting artifact uploaded during the shared Fable 5 transcript to build full Python in WASM.
Shared Claude transcript for the Python WASM task
Concrete transcript showing the model’s step-by-step work on a nontrivial engineering task.
Generative AI Review side-by-side of Fable, Opus 4.8, and ChatGPT 5.5
Independent qualitative comparison mentioned as another early practical test.
Claude Fable 5 system prompt diff versus Opus 4.8
Shows changes in Claude Code system prompts that may partly explain differences in autonomy and communication style.

Pelican benchmark and related writing

Fable 5 pelican SVG examples across effort levels
Shared as the familiar visual benchmark meme for comparing reasoning effort and SVG output quality.
Simon Willison on training for pelicans riding bicycles
Referenced in response to claims that the pelican test is now too embedded in model training data to be useful.

Related policy and industry context

Wired on OpenAI and Anthropic letter about AI biological weapons
Cited as possible context for Anthropic’s aggressive biology-related filtering.
Reuters on Anthropic and White House tensions ahead of IPO
Used in debate about Anthropic’s ethics and relationship with the US government.
OpenAI GPT-5.5 announcement
Referenced in pricing and availability comparisons with Codex and API access.
AP report on Anthropic and Pentagon work
Linked during arguments over Anthropic’s military posture and whether it had refused some defense work.