Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers
- AI
- Security
- Regulation
- Developer Tools
- Open Source
The article argues that federal concern around Anthropic’s Fable 5 was tied to an embarrassingly simple path around its cyber restrictions. Instead of asking for vulnerability discovery directly, researchers reportedly asked the model to fix code and produce tests. Because security fixes usually expose the flaw they patch, and tests can look a lot like exploit proofs, that was enough to recover the same information the guardrails were meant to block. The key context is that Anthropic had positioned Mythos as unusually dangerous for offensive cyber use, then shipped Fable as a constrained version that was supposed to hand sensitive requests off to a weaker model. That framing set a trap for them. If the model can still surface vulnerabilities through normal development work, the denials were never robust. If you harden the denials enough to stop that, you cripple the product as a coding assistant.
If you rely on hosted coding models, treat policy shutdown risk as seriously as model quality. On security work, assume useful bug-finding and offensive capability are inseparable, so plan around open models, multi-vendor options, and stricter internal review rather than expecting provider guardrails to protect you.
- theregister.com
- Discuss on HN