Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable
- AI
- Security
- Developer Tools
- Open Source
The article reports that cybersecurity researchers are unhappy with Anthropic’s safety controls around Fable, a new flagship model that Anthropic says needs stronger protections for cyber and biology-related use. What set people off was not merely refusal on obviously dangerous prompts. It was the breadth and opacity of the filtering. Multiple people said normal work like secure coding, Docker log analysis, reverse engineering, privacy tooling, home automation logs, white papers, and CTF tasks kept triggering downgrades or refusals. Several pointed to Anthropic’s own model card language saying some interventions for model distillation and competing-model research may be invisible to users, using prompt modification, steering vectors, or parameter-efficient fine-tuning rather than an explicit fallback. That made the mood turn from annoyed to hostile. People can live with a hard “no.” They do not want a model that quietly gets worse while still billing and presenting itself as the same product.
If you rely on frontier models for security, privacy, reverse engineering, or life science work, treat vendor guardrails as a product risk, not an edge case. Keep fallback providers and local options ready, because model capability now varies as much by policy layer as by benchmark quality.
- techcrunch.com
- Discuss on HN