Prompt Injection as Role Confusion
- AI
- Security
- Research
- Developer Tools
The paper and writeup frame prompt injection as “role confusion.” Modern chat models are fed one token stream containing system instructions, user text, tool output, and sometimes reasoning traces. The claim is that models do not reliably honor those formal role labels. They often infer role from the style of the text instead, so user or tool content that sounds like internal reasoning or policy can be treated as higher-trust instructions. That gives a cleaner explanation for why simple tag filtering and chat formatting keep failing.
Do not treat system, user, tool, or chain-of-thought labels as security controls. If your product lets an LLM take actions, design it as if every text input is adversarial and keep authority in code, sandboxes, and tightly scoped tools.
- role-confusion.github.io
- Discuss on HN