That framing shaped almost all of the reaction. People did not read this as evidence that
prompt injection is solved. They read it as evidence that a
frontier model with strong instructions can survive a pile of mostly single-shot email jailbreaks when the useful behavior is heavily constrained. The sharpest criticism was that the test ducks the hard part. A secure assistant is not one that ignores an input channel. It is one that can tell good requests from malicious ones while still doing real work. Several commenters pushed on the missing metric: false positives versus false negatives. Without legitimate email tasks in the mix, no one can tell whether Fiu was robust or just overly defensive.
The second big theme was threat model realism. Resetting context per email removes the multi-turn “frog boiling” style attacks many people see as the practical failure mode. Not letting the agent freely reply also removes attacker feedback loops that would normally help refine an exploit. Others noted that direct prompt injection by email is the easier case anyway. Real exfiltration risk often comes from indirect channels like fetched web pages, tool outputs, file handling, calendar invites, or shell actions that move data out of band without ever printing the secret in a reply.
The comments still gave the experiment some credit. A lot of casual
jailbreak lore says frontier models are trivial to break. This result cuts against that. People accepted that modern models like Opus are better at spotting obvious social engineering and refusing blatant secret extraction attempts than they were two years ago. But the dominant conclusion was narrower and more practical: stronger instruction following raises the cost of attack, it does not remove it, and permissions matter more than confidence. Even the author echoed that line by saying the experience made them less worried, but not comfortable enough to grant agents permission to send email. Cost became part of the security story too. The system burned several hundred dollars, Gmail rate-limited the account, and multiple commenters pointed out that “
denial of wallet” is itself a viable attack surface for public-facing agents.