Lockdown Mode

AI
Security
Developer Tools
Enterprise Software

The Help Center page introduces Lockdown Mode as a safer operating mode for ChatGPT when users are worried about prompt injection and data exfiltration. In plain terms, it reduces what the product can do. It turns off or limits features that could let a compromised agent move sensitive information out through the web, connectors, or generated content. That framing landed as the real message here. OpenAI is not claiming it can solve prompt injection at the model level. It is narrowing the blast radius by removing capabilities.

If you are evaluating AI agents for internal company use, treat prompt injection as an unsolved systems problem, not a filter-tuning problem. Ask exactly which tools, network paths, and file access routes remain available in each product mode, because headline safety switches may leave critical escape hatches open.

June 6, 2026
help.openai.com
Discuss on HN

Key insights

Default mode still has real exfil paths

The core implication is uncomfortable. If Lockdown Mode reduces risk by blocking ways an agent can communicate stolen data out, then standard operation still leaves those channels open enough to matter. The useful definition of “robust protection” here is not clever prompt analysis. It is eliminating every outbound route a compromised agent could use.

Do not treat an optional lockdown switch as proof the base product is safe enough for sensitive work. Review the default permission model first, then decide whether the secure mode is your baseline rather than an exception.

Attribution:

simonw #1 #2

Prompt injection filters are losing ground

The sharpest technical read is that Lockdown Mode amounts to a product-level concession that prevention systems are not dependable. Once the model can read untrusted content, use tools, and access secrets, the problem stops looking like sanitizing prompts and starts looking like containing a motivated process. That makes capability restriction more credible than any claim of universal prompt-injection detection.

If a vendor pitches prompt-injection prevention as a standalone control, push for concrete containment measures instead. Ask what happens after compromise, not just how they try to prevent it.

Attribution:

kirtivr #1 #2
sigmoid10 #1

LLM security looks more like human security

The most useful framing shift was to stop expecting agent failures to resemble traditional deterministic software bugs. Several commenters argued that these systems fail more like people do. They can be manipulated by context, conflicting instructions, and persuasive input, much like phishing works on humans. That does not excuse the risk. It means the right defenses look more like supervision, constrained privileges, and independent validation than classic input sanitization alone.

Design agent workflows the way you would design access for a fallible employee. Use least privilege, approvals for high-risk actions, and separate validators for outputs that matter.

Attribution:

kijin #1
mapontosevenths #1 #2
Smaug123 #1
hypeatei #1

Useful agents need secret and network mediation

A stronger path than simply turning features off is to put narrow controls between the model and sensitive resources. Comments pointed to configurable outbound allowlists, secret proxies, and other intermediaries that preserve some utility while cutting obvious leak paths. That reframes Lockdown Mode as a coarse emergency brake, not the shape of a mature enterprise architecture.

If you want agent usefulness without full exposure, build or buy mediation layers around credentials and outbound network access. Product-level on or off switches are too blunt for serious internal deployments.

Attribution:

zerobees #1
cosmicriver #1
ACCount37 #1
noir_lord #1

Codex leaves a separate network hole

The help doc itself apparently excludes Codex from Lockdown Mode network restrictions. That means a team could enable the new mode in ChatGPT and still have an outbound path through Codex when it works against internal codebases. The gap matters because it breaks the intuitive assumption that one security mode covers the whole product surface.

Map controls per product and per tool, not per brand. A single vendor can expose very different risk depending on whether users are in chat, research, or coding workflows.

Attribution:

madanparas #1

Against the grain

Use OS permissions before blaming the app

The complaint that Codex can read all local files was pushed back on with a simpler point. Desktop operating systems already have permission and containment mechanisms, and users should not expect an AI coding tool to invent those from scratch. That does not clear the product of poor defaults, but it does shift some responsibility back to deployment hygiene.

Before adopting any local AI tool, decide whether your endpoint controls already provide the sandboxing you need. If they do not, fix that at the OS or container layer instead of assuming the application will save you.

Attribution:

thomas34298 #1
BSDobelix #1

Humans are not safer than agents

One pushback rejected the idea that AI is uniquely untrustworthy. Human employees are already compromised through phishing and insider abuse, so “acts like a risky human” is not a novel security category. The sharper interpretation is that LLMs automate familiar human failure modes at machine speed, which makes old security weaknesses more visible rather than creating an entirely alien problem.

Compare agent risk to the real behavior of your staff and contractors, not to an idealized careful employee. That comparison will tell you where automation is merely scaling existing exposure and where it creates genuinely new attack paths.

Attribution:

ACCount37 #1

In plain english

agent ↩

An AI system that can take actions using tools, files, websites, or other resources instead of only generating text.

Codex ↩

A code-focused language model and tool interface associated with OpenAI for generating and editing code.

data exfiltration ↩

The unauthorized transfer of sensitive data out of a system.

LLM ↩

Large language model, a machine learning system trained on large amounts of text that can generate and analyze language and code.

Lockdown Mode ↩

A restricted operating mode in ChatGPT that disables or limits certain features to reduce security risks such as data leakage.

prompt injection ↩

A technique for manipulating an AI system by embedding hidden or malicious instructions in its input.

Reference links

OpenAI and vendor docs

OpenAI Help Center: Lockdown Mode
The source document announcing and describing the new restricted mode.
GitHub issue: Codex can read all files on PC unless containerized
Cited as evidence that OpenAI's coding tools may have overly broad file access by default.

Related analysis and comparisons

Simon Willison on OpenAI’s new Lockdown Mode
Referenced as related commentary and as the source of the “lethal trifecta” framing mentioned in comments.
Apple Support: Lockdown Mode
Used to compare OpenAI’s naming and concept with Apple’s existing Lockdown Mode feature.
Sam Altman post on X
Shared as a related executive post tied to the announcement.

Lockdown Mode

Discussion mood

Key insights

Against the grain

In plain english

Reference links

OpenAI and vendor docs

Related analysis and comparisons