Codex logging bug may write TBs to local SSDs

AI
Developer Tools
Programming
Open Source
Software Quality

The submitted issue says Codex has a logging bug that can dump massive amounts of trace data into a local SQLite log file, with users reporting databases in the tens of gigabytes and constant background writes even while sessions sit idle. A later comment points to a fix already merged into the open source Codex repo, so the immediate incident looks tractable. What stuck with people was not the bug itself, but what it says about the state of AI coding products. The mood was that this is exactly the kind of boring, preventable failure these companies claim their own tools should be eliminating, yet both Codex and Claude Code keep shipping resource-heavy desktop software that burns CPU, GPU, RAM, disk, or battery for no obvious user benefit.

If you use coding agents, audit their local disk, CPU, GPU, and memory behavior like any other untrusted dependency. For teams betting on AI-assisted development, the weak point is no longer model capability alone but whether the surrounding client software and review process are disciplined enough to ship safely.

June 22, 2026
github.com
Discuss on HN

Key insights

SQLite workarounds exposed how bad it got

Users were not just complaining in the abstract. They were applying database-level damage control. One workaround installs a SQLite trigger that drops all future log inserts. Another report said a Codex log database shrank from 27 GB to 73 MB after VACUUM FULL. A separate user wrote a script to keep deleting the write-ahead log because disk pressure was locking up a server. That turns this from a cosmetic bug into a local operations problem with a clear failure mode and a clear blast radius.

Check the on-disk state of any local agent, not just whether the UI feels slow. If you see SQLite or write-ahead log growth, cap it immediately with a workaround or remove the tool until the patched release lands.

Attribution:

woadwarrior01 #1
Zenul_Abidin #1
ewsbr #1

The harness layer is where trust breaks

The sharpest criticism was not about model quality. It was about the thin client and tool layer around the model. Commenters pointed to incomplete Model Context Protocol support, inability to use preferred harnesses on some plans, and missing basics like exposing usage cleanly or supporting self-invoked commands. The argument was that the models may be valuable, but the official wrappers are now the unstable part of the stack. That is why power users are peeling away to Pi, Opencode, or custom setups while keeping the same underlying models.

Separate your model choice from your tool choice. If a vendor’s official client is the unstable component, switch harnesses before you switch models.

Attribution:

CharlieDigital #1
deathbob #1
thewebguyd #1

AI code still leaves humans fully accountable

A concrete production incident grounded the broader complaints. One commenter described a company incident where Claude-generated code mishandled ordering and transaction guarantees under normal load, and the cleanup fell back to engineers. Another recounted someone trying to defend bad production code with “the code wasn't written by me,” which was treated as absurd. The useful frame here is not whether AI authored the code. It is that unsupervised generation does not reduce accountability, and pressure to appear “10x with AI” can make teams hide that fact until a postmortem forces it out.

Make AI provenance explicit in reviews and incident reports, then hold the same owner accountable anyway. If management is measuring adoption instead of defect rate, expect hidden failures and bad incentives.

Attribution:

cryo32 #1
flir #1
Imustaskforhelp #1

Prompt boundaries are not real boundaries

Several people said they no longer trust instruction-only safety in coding agents. Even when a model respects repo rules or sandbox guidance today, context loss or a future regression can erase that behavior. That is why users described moving agents into Podman containers, full virtual machines, or pi-sandbox with explicit override flows. The point was practical, not philosophical. If the only guardrail lives in the prompt, it is not a guardrail.

Put coding agents behind hard technical containment before giving them meaningful repo or system access. Treat prompt instructions as hints, not policy enforcement.

Attribution:

l33tman #1
drakythe #1
matheusmoreira #1
newtwilly #1

This needed resource budgets, not just correctness tests

People pushed past “why no QA” into a more precise complaint. A bug like this is easy to miss with normal functional tests because logging is technically doing what it was asked to do. What would have caught it is resource-aware integration or load testing with limits on disk growth, memory use, and long-running idle behavior. One commenter argued that this kind of stop-and-think systems judgment is exactly what disappears when teams over-trust generated glue code.

Add explicit non-functional budgets to CI for agent-facing apps. Test idle sessions, long runs, and log growth, not just whether features return the right output.

Attribution:

altcognito #1
cute_boi #1
tomjakubowski #1
bakugo #1

Against the grain

Fast growth explains some rough edges

One commenter argued that people are pretending a one-year-old product with massive adoption should already have every operational problem solved. The claim was that the goalposts will keep moving until critics are forced to admit the tools are working, because no fast-scaling software product fixes every painful edge case on day one. That does not excuse the bug, but it does push back on treating every flaw as proof the whole category is fake.

Do not confuse an ugly client bug with a full verdict on model utility. Judge the product in layers, because the surrounding app can be bad while the underlying capability is still commercially useful.

Attribution:

reducesuffering #1

Bad polish is not proof AI cannot code

A minority view held that this is still just ordinary product prioritization failure. Humans decide whether to spend cycles on polish or new work, even in AI-heavy teams, and buggy software built with AI does not prove the model itself is useless. The useful part of that argument is narrow but real. Some of the anger is aimed at marketing claims, while the immediate cause may simply be teams shipping low-priority client code without enough care.

When you evaluate AI coding tools, split capability questions from product-management questions. A weak desktop app may reflect bad priorities more than hard limits of the model.

Attribution:

hombre_fatal #1 #2 #3

Humans shipped this class of bug too

A few commenters resisted turning the incident into a uniquely AI-made failure. They pointed to pre-AI software that looped forever writing backups or logs because of mundane edge cases like daylight saving time logic. The deeper point was that average production code has always been messier than developers like to admit, and frontier labs may just be reproducing the industry’s usual “good enough” engineering habits at higher speed.

Use this incident to tighten your own engineering standards, not just to laugh at AI labs. If your process would miss runaway logging from a human-written feature, it will miss it from AI-written code too.

Attribution:

indiv0 #1
lifthrasiir #1

In plain english

ACP ↩

Agent Client Protocol, a way for an editor or client to talk to coding agents and models.

Electron ↩

A framework for building desktop apps using web technologies, typically with broader access to the local system than a website has.

GPU ↩

Graphics Processing Unit, a chip often used to run AI models because it handles parallel computation well.

Model Context Protocol ↩

A protocol for connecting AI assistants to external tools, prompts, and resources in a standard way.

Podman ↩

A container runtime similar to Docker that can isolate processes and filesystems from the host machine.

QA ↩

Quality Assurance, the process of testing software to find defects and verify behavior before release.

SQLite ↩

A lightweight embedded database that stores its data in a local file instead of running as a separate server.

VACUUM FULL ↩

A database maintenance operation that rebuilds storage to recover unused space and shrink a bloated database file.

Reference links

Bug reports and fixes

OpenAI Codex logging fix commit
Shows that a fix for the logging bug was merged and should appear in the next release
OpenAI Codex issue 28224 workaround comment
User-provided script for deleting the growing write-ahead log file
OpenAI openai-python issue 2472
Cited as another example of an OpenAI issue that was demoed as fixed but remained open
Anthropic Claude Code permission bypass issue 16180
Raised as an example of long-lived security-relevant bugs and weak issue triage

Alternative harnesses and clients

OpenAI Codex repository
Used to argue that the CLI is patchable and customizable even if the desktop app is proprietary
Quillcode
An open source native Swift alternative to the official Codex desktop app
claude-agents-md
Plugin to add AGENTS.md support to Claude Code workflows
pi-sandbox issue 50
Shows a sandbox plus override setup for running coding agents more safely

Standards and workflow discussions

Claude Code AGENTS.md support issue 6235
Central example in the discussion about fragmented instruction file conventions across tools
claude-md-symlinker
Shared as an overengineered workaround for CLAUDE.md and AGENTS.md naming differences
Anthropic Claude directory docs
Referenced to dispute claims about current Claude Code log writing behavior
Anthropic settings docs
Referenced to show that transcript cleanup period is configurable

Background reading on AI-generated code quality

OpenAI Harness Engineering post
Cited as evidence that OpenAI publicly emphasized AI-written Codex development
Our first outage from LLM-written code
Used to support the idea of comprehension debt and production risk from AI-generated code
OpenAI unmerged demo blog post
Used to criticize OpenAI for showcasing fixes that were not actually merged
Joel on Software: Fire and Motion
Referenced to frame partial standards support as a possible competitive distraction tactic

Codex logging bug may write TBs to local SSDs

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Bug reports and fixes

Alternative harnesses and clients

Standards and workflow discussions

Background reading on AI-generated code quality