HN Debrief

When I reject AI code even if it works

  • AI
  • Programming
  • Developer Tools
  • Software Engineering

The post lays out a simple rule set for AI-generated code: passing tests is only the floor. Code still gets rejected if the author cannot explain it in plain language, if the change is larger than the problem, if it introduces abstractions before they are needed, or if it leaves the system harder to maintain. That framing landed with most readers because it maps cleanly onto normal engineering judgment. Plenty of people said you should hold coworker code to the same bar. The difference is that AI makes it cheap to produce a lot of plausible code very quickly, which turns ordinary bad habits into a scaling problem.

Treat AI code like accelerated junior output, not trusted automation. If your team cannot enforce understanding, scoped changes, and strong review, AI will amplify your existing process failures faster than it creates value.

Discussion mood

Cautious to negative. Most commenters use AI and see clear value for boilerplate, ports, and low-risk tasks, but they are frustrated by sycophancy, overengineered output, and the way fast code generation overwhelms review and magnifies weak engineering culture.

Key insights

  1. 01

    ML leakage errors look legitimate

    Model-generated machine learning code can hide evaluation bugs that are hard to spot even for experienced practitioners. The concrete example was data leakage in calibration and holdout logic, where the code and metrics looked plausible until someone with domain knowledge traced the data flow and found that future information had leaked into evaluation. That changes the risk profile of AI coding in ML because passing tests and decent scores do not tell you the experiment is valid.

    Do not let agents design or validate ML evaluation pipelines without expert review. Add explicit checks for leakage, row splits, and label contamination before you trust any reported metric.

      Attribution:
    • abhgh #1
    • nostrebored #1
  2. 02

    Agents default to needless abstraction

    The common failure mode was not just wrong code. It was code that solved simple problems with elaborate scaffolding, duplicate helpers, and architecture that fights the existing system. Examples ranged from payment flows with subtle accounting errors to frontend loops replacing obvious database aggregation and layout changes that ignored established patterns. The code often looked polished at a glance, which is exactly why it is dangerous.

    Constrain agents with examples from your existing codebase and ask for the smallest change that fits local conventions. Reject any first pass that expands the abstraction surface faster than the feature demands.

      Attribution:
    • figassis #1
    • itopaloglu83 #1
    • danfritz #1
  3. 03

    AI removes the senior engineer's refusal instinct

    Good senior engineers do not start invasive work by guessing. They ask questions, map the system, write tests around unknown behavior, and sometimes refuse work until they understand it. Coding agents never do that on their own. They charge ahead with full confidence, which means the user has to supply the caution and the stopping power that an experienced human would naturally bring.

    Build explicit stop conditions into your workflow for unknown code paths, risky changes, and missing context. If the task would require a human to slow down and investigate first, your agent workflow should do the same.

      Attribution:
    • kerkeslager #1
    • mkozlows #1
    • Agentlien #1
  4. 04

    AI accelerates existing org dysfunction

    The problem is bigger than model quality. Organizations already reward giant commits, hero behavior, weak review, and shipping at any cost. AI lets those same incentives produce more code, more hidden coupling, and more cleanup work for the few people who still understand the system. The likely outcome is not dramatic 'software bankruptcy' announcements but slower delivery, senior attrition, and expensive rewrite or modernization efforts that arrive too late.

    Audit your incentives before you roll out coding agents broadly. If you reward output volume over code ownership and review quality, AI adoption will show up later as retention, reliability, and velocity problems.

      Attribution:
    • busterarm #1
    • onion2k #1
    • danaris #1
  5. 05

    The productive middle ground is narrow

    The strongest practical pattern was using AI as a power tool rather than an autonomous coder. People reported real gains when they kept architectural control and used models for boilerplate, examples, docs spelunking, ports, tests, and other fussy but readable work. That middle ground exists, but only when changes are scoped tightly enough that a human can still absorb and defend them.

    Start with use cases where review is cheap and failure is reversible. Measure value on reduced drudgery, not on total lines generated or number of tickets closed.

      Attribution:
    • coffeefirst #1
    • Snacklive #1
    • resonious #1
    • unknownfuture #1
    • teaearlgraycold #1
  6. 06

    The real open question is verification

    A few comments pushed past the usual 'just read the code' answer. They pointed out that software already relies on abstractions, third-party libraries, and code no single person fully understands. The harder problem is whether AI-generated modules can be trusted through contracts, tests, isolation, and architecture boundaries rather than line-by-line comprehension. That is a more interesting question than whether every generated diff feels elegant.

    Invest in stronger interface contracts, property tests, and sandboxed component boundaries if you want AI use to scale responsibly. Without those verification layers, you are forced back to expensive manual code reading.

      Attribution:
    • edanm #1
    • CraigJPerry #1

Against the grain

  1. 01

    Custom lint harnesses can tame agents

    One experienced user argued that rejecting rough first drafts misses the point. In this view, the right move is to encode recurring AI mistakes into custom linters, pre-commit hooks, and agent feedback loops so the model fixes dumb patterns before a human ever looks at the change. The claim is not blind trust. It is that teams can move quality checks earlier and make agent output conform to local rules at scale.

    If your codebase has clear mechanical standards, try turning them into executable checks instead of relying on repeated human correction. This works best for structural mistakes, duplication, and known house-style failures.

      Attribution:
    • cadamsdotcom #1 #2 #3
  2. 02

    Throughput pressure is already changing behavior

    One senior developer said most of their daily output is now AI-written and admitted they can no longer review everything in depth. The justification was speed pressure from management and market expectations, plus the belief that experienced developers can still tell when deep scrutiny is required. It is an uncomfortable data point because it sounds irresponsible to many readers, yet it is probably close to how adoption is actually happening inside companies.

    Assume some teams are already trading review depth for delivery speed. If you lead engineering, decide that policy explicitly now rather than letting it emerge by deadline pressure and tool availability.

      Attribution:
    • SunboX #1 #2 #3
  3. 03

    Multi-model review can work on greenfield projects

    A few builders described using several models to critique one another's plans and implementations while keeping architecture docs current and inserting human direction at key points. They reported greenfield projects growing to tens of thousands of lines over six months with acceptable results, even when the human did not understand every detail before merge. That does not settle the long-term maintainability question, but it shows some teams are operationalizing agent-heavy workflows rather than merely experimenting.

    If you want to test agent-heavy development, do it first on greenfield internal projects with clear rollback options. Track maintenance burden over time, not just initial delivery speed.

      Attribution:
    • BobbyTables2 #1
    • wwind123 #1

In plain english

API
Application programming interface, the defined way one piece of software interacts with another.
data leakage
A modeling mistake where information from the evaluation or future data accidentally reaches the training process, making results look better than they really are.
ML
Machine learning, a field of software that learns patterns from data to make predictions or decisions.

Reference links

Benchmarks and evaluation

Background concepts

Tools and projects

Outage and risk case studies