HN Debrief

Nobody ever gets credit for fixing problems that never happened (2001) [pdf]

  • Management
  • Software Engineering
  • Incentives
  • IT Operations
  • Business

The article is a management piece built around Toyota and the broader idea of a "capability trap". When teams are overloaded, they cut maintenance, training, and process improvement to hit short term output. That makes the system weaker, creates more crises, and then rewards the people who swoop in visibly to save the day. Preventive work disappears into the background because the proof of success is that nothing happened.

If your org only notices outages, heroics, and ticket counts, it is probably training people to defer maintenance and manufacture visibility. Put explicit risk, prevention, and reliability measures into planning and reviews, or expect more firefighting and burnout.

Discussion mood

Resigned and cynical, with a lot of lived experience behind it. Most people agreed the article names a real pattern in companies, especially in engineering, IT, and support functions where reliable systems make the work invisible and broken incentive systems reward visible heroics instead.

Key insights

  1. 01

    Capability traps require getting worse first

    Escaping a capability trap often means accepting a short term dip in output so you can rebuild slack, maintenance, or training. That is exactly what many managers refuse to tolerate. The trap is not just that prevention is undervalued. It is that recovery demands headroom when the organization has already spent all of it.

    If a team is permanently at capacity, treat that as a structural failure, not a utilization win. Budget time for reliability work before the crisis forces the same tradeoff at much higher cost.

      Attribution:
    • jacques_chester #1
    • left-struck #1
    • thx67 #1
  2. 02

    Risk communication is part of the job

    Several operators said the difference between being ignored and being funded was not the technical work itself but making near misses concrete for nontechnical leaders. The effective pattern was simple. Name the failure mode, estimate the chance and business blast radius, and keep a written record that the risk was accepted. Quiet competence alone does not travel up the org chart.

    Translate preventive work into probabilities, customer impact, and decision logs. If leadership declines the fix, make that choice explicit so future outages are seen as accepted risk, not engineering surprise.

      Attribution:
    • atoav #1 #2
    • Forgeties79 #1
  3. 03

    Stack ranking turns heroics into strategy

    In forced ranking systems, prevention can actively hurt careers because it removes the visible incidents that people use to prove impact. That pushes rational employees to optimize for rescue scenes, not system health. Once that happens, the problem is no longer bad judgment by a few managers. It is game theory baked into the org chart.

    If your company uses forced ranking or demands constant "exceeds expectations," assume reliability work will be underproduced. Counterweight it with promotion criteria tied to reduced incidents, less toil, and cleaner operations.

      Attribution:
    • qurren #1 #2
    • fgonzag #1
  4. 04

    Y2K shows why prevention gets mocked

    Y2K came up as the cleanest example of successful prevention being rebranded as waste. Systems kept working because people spent years fixing brittle date logic, especially in old COBOL systems at banks and utilities. Once the rollover passed quietly, outsiders treated the calm outcome as proof the warnings were fake. That is the preparedness paradox in one screenshot.

    For major risk-reduction projects, plan the postmortem before the event. Capture the vulnerabilities, remediation work, and plausible failure paths in advance so success is not rewritten later as unnecessary spending.

      Attribution:
    • random3 #1
    • SteveGerencser #1
    • everyone #1
    • marcus_holmes #1
  5. 05

    Technical managers are not a silver bullet

    A useful pushback rejected the lazy fix of "just promote engineers into management." Good management still matters as a distinct skill. Bell Labs was cited not as proof that management is useless, but as proof that exceptional technical organizations had capable managers who understood the work and created stable conditions for it. The load-bearing word was not "engineer" but "capable."

    Hire and train managers for judgment, context, and incentive design, not just domain pedigree. Technical literacy helps, but it does not replace management competence.

      Attribution:
    • Sharlin #1
    • fabianholzer #1 #2
    • sdfsd233fsdf #1

Against the grain

  1. 01

    Markets can reward prevention directly

    Reliability is sometimes legible without internal heroics. Toyota was used as the example. Customers buy it partly to avoid future hassle, and that preference turns thousands of invisible design decisions into a visible sales advantage. The claim is that prevention becomes rewarded once the buyer, not an internal manager, bears the downside of failure over time.

    If you can tie reliability to customer retention, warranty cost, or brand preference, do it. Prevention becomes easier to defend when it shows up in revenue and not just in internal risk memos.

      Attribution:
    • nostrademons #1
  2. 02

    Engineers cause plenty of chaos too

    Some comments pushed back on treating "the suits" as the whole problem. Engineers can manufacture complexity, overestimate their management ability, and contribute to the same visibility games they complain about. That framing matters because it shifts the diagnosis from bad nontechnical bosses to a wider systems problem with incentives and cognitive bias on every side.

    Do not frame this as business versus engineering. Audit how both groups create opacity, complexity, and perverse incentives, then fix the system that rewards them.

      Attribution:
    • maccard #1
    • dfhgdfghdfgdf #1
  3. 03

    Sometimes prevention really is just routine work

    A minority view said the article overstates the injustice. Utilities, inspectors, and other operators are already paid to keep failures from happening, and not every avoided problem deserves special acclaim. The useful distinction is between expected maintenance and exceptional foresight that prevented a material loss. Complaints about missing praise can blur the two.

    Separate baseline operational duty from unusually high leverage preventive work. Reward the second explicitly, but do not build a culture where every routine check needs applause.

      Attribution:
    • jdw64 #1
    • Steve16384 #1 #2

In plain english

capability trap
A reinforcing cycle where short term pressure causes teams to cut maintenance, training, or improvement work, which makes performance worse and creates even more pressure.
COBOL
Common Business-Oriented Language, an old programming language still used in many banks, governments, and large business systems.
stack ranked
A performance system that forces managers to rank employees against each other, often requiring some percentage to be labeled low performers.
Y2K
The 'Year 2000' computer problem, where older software using two-digit years risked handling the date 2000 incorrectly.

Reference links

Management and organizational theory

Papers, essays, and source texts

Y2K and risk examples

Videos and lighter references