LLMs are eroding my software engineering career and I don't know what to do

AI
Programming
Developer Tools
Careers
Economics

The post is a first-person account from a software engineer in fintech who says LLMs have eaten through the parts of the job they spent a decade building up: domain expertise, debugging instinct, and much of the craft of implementation. Their argument is not that models are fully autonomous or flawless. It is that a senior engineer with an agent can now get close enough to specialist output that accumulated domain knowledge stops feeling like a moat, leaving “taste” and review as the last meaningful differentiators. The piece lands as a career anxiety post, but it is also a claim about hiring and org design: teams no longer need the same mix of specialists if agents can flatten the gap.

Most of the high-signal reaction rejected that flattening story, especially in regulated or high-stakes systems. People working in finance, healthcare, airlines, and similar domains said current models still hallucinate rules, miss context that lives outside the codebase, and confidently propose changes that would be reckless to ship without expert review. Several pointed out that compliance is not a literal rules-engine problem in the first place. It is an interpretive process shaped by legal guidance, auditors, regulators, internal risk tolerance, and business tradeoffs. That makes “just give it the docs” a weak answer. The consensus was not anti-LLM. Many commenters use Claude, GPT, Codex, or similar tools constantly. The settled view was narrower and more practical: agents are great at draft generation, refactors, bug hunting, boilerplate, and lowering the cost of context switching, but they still need a human who knows what good looks like. A second theme was that the bottleneck has shifted, not disappeared. Multiple engineers said time to first PR is much shorter, but review is longer and more fatiguing because AI tends to overproduce code, duplicate logic, violate architecture boundaries, and optimize locally instead of fitting changes into the larger system. That means some of the productivity gain is real, but it shows up as faster exploration and iteration, not as permission to stop understanding the system. Others described their orgs moving away from “let the agent do everything” toward deterministic scripts, tests, and toolchains that both humans and LLMs can call. That workflow looked like the most durable one in the discussion. The mood was uneasy rather than uniformly panicked. Plenty of people think the field will shrink, junior paths will get worse, and management is already using AI hype to justify speed pressure, thinner teams, and bad metrics. But the dominant technical judgment was blunt: today’s systems are not replacing experienced engineers in the parts of the job where tacit knowledge, accountability, and architectural judgment matter most. They are changing the shape of the work fast enough that many people no longer trust the old career ladder.

Use LLMs where they clearly compress toil, but do not confuse faster draft generation with reduced need for judgment, accountability, or deep domain context. If you run teams, optimize for workflows that keep human expertise sharp and make deterministic tooling, tests, and review more central rather than less.

June 7, 2026
human-in-the-loop.bearblog.dev
Discuss on HN

Discussion mood

Uneasy and defensive. Most commenters accept that LLMs are already useful and changing workflows, but they strongly reject the idea that domain expertise, architecture judgment, and accountability have become irrelevant. The anxiety comes less from model capability alone and more from management hype, hiring pressure, and the likelihood that organizations will cut headcount before they understand the risks.

Key insights

Compliance is judgment, not checklist parsing

In regulated work, the hard part is not reading a rule and translating it into code. It is negotiating between law, guidance, accepted industry practice, internal risk tolerance, and what a regulator will actually accept. That makes LLM failures here more structural than incidental, because they tend to act like an overeager junior reading the text literally and missing the institutional context that humans resolve through meetings, precedent, and documented signoff.

Do not treat compliance review as a pure AI classification problem. Build workflows where engineering, legal, and compliance jointly define source-of-truth interpretations and record signoff, because that documentation is what protects you when the model is wrong.

Attribution:

t34t34r43 #1
PeterStuer #1 #2
scott_w #1
bobkb #1

Error shape matters more than error rate

Several commenters pushed back on the easy claim that LLMs are fine if they make fewer mistakes than humans. Human review can catch false positives from an AI auditor, but it cannot reveal the true problems the model never flagged unless someone does the full work anyway. More importantly, model errors are distributed differently from human errors. Teams already have instincts and process for spotting human blind spots, but plausible nonsense from an LLM is harder to reason about and easier to miss.

Evaluate AI assistance by failure mode, not average benchmark wins. In any high-stakes workflow, assume missed defects are the real risk and design audits, tests, and fallbacks around what the model is likely to omit.

Attribution:

csallen #1
skillina #1
porridgeraisin #1
genxy #1
Terr_ #1

The durable pattern is agents plus deterministic tools

A strong operations-oriented view emerged from people who tried the full agentic path and backed off. Letting an LLM directly do everything is slow, expensive, and hard to trust during incidents. The better pattern is to use agents to generate or glue together deterministic scripts, tests, and deployment tools that humans can also run. That gives you repeatability, binary outputs, and a clean fallback when production is on fire.

Spend your AI effort on building reusable internal tooling, not just generating one-off code. The teams that win will have agents sitting on top of reliable scripts and checks, not agents improvising core operational behavior from scratch.

Attribution:

alexpotato #1
theshrike79 #1
enraged_camel #1

Specs and tests do not eliminate ambiguity

The idea that you can solve agent reliability by writing exhaustive specs and walling off tests got a sharp rebuttal. Natural language specs are incomplete by default, especially in business and regulatory domains where important constraints are implicit and contested. LLMs fill the gaps with guesses. Tightening the constraints does not make the model smarter. It often just encourages brittle or underhanded behavior that passes the visible checks while missing the intent.

Treat spec-first agent workflows as a useful discipline, not a correctness proof. If the intent behind the system cannot be made explicit and testable, assume a human still needs to own the ambiguous parts.

Attribution:

franze #1
torben-friis #1
hedora #1
officialchicken #1

Implicit business context remains the real moat

One of the better framings split knowledge into what is explicit in code and docs versus what is implicit in business practice, partner relationships, and regulation. LLMs do well on the explicit layer. They are much worse at the implicit layer where the answer to "why does this work this way" lives in old decisions, exceptions, and constraints that are obvious only to people embedded in the business. Attempts to put non-technical domain experts directly in front of the code with an agent reportedly failed badly.

If you want to stay valuable, get closer to the hidden constraints that shape software rather than the visible code alone. The harder your knowledge is to recover from repo context and docs, the safer your role is for now.

Attribution:

throwaway201606 #1
torben-friis #1
mikeocool #1

AI often shifts effort from coding to review debt

Several engineers described the same pattern: AI makes initial implementation cheap, then hands humans a bigger, noisier review and maintenance problem. PRs arrive faster but contain extra layers, duplicate logic, architecture drift, or code that works while making the system harder to change later. That means velocity can look better on dashboards while actual system understanding gets worse.

Track review time, rework, and codebase complexity alongside output. If your team is producing more code but understanding less of it, you are borrowing from future delivery rather than accelerating it.

Attribution:

abhgh #1
cmiles74 #1 #2
Reason077 #1

Expertise shows up in steering, not typing

A repeated point was that experienced engineers underestimate how much hidden skill they are using when they work well with agents. They know what to ask, what to challenge, where the model is likely to go wrong, and how to recognize an answer that is technically plausible but operationally stupid. That is why senior users report huge gains while non-engineers and weaker engineers often produce garbage with the same tools.

Do not frame your value as manual code production alone. Senior leverage now comes from problem framing, model steering, and ruthless evaluation of outputs, which means your hiring and training loops should test those skills explicitly.

Attribution:

PeterStuer #1
jrockway #1
iandanforth #1
enraged_camel #1

Against the grain

Hallucinations depend heavily on harness and context

A minority of heavy users said the constant-hallucination narrative no longer matches their day-to-day experience. With strong harnesses, constrained inputs, and domains where the source of truth can be provided directly, they claim frontier models can be highly reliable and sometimes outperform existing tools or compilers on narrow tasks. That does not refute the broader caution, but it does suggest many arguments are really about bad operating conditions rather than model limits alone.

Before writing off a model, separate failures of the model from failures of setup. If your use case has a clean spec, narrow domain, and auditable source of truth, investing in a better harness may matter more than switching tools.

Attribution:

rfgplk #1 #2
solenoid0937 #1

Some engineers prefer the higher abstraction layer

Not everyone sees AI-assisted development as the death of craft. Some said ordinary software jobs had already become assembly-line work long before LLMs, and that agents remove the least satisfying parts like boilerplate, syntax recall, and rote plumbing. For them, the shift feels like moving up the stack toward intent, design, and language rather than losing something sacred.

If your team is split on AI, do not assume opposition is purely about quality or support. Different engineers value different parts of the work, so role design and retention will increasingly hinge on whether people get to keep the parts they actually enjoy.

Attribution:

hax0ron3 #1
onlyrealcuzzo #1
awill88 #1

This may be another painful but familiar tooling shift

Some veteran commenters argued that software has always forced people to watch valuable skills age out. COBOL, client-server, web, mobile, cloud, test dogmas, framework fads. The field has repeatedly rewarded people who adapt to new abstractions rather than attaching identity to a single layer of the stack. From that angle, AI is harsher and faster, but not fundamentally different in kind.

Do not overfit your response to the latest tool while ignoring the longer pattern. The safest personal strategy is still the old one: keep learning, stay flexible, and avoid defining yourself by one implementation layer.

Attribution:

ddingus #1
Verdex #1
SoftTalker #1

In plain english

FinTech ↩

Financial technology, companies and software that provide banking, payments, investing, or other financial services.

LLM ↩

Large language model, a machine learning system trained on large amounts of text that can generate and analyze language and code.

PR ↩

Pull request, a proposed code change submitted for review before being merged into a shared codebase.

Reference links

Incidents and postmortems

Our first outage from LLM-written code
Used as a concrete example of production risk from AI-generated code
Anthropic April 23 postmortem
Cited as evidence that AI-heavy engineering can still create avoidable operational issues

Workflow and engineering practice

Agentic development philosophy
Shared as a detailed methodology for keeping humans in control of AI-assisted development
Make a list of tasks, write scripts for each task
Referenced to argue for deterministic tooling over freeform agent behavior
NASA rules for code that can't fail
Suggested as a harness or discipline layer for high-reliability code

Career and industry analysis

Software engineering is the new manufacturing engineering
Offered as a framing for how LLMs pull software work toward process and manufacturing logic
The joy of programming
Linked in a discussion about LLMs removing the enjoyable part of software work
AI risks taxonomy: economic risks
Used to support the argument that AI concentrates power with capital owners

Technology forecasts and history

Our World in Data technology trends
Used to support the claim that technological change follows repeated S-curves
The AI Revolution
Referenced for a popular explanation of accelerating technological progress
Computer occupation
Linked to show that once-common skilled jobs can disappear entirely after automation

Compliance and legal examples

AP story on Volkswagen engineer James Robert Liang
Cited in a debate about whether engineers can be personally liable for illegal technical work

Books and cultural references

Simple Sabotage Field Manual
Mentioned to compare unnecessary AI-driven rechecking and process churn to deliberate sabotage tactics
Extruded Book Product trope
Used as an analogy for mass-produced AI-generated software
10 awful truths about publishing
Linked for statistics on book oversupply as an analogy for future software oversupply

Music and creative AI examples

Suno generated track example 1
Shared to illustrate the uncanny but interesting outputs AI music tools can produce
Suno generated track example 2
Shared as a second example in the same AI music discussion
Psytrance reference track
Used as a human-made reference point to show what Suno fails to capture in a genre