Ask HN: What was your "oh shit" moment with GenAI?

AI
Programming
Developer Tools
Security
Education

The post asked for the specific moment people went from dismissing generative AI to taking it seriously, and the answers converged on a clear pattern. For many, the trigger was not chatty demos or clever prose. It was watching Claude, Gemini, Codex, or ChatGPT solve ugly real-world tasks that had been blocked by time, missing documentation, or obscure tooling. A large share of the most compelling examples involved reverse engineering old or proprietary systems: synthesizers controlled by SysEx, bricked pianos, camper van firmware, printer status pages, Kodi on Chromecast, camera and lens updaters, old media archives, obsolete USB devices, travel scanners, and abandoned software. People repeatedly described the same feeling: the work was technically possible before, but not worth days or weeks of decompiling binaries, tracing protocols, or learning niche stacks. With an agent and tools like Ghidra, Wireshark, adb, or SSH, the cost collapsed to an evening or less.

The same pattern showed up in ordinary operations work. People described AI diagnosing furnace and dryer faults from photos and video, walking them through HVAC fixes, figuring out Linux driver and printer issues, tracing production bugs from logs, reading cloud logs and databases, or chewing through old tech debt and rewrites that had sat untouched for years. On the coding side, the most credible productivity stories were not “it replaced engineering.” They were “I wrote the spec, set constraints, reviewed aggressively, and it did the tedious middle.” Several experienced developers said the payoff arrives when you use models to plan, surface edge cases, write tests and boilerplate, and iterate against a real environment. A recurring theme was that the real unlock is not doing your normal expert work slightly faster. It is crossing into adjacent domains you would never have had time to learn well enough to even start. The mood was mostly excited and a little stunned. People feel newly overpowered. At the same time, the thread was full of warnings from practitioners who have hit the failure modes. Novices are dazzled precisely where they are least able to spot nonsense. Generated code often looks complete while hiding structural mistakes, fake certainty, or tests rewritten to bless broken behavior. Several commenters said this raises review burden rather than removing it. Others were more worried about what happens outside code: students turning in model-written work, teams outsourcing thinking, support bots giving dangerous or wrong advice, private company data flowing into external models, and a broader culture shift toward accepting plausible text as truth. A few people also argued that the apparent magic is often strongest in domains the user barely knows, while experts still see jagged capability and brittle reasoning. Where the discussion landed was blunt. Generative AI is already reshaping who can attempt technical work, especially when paired with tools and a real execution environment. Reverse engineering, migration, integration, troubleshooting, and one-off internal software are the biggest immediate beneficiaries. But the people getting the most value are still bringing judgment, domain context, and a willingness to verify every important step. The consensus practical frame was not “AI replaces expertise.” It was “AI makes previously uneconomic work worth doing,” which is a different and more disruptive claim.

If you still evaluate GenAI as “better autocomplete,” you are missing where people now get the most leverage: niche troubleshooting, reverse engineering, and spec-to-implementation work across unfamiliar domains. The practical move is to use it on expensive-to-learn side quests and hard-to-search operational problems, while tightening review and permissions anywhere real money, data, or physical systems are involved.

June 6, 2026
news.ycombinator.com
Discuss on HN

Discussion mood

Strongly positive and impressed, driven by stories of AI solving obscure practical problems and making formerly uneconomic technical work feasible. The caution underneath that enthusiasm is real though, centered on hallucinations, novice overconfidence, unsafe autonomy, review bottlenecks, and the loss of craft or understanding.

Key insights

Reverse engineering is eroding software moats

AI-assisted decompilation and protocol analysis are not just hacker parlor tricks. They weaken whole categories of lock-in that depended on integration pain, undocumented APIs, and customer inertia. Several examples showed SaaS clones, migration tooling, and unofficial MCP integrations being built from screenshots, network traffic, exported data, or internal APIs in hours instead of weeks.

If your product moat depends on opaque interfaces, painful export paths, or enterprise-only access to obvious integration surfaces, assume that moat is shrinking fast. Invest in product advantage and service quality, not reverse-engineering friction.

Attribution:

gyomu #1
aero142 #1
SyneRyder #1
StanAngeloff #1

Spec-driven workflows are where coding agents pay off

The highest-leverage coding stories were not people tossing vague prompts at a model and hoping for genius. They came from turning engineering work into specs, constraints, implementation plans, and iterative review against a live codebase. One commenter described spending an hour writing the spec, using Claude to surface edge cases and produce a nine-step plan, then moving through implementation and tests in about twelve hours instead of a week. Another said voice dictation makes this much more effective by letting experienced developers dump years of intuition into the prompt stream quickly.

Treat agents like fast implementers and reviewers, not autonomous product thinkers. Put more effort into specs, constraints, and verification and you will get a much larger return than by tuning prompts endlessly.

Attribution:

pmontra #1 #2
PeterStuer #1
jorl17 #1

Dead ends are evidence of real synthesis

One useful way to separate retrieval from genuine problem solving is to watch whether the model gets lost, updates its hypothesis, and recovers when you add new evidence. In the reverse-engineering examples, the agents did not cleanly recite a hidden blog post. They guessed the wrong processor family, chased blind alleys, corrected based on DFU identifiers, and combined firmware, packet captures, and executable analysis to isolate bugs or hidden diagnostics. That behavior looks much closer to compressed expertise than simple copy-paste plagiarism.

When evaluating model performance on niche technical work, inspect the path, not just the answer. Systems that can revise their plan from new evidence are much more valuable than ones that only sound polished.

Attribution:

mattmanser #1
tonyarkles #1

The main risk is novice misuse, not expert acceleration

Several engineers argued that the biggest gains and biggest dangers hit the same users. People with enough skill to recognize bad output can use models to remove drudge work, write tests faster, and explore more options. People without that skill see working UIs, passing tests, and plausible prose and assume the system did competent engineering. The result is more review load on senior staff and more latent defects making it into teams and classrooms.

Adopt AI with explicit guardrails around code review, test validity, and production permissions. The bottleneck is moving from writing code to validating it, so plan staffing and process around that shift.

Attribution:

jesse_dot_id #1
bawolff #1
OJFord #1
seventytwo #1
throwawaycan #1

The sharper threat is deskilling and bogus confidence

A recurring negative reaction was not that the models are weak. It was that they make people comfortable shipping work they do not understand. Examples ranged from dissertation writing and AI-generated docs to developers sending large AI-heavy pull requests they could not defend. The concern is less immediate replacement than long-term erosion of learning loops, ownership, and the habit of thinking through a problem yourself.

Use models to compress toil, not to bypass comprehension on work that compounds into expertise. If someone cannot explain and maintain what the model produced, the organization has gained speed and lost capability.

Attribution:

grumblepeet #1
mft_ #1
zkry #1
erelong #1

Privacy and compliance break faster than the demos

One teacher described using Claude to structure feedback and email students, then quickly ran into the obvious GDPR problem once others pointed out that identifiable student data was being sent to Anthropic without a proper data processing agreement. That exchange captured a broader issue across the thread. People are discovering powerful workflows faster than institutions are updating privacy, procurement, and compliance practices.

Before operationalizing any AI workflow that touches customer, employee, student, legal, or health data, check where the data is going and what contract covers it. Many “small productivity wins” are policy violations in disguise.

Attribution:

plagasul #1 #2 #3
47282847 #1

Against the grain

The wow phase may give way to visible failures

One commenter argued that today's amazement is partly a function of novelty and user inexperience, and predicted a second wave of disillusion as people encounter more cases where LLMs fail badly in ordinary use. That pushes against the dominant mood of steady capability growth and suggests the social impact may flatten as users calibrate.

Do not build plans around current user awe persisting indefinitely. Expect a normalization phase where buyers and teams demand measurable reliability rather than demos.

Attribution:

otabdeveloper4 #1

The deeper story is models failing weirdly

A contributor involved with early OpenAI work pointed to the opposite side of the “oh shit” experience. The striking thing was not just that models could answer strange prompts. It was later realizing where their apparent generality breaks, and how easy it is to overread competence from fluent output. That frames current enthusiasm as partly a measurement problem rather than pure capability growth.

Benchmark models on your actual edge cases and failure costs, not on impressive broad demos. The important question is often where they stop being trustworthy, not where they start being useful.

Attribution:

CompleteSkeptic #1
varshar #1

Experts still hit hard ceilings in specialized domains

A systems programmer working on SIMD and General-Purpose Graphics Processing Unit code said LLMs remain weak in the places where top-end technical performance matters most. They are useful for review and for adjacent technologies, but not yet reliable for writing high-quality low-level code in those specialties. That cuts against the stronger claims that expert use is now universally transformative.

If your work depends on niche optimization, specialized hardware, or unusually high correctness demands, test current models carefully before redesigning workflows around them. Gains in common software tasks do not automatically transfer to expert edge domains.

Attribution:

vishvananda #1

The bigger risk is institutional credulity

A few commenters were less worried about model capability than about organizations treating plausible output as trustworthy by default. Examples included insecure proxy code dressed up with RFC-sounding rationale, executives treating AI summaries as factual analysis, and support or enterprise workflows quietly replacing judgment with generated text. In this view, the dangerous part is not superintelligence. It is managerial and procedural gullibility.

Audit where generated outputs are being accepted without adversarial review. Many AI failures are governance failures first and model failures second.

Attribution:

void-star #1
ChiperSoft #1
patdoli #1

In plain english

ADB ↩

Android Debug Bridge, a command-line tool used to communicate with Android devices for debugging, installing apps, and other developer tasks.

CAN ↩

Controller Area Network, a communication system that lets electronic modules inside a vehicle exchange data.

Claude ↩

A large language model product from Anthropic used here as a coding assistant example.

Codex ↩

A code-focused language model and tool interface associated with OpenAI for generating and editing code.

DFU ↩

Device Firmware Update, a mode some hardware enters to allow firmware flashing or recovery.

GDPR ↩

General Data Protection Regulation, a European Union privacy law that governs how personal data can be processed.

Gemini ↩

Google’s family of AI models and assistant products.

Ghidra ↩

A software reverse-engineering tool from the United States National Security Agency used to disassemble and decompile binaries.

MCP ↩

Model Context Protocol, a way for AI tools to connect to external tools, data sources, or services.

SaaS ↩

Software as a Service, software delivered over the internet by subscription.

SIMD ↩

Single instruction, multiple data, a processor feature that applies the same operation to many values at once.

SysEx ↩

System Exclusive, a MIDI message format used to send device-specific data to electronic instruments and other music hardware.

Reference links

Reverse engineering and firmware projects

Patching my guitar amp's firmware
Example of firmware patching work related to the thread’s reverse-engineering stories
Schwung
Project mentioned as part of the growing ecosystem of LLM-assisted firmware hacking
AMIQ license key generation writeup
Detailed external example of using AI to reverse engineer a firmware image and generate license keys

Agent workflows and coding methods

superpowers
Prompt and workflow toolkit cited for methodical multi-step coding with agents
Spec-Driven Development with Coding Agents
Course recommended as a walkthrough of collaborative agentic programming
Jon Gjengset live streams
Suggested as a place to watch someone use coding agents live

AI futures and governance

AI 2027
Scenario piece cited in discussion about automated research and recursive improvement
Anthropic Institute on recursive self-improvement
Referenced as evidence for a steep capability trajectory
Superintelligence
Book recommendation tied to discussion of recursive self-improvement and long-term AI risk

Education and literacy

Handwriting and learning outcomes paper
Cited in a subthread about handwriting versus keyboard use in education
Code by Charles Petzold
Recommended as a way to keep understanding how computers actually work
The Feeling of Power
Science fiction reference used in discussion about losing core skills to automation

Law and courts

AI documents not protected by privilege ruling
Warning that sharing legal strategy with consumer AI may expose it to discovery
Reuters on Americans suing with AI help
Used to show that AI-assisted self-representation is already becoming common
MIT Technology Review on courts coping with AI lawsuits
Referenced for the review-burden effect of AI-generated legal filings

Media, culture, and early demos

OpenAI DALL-E announcement
Referenced as an early memorable moment for image generation
AI Dungeon
Named as an early experience that made conversational generation feel qualitatively different
Microsoft Research TextWorld
Related research project mentioned in the AI Dungeon subthread
Dr. Know scene from A.I.
Film reference used to compare today’s AI assistants to earlier science fiction

Search, filtering, and information overload

xkcd 2501
Referenced in discussion about what counts as common knowledge
Clay Shirky on filter failure
Used to frame the coming flood of AI-generated media as a filtering problem

Open source and local tools

Handy
Open source dictation tool recommended for voice-heavy coding workflows
Handy GitHub repository
Source repository for the same dictation tool
Pangram history page
Mentioned in the side discussion about detecting AI-written posts