Prefer duplication over the wrong abstraction (2016)

Programming
Software Architecture
Developer Tools
AI

Sandi Metz’s post says the expensive mistake is not duplication by itself but collapsing superficially similar code into one abstraction before you know whether the cases really belong together. Her warning is aimed at the familiar failure mode where a shared helper starts simple, then accretes booleans, enums, and branching to satisfy callers that only looked alike for a moment in time. The practical advice people kept returning to was that code should be shared when it changes for the same reasons, not when it merely looks similar today. That produced a rough consensus around a narrower heuristic than the headline suggests: small, local duplication is often fine, especially early, but large-scale repeated logic with real shared meaning should still be consolidated.

The strongest additions were about locality and semantics. Several commenters said the main cost of a bad abstraction is that it destroys local reasoning. A small change in one feature suddenly requires understanding unrelated callers and preserving a broad API surface. Others sharpened the distinction between "single source of truth" and copy-paste avoidance. If divergence would be a bug, refactor. If two pieces of code are semantically different and are likely to evolve differently, keeping them separate is not a DRY violation at all. That is why the common smell mentioned over and over was the reusable function that grows feature flags and mutually exclusive parameters. Where people pushed back, they usually did so on scale and org reality. In messy production systems, duplication rarely stays at two copies. It becomes dozens of drifted variants that no one can reliably find, test, or update together. Some argued that even an imperfect abstraction at least gives you a named entry point to grep for, while duplicated logic disappears into the codebase and compounds under deadline pressure. That led to a more grounded middle position than the slogan implies: use duplication as a discovery tool, then abstract once the common shape is empirically clear. Several people explicitly cited a rule of three, or said they leave comments and revisit later rather than forcing reuse on the second sighting. LLMs came up as a new variable, but not a settled one. Some think they lower the cost of duplication because they can find repeated patterns and apply parallel edits. Others said the opposite is already happening in large codebases, with agents spraying near-duplicate code and then missing one variant when asked to update it. The more durable point was that AI does not rescue a bad abstraction and may amplify whichever maintenance habit a team already has. The thread landed on an old lesson that still holds: prefer abstractions that are obvious from the domain and easy to explain without reference to quirky callers. If you cannot justify the shared code on its own terms, the reuse is probably premature.

Treat DRY as a test of shared meaning, not visual similarity. When reuse starts demanding flags, special cases, or awkward call sites, stop and either split the abstraction or leave the code separate until a cleaner seam appears.

June 21, 2026
sandimetz.com
Discuss on HN

Key insights

Locality beats theoretical reuse

The real damage from a premature abstraction is loss of locality. A change that should stay inside one feature now forces you to reason about distant callers and side effects, because the abstraction has tied unrelated behaviors together. That framing is stronger than generic anti-DRY rhetoric because it gives you a concrete test for maintainability: can someone make this change without loading half the system into their head.

Judge abstractions by how much unrelated context they force into routine edits. If a shared helper turns a local change into a cross-cutting one, unwind it before adding more callers.

Attribution:

jonahx #1 #2
stanmancan #1

Shared meaning matters more than shared shape

Several comments tightened the vague "don't duplicate" advice into a semantic rule. Code should be unified when divergence would be a bug, not when two blocks merely resemble each other. That separates single source of truth from cosmetic deduplication and explains why two identical-looking formulas can deserve different implementations if they represent different business concepts and will be owned or changed separately.

Before deduplicating, write down what each copy means in domain terms. If the names, owners, or reasons for change differ, keep them separate even when the code is nearly identical.

Attribution:

lg5689 #1
alberto467 #1
infinitebit #1
nullbio #1

Use tests to police unavoidable duplication

When two representations really must coexist, a merge-blocking test can turn a risky second source of truth into a controlled one. The example given was keeping pyproject.toml and requirements.txt synchronized. That is a pragmatic escape hatch for cases where perfect consolidation is not available yet but silent drift is unacceptable.

If you cannot collapse duplicate definitions today, add an automated check that proves they still match. That buys safety without forcing a rushed redesign.

Attribution:

QuadmasterXLII #1

A good abstraction should justify itself

One useful test was whether an abstraction makes sense without first studying its callers. If a function or module only exists to juggle the odd, mutually exclusive needs of three call sites, it has no natural place in the architecture. That turns "wrong abstraction" from a vibe into an architectural smell you can inspect directly.

Review shared code in isolation. If you need caller-specific trivia to explain why it exists, split it or push the logic back outward.

Attribution:

ninkendo #1

Abstract for replacement, not generic reuse

A thin wrapper around a JSON parser was offered as a better kind of abstraction. It was not invented to compress code. It existed to isolate a likely future change, and years later the parser was swapped without affecting users of the wrapper. This is a cleaner decision rule than chasing reuse for its own sake.

Put abstraction effort where vendor swaps, infrastructure choices, or unstable dependencies are likely to change. Do not spend the same effort compressing business logic that is still evolving.

Attribution:

corysama #1

LLMs act like junior maintainers

The analogy that landed was an LLM as a junior analyst keeping many duplicate spreadsheets in sync. That captures both the promise and the limit. Models can help spot inconsistencies and perform repetitive edits, but they do not eliminate the need for a sane structure, and they do not magically turn accidental similarity into a sound interface.

Use AI to reduce the clerical cost of repeated edits, not to justify sloppy structure. Keep the code organization understandable to humans first, then let tools assist with maintenance.

Attribution:

dang #1

Against the grain

Imperfect abstractions are still easier to find

One persistent dissent argued that in real legacy systems, drifted duplication is the bigger operational risk. A bad shared interface at least gives you a name, a surface area, and call sites you can grep. Copy-pasted logic fragments mutate, spread, and stop looking alike, which makes urgent fixes much harder when requirements or infrastructure change underneath you.

If the same pattern is already spreading across many files, the window for leisurely discovery is over. Create at least a minimal shared entry point before the copies become untraceable.

Attribution:

jbvlkt #1
dofm #1 #2 #3

Inlining a weak abstraction can be cheaper

A few commenters said low-ceremony abstractions are often reversible with tooling, while consolidating code that has already drifted is slow and error-prone. This challenges the blanket claim that duplication is safer. If the abstraction is just a thin function boundary, undoing it later may be far easier than rediscovering and merging ten variants of the same logic.

Prefer abstractions that are cheap to inline and easy to inspect. Reversible seams give you a safer way to centralize early without committing to a heavy framework.

Attribution:

joshmoody24 #1
bazoom42 #1

Teams rarely come back to dedupe later

A practical objection was that "we'll abstract it when the pattern settles" often never happens. Under deadline pressure, copied code ships, then becomes permanent. The result is not careful duplication as a discovery method but slow accumulation of slop that nobody gets budget to clean up.

If your team has a weak refactoring habit, do not rely on future cleanup as part of the design. Pair any deliberate duplication with an owner, a trigger point, or a scheduled revisit.

Attribution:

TexanFeller #1
bluefirebrand #1

In plain english

API ↩

Application programming interface, the defined way one piece of software interacts with another.

DRY ↩

Don't Repeat Yourself, a software design principle that says knowledge or logic should usually have one authoritative implementation.

JSON ↩

JavaScript Object Notation, a widely used text format for structured data exchange.

LLM ↩

Large language model, a machine learning system trained to generate and analyze text, including source code.

pyproject.toml ↩

A Python project configuration file used to define build settings, metadata, and tool configuration.

requirements.txt ↩

A Python text file listing package dependencies to install for a project.

single source of truth ↩

A design principle where one canonical place defines a piece of information or logic so other copies cannot drift out of sync.

Reference links

Talks and essays on abstraction

Data-Oriented Design and C++
Recommended talk about designing around data rather than real-world object metaphors
The Complexity of Simplicity
Recommended talk about why finding the right abstraction is hard
Semantic Compression
Shared as a related framing for abstraction and compression in code
CMS Trap
Linked to support the argument for preferring build-time or static structure over runtime flexibility

Books and longer references

Data-Oriented Design
Book site referenced in a comment about designing data as normalized tables
How Software Groups Rot: Legacy of the Expert Beginner
Cited in discussion about overzealous deduplication and overengineering habits
99 Bottles of OOP
Recommended as a practical refactoring book from the article page discussion

Rules of thumb and prior references

Duplication Refactoring Threshold
Referenced as a rule-of-thumb resource for when to refactor duplicates
Three Strikes and You Refactor
Referenced to support the rule-of-three heuristic
XKCD 1425: Tasks
Used to illustrate how requirements changes can break neat abstractions

Entertainment references

The Library of Babel
Joking reply to the idea of a universal function tree for all software reuse
Partial application
Linked while clarifying a functional-programming interpretation of calling a function in parts
Taskmaster interview clip
Shared as a humorous analogy for constantly changing requirements

Prefer duplication over the wrong abstraction (2016)

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Talks and essays on abstraction

Books and longer references

Rules of thumb and prior references

Entertainment references