The only scalable delete in Postgres is DROP TABLE

Databases
Infrastructure
Open Source
Developer Tools

The post says PostgreSQL does not really have a cheap way to remove huge amounts of data with DELETE, because every deleted row still creates transactional and storage work that later has to be cleaned up by VACUUM. Its practical advice is to avoid row-by-row deletion when the shape of the workload allows it, especially for retention use cases, and instead delete whole chunks of data by dropping a table or partition. That is fast because the database mostly throws away metadata instead of touching every row.

If you own a high-volume Postgres system, treat bulk deletion as a data-modeling problem, not just a query problem. Use partitioning and retention boundaries where you can, and be wary of copy-swap-drop or DROP TABLE tricks in live OLTP paths because locking, foreign keys, and concurrency can turn a fast idea into an outage.

June 14, 2026
planetscale.com
Discuss on HN

Discussion mood

Mostly positive on the core point that row-by-row bulk deletion is expensive, but skeptical of the title and framing. The mood was that the post teaches a real operational lesson, yet overstates it by presenting a retention-and-partitioning strategy as if it applied broadly to all PostgreSQL deletes.

Key insights

Partition dropping is a retention pattern

It works when data expires in chunks you can isolate ahead of time, usually by timestamp. That makes this a standard time-series retention strategy, not a general answer for applications that delete individual records by primary key or business relationship. The comments tied the advice to existing tooling like pg_partman and to systems like ClickHouse that make partition-based time-to-live natural.

If your deletion policy follows time windows, encode that in the schema now and automate partition lifecycle. If deletions are object-level and irregular, do not expect partition dropping to rescue the design later.

Attribution:

saisrirampur #1
CharlieDigital #1
pstuart #1
awinter-py #1

Autovacuum tuning changes the picture

Large DELETE is not automatically catastrophic in Postgres. With well-tuned autovacuum, people reported it working at terabyte scale, and one comment noted the hidden gotcha in high-churn tables: performance depends on dead tuples since the last VACUUM, not just live row count. That means the bottleneck is often cleanup policy and churn pattern, not the mere presence of DELETE.

Before redesigning around table swaps, inspect autovacuum behavior, dead tuple growth, and churn hotspots. A tuning pass may buy back enough headroom for your current workload.

Attribution:

saisrirampur #1 #2
baq #1
Retr0id #1

The database cannot safely rewrite DELETE for you

A commenter asked why the optimizer does not secretly copy survivors into a new table and drop the old one. The answer was that doing so would change observable behavior under concurrent writes, require stronger locking, and interact badly with foreign keys and dependent objects. You could bolt on logging and replay to recover consistency, but then you introduce long and surprising pauses, which is worse than making the tradeoff explicit.

Do not assume the database is missing an obvious optimization. If you want copy-swap-drop behavior, treat it as an operational migration with explicit coordination around writers, dependencies, and latency spikes.

Attribution:

crazygringo #1 #2
nostrademons #1
Retr0id #1
layer8 #1
mordae #1

UPDATE often inherits delete pain

One important extension was that this is not only about DELETE. In Postgres, many UPDATEs behave like delete-plus-insert under MVCC, especially for wide rows or variable-length text that cannot be updated in place. That is why one commenter saw a multi-terabyte correction run vastly slower than a much larger fresh load, and why storage-engine work like zHeap or OrioleDB keeps coming up in these discussions.

When planning large data corrections, benchmark UPDATE against rebuild strategies instead of assuming in-place edits are cheaper. Wide rows and many indexes can make rewrite-based approaches win by a lot.

Attribution:

jandrewrogers #1
crazygringo #1
gopalv #1

Bulk deletes can poison a cluster for days

A concrete MySQL story made the operational risk vivid. A retention cron deleted hundreds of millions of rows, the purge backlog exploded, replica lag spiked, and the cluster stayed unhealthy for days. The recovery pattern was familiar: move to partition-and-drop for append-only retention tables, and choose migration tactics by table size, from gh-ost for safer schema changes to slow nibble deletes when you cannot repartition immediately.

Treat first-run retention jobs as migrations, not housekeeping. Estimate backlog size before enabling them, and have a rollout plan that matches table size and replication sensitivity.

Attribution:

twotwotwo #1

Against the grain

The post confuses scalable with bulk efficient

The sharpest pushback was that DELETE does scale fine for the workload it was built for, which is lots of small transactional deletions. The problem is bulk deletion of millions of rows at once, and every relational database has separate patterns for bulk operations. Framing that as “the only scalable delete” makes ordinary DELETE sound broken when the real distinction is transactional versus bulk work.

Explain this issue inside your team as a bulk-processing concern, not as a blanket indictment of DELETE. That framing leads to better choices about batching, retention design, and migration tooling.

Attribution:

crazygringo #1
coldtea #1

Sometimes the need to wipe data is self-inflicted

Another skeptical view was that frequent whole-table clearing usually signals a design smell, not a normal operating mode. Some cleanup paths are small enough that plain DELETE is faster than TRUNCATE, especially across multiple related tables in tests, and regular VACUUM is just part of living with Postgres rather than evidence that something has gone wrong.

Separate test cleanup, one-off remediation, and real production retention policies. They need different primitives, and using the heaviest one everywhere usually creates more complexity than it saves.

Attribution:

cowthulhu #1
baq #1
foreigner #1
smithcheck4 #1

In plain english

autovacuum ↩

PostgreSQL's background process that reclaims storage from dead tuples and updates statistics used by the query planner.

ClickHouse ↩

An open source analytical database designed for very fast queries over large datasets.

CRUD ↩

Create, Read, Update, Delete, the basic operations of many ordinary business applications.

DELETE ↩

An HTTP method used to remove a resource and defined so that repeating it should leave the system in the same end state.

DROP TABLE ↩

A SQL command that removes an entire table definition and its stored data from the database.

gh-ost ↩

A MySQL online schema migration tool that reduces locking and replication impact during table changes.

MVCC ↩

Multi-Version Concurrency Control, the PostgreSQL mechanism that keeps old row versions around so readers and writers can work concurrently.

OLTP ↩

Online transaction processing, workloads made of many small reads and writes such as application transactions.

OrioleDB ↩

An alternative PostgreSQL storage engine project that aims to change how data is stored and updated.

partition ↩

A physically separate chunk of a table, often split by time or key range, that can be managed independently.

pg_partman ↩

A PostgreSQL extension that helps create and manage partitioned tables over time.

Postgres ↩

PostgreSQL, an open source relational database system.

PostgreSQL ↩

A widely used open source relational database system for storing and querying structured data.

replica lag ↩

The delay between a primary database applying changes and its replicas catching up.

time-to-live ↩

A retention setting that automatically expires data after a set amount of time.

TRUNCATE ↩

A database operation that quickly removes all rows from a table without deleting them one by one.

VACUUM ↩

A PostgreSQL maintenance process that cleans up dead row versions left behind by updates and deletes and can reclaim space for reuse.

zHeap ↩

A PostgreSQL storage-engine project that explores using undo logging to reduce update and delete overhead.

Reference links

Postgres partitioning and vacuum

pg_partman documentation
Referenced as tooling for managing partitioned tables and retention in PostgreSQL.
Debugging Postgres Autovacuum Problems: 13 Tips
Shared as a practical guide for tuning autovacuum instead of assuming large DELETE is hopeless.
PostgreSQL SET TRANSACTION documentation
Linked to correct a claim about PostgreSQL not supporting isolation-level changes.

Prior discussions and migration tooling

HN discussion of Uber switching to MySQL from Postgres
Cited as background on wide tables and index-update costs in PostgreSQL.

Data systems mentioned as alternatives

ClickHouse
Mentioned as a system whose partition-based retention is convenient for time-series workloads.