HN Debrief

The only scalable delete in Postgres is DROP TABLE

  • Databases
  • Infrastructure
  • Open Source
  • Developer Tools

The post says PostgreSQL does not really have a cheap way to remove huge amounts of data with DELETE, because every deleted row still creates transactional and storage work that later has to be cleaned up by VACUUM. Its practical advice is to avoid row-by-row deletion when the shape of the workload allows it, especially for retention use cases, and instead delete whole chunks of data by dropping a table or partition. That is fast because the database mostly throws away metadata instead of touching every row.

If you own a high-volume Postgres system, treat bulk deletion as a data-modeling problem, not just a query problem. Use partitioning and retention boundaries where you can, and be wary of copy-swap-drop or DROP TABLE tricks in live OLTP paths because locking, foreign keys, and concurrency can turn a fast idea into an outage.

Discussion mood

Mostly positive on the core point that row-by-row bulk deletion is expensive, but skeptical of the title and framing. The mood was that the post teaches a real operational lesson, yet overstates it by presenting a retention-and-partitioning strategy as if it applied broadly to all PostgreSQL deletes.

Key insights

  1. 01

    Partition dropping is a retention pattern

    It works when data expires in chunks you can isolate ahead of time, usually by timestamp. That makes this a standard time-series retention strategy, not a general answer for applications that delete individual records by primary key or business relationship. The comments tied the advice to existing tooling like pg_partman and to systems like ClickHouse that make partition-based time-to-live natural.

    If your deletion policy follows time windows, encode that in the schema now and automate partition lifecycle. If deletions are object-level and irregular, do not expect partition dropping to rescue the design later.

      Attribution:
    • saisrirampur #1
    • CharlieDigital #1
    • pstuart #1
    • awinter-py #1
  2. 02

    Autovacuum tuning changes the picture

    Large DELETE is not automatically catastrophic in Postgres. With well-tuned autovacuum, people reported it working at terabyte scale, and one comment noted the hidden gotcha in high-churn tables: performance depends on dead tuples since the last VACUUM, not just live row count. That means the bottleneck is often cleanup policy and churn pattern, not the mere presence of DELETE.

    Before redesigning around table swaps, inspect autovacuum behavior, dead tuple growth, and churn hotspots. A tuning pass may buy back enough headroom for your current workload.

      Attribution:
    • saisrirampur #1 #2
    • baq #1
    • Retr0id #1
  3. 03

    The database cannot safely rewrite DELETE for you

    A commenter asked why the optimizer does not secretly copy survivors into a new table and drop the old one. The answer was that doing so would change observable behavior under concurrent writes, require stronger locking, and interact badly with foreign keys and dependent objects. You could bolt on logging and replay to recover consistency, but then you introduce long and surprising pauses, which is worse than making the tradeoff explicit.

    Do not assume the database is missing an obvious optimization. If you want copy-swap-drop behavior, treat it as an operational migration with explicit coordination around writers, dependencies, and latency spikes.

      Attribution:
    • crazygringo #1 #2
    • nostrademons #1
    • Retr0id #1
    • layer8 #1
    • mordae #1
  4. 04

    UPDATE often inherits delete pain

    One important extension was that this is not only about DELETE. In Postgres, many UPDATEs behave like delete-plus-insert under MVCC, especially for wide rows or variable-length text that cannot be updated in place. That is why one commenter saw a multi-terabyte correction run vastly slower than a much larger fresh load, and why storage-engine work like zHeap or OrioleDB keeps coming up in these discussions.

    When planning large data corrections, benchmark UPDATE against rebuild strategies instead of assuming in-place edits are cheaper. Wide rows and many indexes can make rewrite-based approaches win by a lot.

      Attribution:
    • jandrewrogers #1
    • crazygringo #1
    • gopalv #1
  5. 05

    Bulk deletes can poison a cluster for days

    A concrete MySQL story made the operational risk vivid. A retention cron deleted hundreds of millions of rows, the purge backlog exploded, replica lag spiked, and the cluster stayed unhealthy for days. The recovery pattern was familiar: move to partition-and-drop for append-only retention tables, and choose migration tactics by table size, from gh-ost for safer schema changes to slow nibble deletes when you cannot repartition immediately.

    Treat first-run retention jobs as migrations, not housekeeping. Estimate backlog size before enabling them, and have a rollout plan that matches table size and replication sensitivity.

      Attribution:
    • twotwotwo #1

Against the grain

  1. 01

    The post confuses scalable with bulk efficient

    The sharpest pushback was that DELETE does scale fine for the workload it was built for, which is lots of small transactional deletions. The problem is bulk deletion of millions of rows at once, and every relational database has separate patterns for bulk operations. Framing that as “the only scalable delete” makes ordinary DELETE sound broken when the real distinction is transactional versus bulk work.

    Explain this issue inside your team as a bulk-processing concern, not as a blanket indictment of DELETE. That framing leads to better choices about batching, retention design, and migration tooling.

      Attribution:
    • crazygringo #1
    • coldtea #1
  2. 02

    Sometimes the need to wipe data is self-inflicted

    Another skeptical view was that frequent whole-table clearing usually signals a design smell, not a normal operating mode. Some cleanup paths are small enough that plain DELETE is faster than TRUNCATE, especially across multiple related tables in tests, and regular VACUUM is just part of living with Postgres rather than evidence that something has gone wrong.

    Separate test cleanup, one-off remediation, and real production retention policies. They need different primitives, and using the heaviest one everywhere usually creates more complexity than it saves.

      Attribution:
    • cowthulhu #1
    • baq #1
    • foreigner #1
    • smithcheck4 #1

In plain english

autovacuum
PostgreSQL’s automatic background process for running VACUUM and related maintenance without manual intervention.
ClickHouse
An open source column-oriented database designed for fast analytics.
CRUD
Create, Read, Update, Delete, the four basic kinds of database operations in typical applications.
DELETE
A SQL command that removes selected rows from a table.
DROP TABLE
A SQL command that removes an entire table definition and its stored data from the database.
gh-ost
A MySQL online schema migration tool that reduces locking and replication impact during table changes.
MVCC
Multi-Version Concurrency Control, a database technique that keeps old and new row versions so readers and writers interfere less with each other.
OLTP
Online Transaction Processing, workloads made up of many small, concurrent database transactions such as user-facing app actions.
OrioleDB
An alternative PostgreSQL storage engine project that aims to change how data is stored and updated.
partition
A physically separate chunk of a table, often split by time or key range, that can be managed independently.
pg_partman
A PostgreSQL extension that helps create and manage partitioned tables over time.
Postgres
Postgres, short for PostgreSQL, is a widely used open source relational database.
PostgreSQL
A widely used open source relational database system, often shortened to Postgres or PG.
replica lag
The delay between a primary database applying changes and its replicas catching up.
time-to-live
A retention setting that automatically expires data after a set amount of time.
TRUNCATE
A SQL command that quickly removes all rows from a table, usually with different locking and transactional behavior than DELETE.
VACUUM
A PostgreSQL maintenance process that cleans up dead row versions left behind by updates and deletes and can reclaim space for reuse.
zHeap
A PostgreSQL storage-engine project that explores using undo logging to reduce update and delete overhead.

Reference links

Postgres partitioning and vacuum

Prior discussions and migration tooling

Data systems mentioned as alternatives

  • ClickHouse
    Mentioned as a system whose partition-based retention is convenient for time-series workloads.