HN Debrief

PostgresBench: A Reproducible Benchmark for Postgres Services

  • Databases
  • Infrastructure
  • Open Source
  • Cloud

ClickHouse introduced PostgresBench as a reproducible way to compare Postgres services under the same transactional workload, with the stated goal of making cloud Postgres performance easier to evaluate. The project is open source and the author confirmed you can run it against your own deployment in about 15 minutes, which made it immediately interesting to people trying to measure CNPG on Kubernetes or compare providers that look similar on paper.

Treat the current results as a starting point for screening vendors, not a buying decision. If you run Postgres seriously, extend any benchmark to include your HA mode, your tuning, and throughput-over-time before trusting the rankings.

Discussion mood

Mostly positive about having a public, reproducible benchmark at all, but skeptical of the current results as a realistic proxy for production. The criticism centered on short run times, missing HA overhead, no pricing, limited deployment coverage, and reliance on default configs.

Key insights

  1. 01

    Checkpoint effects are missing from the graphs

    Longer benchmark windows would expose checkpoint-driven stalls that a 10 minute run can dodge or smooth away. That changes the meaning of the published latency numbers because average TPS can look fine while service quality degrades in bursts, which is exactly what operators need to see when they set SLAs.

    Ask for TPS and latency over time, not just aggregate averages. If you benchmark your own stack, run long enough to cross multiple checkpoint cycles and inspect the shape of the dips.

      Attribution:
    • ahachete #1 #2
  2. 02

    HA overhead can change the ranking

    Disabling high availability makes the comparison cleaner, but it also removes a cost many teams actually pay in production. One commenter said semi-synchronous replication on local NVMe cut performance by roughly 24 to 27 percent in their tests, which is large enough to reshuffle vendor standings once replicas are in the loop.

    Re-run any serious evaluation with the exact durability and failover mode you plan to ship. A fast single-node result is not enough if your real deployment waits on secondaries.

      Attribution:
    • ahachete #1 #2
    • saisrirampur #1
  3. 03

    Default configs measure vendor posture too

    Using out-of-the-box settings is not neutral. It captures how aggressively each provider tunes shared buffers, WAL behavior, and other defaults, which is a product decision as much as a database capability. That is valuable if you want a zero-touch service, but it is a poor proxy for the ceiling you can reach with light tuning.

    Separate two questions in your own evaluation. First test the default experience, then test a tuned configuration so you can see whether you are buying convenience or leaving real headroom unused.

      Attribution:
    • ahachete #1
    • saisrirampur #1
  4. 04

    Customers also need non-managed baselines

    Including vanilla Postgres on a VPS, bare metal, or a self-managed cloud setup would make the benchmark more decision-useful, even if the environments are not perfectly comparable. Buyers do not care whether that is academically pure. They care whether managed services are earning their premium against realistic alternatives they could actually operate.

    If you are comparing providers, include at least one self-managed baseline in the same exercise. It gives finance and engineering a concrete reference for how much convenience is costing you.

      Attribution:
    • cuu508 #1
    • nwhnwh #1
    • saisrirampur #1

Against the grain

  1. 01

    The project has not earned community trust yet

    Sparse stars and few contributors were taken as a signal that the benchmark has not yet become a broadly validated community standard. That does not make the tool bad, but it does mean the methodology still looks vendor-driven rather than field-tested by many operators with different agendas.

    Use the harness, but do not outsource judgment to it yet. Watch whether outside operators add coverage, challenge assumptions, and keep the methodology honest over time.

      Attribution:
    • karlmush #1

In plain english

CNPG
CloudNativePG, an open-source operator for running PostgreSQL on Kubernetes.
HA
High availability, a system design that keeps a service running during failures by using redundancy and failover.
NVMe
Non-Volatile Memory Express, a fast storage interface commonly used by modern solid-state drives.
TPS
Transactions per second, a measure of how many database transactions a system completes each second.
VPS
Virtual private server, a rented virtual machine used to host services on the internet.
WAL
Write-ahead log, the Postgres record of changes written durably before data pages are updated.

Reference links

Benchmark project