HN Debrief

Ten years of ClickHouse in open source

  • Open Source
  • Databases
  • Infrastructure
  • Developer Tools

ClickHouse’s anniversary post lays out how the project evolved from an internal analytics engine into a commercial open source company, and argues that deep technical investment plus a durable business model is what keeps infrastructure projects alive. The practical reaction was much simpler: a lot of people have already made the switch, and they report the same pattern over and over. ClickHouse takes workloads that were awkward or expensive in PostgreSQL, Elasticsearch, Loki, TimescaleDB, Redshift, or hand-rolled pipelines, then makes them fast enough and cheap enough that whole categories of operational pain disappear.

If you are still stretching Postgres, Elasticsearch, Loki, or a TSDB to handle large analytical or observability workloads, ClickHouse is now the default option to benchmark. Plan for it as a companion to Postgres rather than a replacement, and test your ingestion path carefully before trusting production metrics.

Discussion mood

Strongly positive. Most comments came from people running ClickHouse in production and describing dramatic gains in query speed, storage efficiency, retention, or operational simplicity. The main source of friction was not capability but boundaries: knowing it should complement Postgres rather than replace it, watching for ingestion and config footguns, and recognizing that some high-availability features appear reserved for the company’s cloud offering.

Key insights

  1. 01

    Retention and storage economics change fast

    Moving large event and log datasets into ClickHouse did more than speed up dashboards. It changed what teams could afford to keep. One operator said a multi-million-dollar storage layer dropped to costs that looked like S3, while another said shifting more than 5 TB of events out of Postgres ended the cycle of constant storage upgrades and made long retention practical again. That is a different decision surface from “queries got faster.” It means product teams can expose more historical analytics without treating every extra month of data as a budget fight.

    When you benchmark ClickHouse, measure retention and storage policy outcomes alongside latency. The bigger win may be being able to keep more raw data and ship better analytics, not just shaving milliseconds off queries.

      Attribution:
    • oooyay #1
    • ezekg #1
    • rozenmd #1
  2. 02

    Observability works with simple schemas

    For logs and ad hoc observability queries, people are getting good results without elaborate upfront modeling. One setup used a basic MergeTree table with a few core fields plus a raw message column, then queried it through Grafana with the ClickHouse plugin. The notable part is not that structured data performs well. It is that even messy logs and on-the-fly JSON extraction were still fast enough to beat a tuned Loki deployment. That lowers the migration barrier for teams who assume observability on ClickHouse demands a heavy data engineering pass first.

    If Loki or a custom log stack is struggling, try a narrow pilot with a minimal schema and Grafana before designing a perfect model. You may get most of the benefit with only a few extracted columns and a raw fallback field.

  3. 03

    The winning architecture is Postgres plus ClickHouse

    The useful framing here is not database replacement but table evacuation. Keep transactional tables and strong consistency in Postgres, then move the large append-heavy analytics, metrics, and log tables into ClickHouse once they become expensive to store and painful to query. That gives up some clean cross-table joins and pushes more coordination into replication or application code, but commenters who made the move treated that as a tolerable price for getting back predictable performance and simpler operations.

    Look for the biggest append-heavy tables in Postgres first. Those are the best candidates to peel off into ClickHouse while leaving the core application database alone.

      Attribution:
    • saisrirampur #1
    • spprashant #1
    • gempir #1
    • eklavya #1
  4. 04

    The engineering culture is built around brutal testing

    The anniversary post’s claim that ClickHouse welcomes experimental pull requests landed because a maintainer backed it up with specifics. They described multiple fuzzers across different layers, huge configuration coverage, and a complete continuous integration run measured in hundreds of hours for a single commit. That helps explain why so many production users sound unusually confident about a system that is both fast and ambitious. The product reputation is not just query speed. It is a willingness to stress every dependency and edge until bugs fall out, including in upstream libraries and the Linux kernel.

    If you evaluate ClickHouse for critical infrastructure, treat its testing discipline as part of the product. It helps justify taking a chance on a newer analytical stack versus bolting together several less focused components.

      Attribution:
    • benjamkovi #1
    • lazyasciiart #1
  5. 05

    JSON ingestion can quietly corrupt dashboards

    The sharpest operational warning was about JSONEachRow and permissive parsing. One commenter said numeric values can come back as strings, so a missed cast changes arithmetic into string concatenation, and skip-unknown-fields can silently drop misspelled columns instead of failing the write. Another mitigated this by enforcing schema types in the application and using materialized columns where needed. This is the kind of issue that does not show up in a happy-path benchmark but absolutely does show up in executive dashboards.

    Add insert-and-read-back tests for ClickHouse ingestion in CI, especially if you use JSONEachRow or permissive parser settings. Validate types and field names before you trust any production metric derived from those pipelines.

      Attribution:
    • haeseong #1
    • charrondev #1

Against the grain

  1. 01

    It is not a Postgres replacement

    The broad enthusiasm can make ClickHouse sound like the answer to every database scaling problem. It is not. Several comments pushed back on that framing and grounded the distinction in workload shape. ClickHouse is columnar OLAP infrastructure, while Postgres is a row-oriented OLTP system with stronger transactional semantics. Replacing one with the other wholesale usually means smuggling analytical assumptions into transactional code and then paying for it later.

    Do not start a migration with the goal of getting rid of Postgres. Start with a single analytical workload that is hurting today and keep transactional ownership where it is.

      Attribution:
    • spprashant #1
    • fsuts #1
    • saisrirampur #1
    • eklavya #1
  2. 02

    Some HA features look cloud-gated

    The praise for the open source project ran into a hard business complaint around zero-copy replication and object-storage-backed high availability. Commenters read the absence of those capabilities in the open source version as a deliberate open-core boundary that nudges serious production users toward ClickHouse Cloud. That does not invalidate the software’s strengths, but it does change the procurement conversation. Self-hosting may stop being attractive right where your availability requirements get serious.

    If your roadmap depends on object storage and high-availability replication, verify feature availability in the exact edition you plan to run. Do that before you standardize on self-hosting and discover the critical piece lives behind the managed offering.

      Attribution:
    • ddorian43 #1
    • orian #1
    • pepperoni_pizza #1

In plain english

Elasticsearch
An open source search and analytics engine often used for log search, filtering, and aggregations over large datasets.
Grafana
A dashboard and visualization tool commonly used for metrics, logs, and observability data.
JSONEachRow
A ClickHouse input and output format where each row is represented as a separate JSON object.
Loki
An open source log aggregation system often paired with Grafana for observability.
MergeTree
ClickHouse’s main table engine for large analytical datasets, optimized for sorting, partitioning, and fast reads.
OLAP
Online analytical processing, workloads focused on large scans, aggregations, and reporting across big datasets.
OLTP
Online transaction processing, workloads focused on many small reads and writes with strong consistency, such as core application databases.
Redshift
Amazon’s managed cloud data warehouse for analytical queries.
S3
Amazon Simple Storage Service, a cloud object storage service often used as a cheap baseline for storing large amounts of data.
TimescaleDB
A PostgreSQL-based extension and product for time-series data storage and querying.
zero-copy replication
A replication approach that avoids duplicating underlying data files, often by reusing shared object storage instead of copying data between nodes.

Reference links

ClickHouse product and documentation

Observability tools on ClickHouse

  • SigNoz
    Named as an observability interface built on ClickHouse.
  • HyperDX
    Mentioned as another observability product using ClickHouse.
  • Maple
    Mentioned as another observability option built around ClickHouse.

Background and related reading