How TimescaleDB compresses time-series data

Databases
Infrastructure
Open Source
Developer Tools
Hardware

The post explains TimescaleDB’s newer compressed storage path for time-series workloads. Data is reorganized into more columnar chunks, then encoded with schemes matched to the data type, including Gorilla-style compression for timestamps and floating-point values. The headline promise is very high storage reduction inside PostgreSQL, aimed at telemetry and IoT datasets where values repeat, move slowly, or can be represented as small deltas.

If you use PostgreSQL for telemetry or IoT, treat compression as a query-engine design choice, not a storage checkbox. Also verify your TimescaleDB package actually includes compression features before you plan around them, especially on distro builds.

June 15, 2026
roszigit.com
Discuss on HN

Key insights

Compression only wins if queries get cheaper

The key test is not the compression ratio. It is whether the encoding lets the database skip reads, turn expensive string work into integer filters, or run deterministic functions like UPPER() once on a dictionary instead of once per row. That framing shifts the story from storage savings to execution plan quality, which is where these systems actually earn their keep.

When evaluating compressed storage, benchmark filtered and aggregated queries, not just disk usage. Ask specifically whether the engine can prune data, operate on dictionary codes, and avoid row-by-row decompression.

Attribution:

gopalv #1

Metadata and layout do most of the work

Per-segment stats like min, max, distinct counts, and Bloom filters can answer or prune many analytic queries before any payload is decompressed. The comment also points out that disk layout, top-N early stopping, filter pushdown, and parallel execution are first-order design choices. Compression is only one layer in a much larger optimization stack.

If you build or buy a time-series store, inspect segment metadata, pruning behavior, and execution strategy. A flashy codec will not rescue a poor on-disk layout or weak pushdown path.

Attribution:

tudorg #1

Modern compression weakens the case for lossy historians

For IoT and industrial telemetry, commenters argued that older historian-era compromises came from expensive storage and weak compression for floating-point streams. With Gorilla-style lossless encoding and cheap storage, keeping every sample is often affordable. That changes the default architecture from specialized lossy historians toward general databases and file formats like Parquet-backed Delta tables.

If you still rely on lossy ingest rules mainly to save space, rerun the math with current codecs and storage costs. You may be able to keep raw signals and simplify downstream analysis and auditing.

Attribution:

heliosAtwork #1
lkanwoqwp #1
niltecedu #1

JSONB is still a bad fit

TimescaleDB has a long-standing issue around JSONB compression, and a commenter with production use says medium-sized JSON blobs remain a pain point. Another comment points to Iceberg Variant encoding proposals from Databricks and Snowflake as a more promising direction. The important idea is to break semi-structured payloads into typed, columnar chunks so the engine can prune and filter them like ordinary columns.

If your telemetry schema hides most value inside JSONB, do not assume time-series compression will save you. Model hot fields explicitly or watch emerging typed semi-structured formats before committing to large JSON-heavy tables.

Attribution:

PaulWaldman #1
kevinob11 #1
gopalv #1

Packaging and licensing can block the feature

Compression is not just a technical capability. It may be absent from the TimescaleDB package your Linux distribution ships because those builds only include Apache-licensed parts. That creates an easy failure mode where teams think they are evaluating TimescaleDB, but they are really evaluating a stripped-down build.

Check the exact package and license terms in your environment before designing around compression. Validate feature availability in staging with the same install path you plan to use in production.

Attribution:

self_awareness #1

Against the grain

The headline ratio reads like marketing

The criticism is that 'up to 98%' is the kind of claim that hides workload dependence and says little about what a real deployment should expect. The reply says the number came from an actual MQTT-backed database, but that still leaves the central problem: without query benchmarks and data-shape context, the ratio is hard to interpret.

Treat headline compression numbers as anecdotal until you see your own data distribution and query mix. Ask for before-and-after results on latency, CPU, and scan volume, not just storage charts.

Attribution:

robocat #1
lkanwoqwp #1

In plain english

Apache ↩

Usually short for the Apache License, a permissive open source license that allows broad reuse and includes an explicit patent grant.

Databricks ↩

A data and AI platform commonly used for analytics, data engineering, and machine learning workflows.

Delta tables ↩

Tables managed with the Delta Lake format, which adds metadata and transactional features on top of file-based data storage.

filter pushdown ↩

An optimization where filtering is applied as early as possible to avoid processing unnecessary data.

Gorilla ↩

A compression scheme for time-series data, introduced by Facebook, that stores timestamps and numeric values efficiently using small deltas and bit-level encoding.

Iceberg ↩

Apache Iceberg, an open table format for large analytic datasets stored in files like Parquet.

IoT ↩

Internet of Things, a broad term for connected devices like TVs, cameras, thermostats, and appliances.

JSONB ↩

A PostgreSQL data type for storing JSON in a binary format that can be indexed and queried efficiently.

MQTT ↩

Message Queuing Telemetry Transport, a lightweight messaging protocol often used in home automation.

OT ↩

Operational Technology, meaning industrial systems and control environments such as factories, plants, or field equipment.

Parquet ↩

A columnar data file format designed for efficient storage and querying of structured data.

partition pruning ↩

Skipping whole partitions of data during a query because metadata shows they cannot match the filter.

PostgreSQL ↩

A widely used open source relational database system for storing and querying structured data.

Snowflake ↩

A cloud data warehouse platform used to store and analyze large amounts of business data.

swinging-door compression ↩

A lossy time-series compression method used in industrial historians that drops points while keeping values within an error bound.

time-series ↩

Data recorded over time, usually as timestamped measurements or events.

TimescaleDB ↩

A PostgreSQL-based extension and product for time-series data storage and querying.

Variant encoding ↩

A proposed way to store semi-structured data in typed, more query-friendly form inside columnar systems.

Reference links

Semi-structured data and compression proposals

Apache Parquet VariantEncoding proposal
Referenced as a model for turning JSON-like payloads into typed columnar chunks that can be compressed and queried more effectively.
TimescaleDB issue for JSONB compression
Used to show that JSONB compression support is a long-standing gap for some real-world TimescaleDB users.

Related time-series engine work

xataio/deltax
A PostgreSQL extension project mentioned by a commenter as another attempt to optimize time-series analytics and ClickBench performance.

Industrial historian context

AVEVA PI Server documentation on swinging-door compression
Provided as background for the older lossy compression approach used in industrial data historians.

How TimescaleDB compresses time-series data

Discussion mood

Key insights

Against the grain

In plain english

Reference links

Semi-structured data and compression proposals

Related time-series engine work

Industrial historian context