Queues Don't Fix Overload (2014)

Infrastructure
Programming
Developer Tools

The post’s core claim is simple. A queue can absorb short-lived mismatch between producers and consumers, but it cannot fix sustained overload. If work arrives faster than it can be completed on average, the backlog only gets bigger. The queue delays the pain, adds latency, and often hides the real bottleneck until recovery gets expensive. People largely agreed with that framing and sharpened it. The useful distinction that emerged is “volatility versus mismatch.” Queues are good at smoothing bursty arrivals over short windows. They are bad at handling demand spikes that last longer than your latency budget, like traffic surges tied to news or promotions. Several comments recast queues as decouplers, closer to manufacturing buffers or async FIFOs between different clock domains. That framing landed better than “queues are bad,” because it explains why they remain valuable for isolating systems, moving non-urgent work off the request path, and scaling readers and workers independently. The catch is that decoupling weakens feedback. Once producers no longer feel downstream pain immediately, you need explicit backpressure and tight queue limits or you just hide trouble. A recurring practical point was that small bounded buffers are healthier than giant ones. Large buffers increase work in progress, mask sync bugs and deadlocks, and can turn a small incident into a long recovery cascade across services. The comments also pushed the article on one nuance: for low-urgency tasks like analytics, counters, or background updates, delaying work by minutes or even hours is often perfectly acceptable. In those cases, queues are not pretending to fix overload. They are trading freshness for simpler, more scalable request handling. The consensus was not anti-queue. It was anti-magical thinking about what queues can actually buy you.

Treat queues as a buffering and isolation tool, not as capacity planning. Put explicit backpressure, bounded queues, and drop or defer policies in place before load arrives, or you will turn a visible overload into a slower, harder-to-recover failure.

June 11, 2026
ferd.ca
Discuss on HN

Discussion mood

Mostly positive on the core point, with a pragmatic tone. People agreed that queues are routinely misused as a substitute for backpressure and capacity planning, but they also defended them as an essential tool for smoothing short bursts, decoupling components, and offloading non-urgent work when latency can slip.

Key insights

Queues trade burst smoothing for latency

This sharpens the article’s claim by separating short-term burstiness from true capacity mismatch. A queue can reconcile uneven production and consumption over a narrow time window, but it does so by increasing how long work waits before completion or before admission. That makes queues a poor answer for demand spikes that last hours when your service-level objective is measured in seconds.

Match queue depth to the burst window you can tolerate, not to abstract peak traffic. If the arrival surge lasts longer than your latency budget, reject, defer, or shed work at the edge instead of pretending the backlog is harmless.

Attribution:

10000truths #1
jstimpfle #1

Small buffers expose trouble earlier

Keeping buffers tight forces backpressure to show up while the problem is still small. Big queues let work in progress pile up, delay the signal that something is wrong, and in concurrent systems can hide deadlocks or synchronization bugs until the worst possible moment. The queue is not just storage. It is also an instrument panel, and a constantly full one is telling you exactly where the bottleneck sits.

Set hard queue limits and alert on occupancy, age, and drain rate. Treat sustained high fill levels as an incident, not as proof that buffering is working.

Attribution:

marcosdumay #1
thwarted #1
MyHonestOpinon #1
nasretdinov #1

Decoupling reduces immediate feedback

The manufacturing and hardware analogies make the upside of queues clearer. They let adjacent stages run at different rhythms and isolate failure domains, much like buffers between factory stations or an async FIFO between clock domains. The hidden cost is weaker feedback. Once the producer is insulated from the consumer, overload stops being self-limiting unless you deliberately add backpressure.

When you introduce a queue between services, add the feedback path in the same design. Decide upfront how producers slow down, what gets dropped, and who gets paged before the queue quietly becomes your outage sponge.

Attribution:

MyHonestOpinon #1
quentindanjou #1
derefr #1
aidenn0 #1

Async background work is the right fit

The strongest defense of queues was for low-value or delay-tolerant work. View counters, last-seen updates, analytics, and similar side effects do not need to block a user-facing GET and can safely lag behind. That lets request handlers stay simple, read from replicas, and scale horizontally. The important distinction is that this is not using a queue to save an overloaded critical path. It is removing non-critical work from that path entirely.

Audit synchronous request flows for writes that do not need immediate consistency. Move those to background jobs first, then add explicit degradation rules so they are the first thing shed under pressure.

Attribution:

xp84 #1
milesvp #1

FIFO is not always the right queue

A linked queueing theory post introduced a more operational point. Fair first-in first-out processing can be the wrong policy for workloads where freshness matters more than fairness. For some systems, last-in first-out or other scheduling choices can cut perceived latency and avoid grinding through stale work that no longer matters.

Do not default to FIFO just because it feels fair. For each queued workload, define whether you care more about fairness, freshness, tail latency, or discardability, then pick the discipline that matches.

Attribution:

cmrdporcupine #1

Against the grain

Mid-pipeline buffers can still be useful

The automation-game example pushes back on the blanket dislike of internal buffers. When bottlenecks move over time, buffering intermediate work can be an intentional optimization that saves build cost or keeps the system productive while a temporary constraint shifts elsewhere. The failure mode is not buffers themselves. It is forgetting that they can conceal the reason throughput fell.

Do not ban internal queues on principle. Use them when they buy cost or utilization, but pair them with visibility into where waiting time accumulates so the hidden bottleneck does not stay hidden.

Attribution:

MaulingMonkey #1

User-visible retries are not obviously better

Pushing backpressure to the frontend does not magically improve the situation if the user just sees retries or failures instead of an accepted job. For some products, taking the work and processing it later is the better experience, especially when the task is not urgent and eventual completion matters more than immediacy. That is a product decision as much as a systems one.

Choose overload behavior per workflow, not per architecture slogan. For each endpoint, decide whether users prefer fast rejection, delayed completion, stale success, or silent dropping, then design the queue and UI around that choice.

Attribution:

xp84 #1
binsquare #1

Queueing theory books can miss practitioners

The book recommendations drew a useful warning. Formal queueing theory is powerful, but some engineers find the standard texts too math-heavy or too detached from the messy failure modes they fight at work. Practical flow books may be easier to apply even if they give less analytical depth.

If you want your team to get better at load and latency, do not assume one canonical textbook will land. Mix lightweight operational heuristics, observability work, and selective theory instead of waiting for everyone to absorb the math.

Attribution:

kqr #1
wreath #1
jstimpfle #1

In plain english

async FIFO ↩

An asynchronous first-in, first-out buffer used to pass data safely between hardware components running on different clocks.

backpressure ↩

The condition where a slower receiver or downstream system forces a sender to slow down or buffer more data.

work in progress ↩

Tasks or items that have started but are not yet finished, often abbreviated as WIP.

Reference links

Books and learning resources

Performance Modeling and Design of Computer Systems: Queueing Theory in Action
Linked in a review discussion about whether Harchol-Balter’s book is a practical introduction to queueing theory.

Related essays and prior discussions

FIFO considered harmful
Referenced as a classic post arguing that last-in first-out scheduling can outperform FIFO for some workloads.
Previous Hacker News discussion from 2024
Shared as an earlier submission with a larger comment archive on the same article.
Previous Hacker News discussion from 2014
Shared as the original older discussion of the same article.