Epoll vs. io_uring in Linux

Infrastructure
Programming
Security
Developer Tools

The post compares epoll and io_uring through the lens of a small reverse proxy project. It explains the old Linux model of waiting for readiness with epoll versus io_uring’s newer submission and completion rings, where user space and the kernel share queues to cut syscall overhead and support operation chaining. The comments mostly landed on a blunt conclusion: the API choice matters, but it is rarely the first bottleneck. Several people pointed out that a proxy can leave huge performance on the table through cross-core traffic, cache contention, allocator choices, and NIC queue layout long before epoll versus io_uring decides the outcome. That is why raw “CPU went up” or “req/s improved” anecdotes were treated with suspicion unless they also came with latency and max-load behavior.

If you run latency-sensitive or very high-throughput Linux services, treat io_uring as a targeted optimization, not an automatic upgrade. Benchmark end-to-end with your actual workload, and check early whether your kernels, containers, and security policies even allow it before you redesign around it.

June 21, 2026
sibexi.co
Discuss on HN

Discussion mood

Interested and cautiously positive. People liked the walkthrough and generally agreed io_uring can be faster, but they pushed back on simplistic conclusions and kept steering the conversation toward workload-specific measurement, system architecture, and real deployment constraints like sandboxing and kernel security.

Key insights

CPU and NIC affinity can dwarf API gains

Aligning threads, listen sockets, and packet flow to specific CPUs can remove cross-core handoffs that kill a proxy’s throughput. The concrete claim was that on large multi-queue NIC systems, receive-side scaling, socket affinity like SO_INCOMING_CPU, and avoiding false sharing can produce order-of-magnitude gains that make epoll versus io_uring look like a second-order choice.

Before rewriting around io_uring, profile how packets move across cores and NIC queues. On multi-core servers, add CPU pinning and data-layout checks to the benchmark plan, because that may buy more capacity than an API migration.

Attribution:

toast0 #1 #2 #3
camkego #1

Higher CPU usage is not a regression by itself

A busier CPU after switching to io_uring can mean the machine is spending less time in kernel overhead and more time doing useful work. The useful yardsticks are throughput, tail latency, and behavior at saturation, not whether htop shows a larger percentage. That correction matters because io_uring’s job is often to trade waiting and syscall churn for more active work.

When you compare epoll and io_uring, collect p99 latency, max throughput, and system versus user CPU time. Do not treat lower CPU utilization as the win condition for an I/O stack.

Attribution:

vlovich123 #1 #2 #3
saghm #1
FooBarWidget #1
toast0 #1
topspin #1

Most async frameworks blunt io_uring’s advantages

Libraries built around a poll-style event loop often treat io_uring as just another readiness backend. That misses the features that make it interesting, especially linked operations and low-syscall execution paths. The result is a common trap where swapping backends raises complexity and CPU cost without unlocking the design changes required for real gains.

If your stack hides I/O behind a poll-shaped abstraction, expect limited benefit from flipping on io_uring. Check whether the framework can express chained operations and completion-driven flows before betting on benchmark wins.

Attribution:

Asmod4n #1
MathMonkeyMan #1

io_uring is broader than socket multiplexing

The useful framing is not just “faster epoll.” io_uring can cover non-socket interfaces that have poor or no non-blocking user APIs, and it can express sequences of operations as one pipeline. That makes it attractive for file, storage, device, and mixed I/O paths even when pure network readiness handling sees only small gains.

Look at io_uring first in code paths that mix network and file or device I/O, or where you need operation chaining. The value is often bigger there than in a clean socket-only event loop.

Attribution:

Cloudef #1
lukeh #1
kshri24 #1

Security support is ahead of deployment reality

Per-operation filtering for io_uring now exists, but it is too new to count on in the environments most companies actually run. Enterprise kernels, seccomp policies, and container runtimes still lag, and recent vulnerability history means many platforms will keep blocking io_uring for a while even if the upstream kernel story improves.

Treat io_uring availability as a deployment dependency, not a code dependency. Verify kernel versions, sandbox policy, and container runtime support before committing to it in a product roadmap.

Attribution:

insanitybit #1 #2
Asmod4n #1 #2
cyphar #1
mort96 #1

Busy-poll epoll is a serious low-latency option

For dedicated proxy boxes, epoll-based busy polling tied to NAPI contexts can push latency down without jumping all the way to DPDK or a full kernel-bypass design. That puts a useful middle ground on the table for teams that need better packet responsiveness but cannot absorb the complexity of user-space networking stacks.

If your goal is lower network latency rather than a general async rewrite, test epoll busy polling and NAPI-aware worker placement. It may get you close enough without the operational cost of DPDK or AF_XDP.

Attribution:

buybackoff #1

Against the grain

Fast servers are not defined by the API

A well-built server can perform well with either readiness multiplexing or completion-based async I/O, because implementation quality dominates in most real systems. The useful cross-platform perspective was that Windows has long had interfaces like Registered I/O, yet Linux servers were not somehow uncompetitive before io_uring arrived.

Do not let the existence of a newer kernel API force a rewrite narrative. If your existing epoll design is sound, demand evidence from your workload before paying the migration cost.

Attribution:

up2isomorphism #1
RossBencina #1
muststopmyths #1

The real next step may be kernel bypass

Once you push hard on packets per second, the limiting factor can become the Linux network stack itself rather than epoll or io_uring. At that point, features like GSO and GRO help, but AF_XDP, DPDK, or even FPGA paths become the relevant comparison set if raw performance is the goal.

If you are already near line-rate networking limits, benchmark against AF_XDP or DPDK instead of assuming io_uring is the endgame. That changes the engineering tradeoff from API design to operational complexity and hardware tuning.

Attribution:

Cloudef #1
gafferongames #1
inigyou #1

In plain english

AF_XDP ↩

A Linux socket family for high-performance packet processing that can bypass much of the normal network stack.

DPDK ↩

Data Plane Development Kit, a set of user-space libraries and drivers for very high-speed packet processing outside the normal kernel network stack.

epoll ↩

A Linux kernel API that lets a program wait for many file descriptors, such as sockets, to become ready for reading or writing.

false sharing ↩

A performance problem where different CPU cores modify separate data that happens to sit on the same cache line, causing extra cache traffic.

GRO ↩

Generic Receive Offload, a Linux networking feature that combines incoming packets to reduce per-packet processing overhead.

GSO ↩

Generic Segmentation Offload, a Linux networking feature that lets large packets be split efficiently later in the stack or by hardware.

io_uring ↩

A Linux kernel interface that uses shared submission and completion queues so programs can submit I/O work and receive results with less syscall overhead.

NAPI ↩

New API, the Linux mechanism that manages how network drivers switch between interrupt-driven and polling-based packet processing.

NIC ↩

Network interface card, the hardware that connects a machine to a network.

receive-side scaling ↩

A network hardware and driver technique that spreads incoming packets across multiple CPU cores and NIC queues.

Registered I/O ↩

A Windows networking API, often shortened to RIO, designed for high-performance asynchronous socket I/O.

reverse proxy ↩

A server that sits in front of other servers, accepts client requests, and forwards them to backend services.

seccomp ↩

Secure computing mode, a Linux feature that restricts which system calls a process is allowed to make.

SO_INCOMING_CPU ↩

A Linux socket option that helps associate incoming socket processing with a specific CPU.

syscall ↩

A request from a user-space program to the operating system kernel to perform a privileged operation.

tail latency ↩

The slow end of the latency distribution, often measured as p95 or p99 response time rather than the average.

Reference links

Related writeups and examples

TinyGate GitHub repository
Source code for the reverse proxy project discussed in the post and comments.
io_uring, kTLS, and Rust for zero-syscall HTTPS server
A related implementation writeup exploring io_uring for a web server.
Serving files three ways
Another comparison of HTTP file server implementations meant to teach the I/O model differences.

Low-latency networking and busy polling

Fastly busy-polling presentation slides
Explains epoll-based busy polling with diagrams and operational notes.
LWN on epoll-based busy polling
Background reading on the kernel feature discussed as an alternative to io_uring for low latency.
LWN on epoll-based busy polling
Additional reporting on the same kernel work and its tradeoffs.
LWN on networking busy polling
Further context on how Linux networking polling behavior is evolving.
Linux kernel NAPI documentation
Kernel docs for the NAPI mechanics underlying the busy-polling discussion.

Performance tooling and libraries

Concurrency Kit
Suggested as a low-level concurrency library for building a high-performance proxy.
mimalloc
Suggested allocator option for aligned and efficient memory use in networking code.
libxdp documentation
Suggested for adding DDoS protection and more advanced layer 4 packet handling.

Support and security references

Red Hat io_uring support note
Cited to show that recent RHEL releases now support io_uring by default.
liburing discussion on kernel limits
Referenced to support the point that packet-per-second limits can be in the kernel stack, not the API.