Task Failed Successfully: Saturating NIC and Disk Bandwidth
- Infrastructure
- Hardware
- Programming
The post is a case study in chasing a hidden systems bottleneck. The author was trying to saturate a NIC while reading from NVMe with io_uring and RDMA. Early fixes removed the obvious overhead from pinning pages for each I/O with READ_FIXED, but the full deployment still stalled around half of expected throughput. The write-up walks through the dead ends first. io-wq backlog was not the limiter. Request splitting was not the limiter. File descriptor lookup and CRC work were not the limiter either. What finally broke the case open was that the workload kept scanning roughly 1 MiB buffers backed by ordinary 4 KiB pages, which drove heavy dTLB miss cost. Moving the read arena to hugepages got throughput close to NIC saturation.
If a data path looks "CPU-bound" while disks and NICs are both under target, treat virtual memory behavior as a first-class suspect. Add top-down profiling and page-size experiments early, especially for large sequential buffers or zero-copy style pipelines.
- blog.mrcroxx.com
- Discuss on HN