Branchless Quicksort faster than std:sort and pdqsort with C and C++ API
- Programming
- Developer Tools
- Infrastructure
The post presents BLQSort, a C and C++ sorting implementation built around branchless quicksort partitioning. The pitch is straightforward: in quicksort, a good pivot produces roughly 50/50 splits, which also makes the comparison outcome hard for a CPU branch predictor to guess. BLQSort sidesteps that by turning the hot partition step into arithmetic and unconditional stores instead of conditional jumps, and the benchmark charts in the post show it beating `std::sort` and pdqsort on random inputs.
Treat this as a specialized systems optimization, not a universal replacement sort. If your workload is random, in-cache, and compares cheap values, branchless partitioning can pay off. If your data is partially ordered, comparator-heavy, or uses nontrivial C++ types, benchmark before you switch and inspect the API constraints closely.
- tiki.li
- Discuss on HN