That framing landed with a lot of people. The main consensus was that token savings by themselves are a weak metric. Several commenters said the only number that actually matters is something like cost per correct answer or success per session. People working with agents said this is hard to measure at small scale, which is exactly why so much of the ecosystem runs on vibes and anecdotes. A linked writeup using RTK, caveman, and headroom together became the most concrete data point in the discussion because it reported only about 3 to 4 percent
API cost savings on a $926 bill, with no strong evidence yet on quality impact. That gave the skeptics a sharper point than the original post had on its own.
At the same time, the thread did not reject the whole category. Multiple commenters said the underlying idea is sound because normal CLI output is often wasteful for both humans and models, and local models can see noticeable speed gains from smaller tool output. The more credible pro-RTK comments narrowed the claim. RTK only compresses command output, not the whole session, so its upside depends heavily on whether tool calls are actually a big part of your context budget. Some users reported that this makes little difference in long coding sessions where messages dominate context, while others said they saw meaningful token and latency savings when they limited RTK to a specific set of commands. The practical center of gravity was that token compression is not fake, but broad headline percentages can be very misleading about actual workflow savings.
The conversation also shifted toward alternatives that look less brittle than a giant universal output rewriter. Commenters pointed to better
harness design, using CLI flags like `--short` or compact modes directly,
tree-sitter indexing, subagents for context management, and lightweight local summarizers in front of tool output. Even some people who liked RTK argued that the clean long-term fix is native compact or structured output from the tools themselves, or compression logic integrated into the model harness with explicit user control over the quality versus savings tradeoff. The strongest takeaway was not “never use RTK.” It was that teams should treat token compression as an eval problem, not a marketing number, and should prefer narrow, testable reductions over magical wrappers that promise to tame every command on the system.