HN Debrief

Show HN: Smart model routing directly in Claude, Codex and Cursor

  • AI
  • Developer Tools
  • Infrastructure
  • Open Source

Weave released a self-hostable model router for coding agents. It exposes an Anthropic or OpenAI-compatible endpoint, watches each inference request from tools like Claude Code, Codex, Cursor, or OpenCode, and chooses which model should handle that step. The claim is that coding sessions do not need frontier models all the time, so the router can send planning or gnarly debugging to expensive models like Opus while pushing simpler exploration or implementation work to cheaper models. Weave says it trained the router on tens of thousands of agent traces with reinforcement learning and has seen about 40% token savings internally with no visible hit to quality or velocity.

If you are spending real money on coding agents, model routing is becoming an infrastructure problem, not just a prompt tweak. But the useful version likely needs cache-aware session management, subagent support, and hard evals against your own workload before it is safer than simply locking a few known-good model choices.

Discussion mood

Interested but skeptical. People liked the cost-saving goal and agreed token spend has become painful, but they kept circling back to cache invalidation, loss of agent-level control, weak public evidence, and the risk that routing is far less useful than just picking a small set of known-good models per workflow.

Key insights

  1. 01

    Subagents are the easiest routing win

    Subagents make the product sound more plausible than the main agent loop does. Because they start with a fresh context window, they do not inherit cache from the parent session, so routing them to cheaper or specialized models avoids the biggest penalty that hurts mid-session switching. That turns routing from a brittle per-turn optimization into a cleaner orchestration problem around task decomposition.

    If you want to test model routing, start with subagent tasks like code search, summarization, or isolated implementation steps. You will learn faster there than by trying to dynamically swap the primary model in a long-running cached session.

      Attribution:
    • alansaber #1
    • adchurch #1 #2
  2. 02

    Harness behavior can dominate model choice

    The wrapper around a model appears to change outcomes enough that routing on model names alone can miss the real source of performance. People described Opus behaving noticeably differently in Copilot CLI, Claude Code, OpenCode, and Pi, with context limits and tool orchestration likely doing as much work as the base model. That means a router trained on traces from one harness may generalize poorly to another, even when the nominal model is the same.

    Treat harness and model as a combined unit in your evals, logging, and routing rules. If you swap shells, context windows, or tool setups, assume your routing policy may need to be retrained or re-tuned.

      Attribution:
    • devmor #1
    • ValentineC #1
    • adchurch #1
  3. 03

    Locked workflows may beat live routing

    For repeatable classes of work, predefining which model to use at each phase may be more reliable than making decisions on the fly. If you already have evals, holdback data, and stable task shapes, you can tune prompts and model choices offline and avoid paying routing mistakes during production sessions. That reframes live routing as a convenience layer for messy interactive work, not the default best practice for everything.

    Separate your workloads before adopting a router. Keep deterministic internal flows on fixed model policies, and reserve dynamic routing for exploratory sessions where the task shape really changes midstream.

      Attribution:
    • peterbell_nyc #1
    • gopher_space #1
    • jpease #1
  4. 04

    Recovery logic matters more than first-pass accuracy

    The hard part is not guessing the perfect model from the opening prompt. It is detecting when a cheaper model is stuck and escalating quickly enough that the savings survive the mistake. Weave said the system uses prompt and context embeddings plus rescue guardrails, which suggests the router's value depends as much on fallback policy as on the learned classifier itself.

    When you evaluate a router, measure cost and latency after retries, stalls, and escalations, not just first-choice accuracy. A weak first route can still be acceptable if failure detection is fast and cheap.

      Attribution:
    • GodelNumbering #1
    • adchurch #1 #2
    • mjb #1

Against the grain

  1. 01

    Two-model setups may already capture most value

    The strongest skeptical case was that cache pressure collapses the fancy routing story into something much simpler. Once switching gets expensive, you mostly end up with one strong planner and one cheaper executor, which many teams already do without adding a proxy. On that view, a learned router is extra moving parts chasing marginal gains.

    Before adding a routing layer, benchmark a simple two-model policy against your current setup. If it gets close to the claimed savings, the operational overhead of a smarter router may not be worth it.

      Attribution:
    • GodelNumbering #1 #2
  2. 02

    Model vendors may absorb this feature

    A credible counterpoint was that the best long-term router may come from the model providers themselves, not an independent proxy. They control pricing, caching, and native agent behavior, so they can route within their own model families more cheaply than a third party can. The limit is that they have little reason to send traffic to competing models.

    Do not assume an external router will stay structurally advantaged. If your strategy depends on this layer, watch whether Anthropic, OpenAI, or Cursor add enough native routing to erase the benefit for same-vendor stacks.

      Attribution:
    • asdev #1 #2
    • adchurch #1

In plain english

cache-aware
Designed to account for the cost or benefit of reusing cached context when making decisions.
Claude Code
Anthropic's coding-agent command-line or tool environment built around Claude models.
Codex
OpenAI’s coding-focused product and model interface for software development tasks.
context window
The amount of prior text and instructions a model can consider at once when generating an answer.
Cursor
An AI-assisted code editor that includes model-selection features such as Auto mode and subagents.
embeddings
Numerical representations of text or other data that let systems compare similarity or cluster related inputs.
evals
Short for evaluations, the tests and benchmarks used to compare model or system performance.
holdback data
A reserved test dataset that is not used during tuning, so it can measure how well a system generalizes.
OpenCode
An open coding-agent harness or interface mentioned as one of the tools this router can sit behind.
prompt caching
A provider feature that reuses computation for repeated prompt context so later requests are cheaper or faster if the earlier context stays the same.

Reference links

Project and demo

Related routing projects and benchmarks

Related tools and references