The unbearable cheapness of open weight models

AI
Open Source
Infrastructure
Startups
Regulation

The post argues that the market price for running modern AI models is collapsing, especially for open-weight models that can be hosted cheaply or run locally. The core claim is not just that a few open models are inexpensive today. It is that inference itself is turning into a commodity, which threatens the business model of labs that have been valued like software companies with huge margins. The conversation largely accepted that premise. People kept coming back to the same distinction: training and post-training are still brutally expensive, but serving tokens is getting cheaper fast, and many buyers do not need the absolute best model for most work.

If you buy or build with AI, stop assuming frontier API pricing is durable. Re-evaluate where you truly need top-end models and where local or commodity open-weight options are already good enough at a fraction of the cost.

June 25, 2026
jamesoclaire.com
Discuss on HN

Discussion mood

Bearish on the long-term margins of frontier labs and bullish on open-weight adoption. The mood is energized and a little cynical, driven by the belief that inference is becoming cheap, most real workloads do not need the best model, and incumbents will try to preserve pricing power through enterprise lock-in or regulation.

Key insights

Cash burn is not inference cost

The labs' ugly finances do not prove that serving tokens is inherently expensive. The sharper claim is that the big spend sits in training, reinforcement learning from human feedback, reinforcement learning from AI feedback, and oversized organizations. That changes the economics completely. If inference is already high-margin and prices stay high mainly because buyers still pay them, then a price war can happen much faster than many people expect.

Do not use a lab's total losses as a proxy for the future unit cost of inference. Model your own AI usage assuming token prices can drop sharply once demand softens or competition bites.

Attribution:

Tuna-Fish #1 #2
manwithopinions #1
Onavo #1

Agent loops make prompt caching decisive

Coding agents and other multi-step workflows resend most of the same context on every turn. Providers that cache the unchanged prefix can make later requests far cheaper, which is why some people are reporting startlingly low real-world costs. This means raw per-token list prices are often the wrong comparison. Architecture and provider support for cache-first workflows can matter as much as headline model pricing.

When benchmarking vendors, test end-to-end agent workloads with caching enabled instead of comparing bare token rates. A cheaper cache policy can beat a nominally cheaper model.

Attribution:

jcparkyn #1
crazylogger #1
arikrahman #1

Enterprises buy accountability, not just tokens

Large companies often care less about the cheapest capable model than about contracts, support, KPIs, security reviews, and someone to blame when things break. That creates room for closed providers even if open-weight models are technically good enough. The catch is that this only holds while the price gap stays within enterprise tolerance. A modest premium is normal enterprise software behavior. A 20x or 50x premium is much harder to wave through.

If you sell into enterprises, package reliability, governance, and support as the product. If you buy for an enterprise, quantify how much premium you are paying for that wrapper versus the model itself.

Attribution:

actionfromafar #1
NitpickLawyer #1
spwa4 #1
tuatoru #1
anax32 #1

Security gains may come from the harness

Claims that a frontier model won on cybersecurity were undercut by examples where smaller local models also found serious bugs once wrapped in a strong toolchain. The useful lens is that model quality and orchestration quality are getting confounded. Better harnesses, tool use, and evaluation loops can extract a lot more value from non-frontier models than benchmark rankings suggest.

Invest in scaffolding before assuming you need the most expensive model for security or coding tasks. A better wrapper around a cheaper model may close more of the gap than another model upgrade.

Attribution:

kyleomalley #1
orwin #1 #2

Open weights is not the same as open source

Several people had to untangle terminology. An open-weights model usually lets you download the trained parameters and the files needed to run inference. It usually does not include the original training data or the full training pipeline. Fully open models like OLMo are rarer. That distinction matters because many business claims about openness quietly assume more reproducibility and auditability than most open-weights releases actually provide.

If your strategy depends on auditability, reproducibility, or retraining rights, verify what is actually open before committing. Downloadable weights alone do not guarantee full transparency or control.

Attribution:

philipkglass #1 #2
adrian_b #1

Incumbents above and below can squeeze labs

Frontier labs are getting boxed in from both directions. Open-weight models and cheap hosts attack from below on price. Domain incumbents in legal, healthcare, finance, and other verticals can absorb those models into products they already sell and trust relationships they already own. That leaves pure model labs trying to capture margin in the middle, which is rarely the safest place to be.

If you are building on AI, look for places where distribution, workflow ownership, or domain data give you leverage beyond the model. Pure model reselling looks increasingly fragile.

Attribution:

christkv #1
forshaper #1
rectang #1
sofixa #1
orwin #1

Against the grain

Best models still win in expensive mistakes

For coding and other high-stakes work, some people said the quality gap is still obvious and worth paying for. The argument is not that open-weight models are bad. It is that when a missed bug, weak analysis, or reputational failure is costly, the best available model still has pricing power. That keeps a premium tier alive even if the mass market commoditizes.

Segment workloads by error cost, not by enthusiasm for openness. Keep premium models in the loop where failure is expensive and cheaper models where the work is routine.

Attribution:

Schiendelman #1
tuatoru #1
isoprophlex #1

Proprietary and synthetic data may preserve moats

The claim that labs are simply out of useful data got challenged on two fronts. One is that the real limit is public text, not all data. Proprietary industrial, scientific, and enterprise datasets remain largely untapped. The other is that synthetic data can still help, though the cited paper was read as showing conditional gains rather than a clean escape hatch. Together, those points weaken the idea that open models automatically erase frontier advantage.

Watch who controls scarce domain data and post-training pipelines. Open weights narrow the gap, but they do not remove the value of exclusive data or better feedback loops.

Attribution:

CuriouslyC #1
nomel #1
Schiendelman #1
nyrikki #1

Some incumbents may be positioned better than expected

Not everyone bought the blanket doom for established players. Oracle was described as exposed, but likely too connected to fail. Anthropic was described as closer to profitability than peers because it rents datacenter capacity and has a strong position in enterprise coding. If compute prices fall after overbuild, the companies with demand and lighter capital commitments could benefit rather than collapse.

Do not treat all closed-model companies as interchangeable. Balance-sheet structure and workload mix may matter more than ideology about open versus closed.

Attribution:

InsideOutSanta #1
dualvariable #1

In plain english

frontier model ↩

A model near the current state of the art, usually built by the largest AI labs with the most compute and data.

inference ↩

Running a trained AI model to generate outputs such as text, code, or predictions.

OLMo ↩

A family of fully open language models released with weights, code, and training data by the Allen Institute for AI.

Reference links

Primers on how language models work

3Blue1Brown Mini LLM
Suggested as a short introduction to model weights and how language models work.
Andrej Karpathy: Intro to Large Language Models
Suggested as a longer, more technical primer on large language models.

Open models and model artifacts

Allen Institute for AI OLMo
Used to illustrate what a fully open model looks like compared with merely open weights.
Gemma 4 12B IT model files on Hugging Face
Example of the files included with an open-weights model release.

Agent tooling and security examples

Reasonix architecture docs
Referenced to explain cache-first agent loop design and why caching cuts costs so much.
evilsocket audit harness
Shared as an example of orchestration tooling that can boost model performance in security work.
Aisle discovers six new CVEs in curl
Used as an example of security findings from smaller or local models with strong wrappers.

Papers and industry scenarios

Synthetic data pretraining paper
Cited in a debate over whether frontier labs are compute-limited or data-limited and how much synthetic data helps.
Europe 2031
Shared as a reference for European AI ambitions and industrial strategy.

Space and infrastructure side discussion

Orbital datacenter economics paper
Cited in a side debate about whether orbital data centers are commercially plausible.
Microsoft shelves underwater data center
Used as an analogy for the maintenance and physical-operations problems of unconventional datacenter locations.