Scaling Laws, Carefully
- AI
- Machine Learning
- Infrastructure
- Economics
Lilian Weng’s post is a technical overview of scaling laws in machine learning, with a focus on the now-famous result that training loss often follows simple power-law trends as model size, dataset size, and compute grow. The big claim is not that gains are infinite. It is that gains have been smooth and forecastable over huge ranges, which is why labs could justify enormous spending before the latest systems existed. That framing shaped most of the reaction. People who have worked on early scaling-law papers called the result genuinely shocking because deep learning looked too messy to be captured by a compact empirical equation. Several commenters treated that simplicity as the key fact of the last decade in AI. The sharper discussion was about scope. Scaling laws hold for a fixed error metric and data distribution, not as a magic law across every model generation. That matters because newer smaller models can beat older larger ones through better data curation, better training recipes, and transfer. The thread also pushed back on a common misread of “diminishing returns.” Power laws do imply diminishing marginal gains, but the practical lesson from Kaplan and later Chinchilla-style results was that the curve stayed smooth instead of hitting a visible wall, and that optimal compute and data tradeoffs were better than some early readings implied. A side debate focused on the entropy floor of language and whether next-token loss is nearing a hard cap that would limit capability gains. The more convincing reading was narrower: irreducible uncertainty in language is real, but being close to the entropy floor does not tell you how much capability headroom remains on rare, hard predictions that matter for reasoning and useful work.
If you build or fund AI products, treat scaling curves as a serious planning tool rather than hype. But do not confuse a reliable within-regime trend with a guarantee that the same recipe, metric, or data distribution will keep delivering the next generation of capability.
- lilianweng.github.io
- Discuss on HN