What Are Scaling Laws?

Scaling laws are empirical observations that AI model performance improves along predictable power-law curves as parameter count, training data, and compute increase together. The foundational results came from OpenAI's Kaplan et al. (2020) and DeepMind's Chinchilla paper (2022).

Simply put: train a bigger model on more data for longer, and loss drops at a predictable rate. This regularity is the quantitative case behind multi-billion-dollar investments in frontier models.

How Do They Work?

Three variables need to move together to hit an efficient frontier:

Parameters (N): model size. e.g., GPT-3 at 175B, GPT-4 estimated ~1.8T
Training Tokens (D): data volume. Chinchilla proposes an optimal N:D ≈ 1:20 ratio
Compute (C): total FLOPs, approximated as C ≈ 6 · N · D

Before Chinchilla, the industry tended to overshoot on parameters alone. The insight that smaller models + more data can outperform under the same compute budget elevated the role of data scaling in training recipes.

Why Do They Matter?

Scaling laws give AI labs a forecasting tool for investment and product roadmaps. Extrapolating loss curves answers "given this compute budget, what quality should I expect?" However, recent analyses point to limits of naive scaling — data exhaustion, spiraling compute cost, reasoning plateaus — shifting attention to test-time compute, agent architectures, and other scaling dimensions. Even so, scaling laws remain central to the economics and engineering decisions of the LLM ecosystem.

What Are Scaling Laws?

How Do They Work?

Why Do They Matter?

Related terms