Era 14 / 15 · Foundation Models & Scaling Laws 2018–2022

Foundation Models & Scaling Laws

Just add scale — bigger models, predictable gains, surprising emergence.

Beat 1 · Concrete

Same question, three sizes

One prompt. The small model fails, the medium fumbles, the large one nails it.

Three language models of growing size answering the same arithmetic prompt A size dial grows from small to large. The small model answers wrongly in coral, the medium gets close in sand, and the large model answers correctly in teal. Bigger scale buys the right answer. prompt: “what is 47 × 6 ?” SCALE ▸ 0.1B “about 90” ✗ wrong 1.3B “272” ~ close 175B 282 ✓ correct

Beat 2 · Abstract

The scaling law

Loss versus compute on log–log axes: a straight descending line. Gains are predictable.

Test loss falling as a straight line against compute on log–log axes On log compute, horizontal, versus log loss, vertical, measured runs land on a single straight descending line — a power law. As compute grows ten-fold, loss drops predictably along the line. loss ▲ compute (log) ▸ on the line

Beat 3 · Interactive

Slide the scale

Drag the dial: loss slides down the line, and capabilities switch on as thresholds are crossed.

A scale dial that slides loss down the power law and lights up emergent capabilities As you increase scale, a coral loss dot slides down the descending power-law line. Three capability thresholds — arithmetic, translation, reasoning — light up teal as the growing scale crosses each one. Capabilities emerge with scale. loss ▲ scale (log) ▸ arithmetic translation reasoning EMERGENT ABILITIES
loss 2.6 · 1 unlocked

Drag the scale dial with JavaScript enabled to watch capabilities emerge.

Footnotes & further reading

2020 · 2022

Scaling laws

Kaplan et al. fit loss as a power law in compute, data, and parameters. Chinchilla later rebalanced the recipe — train smaller models on far more tokens.

2020

GPT-3, few-shot

175B parameters. With no fine-tuning, it learned tasks from a handful of examples in the prompt — scale alone bought in-context learning.

2021 · 2022

Emergence & “foundation models”

Stanford coined “foundation models” for one pretrained base adapted to many tasks; researchers catalogued “emergent abilities” that appear abruptly past a scale threshold.