Beat 1 · Concrete
Seeing, layer by layer
A pixel image becomes a label by composing features — edges build textures build parts build “cat”.
Beat 2 · Abstract
The cliff at 2012
ImageNet error sat high for years, then AlexNet dropped it sharply — and it kept falling past human level.
Beat 3 · Interactive
Reveal the depth
Pick an image, then reveal one layer at a time — watch the features fire toward the correct label.
Image “cat” — 0 of 3 layers revealed.
Footnotes — the three things that lined up
2012
AlexNet
Krizhevsky, Sutskever & Hinton won ILSVRC by a landslide with a deep convolutional net — the result that convinced the field depth wins.
Fuel
ImageNet + GPUs
A million labelled images gave the data; two consumer GPUs gave the compute. Scale that was finally large enough met an architecture finally deep enough.
Tricks
ReLU & dropout
ReLU activations let gradients flow through many layers; dropout fought overfitting. Small ideas that made deep training actually trainable.