Era 12 · The Deep Learning Revolution (2012, AlexNet)

Beat 1 · Concrete

Seeing, layer by layer

A pixel image becomes a label by composing features — edges build textures build parts build “cat”.

Beat 2 · Abstract

The cliff at 2012

ImageNet error sat high for years, then AlexNet dropped it sharply — and it kept falling past human level.

Beat 3 · Interactive

Reveal the depth

Pick an image, then reveal one layer at a time — watch the features fire toward the correct label.

Image “cat” — 0 of 3 layers revealed.

Footnotes — the three things that lined up

2012

AlexNet

Krizhevsky, Sutskever & Hinton won ILSVRC by a landslide with a deep convolutional net — the result that convinced the field depth wins.

Fuel

ImageNet + GPUs

A million labelled images gave the data; two consumer GPUs gave the compute. Scale that was finally large enough met an architecture finally deep enough.

Tricks

ReLU & dropout

ReLU activations let gradients flow through many layers; dropout fought overfitting. Small ideas that made deep training actually trainable.