ResNet

Residual networks
152 layers
Skips every two layers
- Residual block
Later layers learning the identity function
- Skips help
- Deep network should be at least as good as shallower one by allowing some layers to do very little
Vanishing gradient
- Allows shortcut paths for gradients
Accuracy saturation
- Adding more layers to suitably deep network increases training error

Design

resnet-arch resnet-arch2