Weight Init
- Randomly
- Gaussian noise with mean = 0
- Small network
- Fixed sigma is fine
- E.g. 8 layers
- Too large
- Too small
- Gradient wont propagate back many layers
Xavier System
σ=nin+nout1σ=2/n- Where n=filter size×nout
- And nin and nout refer to number of image channels in and out of the layer