Weight Init

  • Randomly
    • Gaussian noise with mean = 0
    • Small network
      • Fixed sigma is fine
        • 0.01
      • E.g. 8 layers
        • AlexNet
    • Too large
      • Wont converge
    • Too small
      • Gradient wont propagate back many layers

Xavier System

σ=1nin+nout\sigma=\frac 1 {n_{in}+n_{out}}σ=2/n\sigma=\sqrt{2/n}
  • Where n=filter size×noutn=\text{filter size}\times n_{out}
  • And ninn_{in} and noutn_{out} refer to number of image channels in and out of the layer