WikiAINeural NetworksWeight InitWeight InitRandomlyGaussian noise with mean = 0Small networkFixed sigma is fine0.01E.g. 8 layersAlexNetToo largeWont convergeToo smallGradient wont propagate back many layersXavier System σ=1nin+nout\sigma=\frac 1 {n_{in}+n_{out}}σ=nin+nout1orσ=2/n\sigma=\sqrt{2/n}σ=2/nWhere n=filter size×noutn=\text{filter size}\times n_{out}n=filter size×noutAnd ninn_{in}nin and noutn_{out}nout refer to number of image channels in and out of the layerTraining