Batch Normalization

Xiaowei Liu's Blog

2019-01-25

Batch Normalization 处理的是各个激活函数的输入，从而使得每个batch内的数据的分布为均值0，标准差为1的标准正态分布(激活函数的输入)

Batch normalization enables the use of higher learning rates, acts as a regularizer and can speed up training by 14 times.
To solve this problem, the BN2015 paper propposes the batch normalization of the input to the activation function of each nuron (e.g., each sigmoid or ReLU function) during training, so that the input to the activation function across each training batch has a mean of 0 and a variance of 1. For example, applying batch normalization to the activation σ(Wx+b) would result in σ(BN(Wx+b)) where BN is the batch normalizing transform.

Use the training parameter of the batch_normalization function.
Update the moving averages by evaluating the ops manually or by adding them as a control dependency.