且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

ResNet:训练过程中的准确度为100%,但相同数据的预测准确度为33%

更新时间:2023-02-26 19:02:06

这是因为批次归一化层.

It's because of the batch normalization layers.

在训练阶段,将批次标准化其均值和方差.但是,在测试阶段,该批次将以w.r.t.归一化.先前观察到的均值和方差的移动平均值.

In training phase, the batch is normalized w.r.t. its mean and variance. However, in testing phase, the batch is normalized w.r.t. the moving average of previously observed mean and variance.

现在,当观察到的批次数量很少时(例如,在您的示例中为5),这是一个问题,因为在BatchNormalization层中,默认情况下moving_mean初始化为0,而moving_variance初始化为1.

Now this is a problem when the number of observed batches is small (e.g., 5 in your example) because in the BatchNormalization layer, by default moving_mean is initialized to be 0 and moving_variance is initialized to be 1.

还考虑到默认的momentum为0.99,您需要多次多次更新移动平均线 ,以使其收敛到真实"均值和方差.

Given also that the default momentum is 0.99, you'll need to update the moving averages quite a lot of times before they converge to the "real" mean and variance.

这就是为什么预测在早期是错误的,但是在1000个时期后才是正确的.

That's why the prediction is wrong in the early stage, but is correct after 1000 epochs.

您可以通过强制BatchNormalization图层在训练模式"下运行来进行验证.

You can verify it by forcing the BatchNormalization layers to operate in "training mode".

在训练过程中,准确性为1,损失接近零:

During training, the accuracy is 1 and the loss is close to zero:

model.fit(imgs,y,epochs=5,shuffle=True)
Epoch 1/5
3/3 [==============================] - 19s 6s/step - loss: 1.4624 - acc: 0.3333
Epoch 2/5
3/3 [==============================] - 0s 63ms/step - loss: 0.6051 - acc: 0.6667
Epoch 3/5
3/3 [==============================] - 0s 57ms/step - loss: 0.2168 - acc: 1.0000
Epoch 4/5
3/3 [==============================] - 0s 56ms/step - loss: 1.1921e-07 - acc: 1.0000
Epoch 5/5
3/3 [==============================] - 0s 53ms/step - loss: 1.1921e-07 - acc: 1.0000

现在,如果我们评估模型,我们将观察到高损失和低准确性,因为经过5次更新后,移动平均值仍然非常接近初始值:

Now if we evaluate the model, we'll observe high loss and low accuracy because after 5 updates, the moving averages are still pretty close to the initial values:

model.evaluate(imgs,y)
3/3 [==============================] - 3s 890ms/step
[10.745396614074707, 0.3333333432674408]

但是,如果我们手动指定学习阶段"变量,并让BatchNormalization层使用实际"批次均值和方差,则结果将与在fit()中观察到的结果相同.

However, if we manually specify the "learning phase" variable and let the BatchNormalization layers use the "real" batch mean and variance, the result becomes the same as what's observed in fit().

sample_weights = np.ones(3)
learning_phase = 1  # 1 means "training"
ins = [imgs, y, sample_weights, learning_phase]
model.test_function(ins)
[1.192093e-07, 1.0]


也可以通过将动量更改为较小的值来进行验证.


It's also possible to verify it by changing the momentum to a smaller value.

例如,通过将momentum=0.01添加到ResNet50中的所有批处理规范层,则20个纪元后的预测为:

For example, by adding momentum=0.01 to all the batch norm layers in ResNet50, the prediction after 20 epochs is:

model.predict(imgs)
array([[  1.00000000e+00,   1.34882026e-08,   3.92139575e-22],
       [  0.00000000e+00,   1.00000000e+00,   0.00000000e+00],
       [  8.70998792e-06,   5.31159838e-10,   9.99991298e-01]], dtype=float32)