更新时间:2022-12-09 22:30:11
有很多方法可以得出非限定结果
There are a lot of ways you can end up with non-finite results.
但是,如果学习率过高,则优化器(尤其是诸如梯度下降之类的简单优化器)可能会发散。
But optimizers, especially simple ones like gradient descent, can diverge if the learning rate is 'too high'.
您是否尝试过简单地将学习率除以10/100/1000?或通过 pixels * batch_size
进行标准化以获得每个像素的平均误差?
Have you tried simply dividing your learning rate by 10/100/1000? Or normalizing by pixels*batch_size
to get the average error per pixel?
还是更多高级优化程序?例如,带有默认选项的 tf.train.AdamOptimizer()
。
Or one of the more advanced optimizers? For example tf.train.AdamOptimizer()
with default options.