且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Keras模型无法减少损失

更新时间:2023-12-02 15:35:46

您的代码有一个关键问题:维度改组.您应该从不接触的一个维度是批量维度-根据定义,它包含数据的独立样本.在第一次重塑中,您将要素尺寸与批尺寸混合在一起:

Your code has a single critical problem: dimensionality shuffling. The one dimension you should never touch is the batch dimension - as it, by definition, holds independent samples of your data. In your first reshape, you mix features dimensions with the batch dimension:

Tensor("input_1:0", shape=(12, 6, 16, 16, 16, 3), dtype=float32)
Tensor("lambda/Reshape:0", shape=(72, 16, 16, 16, 3), dtype=float32)

这就像喂入72个形状为(16,16,16,3)的独立样本一样.进一步的层也遇到类似的问题.

This is like feeding 72 independent samples of shape (16,16,16,3). Further layers suffer similar problems.


解决方案:
  • 不要重塑过程中的每个步骤(应使用Reshape),而是对现有的Conv和缓冲层进行整形,以使所有内容都可以直接解决.
  • 除了输入和输出图层外,***为每个图层加上简短的标题-不会丢失清晰度,因为每一行都由图层名称很好地定义了
  • GlobalAveragePooling旨在作为 final 层,因为它折叠了要素尺寸-在您的情况下,如下所示:(12,16,16,16,3) --> (12,3);转换之后没有什么用处
  • 以上,我将Conv1D替换为Conv3D
  • 除非您使用可变的批处理大小,否则始终选择batch_shape=shape=,因为您可以完整检查图层尺寸(非常有帮助)
  • 您真实的batch_size这是6,根据您的评论回复推导出来
  • kernel_size=1和(尤其是)filters=1是一个非常弱的卷积,我相应地替换了它-如果需要,您可以还原
  • 如果您的预期应用程序中只有2个类,我建议使用Dense(1, 'sigmoid')binary_crossentropy损失
  • Instead of reshaping every step of the way (for which you should use Reshape), shape your existing Conv and pooling layers to make everything work out directly.
  • Aside the input and output layers, it's better to title each layer something short and simple - no clarity is lost, as each line is well-defined by layer name
  • GlobalAveragePooling is intended to be the final layer, as it collapses features dimensions - in your case, like so: (12,16,16,16,3) --> (12,3); Conv afterwards serves little purpose
  • Per above, I replaced Conv1D with Conv3D
  • Unless you're using variable batch sizes, always go for batch_shape= vs. shape=, as you can inspect layer dimensions in full (very helpful)
  • Your true batch_size here is 6, deducing from your comment reply
  • kernel_size=1 and (especially) filters=1 is a very weak convolution, I replaced it accordingly - you can revert if you wish
  • If you have only 2 classes in your intended application, I advise using Dense(1, 'sigmoid') with binary_crossentropy loss

最后一点:您可以将除 以外的所有内容抛弃,以获取尺寸改组建议,并且仍能获得理想的列车设置性能;这是问题的根源.

As a last note: you can toss all of the above out except for the dimensionality shuffling advice, and still get perfect train set performance; it was the root of the problem.

def create_model(batch_size, input_shape):

    ipt = Input(batch_shape=(batch_size, *input_shape))
    x   = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2),
                             activation='relu', padding='same')(ipt)
    x   = Conv3D(filters=8,  kernel_size=4, strides=(2, 2, 2),
                             activation='relu', padding='same')(x)
    x   = GlobalAveragePooling3D()(x)
    out = Dense(units=2, activation='softmax')(x)

    return Model(inputs=ipt, outputs=out)

BATCH_SIZE = 6
INPUT_SHAPE = (16, 16, 16, 3)
BATCH_SHAPE = (BATCH_SIZE, *INPUT_SHAPE)

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones(INPUT_SHAPE), np.array([0., 1.])
        else:
            yield np.zeros(INPUT_SHAPE), np.array([1., 0.])


def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
                                 output_types=(tf.float32,
                                               tf.float32),
                                 output_shapes=(tf.TensorShape(INPUT_SHAPE),
                                                tf.TensorShape([2])))
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset


结果:

Epoch 28/500
40/40 [==============================] - 0s 3ms/step - loss: 0.0808 - acc: 1.0000