Your code has a single critical problem: dimensionality shuffling. The one dimension you should never touch is the batch dimension - as it, by definition, holds independent samples of your data. In your first reshape, you mix features dimensions with the batch dimension:

Tensor("input_1:0", shape=(12, 6, 16, 16, 16, 3), dtype=float32)
Tensor("lambda/Reshape:0", shape=(72, 16, 16, 16, 3), dtype=float32)


This is like feeding 72 independent samples of shape (16,16,16,3). Further layers suffer similar problems.

  • Instead of reshaping every step of the way (for which you should use Reshape), shape your existing Conv and pooling layers to make everything work out directly.
  • Aside the input and output layers, it's better to title each layer something short and simple - no clarity is lost, as each line is well-defined by layer name
  • GlobalAveragePooling is intended to be the final layer, as it collapses features dimensions - in your case, like so: (12,16,16,16,3) --> (12,3); Conv afterwards serves little purpose
  • Per above, I replaced Conv1D with Conv3D
  • Unless you're using variable batch sizes, always go for batch_shape= vs. shape=, as you can inspect layer dimensions in full (very helpful)
  • Your true batch_size here is 6, deducing from your comment reply
  • kernel_size=1 and (especially) filters=1 is a very weak convolution, I replaced it accordingly - you can revert if you wish
  • If you have only 2 classes in your intended application, I advise using Dense(1, 'sigmoid') with binary_crossentropy loss

As a last note: you can toss all of the above out except for the dimensionality shuffling advice, and still get perfect train set performance; it was the root of the problem.

def create_model(batch_size, input_shape):

    ipt = Input(batch_shape=(batch_size, *input_shape))
    x   = Conv3D(filters=64, kernel_size=8, strides=(2, 2, 2),
                             activation='relu', padding='same')(ipt)
    x   = Conv3D(filters=8,  kernel_size=4, strides=(2, 2, 2),
                             activation='relu', padding='same')(x)
    x   = GlobalAveragePooling3D()(x)
    out = Dense(units=2, activation='softmax')(x)

    return Model(inputs=ipt, outputs=out)

INPUT_SHAPE = (16, 16, 16, 3)

def generate_fake_data():
    for j in range(1, 240 + 1):
        if j < 120:
            yield np.ones(INPUT_SHAPE), np.array([0., 1.])
            yield np.zeros(INPUT_SHAPE), np.array([1., 0.])

def make_tfdataset(for_training=True):
    dataset = tf.data.Dataset.from_generator(generator=lambda: generate_fake_data(),
    dataset = dataset.repeat()
    if for_training:
        dataset = dataset.shuffle(buffer_size=1000)
    dataset = dataset.batch(BATCH_SIZE)
    dataset = dataset.prefetch(tf.data.experimental.AUTOTUNE)
    return dataset


Epoch 28/500
40/40 [==============================] - 0s 3ms/step - loss: 0.0808 - acc: 1.0000