且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用keras功能API构建(预训练)CNN + LSTM网络

更新时间:2022-04-11 01:07:29

此处是构建模型分类视频序列的正确方法。请注意,我将一个Model实例包装到TimeDistributed中。先前已构建此模型以分别从每个帧提取特征。在第二部分中,我们处理帧序列

here the correct way to build a model to classify video sequences. Note that I wrap into TimeDistributed a model instance. This model was previously build to extract features from each frame individually. In the second part, we deal the frame sequences

frames, channels, rows, columns = 5,3,224,224

video = Input(shape=(frames,
                     rows,
                     columns,
                     channels))
cnn_base = VGG16(input_shape=(rows,
                              columns,
                              channels),
                 weights="imagenet",
                 include_top=False)
cnn_base.trainable = False

cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(cnn_base.input, cnn_out)
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)

model = Model(video, outputs)
model.summary()

如果要使用VGG 1x4096 emb re演示文稿,您可以轻松完成:

if you want to use the VGG 1x4096 emb representation you can simply do:

frames, channels, rows, columns = 5,3,224,224

video = Input(shape=(frames,
                     rows,
                     columns,
                     channels))
cnn_base = VGG16(input_shape=(rows,
                              columns,
                              channels),
                 weights="imagenet",
                 include_top=True) #<=== include_top=True
cnn_base.trainable = False

cnn = Model(cnn_base.input, cnn_base.layers[-3].output) # -3 is the 4096 layer
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)

model = Model(video, outputs)
model.summary()