且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

LSTM上的Keras注意层

更新时间:2023-12-01 22:15:28

您共享的第一段代码不正确.第二段代码看起来很正确,除了一件事.不要使用TimeDistributed,因为权重是相同的.使用具有非线性激活的常规密集层.

The first piece of code you have shared is incorrect. The second piece of code looks correct except for one thing. Do not use TimeDistributed as the weights will be the same. Use a regular Dense layer with a non linear activation.


    input_ = Input(shape=(input_length, input_dim))
    lstm = GRU(self.HID_DIM, input_dim=input_dim, input_length = input_length, return_sequences=True)(input_)
    att = Dense(1, activation='tanh')(lstm_out )
    att = Flatten()(att)
    att = Activation(activation="softmax")(att)
    att = RepeatVector(self.HID_DIM)(att)
    att = Permute((2,1))(att)
    mer = merge([att, lstm], "mul")

现在,您有了体重调整状态.您如何使用它取决于您.我见过的大多数Attention版本,只需在时间轴上将它们累加起来,然后将输出用作上下文即可.

Now you have the weight adjusted states. How you use it is up to you. Most versions of Attention I have seen, just add these up over the time axis and then use the output as the context.