且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在Keras中为Attention创建LSTM层以用于多标签文本分类神经网络

更新时间:2022-11-28 23:36:55

让我总结一下意图.您想增加对代码的关注.您的任务是序列分类任务,而不是seq-seq转换器.您实际上并不十分在意它的完成方式,因此可以不调试上面的错误,但是只需要一段代码即可.我们在这里的主要输入是电影评论,其中包含您要引起注意的n个单词.

Let me summarize the intent. You want to add attention to your code. Yours is a sequence classification task and not a seq-seq translator. You dont really care much about the way it is done, so you are ok with not debugging the error above, but just need a working piece of code. Our main input here is the movie reviews consisting of 'n' words for which you want to add attention.

假设您嵌入评论并将其传递到LSTM层.现在,您想参与" LSTM层的所有隐藏状态,然后生成一个分类(而不是仅使用编码器的最后一个隐藏状态).因此,需要插入关注层.一个准系统的实现将如下所示:

Assume you embed the reviews and pass it to an LSTM layer. Now you want to 'attend' to all the hidden states of the LSTM layer and then generate a classification (instead of just using the last hidden state of the encoder). So an attention layer needs to be inserted. A barebones implementation would look like this:

    def __init__(self):    
        ##Nothing special to be done here
        super(peel_the_layer, self).__init__()
        
    def build(self, input_shape):
        ##Define the shape of the weights and bias in this layer
        ##This is a 1 unit layer. 
        units=1
        ##last index of the input_shape is the number of dimensions of the prev
        ##RNN layer. last but 1 index is the num of timesteps
        self.w=self.add_weight(name="att_weights", shape=(input_shape[-1], units), initializer="normal") #name property is useful for avoiding RuntimeError: Unable to create link.
        self.b=self.add_weight(name="att_bias", shape=(input_shape[-2], units), initializer="zeros")
        super(peel_the_layer,self).build(input_shape)
        
    def call(self, x):
        ##x is the input tensor..each word that needs to be attended to
        ##Below is the main processing done during training
        ##K is the Keras Backend import
        e = K.tanh(K.dot(x,self.w)+self.b)
        a = K.softmax(e, axis=1)
        output = x*a
        
        ##return the ouputs. 'a' is the set of attention weights
        ##the second variable is the 'attention adjusted o/p state' or context
        return a, K.sum(output, axis=1)

现在在LSTM之后和Dense输出层之前调用上面的Attention层.

Now call the above Attention layer after your LSTM and before your Dense output layer.

        a, context = peel_the_layer()(lstm_out)
        ##context is the o/p which be the input to your classification layer
        ##a is the set of attention weights and you may want to route them to a display

您可以在此基础上构建,因为您似乎希望将其他功能分开使用,以便电影评论提出最终的观点.注意在很大程度上适用于审阅.如果句子很长,则可以看到好处.

You can build on top of this as you seem to want to use other features apart for the movie reviews to come up with the final sentiment. Attention largely applies to reviews..and benefits are to be seen if the sentences are very long.

有关更多详细信息,请参阅 https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavors-2201b5e8be9e

For more specific details, please refer https://towardsdatascience.com/create-your-own-custom-attention-layer-understand-all-flavours-2201b5e8be9e