如何将LSTM的先前输出和隐藏状态用于注意力机制?

更新时间：2023-12-02 22:57:34

我认为，如果您使用

I think your model will be much simplified if you use tf.contrib.seq2seq.AttentionWrapper with one of implementations: BahdanauAttention or LuongAttention.

通过这种方式，可以将注意力向量连接到单元格级别，以便在施加注意后，单元格输出已经 . seq2seq教程:

This way it'll be possible to wire the attention vector on a cell level, so that cell output is already after attention applied. Example from the seq2seq tutorial:

cell = LSTMCell(512)
attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_size=256)

请注意，通过这种方式，您将不需要window_size循环，因为tf.nn.static_rnn或tf.nn.dynamic_rnn将实例化被注意包裹的单元格.

Note that this way you won't need a loop of window_size, because tf.nn.static_rnn or tf.nn.dynamic_rnn will instantiate the cells wrapped with attention.

关于您的问题:您应该区分python变量和tensorflow图节点:您可以将last_encoder_state分配给其他张量，因此，原始图节点不会更改.这是灵活的，但在结果网络中也会产生误导-您可能会认为将LSTM连接到一个张量，而实际上是另一个张量.通常，您不应该这样做.

Regarding your question: you should distinguish python variables and tensorflow graph nodes: you can assign last_encoder_state to a different tensor, the original graph node won't change because of this. This is flexible, but can be also misleading in the result network - you might think that you connect an LSTM to one tensor, but it's actually the other. In general, you shouldn't do that.

上一篇 : ：如何使用keras-self-attention软件包可视化注意力LSTM?下一篇 : 如何以编程方式向现有Core Data实体（对象）添加属性？

如何将LSTM的先前输出和隐藏状态用于注意力机制?

相关阅读

推荐文章