且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将LSTM的先前输出和隐藏状态用于注意力机制?

更新时间:2023-12-02 22:57:34

我认为,如果您使用

I think your model will be much simplified if you use tf.contrib.seq2seq.AttentionWrapper with one of implementations: BahdanauAttention or LuongAttention.

通过这种方式,可以将注意力向量连接到单元格级别,以便在施加注意后,单元格输出已经 . seq2seq教程:

This way it'll be possible to wire the attention vector on a cell level, so that cell output is already after attention applied. Example from the seq2seq tutorial:

cell = LSTMCell(512)
attention_mechanism = tf.contrib.seq2seq.LuongAttention(512, encoder_outputs)
attn_cell = tf.contrib.seq2seq.AttentionWrapper(cell, attention_mechanism, attention_size=256)

请注意,通过这种方式,您将不需要window_size循环,因为tf.nn.static_rnntf.nn.dynamic_rnn将实例化被注意包裹的单元格.

Note that this way you won't need a loop of window_size, because tf.nn.static_rnn or tf.nn.dynamic_rnn will instantiate the cells wrapped with attention.

关于您的问题:您应该区分python变量和tensorflow图节点:您可以将last_encoder_state分配给其他张量,因此,原始图节点不会更改.这是灵活的,但在结果网络中也会产生误导-您可能会认为将LSTM连接到一个张量,而实际上是另一个张量.通常,您不应该这样做.

Regarding your question: you should distinguish python variables and tensorflow graph nodes: you can assign last_encoder_state to a different tensor, the original graph node won't change because of this. This is flexible, but can be also misleading in the result network - you might think that you connect an LSTM to one tensor, but it's actually the other. In general, you shouldn't do that.