且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Seq2Seq模型学习几次迭代后仅输出EOS令牌(< \ s>)

更新时间:2023-12-02 20:16:10

最近,我还在seq2seq模型上工作. 在遇到我的情况之前,我是通过更改损失函数来解决问题的.

recently I also work on seq2seq model. I have encountered your problem before, in my case, I solve it by changing the loss function.

您说您使用口罩,所以我想您像以前一样使用tf.contrib.seq2seq.sequence_loss.

You said you use mask, so I guess you use tf.contrib.seq2seq.sequence_loss as I did.

我更改为tf.nn.softmax_cross_entropy_with_logits,它可以正常工作(并且计算成本更高).

I changed to tf.nn.softmax_cross_entropy_with_logits, and it works normally (and higher computation cost).

(编辑05/10/2018.对不起,我发现我的代码中存在严重错误,因此我需要进行编辑)

(Edit 05/10/2018. Pardon me, I need to edit since I found there is an egregious mistake in my code)

tf.contrib.seq2seq.sequence_loss确实可以很好地工作. 根据官方文件中的定义: tf.contrib.seq2seq.sequence_loss

tf.contrib.seq2seq.sequence_loss can work really well, if the shape of logits ,targets , mask are right. As defined in official document : tf.contrib.seq2seq.sequence_loss

loss=tf.contrib.seq2seq.sequence_loss(logits=decoder_logits,
                                      targets=decoder_targets,
                                      weights=masks) 

#logits:  [batch_size, sequence_length, num_decoder_symbols]  
#targets: [batch_size, sequence_length] 
#weights: [batch_size, sequence_length] 

好吧,即使形状不符合,它仍然可以工作.但是结果可能很奇怪(很多#EOS #PAD ...等).

Well, it can still work even if the shape are not meet. But the result could be weird (lots of #EOS #PAD... etc).

由于decoder_outputsdecoder_targets可能具有与所需形状相同的形状(在我的情况下,我的decoder_targets具有形状[sequence_length, batch_size]). 因此,尝试使用tf.transpose帮助您重塑张量.

Since the decoder_outputs, and the decoder_targets might have the same shape as required ( In my case, my decoder_targets has the shape [sequence_length, batch_size] ). So try to use tf.transpose to help you reshape the tensor.