更新时间:2023-12-02 18:37:46
Keras/TF以明确定义的顺序构建RNN权重,可以从源代码或直接通过layer.__dict__
进行检查-然后将其用于获取每个内核的权重和每个门的权重;给定张量的形状,然后可以使用每通道处理.下面的代码和解释涵盖了Keras/TF RNN的所有可能的情况,并且应该可以轻松扩展到将来的任何API更改.
Keras/TF build RNN weights in a well-defined order, which can be inspected from the source code or via layer.__dict__
directly - then to be used to fetch per-kernel and per-gate weights; per-channel treatment can then be employed given a tensor's shape. Below code & explanations cover every possible case of a Keras/TF RNN, and should be easily expandable to any future API changes.
另请参阅可视化RNN梯度,以及对 RNN正则化;与前一篇文章不同,我不会在此处包括一个简化的变体,因为从重量提取和组织的本质来看,它仍然相当庞大和复杂;相反,只需在存储库中查看相关的源代码(请参阅下一节).
Also see visualizing RNN gradients, and an application to RNN regularization; unlike in the former post, I won't be including a simplified variant here, as it'd still be rather large and complex per the nature of weight extraction and organization; instead, simply view relevant source code in the repository (see next section).
代码源:请参阅RNN (该帖子包括了w /大图),我的存储库;包括:
Code source: See RNN (this post included w/ bigger images), my repository; included are:
from keras
& from tf.keras
from keras
& from tf.keras
可视化方法:
EX 1:uni-LSTM,256个单位,重量-batch_shape = (16, 100, 20)
(输入)rnn_histogram(model, 'lstm', equate_axes=False, show_bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, show_bias=False)
rnn_heatmap(model, 'lstm')
EX 1: uni-LSTM, 256 units, weights -- batch_shape = (16, 100, 20)
(input)rnn_histogram(model, 'lstm', equate_axes=False, show_bias=False)
rnn_histogram(model, 'lstm', equate_axes=True, show_bias=False)
rnn_heatmap(model, 'lstm')
equate_axes=True
用于在内核和门之间进行均匀比较,从而提高了比较质量,但可能会降低视觉吸引力equate_axes=True
for an even comparison across kernels and gates, improving quality of comparison, but potentially degrading visual appeal
EX 2:bi-CuDNNLSTM,256单位,重量-batch_shape = (16, 100, 16)
(输入)rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))
EX 2: bi-CuDNNLSTM, 256 units, weights -- batch_shape = (16, 100, 16)
(input)rnn_histogram(model, 'bidir', equate_axes=2)
rnn_heatmap(model, 'bidir', norm=(-.8, .8))
CuDNNLSTM
(和CuDNNGRU
)偏差的定义和初始化方式不同-无法从直方图推断出这一点CuDNNLSTM
(and CuDNNGRU
) biases are defined and initialized differently - something that can't be inferred from histograms
EX 3:uni-CuDNNGRU,64个单位,权重梯度-batch_shape = (16, 100, 16)
(输入)rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)
EX 3: uni-CuDNNGRU, 64 units, weights gradients -- batch_shape = (16, 100, 16)
(input)rnn_heatmap(model, 'gru', mode='grads', input_data=x, labels=y, cmap=None, absolute_value=True)
absolute_value=True
和灰度色图New
是最活跃的内核门(输入到隐藏),建议对允许信息流 Reset
是最不活跃的循环门(隐藏到隐藏),建议在内存保持方面的错误校正最少absolute_value=True
and a greyscale colormapNew
is the most active kernel gate (input-to-hidden), suggesting more error correction on permitting information flow
Reset
is the least active recurrent gate (hidden-to-hidden), suggesting least error correction on memory-keeping
BONUS EX:LSTM NaN检测,512个单位,重量-batch_shape = (16, 100, 16)
(输入)