且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Keras嵌入层蒙版。为什么input_dim需要| vocalbulary | + 2?

更新时间:2023-12-01 23:24:10

我相信那里的文档有些误导。在正常情况下,您正在映射 n 输入数据索引 [0,1,2,...,n-1] 到向量,因此您的 input_dim 应该与您拥有的元素数量一样

I believe the docs are a bit misleading there. In the normal case you are mapping your n input data indices [0, 1, 2, ..., n-1] to vectors, so your input_dim should be as many elements as you have

input_dim = len(vocabulary_indices)

一种等效的表达方式(但有点令人困惑) ,以及文档的操作方式是

An equivalent (but slightly confusing) way to say this, and the way the docs do, is to say


1 +输入数据中出现的最大整数索引。

1 + maximum integer index occurring in the input data.



input_dim = max(vocabulary_indices) + 1

如果启用屏蔽,则对值 0 的处理会有所不同,因此您增加 n 索引加1: [0、1、2,...,n-1,n] ,因此您需要

If you enable masking, value 0 is treated differently, so you increment your n indices by one: [0, 1, 2, ..., n-1, n], thus you need

input_dim = len(vocabulary_indices) + 1

input_dim = max(vocabulary_indices) + 2

文档变为esp

The docs become especially confusing here as they say


(input_dim应该等于 | vocabulary | + 2

在这里我会解释 | x | 作为集合的基数(相当于 len(x)),但是作者的意思似乎是

where I would interpret |x| as the cardinality of a set (equivalent to len(x)), but the authors seem to mean

2 +输入数据中出现的最大整数索引。

2 + maximum integer index occurring in the input data.