且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Keras-嵌入层

更新时间:2023-12-01 22:23:40

为了将单词用于自然语言处理或机器学习任务,必须首先将它们映射到连续的向量空间,从而创建 word向量单词嵌入. Keras嵌入层可用于构建此类单词向量.

It order to use words for natural language processing or machine learning tasks, it is necessary to first map them onto a continuous vector space, thus creating word vectors or word embeddings. The Keras Embedding layer is useful for constructing such word vectors.

input_dim :词汇量.这是您的语料库中代表多少个独特的单词.

input_dim : the vocabulary size. This is how many unique words are represented in your corpus.

output_dim :单词向量的期望尺寸.例如,如果output_dim = 100,则每个单词将映射到具有100个元素的向量上,而如果output_dim = 300,则每个单词将映射到具有300个元素的向量上.

output_dim : the desired dimension of the word vector. For example, if output_dim = 100, then every word will be mapped onto a vector with 100 elements, whereas if output_dim = 300, then every word will be mapped onto a vector with 300 elements.

input_length :序列的长度.例如,如果您的数据由句子组成,则此变量表示一个句子中有多少个单词.由于不同的句子通常包含不同数量的单词,因此通常需要填充序列以使所有句子的长度相等.可以为此使用keras.preprocessing.pad_sequence方法( https://keras.io/preprocessing/sequence/).

input_length : the length of your sequences. For example, if your data consists of sentences, then this variable represents how many words there are in a sentence. As disparate sentences typically contain different number of words, it is usually required to pad your sequences such that all sentences are of equal length. The keras.preprocessing.pad_sequence method can be used for this (https://keras.io/preprocessing/sequence/).

在Keras中,可以1)使用预先训练的单词向量(例如GloVe或word2vec表示形式),或2)在训练过程中学习单词向量.这篇博客文章( https://blog .keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html )提供了有关如何使用GloVe预训练单词向量的教程.对于选项2,Keras将随机初始化矢量作为默认选项,然后在训练过程中学习***单词矢量.

In Keras, it is possible to either 1) use pretrained word vectors such as GloVe or word2vec representations, or 2) learn the word vectors as part of the training process. This blog post (https://blog.keras.io/using-pre-trained-word-embeddings-in-a-keras-model.html) offers a tutorial on how to use GloVe pretrained word vectors. For option 2, Keras will randomly initialize vectors as the default option, and then learn optimal word vectors during the training process.