更新时间:2023-02-27 09:19:54
您对gensim
中LDA
输出的理解是正确的.不过,您需要记住的是LDA[corpus]
仅会输出超过特定阈值(在初始化模型时设置)的主题.
Your understanding of the output of LDA
from gensim
is correct. What you need to remember though is that LDA[corpus]
will only output topics that exceed a certain threshold (set when you initialise the model).
document belongs to ONE topic
问题是您需要自己做出决定的问题. LDA为您提供的每个文档的主题分布*.然后,您需要确定一个文档(例如,具有某个主题的50%)是否足以使该文档属于该主题.
The document belongs to ONE topic
issue is one you need to make a decision about on your own. LDA gives you a distribution over the topics for each document you feed into it*. You need to then make a decision whether a document having (for instance) 50% of a topic is enough for that document to belong to said topic.
(*),您必须牢记LDA[corpus]
只会向您显示超过阈值的那些,而不是整个分布.您也可以使用
(*) again you have to keep in mind that LDA[corpus]
will only show you those ones that exceed a threshold, not the whole distribution. You can access the whole distribution as well using
theta, _ = lda.inference(corpus)
theta /= theta.sum(axis=1)[:, None]