使用Gensim为LDA模型获取主题数的方法是什么?

更新时间：2022-11-11 19:19:29

尽管我不能特别评论Gensim，但可以考虑一些有关优化主题的一般建议.

Although I cannot comment on Gensim in particular I can weigh in with some general advice for optimising your topics.

正如您所述，使用对数似然法是一种方法.另一种选择是保留一组来自模型生成过程的文档，并在模型完成后推断主题，并检查是否有意义.

As you stated, using log likelihood is one method. Another option is to keep a set of documents held out from the model generation process and infer topics over them when the model is complete and check if it makes sense.

您可以尝试的另一种完全不同的方法是层次化Dirichlet流程，该方法可以在不指定的情况下动态地找到语料库中的主题数.

A completely different method you could try is a hierarchical Dirichlet process, this method can find the number of topics in the corpus dynamically without being specified.

关于如何***地指定参数和评估主题模型的论文很多，具体取决于您的经验水平，这些论文可能对您不利或对您不利:

There are many papers on how to best specify parameters and evaluate your topic model, depending on your experience level these may or may not be good for you:

重新思考LDA:为何如此重要，Wallach，HM，Mimno，D.和McCallum，答:

Rethinking LDA: Why Priors Matter, Wallach, H.M., Mimno, D. and McCallum, A.

主题模型的评估方法，Wallach HM，Murray，I.，Salakhutdinov，R.还有Dim Mimno.

Evaluation Methods for Topic Models, Wallach H.M., Murray, I., Salakhutdinov, R. and Mimno, D.

此外，这是有关分层Dirichlet流程的论文:

Also, here is the paper about the hierarchical Dirichlet process:

分级Dirichlet流程，Teh，YW，约旦，密西根州，比尔(M. Beal)和布莱(Blei)DM

Hierarchical Dirichlet Processes, Teh, Y.W., Jordan, M.I., Beal, M.J. and Blei, D.M.

上一篇 : ：在表视图中调整行大小的***方法是什么?下一篇 : 程序“包”目前尚未安装

使用Gensim为LDA模型获取主题数的方法是什么?

相关阅读

技术问答最新文章

使用Gensim为LDA模型获取***主题数的***方法是什么?

相关阅读

技术问答最新文章

使用Gensim为LDA模型获取主题数的方法是什么?