且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用gensim LDA获得文档的完整主题分发?

更新时间:2023-02-27 09:19:48

似乎还没有人答复,所以我会尽力回答这个问题,并尽可能给gensim

It doesnt seem that anyone has replied yet, so I'll try and answer this as best I can given the gensim documentation.

似乎在训练模型以获得所需结果时需要将参数minimum_probability设置为0.0:

It seems you need to set a parameter minimum_probability to 0.0 when training the model to get the desired results:

lda = LdaMulticore(corpus=corpus, num_topics=num_topics, id2word=dictionary, workers=num_cores, alpha=1e-5, eta=5e-1,
              minimum_probability=0.0)

lda[corpus[233]]
>>> [(0, 5.8821799358842424e-07),
 (1, 5.8821799358842424e-07),
 (2, 5.8821799358842424e-07),
 (3, 5.8821799358842424e-07),
 (4, 5.8821799358842424e-07),
 (5, 5.8821799358842424e-07),
 (6, 5.8821799358842424e-07),
 (7, 5.8821799358842424e-07),
 (8, 5.8821799358842424e-07),
 (9, 5.8821799358842424e-07),
 (10, 5.8821799358842424e-07),
 (11, 5.8821799358842424e-07),
 (12, 5.8821799358842424e-07),
 (13, 5.8821799358842424e-07),
 (14, 5.8821799358842424e-07),
 (15, 5.8821799358842424e-07),
 (16, 5.8821799358842424e-07),
 (17, 5.8821799358842424e-07),
 (18, 5.8821799358842424e-07),
 (19, 5.8821799358842424e-07),
 (20, 5.8821799358842424e-07),
 (21, 5.8821799358842424e-07),
 (22, 5.8821799358842424e-07),
 (23, 5.8821799358842424e-07),
 (24, 5.8821799358842424e-07),
 (25, 5.8821799358842424e-07),
 (26, 5.8821799358842424e-07),
 (27, 0.99997117731831464),
 (28, 5.8821799358842424e-07),
 (29, 5.8821799358842424e-07),
 (30, 5.8821799358842424e-07),
 (31, 5.8821799358842424e-07),
 (32, 5.8821799358842424e-07),
 (33, 5.8821799358842424e-07),
 (34, 5.8821799358842424e-07),
 (35, 5.8821799358842424e-07),
 (36, 5.8821799358842424e-07),
 (37, 5.8821799358842424e-07),
 (38, 5.8821799358842424e-07),
 (39, 5.8821799358842424e-07),
 (40, 5.8821799358842424e-07),
 (41, 5.8821799358842424e-07),
 (42, 5.8821799358842424e-07),
 (43, 5.8821799358842424e-07),
 (44, 5.8821799358842424e-07),
 (45, 5.8821799358842424e-07),
 (46, 5.8821799358842424e-07),
 (47, 5.8821799358842424e-07),
 (48, 5.8821799358842424e-07),
 (49, 5.8821799358842424e-07)]