且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Gensim LDA多核Python脚本运行太慢

更新时间:2023-02-27 08:43:35

我在您的代码中注意到了这些问题.但是我不确定它们是否是执行缓慢的原因. 这个循环是没有用的,它永远不会运行:

I noticed these problems in your code.. but I'm not sure the they are the reason for the slow execution.. this loop here is useless it well never run:

 for text in author_text['full_text'].tolist():
      word_list = []
      for word in text:
         word_list.append(word)
         author_text.append(word_list)

同样也不需要循环文本中的单词,只需在其上使用split函数就可以了,这将是一个单词列表,这是通过甩开作者courser来实现的.

also there is no need to loop the words of the text it is enough to use split function on it and it will be a list of words, by lopping authors courser..

尝试这样写: 首先:

all_authors_text = []
for author in authors:
    all_authors_text.append(author['full_text'].split())

然后创建字典:

dictionary = corpora.Dictionary(all_authors_text)