更新时间:2023-02-27 08:43:35
我在您的代码中注意到了这些问题.但是我不确定它们是否是执行缓慢的原因. 这个循环是没有用的,它永远不会运行:
I noticed these problems in your code.. but I'm not sure the they are the reason for the slow execution.. this loop here is useless it well never run:
for text in author_text['full_text'].tolist():
word_list = []
for word in text:
word_list.append(word)
author_text.append(word_list)
同样也不需要循环文本中的单词,只需在其上使用split函数就可以了,这将是一个单词列表,这是通过甩开作者courser来实现的.
also there is no need to loop the words of the text it is enough to use split function on it and it will be a list of words, by lopping authors courser..
尝试这样写: 首先:
all_authors_text = []
for author in authors:
all_authors_text.append(author['full_text'].split())
然后创建字典:
dictionary = corpora.Dictionary(all_authors_text)