使用gensim库进行记忆有效的LDA训练

更新时间：2023-02-27 08:57:08

请考虑将您的corpus打包为可迭代的，并传递它而不是列表(生成器将不起作用).

Consider wrapping your corpus up as an iterable and passing that instead of a list (a generator will not work).

class MyCorpus(object):
    def __iter__(self):
       for line in open(fname):
            # assume there's one document per line, tokens separated by whitespace
            yield dictionary.doc2bow(line.lower().split())

corpus = MyCorpus()
lda = gensim.models.ldamodel.LdaModel(corpus=corpus, 
                                      id2word=dictionary,
                                      num_topics=100,
                                      update_every=1,
                                      chunksize=10000,
                                      passes=1)

另外，Gensim还提供了几种易于使用的不同语料库格式，可以在 API参考中找到一个>.您可以考虑使用TextCorpus，它应该已经非常适合您的格式:

Additionally, Gensim has several different corpus formats readily available, which can be found in the API reference. You might consider using TextCorpus, which should fit your format nicely already:

corpus = gensim.corpora.TextCorpus(fname)
lda = gensim.models.ldamodel.LdaModel(corpus=corpus, 
                                      id2word=corpus.dictionary, # TextCorpus can build the dictionary for you
                                      num_topics=100,
                                      update_every=1,
                                      chunksize=10000,
                                      passes=1)

上一篇 : ：如何将Ava测试分成多个文件?下一篇 : Haskell无限递归

使用gensim库进行记忆有效的LDA训练

相关阅读

技术问答最新文章