更新时间:2023-02-27 08:43:11
我想知道您是否已经看到此页面?
I wonder if you have seen this page?
无论哪种方式,让我为您解释一些事情.该方法使用的文档数量很少(在经过Wikipedia大小的数据源训练后,效果会更好).因此,结果将是相当粗糙的,您必须意识到这一点.这就是为什么您不应该针对大量主题的原因(您选择了10个,在您的情况下可能明智地增加到20个).
Either way, let me explain a few things for you. The number of documents you use is small for the method (it works much better when trained on a data source of the size of Wikipedia). Therefore the results will be rather crude and you have to be aware of that. This is why you should not aim for a large number of topics (you chose 10 which could maybe go sensibly up to 20 in your case).
其他参数:
random_state
-作为种子(如果您想精确地重复训练过程)
random_state
- this serves as a seed (in case you wanted to repeat exactly the training process)
chunksize
-一次要考虑的文档数(影响内存消耗)
chunksize
- number of documents to consider at once (affects the memory consumption)
update_every
-每update_every
chunksize
块(本质上,这是为了优化内存消耗)
update_every
- update the model every update_every
chunksize
chunks (essentially, this is for memory consumption optimization)
passes
-该算法应该遍历整个主体的次数
passes
- how many times the algorithm is supposed to pass over the whole corpus
alpha
-引用文档:
可以将设置为显式数组=您选择的优先级.它也是 支持不对称"和自动"的特殊值:前者使用 固定归一化不对称1.0/topicno先于,后者学习 直接从您的数据中获取非对称先验.
can be set to an explicit array = prior of your choice. It also support special values of `‘asymmetric’ and ‘auto’: the former uses a fixed normalized asymmetric 1.0/topicno prior, the latter learns an asymmetric prior directly from your data.
per_word_topics
-将其设置为True
允许提取给定单词的最可能主题.设置培训过程的方式是将每个单词分配给一个主题.否则,将省略没有指示性的词. phi_value
是引导该过程的另一个参数-它是一个单词是否被视为指示性单词的阈值.
per_word_topics
- setting this to True
allows for extraction of the most likely topics given a word. The training process is set in such a way that every word will be assigned to a topic. Otherwise, words that are not indicative are going to be omitted. phi_value
is another parameter that steers this process - it is a threshold for a word treated as indicative or not.
M中特别详细地描述了***训练过程参数. Hoffman等人,在线学习潜在的狄利克雷分配方法.
有关训练过程或模型的内存优化,请参见此博客文章.
For memory optimization of the training process or the model see this blog post.