我该如何分割短语列表进入的话，所以我可以对他们使用柜台？

更新时间：2023-12-02 17:19:28

您已经有一个单词列表，所以你并不需要分割什么，忘记调用的 str.join 的即。加入（meaningful_words）和刚创建的计数的每次通话字典并更新到 post_to_words ，你也做的方式很多工作，你需要做的是遍历 fiance_forum [POST_TEXT] 每个元素传递给函数。你只还需要一次创建一组停止字，而不是在每次迭代：

You already have a list of words so you don't need to split anything, forget calling str.join i.e " ".join(meaningful_words) and just create a Counter dict and update on each call to post_to_words, you are also doing way to much work, all you need to do is iterate over fiance_forum["Post_Text"] passing each element to the function. You only also need to create the set of stopwords once, not on every iteration:

from collections import Counter

def post_to_words(raw_pos, st):
    HTML_text = BeautifulSoup(raw_post).get_text()
    letters_only = re.sub("[^a-zA-Z]", " ", HTML_text)
    words = letters_only.lower().split()
    return (w for w in words if w not in st)



cn = Counter()
st = set(stopwords.words("english"))
for post in fiance_forum["Post_Text"]:
    cn.update(post_to_words(post, st)

这也避免了由需要你去做计数创造了巨大的单词列表。

That also avoids the need to create a huge list of words by doing the counting as you go.

上一篇 : ：Cookie信息回顾下一篇 : 在我的PC上运行的网站上与lan连接的另一台PC上运行网站

我该如何分割短语列表进入的话，所以我可以对他们使用柜台？

相关阅读

推荐文章