更新时间:2021-07-10 01:27:47
首先,这就是我将生成的cnt
生成方式(以减少内存开销)
First of all, this is how I would generate the cnt
that you do (to reduce memory overhead)
def findWords(filepath):
with open(filepath) as infile:
for line in infile:
words = re.findall('\w+', line.lower())
yield from words
cnt = collections.Counter(findWords('02.2003.BenBernanke.txt'))
现在,关于您的词组问题:
Now, on to your question about phrases:
from itertools import tee
phrases = {'central bank', 'high inflation'}
fw1, fw2 = tee(findWords('02.2003.BenBernanke.txt'))
next(fw2)
for w1,w2 in zip(fw1, fw2)):
phrase = ' '.join([w1, w2])
if phrase in phrases:
cnt[phrase] += 1
希望这会有所帮助