且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何使用正则表达式计算文本文件中短语的所有出现次数?

更新时间:2023-09-02 17:46:22

你可以完全去掉正则表达式,字符串对象的计数方法就足够了,其他很多代码也可以简化.

You can get rid of the regex entirely, the count-method of string objects is enough, much of the other code can be simplified as well.

您也没有将数据更改为小写,只是将字符串打印为小写,请注意我如何使用 data = data.lower() 实际更改变量.

You're also not changing data to lower case, just printing the string as lower case, note how I use data = data.lower() to actually change the variable.

试试这个代码:

import glob
import os

path = 'c:\script\lab\Tests'

k = 0

substring = ' at least '
for filename in glob.glob(os.path.join(path, '*.txt')):
    if filename.endswith('.txt'):
        f = open(filename)
        data = f.read()
        data = data.lower()
        S= data.count(substring)
        if S:
            k= k + 1
            print("'{}' match".format(filename), S)
        else:
            print("'{}' no match".format(filename))
print("Total number of matches", k)

如有任何不清楚的地方,请随时提问!

If anything is unclear feel free to ask!