且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

用Python解析硕大的日志文件

更新时间:2022-03-23 10:01:06

调用readlines()会将整个文件调用到内存中,因此您必须逐行读取,直到达到500,000行或按EOF键为止首先.您应该改用以下方法:

Calling readlines() will call the entire file into memory, so you'll have to read line by line until you reach line 500,000 or hit the EOF, whichever comes first. Here's what you should do instead:

i = 0
while i < 500000:
    line = FILE.readline()
    if line == "": # Cuts off if end of file reached
        break
    m = re.search('key=([^&]*)', line)
    count_words[m.group(1)]+=1
    i += 1