且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何正确地循环两个文件,比较两个文件中的字符串

更新时间:2023-02-22 13:05:33

原因是它只做一次就是for循环已经到达文件的末尾,所以它停止了,因为没有更多的行要阅读。

换句话说,第一次循环运行,它遍历整个文件,然后因为没有更多的行读取(自从它到达文件的末尾),它不会再循环,导致只有一行正在处理。

所以一种方法来解决这是倒带文件,你可以用 seek 方法的文件对象。



如果你的文件不大,另一种方法是读他们都进入了一个列表或类似的结构,然后循环通过它。

然而,因为你的情绪分数是一个简单的查找,***的方法是建立一个字典情感分数,然后查看词典中的每个单词来计算推特的整体情绪:

  import csv 
import json

scores = {}#empty dicti onary来存储每个单词

的打分('sentimentfile.txt')作为f:
reader = csv.reader(f,delimiter ='\ t')
在阅读器中的行:
scores [row [0] .strip()] = int(row [1] .strip())


with open('tweetsfile ('text','')。encode('utf')as f:
for line in f:
tweet = json.loads(line)
text = tweet.get -8')
if text:
total_sentiment = sum(scores.get(word,0)for word in text.split())
print({}:{})。格式(文本,分数))

with statement 会自动关闭文件处理程序。我正在使用 csv 模块读取文件(它也适用于制表符分隔的文件)。

这一行计算:

  total_sentiment = sum(scores.get(word,0)for word in text.split())

这是写这个循环的一个简短的方法:

  tweet_score = [] 
for word in text.split():
如果单词在分数中:
tweet_score [word] = scores [word]

total_score = sum(tweet_score)

字典的 get 方法需要第二个可选参数返回一个自定义值,当找不到密钥时;如果你省略了第二个参数,它将返回 None 。在我的循环中,我使用它返回0,如果这个词没有得分。


I am having trouble doing a sentiment analysis of tweets (file 1, standard twitter json response) against a list of words (file 2, tab delimited, two columns) with their sentiment assigned to them (either positive or negative).

The problem is: the top loop is only running once and then the script ends while I am looping through file 1 then nested within that I am looping through file 2 and trying to compare and keep a running sum of the combined sentiment for each tweet.

so i have:

def get_sentiments(tweet_file, sentiment_file):


    sent_score = 0
    for line in tweet_file:

        document = json.loads(line)
        tweets = document.get('text')

        if tweets != None:
            tweet = str(tweets.encode('utf-8'))

            #print tweet


            for z in sentiment_file:
                line = z.split('\t')
                word = line[0].strip()
                score = int(line[1].rstrip('\n').strip())

                #print score



                if word in tweet:
                    print "+++++++++++++++++++++++++++++++++++++++"
                    print word, tweet
                    sent_score += score



            print "====", sent_score, "====="

    #PROBLEM, IT'S ONLY DOING THIS FOR THE FIRST TWEET

file1 = open(tweetsfile.txt)
file2 = open(sentimentfile.txt)


get_sentiments(file1, file2)

I've spent the better half of a day trying to figure out why it prints out all the tweets without the nested for loop for file2, but with it, it only processes the first tweet then exits.

The reason its only doing it once is that the for loop has reached the end of the file, so it stops since there are no more lines to read.

In other words, the first time your loop runs, it steps through the entire file, and then since there are no more lines to read (since its reached the end of the file), it doesn't loop again, resulting in only one line being processed.

So one way to solve this is to "rewind" the file, you can do that with the seek method of the file object.

If your files aren't big, another approach is to read them all into a list or similar structure and then loop through it.

However, since your sentiment score is a simple lookup, the best approach would be to build a dictionary with the sentiment scores, then lookup each word in the dictionary to calculate the overall sentiment of the tweet:

import csv
import json

scores = {}  # empty dictionary to store scores for each word

with open('sentimentfile.txt') as f:
    reader = csv.reader(f, delimiter='\t')
    for row in reader:
        scores[row[0].strip()] = int(row[1].strip()) 


with open('tweetsfile.txt') as f:
    for line in f:
        tweet = json.loads(line)
        text = tweet.get('text','').encode('utf-8')
        if text:
            total_sentiment = sum(scores.get(word,0) for word in text.split())
            print("{}: {}".format(text,score))

The with statement automatically closes file handlers. I am using the csv module to read the file (it works for tab delimited files as well).

This line does the calculation:

total_sentiment = sum(scores.get(word,0) for word in text.split())

It is a shorter way to write this loop:

tweet_score = []
for word in text.split():
    if word in scores:
        tweet_score[word] = scores[word]

total_score = sum(tweet_score)

The get method of dictionaries takes a second optional argument to return a custom value when the key cannot be found; if you omit this second argument, it will return None. In my loop I am using it to return 0 if the word has no score.