更新时间:2023-02-22 13:05:33
原因是它只做一次就是for循环已经到达文件的末尾,所以它停止了,因为没有更多的行要阅读。
换句话说,第一次循环运行,它遍历整个文件,然后因为没有更多的行读取(自从它到达文件的末尾),它不会再循环,导致只有一行正在处理。
所以一种方法来解决这是倒带文件,你可以用 seek
方法的文件对象。
如果你的文件不大,另一种方法是读他们都进入了一个列表或类似的结构,然后循环通过它。
然而,因为你的情绪分数是一个简单的查找,***的方法是建立一个字典情感分数,然后查看词典中的每个单词来计算推特的整体情绪:
import csv
import json
scores = {}#empty dicti onary来存储每个单词
的打分('sentimentfile.txt')作为f:
reader = csv.reader(f,delimiter ='\ t')
在阅读器中的行:
scores [row [0] .strip()] = int(row [1] .strip())
with open('tweetsfile ('text','')。encode('utf')as f:
for line in f:
tweet = json.loads(line)
text = tweet.get -8')
if text:
total_sentiment = sum(scores.get(word,0)for word in text.split())
print({}:{})。格式(文本,分数))
with statement
会自动关闭文件处理程序。我正在使用 csv
模块读取文件(它也适用于制表符分隔的文件)。
这一行计算:
total_sentiment = sum(scores.get(word,0)for word in text.split())
这是写这个循环的一个简短的方法:
tweet_score = []
for word in text.split():
如果单词在分数中:
tweet_score [word] = scores [word]
total_score = sum(tweet_score)
字典的 get
方法需要第二个可选参数返回一个自定义值,当找不到密钥时;如果你省略了第二个参数,它将返回 None
。在我的循环中,我使用它返回0,如果这个词没有得分。
I am having trouble doing a sentiment analysis of tweets (file 1, standard twitter json response) against a list of words (file 2, tab delimited, two columns) with their sentiment assigned to them (either positive or negative).
The problem is: the top loop is only running once and then the script ends while I am looping through file 1 then nested within that I am looping through file 2 and trying to compare and keep a running sum of the combined sentiment for each tweet.
so i have:
def get_sentiments(tweet_file, sentiment_file):
sent_score = 0
for line in tweet_file:
document = json.loads(line)
tweets = document.get('text')
if tweets != None:
tweet = str(tweets.encode('utf-8'))
#print tweet
for z in sentiment_file:
line = z.split('\t')
word = line[0].strip()
score = int(line[1].rstrip('\n').strip())
#print score
if word in tweet:
print "+++++++++++++++++++++++++++++++++++++++"
print word, tweet
sent_score += score
print "====", sent_score, "====="
#PROBLEM, IT'S ONLY DOING THIS FOR THE FIRST TWEET
file1 = open(tweetsfile.txt)
file2 = open(sentimentfile.txt)
get_sentiments(file1, file2)
I've spent the better half of a day trying to figure out why it prints out all the tweets without the nested for loop for file2, but with it, it only processes the first tweet then exits.
The reason its only doing it once is that the for loop has reached the end of the file, so it stops since there are no more lines to read.
In other words, the first time your loop runs, it steps through the entire file, and then since there are no more lines to read (since its reached the end of the file), it doesn't loop again, resulting in only one line being processed.
So one way to solve this is to "rewind" the file, you can do that with the seek
method of the file object.
If your files aren't big, another approach is to read them all into a list or similar structure and then loop through it.
However, since your sentiment score is a simple lookup, the best approach would be to build a dictionary with the sentiment scores, then lookup each word in the dictionary to calculate the overall sentiment of the tweet:
import csv
import json
scores = {} # empty dictionary to store scores for each word
with open('sentimentfile.txt') as f:
reader = csv.reader(f, delimiter='\t')
for row in reader:
scores[row[0].strip()] = int(row[1].strip())
with open('tweetsfile.txt') as f:
for line in f:
tweet = json.loads(line)
text = tweet.get('text','').encode('utf-8')
if text:
total_sentiment = sum(scores.get(word,0) for word in text.split())
print("{}: {}".format(text,score))
The with statement
automatically closes file handlers. I am using the csv
module to read the file (it works for tab delimited files as well).
This line does the calculation:
total_sentiment = sum(scores.get(word,0) for word in text.split())
It is a shorter way to write this loop:
tweet_score = []
for word in text.split():
if word in scores:
tweet_score[word] = scores[word]
total_score = sum(tweet_score)
The get
method of dictionaries takes a second optional argument to return a custom value when the key cannot be found; if you omit this second argument, it will return None
. In my loop I am using it to return 0 if the word has no score.