使用python从Twitter的推文中提取数据

更新时间：2023-02-19 09:38:27

您可以执行类似的操作来发出查询，然后通过使用相应的键进行查询来获取所需的数据.

You could do something like this to issue a query and afterwards get the data you want by querying with the corresponding keys.

import json
import urllib2
import twitter

ckey = 'Your consumer key'
csecret = 'your consumer secret' 
atoken = 'your token' 
asecret = 'your secret token'

auth = twitter.oauth.OAuth(atoken, asecret,
                           ckey, csecret)

twitter_api = twitter.Twitter(auth=auth)

q = 'http://on.fb.me'

count = 100

search_results = twitter_api.search.tweets(q=q, count=count)

statuses = search_results['statuses']

# Iterate through 5 more batches of results by following the cursor

for _ in range(5):
    print "Length of statuses", len(statuses)
    try:
        next_results = search_results['search_metadata']['next_results']
    except KeyError, e: # No more results when next_results doesn't exist
        break

    # Create a dictionary from next_results, which has the following form:
    # ?max_id=313519052523986943&q=NCAA&include_entities=1
    kwargs = dict([ kv.split('=') for kv in next_results[1:].split("&") ])

    search_results = twitter_api.search.tweets(**kwargs)
    statuses += search_results['statuses']

# Show one sample search result by slicing the list...
print json.dumps(statuses[0], indent=1)

# get relevant data into lists
user_names = [ user_mention['name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

screen_names = [ user_mention['screen_name'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

id_str = [ user_mention['id_str'] 
                 for status in statuses
                     for user_mention in status['entities']['user_mentions'] ]

t_id = [ status['id'] 
         for status in statuses ]

# print out first 5 results
print json.dumps(screen_names[0:5], indent=1) 
print json.dumps(user_names[0:5], indent=1)
print json.dumps(id_str[0:5], indent=1)
print json.dumps(t_id[0:5], indent=1)

结果:

[
 "DijalogNet", 
 "Kihot_ex_of", 
 "Kihot_ex_of", 
 "JAsunshine1011", 
 "RobertCornegyJr"
]
[
 "Dijalog Net", 
 "Sa\u0161a Jankovi\u0107", 
 "Sa\u0161a Jankovi\u0107", 
 "Raycent Edwards", 
 "Robert E Cornegy, Jr"
]
[
 "2380692464", 
 "563692937", 
 "563692937", 
 "15920807", 
 "460051837"
]
[
 542309722385580032, 
 542227367834685440, 
 542202885514461185, 
 542201843448045568, 
 542188061598437376
]

看看

Have a look at this site for more examples on how to use the api.

上一篇 : ：正则表达式替换所有上标数字下一篇 : 正则表达式 - 指定长度且Base64编码的正则匹配

使用python从Twitter的推文中提取数据

相关阅读

技术问答最新文章