且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用公共域和页面值对值进行分组

更新时间:2023-11-26 10:49:16

使用 defaultdict() 收集每个 url 路径的参数:

Use defaultdict() to collect parameters per url path:

from collections import defaultdict
from urllib import quote
from urlparse import parse_qsl, urlparse


urls = defaultdict(list)
with open('links.txt') as f:
    for url in f:
        parsed_url = urlparse(url.strip())
        params = parse_qsl(parsed_url.query, keep_blank_values=True)
        for key, value in params:
            urls[parsed_url.path].append("%s=%s" % (key, quote(value)))

# printing results
for url, params in urls.iteritems():
    print url
    for param in params:
        print param

印刷品:

ww2.domain.com/cal
date=2007-04-14
date=2007-08-19
www.domain.edu/some/folder/image.php
l=adm
y=5
id=2
page=http%3A//support.domain.com/downloads/index.asp
unique=12345
l=adm
y=5
id=2
page=http%3A//.domain.com/downloads/index.asp
unique=12345
domain.com/cal
view=month
view=day
www.domain.com/page
id_eve=479989
adm=no
id_eve=47
adm=yes
id_eve=479
blog.news.org/news/calendar.php
view=day
date=2011-12-10
view=month
date=2011-12-10

希望有所帮助.