将无序元组列表转换为Pandas DataFrame

更新时间：2021-11-13 03:57:01

感谢您的回复！我最终做了一个完全不同的解决方法，如下所示:

Thank you for your responses! I ended up doing a completely different workaround as follows:

我检查了文档，以查看来自usaddress的所有可能的parse_tags，创建了一个DataFrame，其中所有可能的标签作为列，而另一列包含提取的地址.然后，我开始使用regex解析并从列中提取信息.下面的代码！

I checked the documentation to see all possible parse_tags from usaddress, created a DataFrame with all possible tags as columns, and one other column with the extracted addresses. Then I proceeded to parse and extract information from the columns using regex. Code below!

parse_tags = ['Recipient','AddressNumber','AddressNumberPrefix','AddressNumberSuffix',
'StreetName','StreetNamePreDirectional','StreetNamePreModifier','StreetNamePreType',
'StreetNamePostDirectional','StreetNamePostModifier','StreetNamePostType','CornerOf',
'IntersectionSeparator','LandmarkName','USPSBoxGroupID','USPSBoxGroupType','USPSBoxID',
'USPSBoxType','BuildingName','OccupancyType','OccupancyIdentifier','SubaddressIdentifier',
'SubaddressType','PlaceName','StateName','ZipCode']

addr = ['123 Pennsylvania Ave NW Washington DC 20008', 
        '652 Polk St San Francisco, CA 94102', 
        '3711 Travis St #800 Houston, TX 77002']

df = pd.DataFrame({'Addresses': addr})
pd.concat([df, pd.DataFrame(columns = parse_tags)])

然后我创建了一个新列，该列使usaddress解析列表中的字符串成为"Info"

Then I created a new column that made a string out of the usaddress parse list and called it "Info"

df['Info'] = df['Addresses'].apply(lambda x: str(usaddress.parse(x)))

现在这是主要的解决方法.我遍历了每个列的名称，并在相应的信息"单元格中查找了该名称，并应用了正则表达式以提取它们所在的信息！

Now here's the major workaround. I looped through each column name and looked for it in the corresponding "Info" cell and applied regular expressions to extract information where they existed!

for colname in parse_tags:
    df[colname] = df['Info'].apply(lambda x: re.findall("\('(\S+)', '{}'\)".format(colname), x)[0] if re.search(
    colname, x) else "")

这可能不是最有效的方法，但是它可以达到我的目的.感谢大家提供的建议！

This is probably not the most efficient way, but it worked for my purposes. Thanks everyone for providing suggestions!

上一篇 : ：转换Pandas DataFrame，将行值添加为列标题下一篇 : 阻止 Pandas 将 int 转换为 float

将无序元组列表转换为Pandas DataFrame

相关阅读

技术问答最新文章