更新时间:2021-11-29 23:08:36
您的数据结构不便.我想专注于:
Your data is structured inconveniently. I want to focus on:
"ID"
中的列表放入字典列表中,这会更加方便.'IDs'
into a list of dictionaries, which would be far more convenient.您的数据
:
{1: {'Name': 'Thrilling Tales of Dragon Slayers',
'IDs': {'StoreID': ['123445452543'],
'BookID': ['543533254353'],
'SalesID': ['543267765345']}},
2: {'Name': 'boring Tales of Dragon Slayers',
'IDs': {'StoreID': ['111111', '1121111'],
'BookID': ['543533254353', '4324232342'],
'SalesID': ['543267765345', '4353543']}}}
我希望它看起来像什么
[{'Name': 'Thrilling Tales of Dragon Slayers',
'IDs': [{'StoreID': '123445452543',
'BookID': '543533254353',
'SalesID': '543267765345'}]},
{'Name': 'boring Tales of Dragon Slayers',
'IDs': [{'StoreID': '111111',
'BookID': '543533254353',
'SalesID': '543267765345'},
{'StoreID': '1121111',
'BookID': '4324232342',
'SalesID': '4353543'}]}]
简单的循环,不要乱扔.这使我们得到了上面显示的内容
Simple loop, don't mess around. This gets us what I showed above
new = []
for v in data.values():
temp = {**v} # This is intended to keep all the other data that might be there
ids = temp.pop('IDs') # I have to focus on this to create the records
temp['IDs'] = [dict(zip(ids, x)) for x in zip(*ids.values())]
new.append(temp)
new = [{**v, 'IDs': [dict(zip(v['IDs'], x)) for x in zip(*v['IDs'].values())]} for v in data.values()]
pd.json_normalize
DataFrame
在对 json_normalize
的调用中,我们需要指定记录的路径,即在'IDs'
键处找到的ID词典列表. json_normalize
将在数据框中为该列表中的每个项目创建一行.这将通过 record_path
参数完成,我们传递一个 tuple
来描述路径(如果是更深层的结构)或字符串(如果键位于顶层,对我们来说就是.)
DataFrame
with pd.json_normalize
In this call to json_normalize
we need to specify the path to the records, i.e. the list of id dictionaries found at the 'IDs'
key. json_normalize
will create one row in the dataframe for every item in that list. This will be done with the the record_path
parameter and we pass a tuple
that describes the path (if it were in a deeper structure) or a string (if the key is at the top layer, which for us, it is).
record_path = 'IDs'
然后我们要告诉 json_normalize
哪些键是记录的元数据.如果像我们一样有多个记录,那么将为每条记录重复元数据.
Then we want to tell json_normalize
what keys are metadata for the records. If there are more than one record, as we have, then the metadata will be repeated for each record.
meta = 'Name'
所以最终的解决方案是这样的:
So the final solution looks like this:
pd.json_normalize(new, record_path='IDs', meta='Name')
StoreID BookID SalesID Name
0 123445452543 543533254353 543267765345 Thrilling Tales of Dragon Slayers
1 111111 543533254353 543267765345 boring Tales of Dragon Slayers
2 1121111 4324232342 4353543 boring Tales of Dragon Slayers
如果我们仍然要进行重组,那么***还是将其传递给数据框构造函数.
If we are restructuring anyway, might as well make it so we can just pass it to the dataframe constructor.
pd.DataFrame([
{'Name': r['Name'], **dict(zip(r['IDs'], x))}
for r in data.values() for x in zip(*r['IDs'].values())
])
Name StoreID BookID SalesID
0 Thrilling Tales of Dragon Slayers 123445452543 543533254353 543267765345
1 boring Tales of Dragon Slayers 111111 543533254353 543267765345
2 boring Tales of Dragon Slayers 1121111 4324232342 4353543
虽然我们在做.关于每个id类型是否具有相同数量的id,数据是模棱两可的.假设他们没有.
While we are at it. The data is ambiguous in regards to whether or not each id type has the same number of ids. Suppose they did not.
data = {1:{
'Name': "Thrilling Tales of Dragon Slayers",
'IDs':{
"StoreID": ['123445452543'],
"BookID": ['543533254353'],
"SalesID": ['543267765345']}},
2:{
'Name': "boring Tales of Dragon Slayers",
'IDs':{
"StoreID": ['111111', '1121111'],
"BookID": ['543533254353', '4324232342'],
"SalesID": ['543267765345', '4353543', 'extra id']}}}
然后我们可以使用 itertools
from itertools import zip_longest
pd.DataFrame([
{'Name': r['Name'], **dict(zip(r['IDs'], x))}
for r in data.values() for x in zip_longest(*r['IDs'].values())
])
Name StoreID BookID SalesID
0 Thrilling Tales of Dragon Slayers 123445452543 543533254353 543267765345
1 boring Tales of Dragon Slayers 111111 543533254353 543267765345
2 boring Tales of Dragon Slayers 1121111 4324232342 4353543
3 boring Tales of Dragon Slayers None None extra id