使用pd.json_normalize展平字典

更新时间：2021-11-29 23:08:36

设置

您的数据结构不便.我想专注于:

Setup

Your data is structured inconveniently. I want to focus on:

将"ID" 中的列表放入字典列表中，这会更加方便.
摆脱父词典中无用的键.我们只关心价值.

Getting the lists in 'IDs' into a list of dictionaries, which would be far more convenient.
Getting rid of the useless keys in the parent dictionary. All we care about are the values.

您的数据:

{1: {'Name': 'Thrilling Tales of Dragon Slayers',
     'IDs': {'StoreID': ['123445452543'],
             'BookID': ['543533254353'],
             'SalesID': ['543267765345']}},
 2: {'Name': 'boring Tales of Dragon Slayers',
     'IDs': {'StoreID': ['111111', '1121111'],
             'BookID': ['543533254353', '4324232342'],
             'SalesID': ['543267765345', '4353543']}}}

我希望它看起来像什么

[{'Name': 'Thrilling Tales of Dragon Slayers',
  'IDs': [{'StoreID': '123445452543',
           'BookID': '543533254353',
           'SalesID': '543267765345'}]},
 {'Name': 'boring Tales of Dragon Slayers',
  'IDs': [{'StoreID': '111111',
           'BookID': '543533254353',
           'SalesID': '543267765345'},
          {'StoreID': '1121111',
           'BookID': '4324232342',
           'SalesID': '4353543'}]}]

重组数据

合理的方式

简单的循环，不要乱扔.这使我们得到了上面显示的内容

Restructure Data

Reasonable Way

Simple loop, don't mess around. This gets us what I showed above

new = []

for v in data.values():
    temp = {**v}           # This is intended to keep all the other data that might be there
    ids = temp.pop('IDs')  # I have to focus on this to create the records
    temp['IDs'] = [dict(zip(ids, x)) for x in zip(*ids.values())]
    new.append(temp)

可爱的单缸纸

new = [{**v, 'IDs': [dict(zip(v['IDs'], x)) for x in zip(*v['IDs'].values())]} for v in data.values()]

使用 `pd.json_normalize`

创建 DataFrame

在对 json_normalize 的调用中，我们需要指定记录的路径，即在'IDs'键处找到的ID词典列表. json_normalize 将在数据框中为该列表中的每个项目创建一行.这将通过 record_path 参数完成，我们传递一个 tuple 来描述路径(如果是更深层的结构)或字符串(如果键位于顶层，对我们来说就是.)

Create `DataFrame` with `pd.json_normalize`

In this call to json_normalize we need to specify the path to the records, i.e. the list of id dictionaries found at the 'IDs' key. json_normalize will create one row in the dataframe for every item in that list. This will be done with the the record_path parameter and we pass a tuple that describes the path (if it were in a deeper structure) or a string (if the key is at the top layer, which for us, it is).

record_path = 'IDs'

然后我们要告诉 json_normalize 哪些键是记录的元数据.如果像我们一样有多个记录，那么将为每条记录重复元数据.

Then we want to tell json_normalize what keys are metadata for the records. If there are more than one record, as we have, then the metadata will be repeated for each record.

meta = 'Name'

所以最终的解决方案是这样的:

So the final solution looks like this:

pd.json_normalize(new, record_path='IDs', meta='Name')

        StoreID        BookID       SalesID                               Name
0  123445452543  543533254353  543267765345  Thrilling Tales of Dragon Slayers
1        111111  543533254353  543267765345     boring Tales of Dragon Slayers
2       1121111    4324232342       4353543     boring Tales of Dragon Slayers

但是

如果我们仍然要进行重组，那么***还是将其传递给数据框构造函数.

However

If we are restructuring anyway, might as well make it so we can just pass it to the dataframe constructor.

pd.DataFrame([
    {'Name': r['Name'], **dict(zip(r['IDs'], x))}
    for r in data.values() for x in zip(*r['IDs'].values())
])

                                Name       StoreID        BookID       SalesID
0  Thrilling Tales of Dragon Slayers  123445452543  543533254353  543267765345
1     boring Tales of Dragon Slayers        111111  543533254353  543267765345
2     boring Tales of Dragon Slayers       1121111    4324232342       4353543

奖励内容

虽然我们在做.关于每个id类型是否具有相同数量的id，数据是模棱两可的.假设他们没有.

Bonus Content

While we are at it. The data is ambiguous in regards to whether or not each id type has the same number of ids. Suppose they did not.

data = {1:{
      'Name': "Thrilling Tales of Dragon Slayers",
      'IDs':{
            "StoreID": ['123445452543'],
            "BookID": ['543533254353'],
            "SalesID": ['543267765345']}},
     2:{
      'Name': "boring Tales of Dragon Slayers",
      'IDs':{
            "StoreID": ['111111', '1121111'],
            "BookID": ['543533254353', '4324232342'],
            "SalesID": ['543267765345', '4353543', 'extra id']}}}

然后我们可以使用 itertools

from itertools import zip_longest

pd.DataFrame([
    {'Name': r['Name'], **dict(zip(r['IDs'], x))}
    for r in data.values() for x in zip_longest(*r['IDs'].values())
])

                                Name       StoreID        BookID       SalesID
0  Thrilling Tales of Dragon Slayers  123445452543  543533254353  543267765345
1     boring Tales of Dragon Slayers        111111  543533254353  543267765345
2     boring Tales of Dragon Slayers       1121111    4324232342       4353543
3     boring Tales of Dragon Slayers          None          None      extra id

上一篇 : ：如何使用Nans将zscore归一化 pandas 列?下一篇 : Json中缺少空值列

使用pd.json_normalize展平字典

设置

Setup

重组数据

合理的方式

Restructure Data

Reasonable Way

可爱的单缸纸

使用 `pd.json_normalize`

Create `DataFrame` with `pd.json_normalize`

但是

However

奖励内容

Bonus Content

相关阅读

技术问答最新文章

使用pd.json_normalize展平字典

设置

Setup

重组数据

合理的方式

Restructure Data

Reasonable Way

可爱的单缸纸

使用 pd.json_normalize

Create DataFrame with pd.json_normalize

但是

However

奖励内容

Bonus Content

相关阅读

技术问答最新文章

使用 `pd.json_normalize`

Create `DataFrame` with `pd.json_normalize`