且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

使用2级嵌套数组将数据帧转换为JSON

更新时间:2022-11-22 18:01:21

首先对参数 FortNight列进行分组,然后在结果分组的行上使用.to_dict()来产生内部大多数词典:

Start by grouping on both the Parameter and FortNight columns, and using .to_dict() on the resulting grouped rows to produce the inner-most dictionaries:

details = df.groupby(['Parameter', 'FortNight']).apply(
    lambda r: r[['Customer', 'Amount']].to_dict(orient='records'))

这为您提供了一个在ParameterFortNight上具有多个索引的系列,并且值都是正确格式的所有列表,每个条目都有一个包含CustomerAmount列的字典.如果需要转换值类型,请先对r[['Customer', 'Amount']]数据框结果进行转换,然后再对其调用to_dict().

This gives you a series with a multi-index over Parameter and FortNight, and the values are all the lists in the correct format, each entry a dictionary with Customer and Amount columns. If you need to convert the value types, do so on the r[['Customer', 'Amount']] dataframe result before calling to_dict() on it.

然后您可以取消堆叠系列化为数据框,为您提供嵌套的参数-> FortNight->详细信息结构;参数值变成列,每个列表由FortNight索引的客户/金额字典:

You can then unstack the series into a dataframe, giving you a nested Parameter -> FortNight -> details structure; the Parameter values become columns, each list of Customer / Amount dictionaries indexed by FortNight:

nested = details.unstack('Parameter')

如果将其变成字典,您将获得一本最正确的字典:

If you turn this into a dictionary, you'd get a dictionary that's mostly correct already:

>>> pprint(grouped.unstack('Parameter').to_dict())
{'CustomerSales': {'Apr-2FN-2018': [{'Amount': 339632.0, 'Customer': '10992'},
                                    {'Amount': 27282.0, 'Customer': '10994'},
                                    {'Amount': 26353.0, 'Customer': '10995'},
                                    {'Amount': 24797.0, 'Customer': '11000'},
                                    {'Amount': 21093.0, 'Customer': '10990'}]}}

但是对于您的格式,您需要将每一列中的值转换为{'FortNight': indexvalue, 'Details': value}映射的列表,然后然后将整个结构转换为字典:

but for your format, you'd convert the values in each column to a list of {'FortNight': indexvalue, 'Details': value} mappings, then converting the whole structure to a dictionary:

output = nested.apply(lambda s: [
    {s.index.name: idx, 'Details': value}
    for idx, value in s.items()
]).to_dict('records')

这将为您提供最终输出:

This gives you your final output:

>>> pprint(output)
[{'CustomerSales': {'Details': [{'Amount': 339632.0, 'Customer': '10992'},
                                {'Amount': 27282.0, 'Customer': '10994'},
                                {'Amount': 26353.0, 'Customer': '10995'},
                                {'Amount': 24797.0, 'Customer': '11000'},
                                {'Amount': 21093.0, 'Customer': '10990'}],
                    'FortNight': 'Apr-2FN-2018'}}]

如果需要JSON文档,请使用.to_json(orient='records')而不是.to_dict('records').

If you need a JSON document, use .to_json(orient='records') rather than .to_dict('records').

放在一起作为一个表达式:

Put together as one expression:

df.groupby(['Parameter', 'FortNight']).apply(
        lambda r: r[['Customer', 'Amount']].to_dict(orient='records')
    ).unstack('Parameter').apply(lambda s: [
        {s.index.name: idx, 'Details': value}
        for idx, value in s.items()]
    ).to_json(orient='records')