且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何将 Dataframe 单元格内的列表分解为单独的行

更新时间:2023-02-09 22:30:07

在下面的代码中,我首先重置了索引以使行迭代更容易.

我创建了一个列表列表,其中外部列表​​的每个元素都是目标 DataFrame 的一行,而内部列表的每个元素都是其中的一列.这个嵌套列表最终将被连接起来以创建所需的 DataFrame.

我使用 lambda 函数和列表迭代来为 nearest_neighbors 的每个元素创建一行,并与相关的 name 和 对手代码>.

最后,我从这个列表中创建了一个新的 DataFrame(使用原始列名并将索引设置回 nameopponent).

df = (pd.DataFrame({'name': ['A.J. Price'] * 3,'对手': ['76ers', 'blazers', 'bobcats'],'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3}).set_index(['姓名', '对手']))>>>df最近的邻居命名对手A.J.Price 76ers [扎克·拉文、林书豪、内特·罗宾逊、伊萨亚]开拓者 [扎克·拉文、林书豪、内特·罗宾逊、伊萨亚]山猫 [扎克·拉文、林书豪、内特·罗宾逊、伊萨亚]df.reset_index(就地=真)行 = []_ = df.apply(lambda 行: [rows.append([row['name'], row['opponent'], nn])对于 row.nearest_neighbors] 中的 nn,轴 = 1)df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])>>>df_new最近的邻居命名对手A.J.价格 76 人队扎克·拉文76人林书豪76人内特罗宾逊76人伊萨亚西装外套扎克·拉文西装外套林书豪西装外套内特罗宾逊西装外套 Isaia山猫扎克·拉文山猫林书豪山猫内特罗宾逊山猫伊萨亚

编辑 2017 年 6 月

另一种方法如下:

>>>(pd.melt(df.nearest_neighbors.apply(pd.Series).reset_index(),id_vars=['姓名', '对手'],value_name='nearest_neighbors').set_index(['姓名', '对手']).drop('变量',axis=1).dropna().sort_index())

I'm looking to turn a pandas cell containing a list into rows for each of those values.

So, take this:

If I'd like to unpack and stack the values in the nearest_neighbors column so that each value would be a row within each opponent index, how would I best go about this? Are there pandas methods that are meant for operations like this?

In the code below, I first reset the index to make the row iteration easier.

I create a list of lists where each element of the outer list is a row of the target DataFrame and each element of the inner list is one of the columns. This nested list will ultimately be concatenated to create the desired DataFrame.

I use a lambda function together with list iteration to create a row for each element of the nearest_neighbors paired with the relevant name and opponent.

Finally, I create a new DataFrame from this list (using the original column names and setting the index back to name and opponent).

df = (pd.DataFrame({'name': ['A.J. Price'] * 3, 
                    'opponent': ['76ers', 'blazers', 'bobcats'], 
                    'nearest_neighbors': [['Zach LaVine', 'Jeremy Lin', 'Nate Robinson', 'Isaia']] * 3})
      .set_index(['name', 'opponent']))

>>> df
                                                    nearest_neighbors
name       opponent                                                  
A.J. Price 76ers     [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
           blazers   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]
           bobcats   [Zach LaVine, Jeremy Lin, Nate Robinson, Isaia]

df.reset_index(inplace=True)
rows = []
_ = df.apply(lambda row: [rows.append([row['name'], row['opponent'], nn]) 
                         for nn in row.nearest_neighbors], axis=1)
df_new = pd.DataFrame(rows, columns=df.columns).set_index(['name', 'opponent'])

>>> df_new
                    nearest_neighbors
name       opponent                  
A.J. Price 76ers          Zach LaVine
           76ers           Jeremy Lin
           76ers        Nate Robinson
           76ers                Isaia
           blazers        Zach LaVine
           blazers         Jeremy Lin
           blazers      Nate Robinson
           blazers              Isaia
           bobcats        Zach LaVine
           bobcats         Jeremy Lin
           bobcats      Nate Robinson
           bobcats              Isaia

EDIT JUNE 2017

An alternative method is as follows:

>>> (pd.melt(df.nearest_neighbors.apply(pd.Series).reset_index(), 
             id_vars=['name', 'opponent'],
             value_name='nearest_neighbors')
     .set_index(['name', 'opponent'])
     .drop('variable', axis=1)
     .dropna()
     .sort_index()
     )