且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Pandas数据透视表可同时显示多列

更新时间:2023-11-26 21:44:28

您可以先进行汇总,然后再使用unstack方法pivot进行聚合:

Instead of doing it in one step, you can do the aggregation firstly and then pivot it using unstack method:

(df.set_index('ptype')
 .groupby(level='ptype')
# to do the count of columns nj, wd, wpt against the column ptype using 
# groupby + value_counts
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))

#      nj             wd            wpt
#       1    2    3    1    2    3    1    2    3
#ptype                                  
#1    1.0  1.0  1.0  0.0  2.0  1.0  2.0  1.0  0.0
#2    0.0  1.0  1.0  1.0  0.0  1.0  0.0  1.0  1.0


避免使用apply方法的另一种选择:


Another option to avoid using apply method:

(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))

天真计时:

原始解决方案:

%%timeit
nj = df.pivot_table(index='ptype', columns='nj', aggfunc='count').ix[:, 'wd']
wpt = df.pivot_table(index='ptype', columns='wpt', aggfunc='count').ix[:, 'wd']
wd = df.pivot_table(index='ptype', columns='wd', aggfunc='count').ix[:, 'nj']
out = pd.concat([nj, wd, wpt], axis=1, keys=['nj', 'wd', 'wpt']).fillna(0)
out.columns.names = [None, None]
# 100 loops, best of 3: 12 ms per loop

选项一:

%%timeit
(df.set_index('ptype')
 .groupby(level='ptype')
 .apply(lambda g: g.apply(pd.value_counts))
 .unstack(level=1)
 .fillna(0))
# 100 loops, best of 3: 10.1 ms per loop

选项二:

%%timeit 
(df.set_index('ptype').stack()
 .groupby(level=[0,1])
 .value_counts()
 .unstack(level=[1,2])
 .fillna(0)
 .sort_index(axis=1))
# 100 loops, best of 3: 4.3 ms per loop