且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

多索引箱线图

更新时间:2023-11-25 20:20:52

使用

使用

Let's say i have a Dataframe with columns as Multiindex. For example:

a = pd.DataFrame(index=range(10), 
                 columns=pd.MultiIndex.from_product(
                         iterables=[['2000', '2010'], ['a', 'b']], 
                         names=['Year', 'Text']), 
                 data=np.random.randn(10,4))

I'd like to make a boxplot that groups by the Year. Like the hue arg on seaborn boxplots. I wondered if there was an easy way to achieve that in either pandas/seaborn/matplotlib. I feel an unstacking could do the trick but I can't get it to work.

Use stack for reshape and plot by DataFrame.boxplot:

np.random.seed(45)
a = pd.DataFrame(index=range(10), 
                 columns=pd.MultiIndex.from_product(
                         iterables=[['2000', '2010'], ['a', 'b']], 
                         names=['Year', 'Text']), 
                 data=np.random.randn(10,4))

b = a.stack(level=0).reset_index(level=0, drop=True).reset_index()
print (b)
Text  Year         a         b
0     2000  0.026375  0.260322
1     2010 -0.395146 -0.204301
2     2000 -1.271633 -2.596879
3     2010  0.289681 -0.873305
4     2000  0.394073  0.935106
5     2010 -0.015685  0.259596
6     2000 -1.473314  0.801927
7     2010 -1.750752 -0.495052
8     2000 -1.008601  0.025244
9     2010 -0.121507 -1.546873
10    2000 -0.606944 -1.393813
11    2010 -0.627695  0.332632
12    2000 -1.541367  1.670300
13    2010 -0.499546  0.673129
14    2000  2.248090 -1.654263
15    2010 -0.474397 -0.301915
16    2000 -0.931026  1.110986
17    2010 -0.189683  1.278410
18    2000 -0.554077  0.354303
19    2010 -0.440276 -0.424449


b.boxplot(by='Year')

Solution for seaborn boxplot with unstack:

b = a.unstack(level=0).reset_index(level=2, drop=True).reset_index(name='data')
print (b.head(15))
    Year Text      data
0   2000    a  0.026375
1   2000    a -1.271633
2   2000    a  0.394073
3   2000    a -1.473314
4   2000    a -1.008601
5   2000    a -0.606944
6   2000    a -1.541367
7   2000    a  2.248090
8   2000    a -0.931026
9   2000    a -0.554077
10  2000    b  0.260322
11  2000    b -2.596879
12  2000    b  0.935106
13  2000    b  0.801927
14  2000    b  0.025244


ax = sns.boxplot(x='Text', y='data', hue="Year",
                 data=b, palette="Set3")