且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

Pandas MultiIndex自定义排序级别按分类顺序排列,而不是按字母顺序排列

更新时间:2023-11-30 09:56:52

可以使用这是一个小例子:

df = pd.DataFrame(
    {"i1":[1,1,1,1,2,4,4,2,3,3,3,3],
     "i2":[1,3,2,2,1,1,2,2,1,1,3,2],
     "d1":['a','b','c','d','e','f','g','h','i','j','k','l']}
)
df.set_index(['i1', 'i2'], inplace=True)
df.sort_index()

输出:

        d1
i1  i2  
1   1   a
    2   c
    2   d
    3   b
2   1   e
    2   h
3   1   i
    1   j
    2   l
    3   k
4   1   f
    2   g

如果您要更改列的排序顺序,请

If you want to change the sort order on column basis, the Dataframe.sort_index function takes an argument ascending= which can be given a list of [True, False] statements corresponding to the columns in order.

类别是熊猫中一个新的闪亮dtype,应使用它,但此操作本身并不需要.

Categorical is a new shiny dtype in pandas and it should be used, but it is not needed for this operation per se.

由于评论而

排序将始终按字母顺序或相反顺序排序.如果要进行自定义排序,则需要创建一个新列,该列可以按字母顺序排序,但是是可以确定排序的列的结果.使用 Series.map 来执行此操作,就像这样例如,首先用元音对数据集进行排序:

Sort will always sort alphabetically or in reverse order. If you want custom sort, then you need to create a new column which can be sorted alphabetically but is a result of the column which can determine the sorting. Do this using Series.map, like this example, that sorts the datasets with vowels first:

mappings = {'a': 0, 'b':1, 'c':1, 'd':1,
            'e':0, 'f':1, 'g':1, 'h':1,
            'i':0, 'j':1, 'k': 1, 'l': 1}
df['sortby'] = df['d1'].map(mappings)
df.sort('sortby')

        d1  sortby
i1  i2      
1   1   a   0
2   1   e   0
3   1   i   0
1   3   b   1
    2   c   1
    2   d   1
4   1   f   1
    2   g   1
2   2   h   1
3   1   j   1
    3   k   1
    2   l   1

如果您此后不希望使用sortby列,则可以将其删除,如下所示:

If you do not want the sortby column after that, you can simply delete it, like this:

del df['sortby']