且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

按 pandas 数据框分组,并在每组中选择最新的

更新时间:2023-02-05 18:24:56

groupby 和切片 df 中使用 idxmax loc

  df.loc [df.groupby('id') .date.idxmax()] 

id产品日期
2 220 6647 2014-1 0-16
5 826 3380 2015-05-19
8 901 4555 2014-11-01


How to group values of pandas dataframe and select the latest(by date) from each group?

For example, given a dataframe sorted by date:

    id     product   date
0   220    6647     2014-09-01 
1   220    6647     2014-09-03 
2   220    6647     2014-10-16
3   826    3380     2014-11-11
4   826    3380     2014-12-09
5   826    3380     2015-05-19
6   901    4555     2014-09-01
7   901    4555     2014-10-05
8   901    4555     2014-11-01

grouping by id or product, and selecting the earliest gives:

    id     product   date
2   220    6647     2014-10-16
5   826    3380     2015-05-19
8   901    4555     2014-11-01

use idxmax in groupby and slice df with loc

df.loc[df.groupby('id').date.idxmax()]

    id  product       date
2  220     6647 2014-10-16
5  826     3380 2015-05-19
8  901     4555 2014-11-01