且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

相当于Python中的R group_by()+ rleid()

更新时间:2022-01-25 05:00:36

更新后的答案

问题在于,每个measurement_id, obj, var组中的min列都应保持顺序.我们可以在measurement_id, obj, var上按组进行检查,然后检查min列中的差异是否大于1.如果是这样,我们会在expected_output中将其标记为唯一的持续时间:

The problem is that the min column in each group of measurement_id, obj, var should be maintained order. We can check this by group by on measurement_id, obj, var and then checking if the difference in min column is greater than 1. If so, we mark it as a unique duration in expected_output:

df['grouper'] = (df.groupby(['measurement_id', 'obj', 'var'])['min']
                 .apply(lambda x: x.diff().fillna(1).eq(1))
                )

df['expected_output'] = (
    df.groupby(['measurement_id', 'obj', 'var'])['grouper'].transform('sum').astype(int)
)

df = df.drop(columns='grouper')

    measurement_id  min obj  var  expected_output
0                1    1   A    1                1
1                1    1   B    2                2
2                1    2   A    2                1
3                1    2   B    2                2
4                1    3   A    1                1
5                1    3   B    1                1
6                2    1   A    2                2
7                2    1   B    1                3
8                2    2   A    2                2
9                2    2   B    1                3
10               2    3   A    1                1
11               2    3   B    1                3


遵循OP的逻辑的旧答案

我们可以通过使用GroupBy.diff来获取您的rleid_output,基本上是每次measurement_id每次更改var时唯一的标识符. obj

We can achieve this by using GroupBy.diff to get your rleid_output, basically a unique identifier each time var changes for each measurement_id& obj

之后,使用GroupBy.nunique测量minutes的量:

rleid_output = df.groupby(['measurement_id', 'obj'])['var'].diff().abs().bfill()
df['expected_output'] = (df.groupby(['measurement_id', 'obj', rleid_output])['min']
                         .transform('nunique'))

    measurement_id  min obj  var  expected_output
0                1    1   A    1                2
1                1    1   B    2                2
2                1    2   A    1                2
3                1    2   B    2                2
4                1    3   A    2                1
5                1    3   B    1                1
6                2    1   A    2                2
7                2    1   B    1                3
8                2    2   A    2                2
9                2    2   B    1                3
10               2    3   A    1                1
11               2    3   B    1                3