且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

基于多列创建滞后特征

更新时间:2021-12-04 23:12:20

您想转到下一周,因此从分组中删除'week':

You want to shift to the next week so remove 'week' from the grouping:

df['t-1'] = df.groupby(['id1','id2','id3'],as_index=False)['value'].shift()
#    week  id1  id2  id3  value   t-1
#0     1  101  123    1     45   NaN
#1     1  102  231    4     89   NaN
#2     1  203  435   99     65   NaN
#3     2  101  123    1     48  45.0
#4     2  102  231    4     75  89.0
#5     2  203  435   99     90  65.0


该错误容易导致缺少几周.在这种情况下,我们可以在更改星期之后合并,以确保无论丢失星期如何,它都是前一周.


That's error prone to missing weeks. In this case we can merge after changing the week, which ensures it is the prior week regardless of missing weeks.

df2 = df.assign(week=df.week+1).rename(columns={'value': 't-1'})
df = df.merge(df2, on=['week', 'id1', 'id2', 'id3'], how='left')


带来并重命名许多列的另一种方法是在合并中使用suffixes参数.这将重命名右侧DataFrame中的所有重叠列(不是键).


Another way to bring and rename many columns would be to use the suffixes argument in the merge. This will rename all overlapping columns (that are not keys) in the right DataFrame.

df.merge(df.assign(week=df.week+1),         # Manally lag
         on=['week', 'id1', 'id2', 'id3'], 
         how='left',
         suffixes=['', '_lagged']           # Right df columns -> _lagged
         )
#   week  id1  id2  id3  value  value_lagged
#0     1  101  123    1     45           NaN
#1     1  102  231    4     89           NaN
#2     1  203  435   99     65           NaN
#3     2  101  123    1     48          45.0
#4     2  102  231    4     75          89.0
#5     2  203  435   99     90          65.0