且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在R中按组连续记录

更新时间:2023-12-01 12:02:34

这是另一种方法是相当凌乱,但我认为你想要的:

Here is another approach that is quite messy but I think does what you want:

library(dplyr)
yesterday <- as.Date(Sys.Date()-1)
df2 <- employ.data %>% group_by(Child) %>%
  mutate(`Consec. Days with Callout`=cumsum(rev(cumprod(rev((yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday]))))) %>%
  filter(DATE == yesterday)
##Source: local data frame [3 x 7]
##Groups: Child [3]
##
##        DATE Parent  Child avg_child_salary salary Callout Consec. Days with Callout
##      <date> <fctr> <fctr>            <dbl>  <dbl>  <fctr>                     <dbl>
##1 2016-12-04      A     ab              500   2000    HIGH                         2
##2 2016-12-04      A     ac              300   1234    HIGH                         2
##3 2016-12-04      B     bd             9000    600     LOW                         1

注意:


  1. (yesterday-DATE)==(which(DATE == yesterday)-row_number())& Callout == Callout [DATE == yesterday] 计算一个条件,如果$ code> Callout TRUE / code>与昨天标注相同,如果行中的行距离是昨天与日期的距离相同。这给出了 Cond 列,如下所示:

  1. (yesterday-DATE)==(which(DATE == yesterday)-row_number()) & Callout==Callout[DATE == yesterday] computes a condition that will be TRUE for the row if the Callout is the same as the Callout for yesterday and if the distance in rows from the row that is yesterday is the same as the distance in days for the date. This gives the Cond column as shown below:

Source: local data frame [8 x 7]
Groups: Child [3]

        DATE Parent  Child avg_child_salary salary Callout  Cond
      <date> <fctr> <fctr>            <dbl>  <dbl>  <fctr> <lgl>
1 2016-12-01      A     ab              500   1000    HIGH  TRUE
2 2016-12-02      A     ab              500    100     LOW FALSE
3 2016-12-03      A     ab              500   4000    HIGH  TRUE
4 2016-12-04      A     ab              500   2000    HIGH  TRUE
5 2016-12-01      A     ac              300   1000    HIGH FALSE
6 2016-12-03      A     ac              300   3455    HIGH  TRUE
7 2016-12-04      A     ac              300   1234    HIGH  TRUE
8 2016-12-04      B     bd             9000    600     LOW  TRUE


  • 鉴于此,我们希望从昨天行的连续 TRUE 的数量倒数c $ c>(按 Child 分组)。为了做到这一点,我们可以使用 rev 来逆转向量,执行一个 cumprod ,它将从 1 0 一旦遇到 FALSE ,反向向量再次使用 rev ,最后执行 cumsum 来累积连续的日子。这样做会给出以下的 Consec。标注日期列解释为以前连续天数与标注之间的日期昨天

  • Given this we want to count backwards the number of consecutive TRUE from the row that is yesterday (grouped by Child). To do this, we can reverse the vector using rev, do a cumprod, which will switch from 1 to 0 as soon as it encounters a FALSE, reverse the vector back again using rev, and finally do the cumsum to accumulate the consecutive days. Doing this gives the following where the Consec. Days with Callout column is interpreted as the number of previous consecutive days with the same Callout as yesterday:

    Source: local data frame [8 x 7]
    Groups: Child [3]
    
            DATE Parent  Child avg_child_salary salary Callout Consec. Days with Callout
          <date> <fctr> <fctr>            <dbl>  <dbl>  <fctr>                     <dbl>
    1 2016-12-01      A     ab              500   1000    HIGH                         0
    2 2016-12-02      A     ab              500    100     LOW                         0
    3 2016-12-03      A     ab              500   4000    HIGH                         1
    4 2016-12-04      A     ab              500   2000    HIGH                         2
    5 2016-12-01      A     ac              300   1000    HIGH                         0
    6 2016-12-03      A     ac              300   3455    HIGH                         1
    7 2016-12-04      A     ac              300   1234    HIGH                         2
    8 2016-12-04      B     bd             9000    600     LOW                         1
    


  • 最后,执行过滤器,就像生成最终结果一样。

  • Finally, do the filter as you did to generate the final result.