且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

计算R中数据帧每一行中连续出现的特定值

更新时间:2023-08-28 16:37:10

您已经确定了最长运行可能发生的两种情况:(1)在中间某处或(2 )在每行的结尾和开头之间进行分割。因此,您要计算每个条件并取最大值,如下所示:

You've identified the two cases that the longest run can take: (1) somewhere int he middle or (2) split between the end and beginning of each row. Hence you want to calculate each condition and take the max like so:

df<-cbind(
Winter=c(0,0,3),
Spring=c(0,2,4),
Summer=c(0,2,7),
Autumn=c(3,0,4))

#>      Winter Spring Summer Autumn
#> [1,]      0      0      0      3
#> [2,]      0      2      2      0
#> [3,]      3      4      7      4


# calculate the number of consecutive zeros at the start and end
startZeros  <-  apply(df,1,function(x)which.min(x==0)-1)
#> [1] 3 1 0
endZeros  <-  apply(df,1,function(x)which.min(rev(x==0))-1)
#> [1] 0 1 0

# calculate the longest run of zeros
longestRun  <-  apply(df,1,function(x){
                y = rle(x);
                max(y$lengths[y$values==0],0)}))
#> [1] 3 1 0

# take the max of the two values
pmax(longestRun,startZeros +endZeros  )
#> [1] 3 2 0

当然,更简单的解决方案是:

Of course an even easier solution is:

longestRun  <-  apply(cbind(df,df),# tricky way to wrap the zeros from the start to the end
                      1,# the margin over which to apply the summary function
                      function(x){# the summary function
                          y = rle(x);
                          max(y$lengths[y$values==0],
                              0)#include zero incase there are no zeros in y$values
                      })

请注意,上述解决方案之所以有效,是因为我的 df 不包含 location 字段(列)。

Note that the above solution works because my df does not include the location field (column).