更新时间:2023-12-01 12:15:28
Using data.table
library(data.table) # v 1.9.5+
setDT(df)[,.(my=mean(tot)), by=.(indx=rleid(mmpd),mmpd)][,indx:=NULL][mmpd=='mm']
mmpd my
#1: mm 0.500000
#2: mm 1.250000
#3: mm 0.000000
#4: mm 1.666667
显然,有很多方法可以做到(参见r 沿向量搜索并计算平均值).data.table
方法在此处速度最快且经过调整.
Apparently, there are many ways to do it (see r search along a vector and calculate the mean). The data.table
method was fastest and adapted here.
注意:rleid
可以在 data.table
语法之外使用.这将更像传统"R
语法并产生相同的结果.
Note: rleid
can be use outside of the data.table
syntax. This will be more like "traditional" R
syntax and produce the same results.
subset(aggregate(tot ~ indx + mmpd,
data=cbind(df,indx=rleid(df$mmpd)),
FUN=mean),mmpd=="mm")
rleid(myrleid)不同生成方式的速度比较来自@JasonAizkalns 的回答).
Speed comparison of different ways to generate rleid (myrleid is from @JasonAizkalns answer).
> set.seed(1); x<-sample(1:2,100000,replace=T);
microbenchmark(rleid(x),
myrleid2=cumsum(c(1,diff(x)!=0)),
myrleid(x))
Unit: milliseconds
expr min lq mean median uq max neval cld
rleid(x) 1.422263 1.500873 1.586482 1.571315 1.662982 1.938254 100 a
myrleid2 3.860290 3.908308 4.369646 3.962497 4.177673 15.674611 100 b
myrleid(x) 7.282868 7.386515 7.753515 7.444008 7.654126 18.864898 100 c
对于非数字 x:
> set.seed(1); x<-sample(c('a','b'),100000,replace=T);
> microbenchmark(rleid(x),myrleid2=cumsum(c(1,diff(as.numeric(factor(x)))!=0)),myrleid(x))
Unit: milliseconds
expr min lq mean median uq max neval cld
rleid(x) 1.465466 1.571662 1.684568 1.606614 1.66080 2.900983 100 a
myrleid2 8.705447 9.276787 12.393393 9.907403 10.35032 61.080374 100 b
myrleid(x) 11.970271 13.176144 18.779256 13.790767 14.09626 69.845587 100 c