且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R如何根据另一个变量的范围获得一个变量的平均值?

更新时间:2023-01-24 17:09:15

使用 cut 然后在包 plyr 中使用 ddply

 > df $ xrange<  -  cut(df $ x,breaks = seq(0,100,10))

库(plyr)
ddply(df,。(xrange) mean_y = mean(y))
xrange mean_y
1(0,10)490.7571
2(10,20)462.6347
3(20,30)507.5614
4(30,40)482.6004
5(40,50] 510.3081
6(50,60)480.7927
7(60,70)507.8944
8(70,80] 458.4668
9(80,90] 501.9672
10(90,100)493.4844


If I have a series of observations with two variables X and Y, how can I get the average value of Y based on ranges of variable X?

So for example, with some data like:

df = data.frame(x=runif(50,1,100),y=runif(50,300,700))

How could I get the answer to "When X is 1-10 the average of y 332.4, when X is 11-20 the average of y is 632.3, etc...."

Cut your x using cut and then use ddply in package plyr:

> df$xrange <- cut(df$x, breaks=seq(0, 100, 10))

library(plyr)
ddply(df, .(xrange), summarize, mean_y=mean(y))
     xrange   mean_y
1    (0,10] 490.7571
2   (10,20] 462.6347
3   (20,30] 507.5614
4   (30,40] 482.6004
5   (40,50] 510.3081
6   (50,60] 480.7927
7   (60,70] 507.8944
8   (70,80] 458.4668
9   (80,90] 501.9672
10 (90,100] 493.4844