且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

时间序列中的每小时平均值

更新时间:2023-02-26 19:23:55

因为你没有发布一组易于使用的示例数据,让我们先生成一些:

time_series = runif(72)

下一步是将数据集的结构从一维向量更改为二维矩阵,这样您就不必处理索引等:

time_matrix = matrix(time_series, 24, 3)

并使用 apply 来计算每小时的平均值(如果你喜欢 apply,请查看 plyr 包以获得更多好的功能,请参阅 概述了 R 可以用时间序列做什么.

使用 ts 对象计算每小时的平均值(受此SO post):

#创建一个ts对象time_ts = ts(time_series, 频率 = 24)# 计算平均值>Tapply(time_ts,周期(time_ts),平均值)1 2 3 4 5 6 7 80.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761 0.20798829 10 11 12 13 14 15 160.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751 0.3715500 0.263738317 18 19 20 21 22 23 240.2730713 0.3170541 0.6053016 0.6550780 0.4031117 0.6857810 0.4492246 0.4795785>聚合(as.numeric(time_ts),列表(小时=周期(time_ts)),平均值)小时1 1 0.29542382 2 0.67913553 3 0.61136704 4 0.5775792....

this is a time series with hourly smart meter data and freq=24. It is measured over three days, so first day[1:24], second[25:48], third[49:72].

I want to have the mean for every hour over three days. For example:

(t[1]+t[25]+t[49])/3

so I can make a boxplot for 24 mean hours over 3 days.

x <- c(0.253, 0.132, 0.144, 0.272, 0.192, 0.132, 0.209, 0.255, 0.131, 
  0.136, 0.267, 0.166, 0.139, 0.238, 0.236, 1.75, 0.32, 0.687, 
  0.528, 1.198, 1.961, 1.171, 0.498, 1.28, 2.267, 2.605, 2.776, 
  4.359, 3.062, 2.264, 1.212, 1.809, 2.536, 2.48, 0.531, 0.515, 
  0.61, 0.867, 0.804, 2.282, 3.016, 0.998, 2.332, 0.612, 0.785, 
  1.292, 2.057, 0.396, 0.455, 0.283, 0.131, 0.147, 0.272, 0.198, 
  0.13, 0.19, 0.257, 0.149, 0.134, 0.251, 0.215, 0.133, 1.755, 
  1.855, 1.938, 1.471, 0.528, 0.842, 0.223, 0.256, 0.239, 0.113)

Because you did not post an easy to use set of example data, let's first generate some:

time_series = runif(72)

The next step would be to change the structure of the dataset from a 1d vector, to a 2d matrix, this saves you a lot of having to deal with indices and such:

time_matrix = matrix(time_series, 24, 3)

and use apply to calculate the hourly means (if you like apply, take a look at the plyr package for more nice functions, see this link for more detail):

hourly_means = apply(time_matrix, 1, mean)
> hourly_means
 [1] 0.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761
 [8] 0.2079882 0.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751
[15] 0.3715500 0.2637383 0.2730713 0.3170541 0.6053016 0.6550780 0.4031117
[22] 0.6857810 0.4492246 0.4795785

However, if you use ggplot2 there is no need to precalculate the boxplots, ggplot2 does this for you:

require(ggplot2)
require(reshape2)
# Notice the use of melt to reshape the dataset a bit
# Also notice the factor to transform Var1 to a categorical dataset
ggplot(aes(x = factor(Var1), y = value), 
       data = melt(time_matrix)) + 
       geom_boxplot()

Which yields, what I think, you where after:

On the x-axis the hours of the day, on the y axis the value.


Note: the data you have is a timeseries. R has specific ways of dealing with timeseries, e.g. the ts function. I normally use ordinary R data objects (array's, matrices), but you could take a look at the TimeSeries CRAN taskview for an overview of what R can do with timeseries.

To calculate the hourly means using a ts object (inspired by this SO post):

# Create a ts object
time_ts = ts(time_series, frequency = 24)
# Calculate the mean
> tapply(time_ts, cycle(time_ts), mean)
        1         2         3         4         5         6         7         8 
0.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761 0.2079882 
        9        10        11        12        13        14        15        16 
0.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751 0.3715500 0.2637383 
       17        18        19        20        21        22        23        24 
0.2730713 0.3170541 0.6053016 0.6550780 0.4031117 0.6857810 0.4492246 0.4795785 
> aggregate(as.numeric(time_ts), list(hour = cycle(time_ts)), mean)
   hour         x
1     1 0.2954238
2     2 0.6791355
3     3 0.6113670
4     4 0.5775792
....