更新时间:2023-02-26 19:23:55
因为你没有发布一组易于使用的示例数据,让我们先生成一些:
time_series = runif(72)
下一步是将数据集的结构从一维向量更改为二维矩阵,这样您就不必处理索引等:
time_matrix = matrix(time_series, 24, 3)
并使用 apply
来计算每小时的平均值(如果你喜欢 apply
,请查看 plyr
包以获得更多好的功能,请参阅 概述了 R 可以用时间序列做什么.
使用 ts
对象计算每小时的平均值(受此SO post):
#创建一个ts对象time_ts = ts(time_series, 频率 = 24)# 计算平均值>Tapply(time_ts,周期(time_ts),平均值)1 2 3 4 5 6 7 80.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761 0.20798829 10 11 12 13 14 15 160.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751 0.3715500 0.263738317 18 19 20 21 22 23 240.2730713 0.3170541 0.6053016 0.6550780 0.4031117 0.6857810 0.4492246 0.4795785>聚合(as.numeric(time_ts),列表(小时=周期(time_ts)),平均值)小时1 1 0.29542382 2 0.67913553 3 0.61136704 4 0.5775792....
this is a time series with hourly smart meter data and freq=24. It is measured over three days, so first day[1:24], second[25:48], third[49:72].
I want to have the mean for every hour over three days. For example:
(t[1]+t[25]+t[49])/3
so I can make a boxplot for 24 mean hours over 3 days.
x <- c(0.253, 0.132, 0.144, 0.272, 0.192, 0.132, 0.209, 0.255, 0.131,
0.136, 0.267, 0.166, 0.139, 0.238, 0.236, 1.75, 0.32, 0.687,
0.528, 1.198, 1.961, 1.171, 0.498, 1.28, 2.267, 2.605, 2.776,
4.359, 3.062, 2.264, 1.212, 1.809, 2.536, 2.48, 0.531, 0.515,
0.61, 0.867, 0.804, 2.282, 3.016, 0.998, 2.332, 0.612, 0.785,
1.292, 2.057, 0.396, 0.455, 0.283, 0.131, 0.147, 0.272, 0.198,
0.13, 0.19, 0.257, 0.149, 0.134, 0.251, 0.215, 0.133, 1.755,
1.855, 1.938, 1.471, 0.528, 0.842, 0.223, 0.256, 0.239, 0.113)
Because you did not post an easy to use set of example data, let's first generate some:
time_series = runif(72)
The next step would be to change the structure of the dataset from a 1d vector, to a 2d matrix, this saves you a lot of having to deal with indices and such:
time_matrix = matrix(time_series, 24, 3)
and use apply
to calculate the hourly means (if you like apply
, take a look at the plyr
package for more nice functions, see this link for more detail):
hourly_means = apply(time_matrix, 1, mean)
> hourly_means
[1] 0.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761
[8] 0.2079882 0.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751
[15] 0.3715500 0.2637383 0.2730713 0.3170541 0.6053016 0.6550780 0.4031117
[22] 0.6857810 0.4492246 0.4795785
However, if you use ggplot2
there is no need to precalculate the boxplots, ggplot2
does this for you:
require(ggplot2)
require(reshape2)
# Notice the use of melt to reshape the dataset a bit
# Also notice the factor to transform Var1 to a categorical dataset
ggplot(aes(x = factor(Var1), y = value),
data = melt(time_matrix)) +
geom_boxplot()
Which yields, what I think, you where after:
On the x-axis the hours of the day, on the y axis the value.
Note: the data you have is a timeseries. R has specific ways of dealing with timeseries, e.g. the ts
function. I normally use ordinary R data objects (array's, matrices), but you could take a look at the TimeSeries CRAN taskview for an overview of what R can do with timeseries.
To calculate the hourly means using a ts
object (inspired by this SO post):
# Create a ts object
time_ts = ts(time_series, frequency = 24)
# Calculate the mean
> tapply(time_ts, cycle(time_ts), mean)
1 2 3 4 5 6 7 8
0.2954238 0.6791355 0.6113670 0.5775792 0.3614329 0.4414882 0.6206761 0.2079882
9 10 11 12 13 14 15 16
0.6238492 0.4069143 0.6333607 0.5254185 0.6685191 0.3629751 0.3715500 0.2637383
17 18 19 20 21 22 23 24
0.2730713 0.3170541 0.6053016 0.6550780 0.4031117 0.6857810 0.4492246 0.4795785
> aggregate(as.numeric(time_ts), list(hour = cycle(time_ts)), mean)
hour x
1 1 0.2954238
2 2 0.6791355
3 3 0.6113670
4 4 0.5775792
....