更新时间:2023-02-17 17:58:30
请检查是否达到目的?实际上,使用提供标准选择最大可能日期很困难(至少对我而言).我们可以通过以下策略识别连续和非连续组中的日期.但是考虑来自一组连续 3 个日期的两个场景.如果 random 样本包含 2 个单位,则这些单位也可以是连续的或非连续的.假设如果我们进一步选择奇数 (2) 或偶数 (1) 行,那么在我看来,样本将是判断性的而不是随机的.这是采用的策略-
Please check whether it serves the purpose? Actually, selecting maximum possible dates with the provide criteria is difficult (at least for me). We can identify dates in consecutive and non-consecutive groups by the following strategy. But consider two scenarios from a group of say 3 consecutive dates. If the random sample contains 2 units, these can be consecutive or non-consecutive as well. Suppose if we further select either odd (2) or even(1) rows then the sample would have been judgmental and not random in my opinion. This is the strategy adopted -
purrr::map_df
对每组分别进行操作,最后行绑定数据purrr::map_df
which finally row binds the datalibrary(tidyverse)
df %>%
ungroup() %>%
group_split(Site) %>%
map_df(., ~ .x %>% ungroup() %>%
arrange(Date) %>%
mutate(n = 1) %>%
complete(Date = seq.Date(first(Date), last(Date), by = 'days')) %>%
group_by(n = cumsum(is.na(n))) %>%
filter(!is.na(Site)) %>%
sample_n(1) %>%
ungroup() %>%
sample_n(min(n(), 3))) %>%
select(-n)
# A tibble: 86 x 2
Date Site
<date> <chr>
1 2020-03-04 HP36P1B
2 2020-03-04 HP36P3B
3 2020-03-04 HP36P4B
4 2020-03-07 HP37P1B
5 2020-03-12 HP37P1B
6 2020-03-07 HP37P2B
7 2020-03-12 HP37P2B
8 2020-03-07 HP37P4B
9 2020-03-12 HP37P4B
10 2020-03-04 HP4008R
# ... with 76 more rows
注意:您的 dput 已分组,因此我必须在代码的第二行添加 ungroup()
,您可以将其删除
Note: Your dput was grouped so I had to add ungroup()
in second line of the code, which you may remove