且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在R中对dplyr函数进行重采样和循环

更新时间:2023-11-03 08:54:28

我认为您不需要循环.您可以通过以下方式更快地执行此操作:一次对每个组采样 3 * 1000 个值,分配 sample_id 并将其添加到分组变量中,最后对 summaryize 进行分组获得所需的值.这样,您只调用一次所有函数.-

I don't think you need a loop. You can do this faster by sampling 3*1000 values per group at once, assign sample_id and add it to grouping variables, and finaly summarize to get desired values. This way you are calling all functions only once. -

dat %>% 
  group_by(fertilizer, crop, level) %>% 
  sample_n(3*1000, replace = T) %>% 
  mutate(sample_id = rep(1:1000, each = 3)) %>% 
  group_by(sample_id, add = TRUE) %>% 
  summarise(
    mean = mean(growth, na.rm = T),
    var = sd(growth)^2
  ) %>% 
  ungroup()

# A tibble: 8,000 x 6
   fertilizer crop  level sample_id  mean      var
   <chr>      <chr> <chr>     <int> <dbl>    <dbl>
 1 N          alone high          1 30.7  2640.   
 2 N          alone high          2  1       0    
 3 N          alone high          3 60.3  2640.   
 4 N          alone high          4  1.33    0.333
 5 N          alone high          5  1.33    0.333
 6 N          alone high          6 60.3  2640.   
 7 N          alone high          7  1.33    0.333
 8 N          alone high          8 30.3  2670.   
 9 N          alone high          9  1.33    0.333
10 N          alone high         10 60.7  2581.   
# ... with 7,990 more rows