且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R-在data.table中查找每个组的第一个非零元素

更新时间:2023-01-07 11:28:06

已更新

我已修改我的代码以得到所需的结果,但这不是您首选的 data.table 解决方案.我们没有得到的意思是分组变量的组合并不总是唯一的,因此这基本上不是逐行操作,需要在考虑每个组中最早的日期时进行分组.我输出的唯一问题是 Maturing Soil 变量的级别顺序,它们与输出中的obs顺序不同.它可以解决.

I have modified my code to have your desired result, however this is not your preferred data.table solution. The point we didn't get is that the combinations of grouping variables are not always unique so this was not basically a row-wise operation and needed grouping while considering the earliest date in each group. The only problem with my output is the orders of levels for Maturing and Soil variables which differ from the order obs in your output. It can be fixed.

library(dplyr)
library(tidyr)
library(purrr)

dat %>%
  mutate(Earliest = pmap(dat %>% 
                           select(`1`:`36`), ~ names(c(...))[c(...) != 0][1])) %>%
  select(-c(`1`:`36`)) %>%
  unnest(cols = c(Earliest)) %>% 
  group_by(State, Maturing, Soil) %>% 
  mutate(Earliest = as.numeric(Earliest)) %>%
  summarise(across(Earliest, ~ min(.x))) %>%
  ungroup()


# A tibble: 18 x 4
   State Maturing Soil  Earliest
   <chr> <chr>    <chr>    <dbl>
 1 PR    Early    CLAY        26
 2 PR    Early    SANDY       30
 3 PR    Early    SILT        26
 4 PR    Late     CLAY        26
 5 PR    Late     SANDY       26
 6 PR    Late     SILT        26
 7 PR    Medium   CLAY        26
 8 PR    Medium   SANDY       27
 9 PR    Medium   SILT        26
10 RS    Early    CLAY        27
11 RS    Early    SANDY       28
12 RS    Early    SILT        27
13 RS    Late     CLAY        27
14 RS    Late     SANDY       27
15 RS    Late     SILT        27
16 RS    Medium   CLAY        27
17 RS    Medium   SANDY       27
18 RS    Medium   SILT        27