且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R使用if语句对多列进行突变

更新时间:2023-01-29 18:16:32

如果保留前面的列并进行适当的突变并不重要,则可以使用 mutate_at case_when .

If it isn't crucial that you keep the previous columns and instead mutate in place, you can use mutate_at and a case_when inside the function used to mutate.

case_when 使用 dplyr 中的 ween 函数设置条件,然后使用分配一个值>.最后一个参数 T〜NA_real _ NA 分配给任何不符合任何条件的观测值.

case_when is making use of the between function from dplyr to setup conditions, then assigns a value with ~. The last argument, T ~ NA_real_, assigns NA to any observations that didn't match any of the conditions.

library(tidyverse)

cols_to_mutate <- c("X01_01_p","X01_02_p", "X01_03_p", "X01_04", "X01_05","X01_06")

df %>%
  mutate_at(cols_to_mutate, function(x) {
    case_when(
      between(x, 1, 2) ~ 0,
      x == 3 ~ 0.5,
      between(x, 4, 5) ~ 1,
      T ~ NA_real_
    )
  })
#>   X01_01 X01_01_p X01_02 X01_02_p X01_03 X01_03_p X01_04 X01_05 X01_06
#> 1      3      0.0      3      0.0      1      0.5      1    1.0    0.5
#> 2      4      0.5      1      0.0      5      0.0      0    0.5    0.5
#> 3      2      0.0      3      0.0      2      0.0      1    0.0    0.0
#> 4      3      0.5      3      0.5      4      0.0      0    0.5    1.0

如果需要 保留原始列并为重新缩放的列赋予新名称,这是一些 rlang + purrr 的棘手问题.我所做的是在数据框的列上 imap .如果名称在要更改的列列表中,则使用与上述相同的 case_when ,并输出一个包含两列的 tibble :一是原始列,名称是使用 quo_name := 运算符分配的,另一个是新值列,其名称相同,但附加了 _n .如果不是要更改的列,则只返回原始列的 tibble .通过使用 imap_dfc ,所有列都被绑定回到一个数据帧中.

If it is necessary to keep the original columns and give new names to the rescaled columns, here is some rlang + purrr trickiness. What I did is imaped over the columns of the data frame. If the name was in the list of columns to mutate, I used the same case_when as above, and output a tibble with two columns: one is the original column, with its name assigned using quo_name and the := operator, and the other is the new values column, with the same name but _n appended. If it isn't a column to mutate, it just returns a tibble of the original column. By using imap_dfc, all the columns are bound back together into one data frame.

df %>%
  imap_dfc(function(x, name) {
    if (name %in% cols_to_mutate) {
      new_vals <- case_when(
        between(x, 1, 2) ~ 0,
        x == 3 ~ 0.5,
        between(x, 4, 5) ~ 1,
        T ~ NA_real_
      )
      tibble(!!quo_name(name) := x, !!quo_name(paste0(name, "_n")) := new_vals)
    } else {
      tibble(!!quo_name(name) := x)
    }
  })
#> # A tibble: 4 x 15
#>   X01_01 X01_01_p X01_01_p_n X01_02 X01_02_p X01_02_p_n X01_03 X01_03_p
#>    <int>    <int>      <dbl>  <int>    <int>      <dbl>  <int>    <int>
#> 1      3        2        0        3        1        0        1        3
#> 2      4        3        0.5      1        1        0        5        2
#> 3      2        1        0        3        1        0        2        2
#> 4      3        3        0.5      3        3        0.5      4        2
#> # ... with 7 more variables: X01_03_p_n <dbl>, X01_04 <int>,
#> #   X01_04_n <dbl>, X01_05 <int>, X01_05_n <dbl>, X01_06 <int>,
#> #   X01_06_n <dbl>