从单列创建多列并清理结果

更新时间：2023-02-02 22:19:30

您可以使用 dplyr 对列进行后期处理:

You can post-process the columns with dplyr:

library(dplyr)
foo <- foo %>%
  separate(Point.Type, c("rpm_nom", "GVF_nom", "p0in_nom"), 
           sep="_", remove = FALSE, extra="drop", fill="right") %>%
  mutate_each(funs(as.numeric(gsub("[^0-9]","",.))), rpm_nom, GVF_nom, p0in_nom)

gsub("[^0-9]","",.)-part 删除所有非数字字符.如果你想防止小数点被删除，你可以使用[^0-9.]代替[^0-9](就像@PierreLafortune 在他的回答中使用的一样)，但请注意，这也将包括不是小数点的点.通过将其包装在 as.numeric 中，您可以将它们转换为数值，同时将空单元格转换为 NA.这给出了以下结果:

The gsub("[^0-9]","",.)-part removes all non-numeric characters. If you want to prevent the removal of decimal points, you can use [^0-9.] instead of [^0-9] (like @PierreLafortune used in his answer), but be aware that this will also include points that are not meant to be decimal points. By wrapping it in as.numeric, you convert them to numeric values while at the same time transforming the empty cells to NA. This gives the following result:

> foo
                            Point.Type rpm_nom GVF_nom p0in_nom Point.Value
1                           Zero Start      NA      NA       NA          NA
2                           Zero Start      NA      NA       NA          NA
3                           Zero Start      NA      NA       NA          NA
4 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww    3000      10       13  -1.2361145
5 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww    3000      10       13  -0.8727960
6 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww    3000      10       13   0.9685555
7                            Zero Stop      NA      NA       NA          NA
8                           Zero Start      NA      NA       NA          NA

或者使用 data.table(由@DavidArenburg 在评论中提供):

Or using data.table (as contributed by @DavidArenburg in the comments):

library(data.table)
setDT(foo)[, c("rpm_nom","GVF_nom","p0in_nom") := 
             lapply(tstrsplit(Point.Type, "_", fixed = TRUE)[1:3],
                    function(x) as.numeric(gsub("[^0-9]","",x)))
           ]

将给出类似的结果:

> foo
                             Point.Type Point.Value rpm_nom GVF_nom p0in_nom
1:                           Zero Start          NA      NA      NA       NA
2:                           Zero Start          NA      NA      NA       NA
3:                           Zero Start          NA      NA      NA       NA
4: 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww -0.09255445    3000      10       13
5: 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww  1.18581340    3000      10       13
6: 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww  2.14475950    3000      10       13
7:                            Zero Stop          NA      NA      NA       NA
8:                           Zero Start          NA      NA      NA       NA

这样做的好处是foo是通过引用更新的.由于它速度更快且内存效率更高，因此这对于使用大型数据集特别有价值.

The advantage of this is that foo is updated by reference. As this is faster and more memory efficient, this is especially valuable for using with large datasets.

上一篇 : ：Python(Pandas):基于两列删除重复项，并在另一列中保留具有最大值的行下一篇 : 创建闪存和数据库

从单列创建多列并清理结果

相关阅读

技术问答最新文章