更新时间:2023-02-02 22:19:30
您可以使用 dplyr 对列进行后期处理:
You can post-process the columns with dplyr:
library(dplyr)
foo <- foo %>%
separate(Point.Type, c("rpm_nom", "GVF_nom", "p0in_nom"),
sep="_", remove = FALSE, extra="drop", fill="right") %>%
mutate_each(funs(as.numeric(gsub("[^0-9]","",.))), rpm_nom, GVF_nom, p0in_nom)
gsub("[^0-9]","",.)
-part 删除所有非数字字符.如果你想防止小数点被删除,你可以使用[^0-9.]
代替[^0-9]
(就像@PierreLafortune 在他的回答中使用的一样),但请注意,这也将包括不是小数点的点.通过将其包装在 as.numeric
中,您可以将它们转换为数值,同时将空单元格转换为 NA
.这给出了以下结果:
The gsub("[^0-9]","",.)
-part removes all non-numeric characters. If you want to prevent the removal of decimal points, you can use [^0-9.]
instead of [^0-9]
(like @PierreLafortune used in his answer), but be aware that this will also include points that are not meant to be decimal points. By wrapping it in as.numeric
, you convert them to numeric values while at the same time transforming the empty cells to NA
. This gives the following result:
> foo
Point.Type rpm_nom GVF_nom p0in_nom Point.Value
1 Zero Start NA NA NA NA
2 Zero Start NA NA NA NA
3 Zero Start NA NA NA NA
4 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww 3000 10 13 -1.2361145
5 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww 3000 10 13 -0.8727960
6 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww 3000 10 13 0.9685555
7 Zero Stop NA NA NA NA
8 Zero Start NA NA NA NA
或者使用 data.table(由@DavidArenburg 在评论中提供):
Or using data.table (as contributed by @DavidArenburg in the comments):
library(data.table)
setDT(foo)[, c("rpm_nom","GVF_nom","p0in_nom") :=
lapply(tstrsplit(Point.Type, "_", fixed = TRUE)[1:3],
function(x) as.numeric(gsub("[^0-9]","",x)))
]
将给出类似的结果:
> foo
Point.Type Point.Value rpm_nom GVF_nom p0in_nom
1: Zero Start NA NA NA NA
2: Zero Start NA NA NA NA
3: Zero Start NA NA NA NA
4: 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww -0.09255445 3000 10 13
5: 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww 1.18581340 3000 10 13
6: 3000rpm_10%_13barG_Sdsdsa_1.0_ss_Pww 2.14475950 3000 10 13
7: Zero Stop NA NA NA NA
8: Zero Start NA NA NA NA
这样做的好处是foo
是通过引用更新的.由于它速度更快且内存效率更高,因此这对于使用大型数据集特别有价值.
The advantage of this is that foo
is updated by reference. As this is faster and more memory efficient, this is especially valuable for using with large datasets.