且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在R中提取一部分文件名

更新时间:2023-01-04 22:13:56

此处的模式是日期,您不需要的可选E \ digit或Expt \ digit,您想要的单词以及可选的不需要的SDM后跟数据copy.txt" ...

The pattern here is a date, an optional E\digit or Expt\digit that you don't want, a word that you do want, then an optional SDM that you don't want followed by 'data copy.txt'...

这是我的测试数据:

> names
[1] "2012-05-31 CTN1 data copy.txt"          
[2] "2012-05-21 E7 PMA1 data copy.txt"       
[3] "2011-11-29 TDH3 SDM data copy.txt"      
[4] "2012-01-04 POX1 data copy.txt"          
[5] "2011-11-29 ECHO data copy.txt"          
[6] "2011-11-29 E8 ECHO data copy.txt"       
[7] "2011-11-29 ECHO SDM data copy.txt"      
[8] "2011-11-29 Expt2 ECHO SDM data copy.txt"

这是我的sub:

> sub(pattern="^....-..-.. (E\\d+ |Expt\\d+ )*(\\w+) (SDM )*data copy.txt","\\2",names)
[1] "CTN1" "PMA1" "TDH3" "POX1" "ECHO" "ECHO" "ECHO" "ECHO"

如果您的电子前缀超过一位,这也将起作用.我尝试从E开始向我的测试集中添加一些内容,以确保它们得到正确处理,以及电子前缀 SDM的情况.

If your E-prefixes have more than one digit this will also work. I've tried to add some things to my test set starting with E to make sure they get treated properly, as well as the case of an E-prefix and an SDM.