更新时间:2023-01-30 19:27:56
您可以尝试strsplit
res <- do.call(rbind.data.frame,lapply(strsplit(annot$GOs, ";"),
function(x) tapply(x, sub(':.*', '', x), FUN=paste, collapse=";")))
res1 <- data.frame(Name=annot[,1], setNames(res, c('Component',
'Function', 'P')), stringsAsFactors=FALSE)
res1
# Name Component
#1 dd_1 C:extracellular space;C:cell body
#2 dd_2 C:Signal transduction;C:nucleus
#3 dd_3 C:cardiomyceltes;C:intracellular pace
# Function
#1 F:transport carrier
#2 F:positive regulation
#3 F:putative;F:magnesium ion binding;F:calcium ion binding
# P
#1 P:cell migration process;P:NF/ß pathway
#2 P:single organism;P:positive regulation
#3 P:visual perception;P:blood coagulation
或者您可以从tidyr
library(tidyr)
extract(annot, GOs, c('C', 'F', 'P'), '(C:[^F]+);(F:[^P]+);(P:.*)')
# Name C
#1 dd_1 C:extracellular space;C:cell body
#2 dd_2 C:Signal transduction;C:nucleus
#3 dd_3 C:cardiomyceltes;C:intracellular pace
# F
#1 F:transport carrier
#2 F:positive regulation
#3 F:putative;F:magnesium ion binding;F:calcium ion binding
# P
#1 P:cell migration process;P:NF/ß pathway
#2 P:single organism;P:positive regulation
#3 P:visual perception;P:blood coagulation
新数据集的每一行都缺少一些元素(即"C","F"等).您可以修改第一个解决方案
The new dataset have some elements absent (i.e. "C" , "F" etc) for each row. You could modify the first solution
res <- do.call(rbind.data.frame,lapply(strsplit(annot$GOs, "; "),function(x){
x1 <- tapply(x, sub(':.*', '', x), FUN=paste, collapse=";")
x1[match(c('C', 'F', 'P'), names(x1))]}))
res1 <- data.frame(Name=annot[,1], setNames(res, c('Component',
'Function', 'P')), stringsAsFactors=FALSE)
head(res1,2)
# Name Component Function
#1 dd_1 C:extracellular space;C:cell body <NA>
#2 dd_2 C:Signal transduction;C:nucleus F:positive regulation
# P
#1 P:cell migration process;P:NF/ß pathway
#2 P:single organism;P:positive(+) regulation