且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

用子字符串替换数据帧的rownames

更新时间:2022-10-19 23:24:29

正如@teucer指出的那样,你不能有重复的行名。相反,您在数据框架中创建一个新列,并使用简单的正则表达式来提取您的因素。例如,

  ##您的行名称
x = c(U2OS.EV.2.7.9, U2OS.PIM.2.7.9,U2OS.WDR.2.7.9,U2OS.MYC.2.7.9,
U2OS.OBX.2.7.9,U2OS.EV.18.6。 9,U2O2.PIM.18.6.9,U2OS.WDR.18.6.9,
U2OS.MYC.18.6.9,U2OS.OBX.18.6.9,X1。 U2OS ... OBX,X2.U2OS ... MYC)

test $ rnames = gsub(。*(MYC | EV | PIM | WDR | OBX) \\1,x)


I have a large dataframe (named test) with different rownames.

> rownames(test)
[1] "U2OS.EV.2.7.9"   "U2OS.PIM.2.7.9"  "U2OS.WDR.2.7.9"  "U2OS.MYC.2.7.9"
[5] "U2OS.OBX.2.7.9"  "U2OS.EV.18.6.9"  "U2O2.PIM.18.6.9" "U2OS.WDR.18.6.9"
[9] "U2OS.MYC.18.6.9" "U2OS.OBX.18.6.9" "X1.U2OS...OBX"   "X2.U2OS...MYC"
[13] "X3.U2OS...WDR82" "X4.U2OS...PIM"   "X5.U2OS...EV"    "exp1.U2OS.EV"
[17] "exp1.U2OS.MYC"   "EXP1.U20S..PIM1" "EXP1.U2OS.WDR82" "EXP1.U20S.OBX"
[21] "EXP2.U2OS.EV"    "EXP2.U2OS.MYC"   "EXP2.U2OS.PIM1"  "EXP2.U2OS.WDR82"
[25] "EXP2.U2OS.OBX"

As you could see, part of the row names have the same partial name. For example every row with partial name MYC I want to change the whole rowname into "MYC". Overall the row names contain 5 factors: MYC, EV, PIM, WDR and OBX.

As @teucer points out, you can't have duplicate row names. Instead, you create a new column in your data frame and use a simple regular expression to extract your factors. For example,

## Your row names
x = c("U2OS.EV.2.7.9", "U2OS.PIM.2.7.9", "U2OS.WDR.2.7.9", "U2OS.MYC.2.7.9",
      "U2OS.OBX.2.7.9", "U2OS.EV.18.6.9", "U2O2.PIM.18.6.9","U2OS.WDR.18.6.9",
      "U2OS.MYC.18.6.9","U2OS.OBX.18.6.9", "X1.U2OS...OBX","X2.U2OS...MYC")

test$rnames = gsub(".*(MYC|EV|PIM|WDR|OBX).*", "\\1", x)