且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R数据表 - 根据另一列的值从替代列中提取值

更新时间:2023-01-30 21:12:02

我们可以使用 data.table 中的融化将其转换为'long'格式。此外,该函数可以采用多个模式$ c>

 
melt(setDT(df1),measure = patterns(Visit,Activity),
value.name = c(VisitDuration,Activity),na.rm = [,variable:= NULL] []
#UserID Channel VisitDuration活动
#1:aaa TW 30高
#2:bbb FB 45低



data



  df1 ),TW_VisitDuration = c(30L,NA),TW_Activity = c ),
FB_VisitDuration = c(NA,45L),FB_Activity = c(NA,Low)),.Names = c(UserID,
Channel,TW_VisitDuration,TW_Activity ,FB_VisitDuration,
FB_Activity),class =data.frame,row.names = c(NA,-2L))


I have a data table in R where each row represent a visit of a user in a social media platform. For simplicity, an example of this data is as follows:

UserID   Channel   TW_VisitDuration  TW_Activity  FB_VisitDuration FB_Activity
aaa        TW           30               High         
bbb        FB                                         45             Low

Each visit has a channel (e.g. FB/TW) and the other columns are filled according to this channel (only relevant columns are filled). I want to have a new table, where all the similar columns are reduced to column, and the value is taken from the relevant column. In this case, the new table will be like this:

UserID   Channel   VisitDuration  Activity  
aaa        TW           30          High         
bbb        FB           45          Low

I wrote a for loop which does this evaluation row by row, but I am sure this is not "the R way to do this" (and the performance of the loop would probably be bad as my data will scale). This is the for loop I wrote:

for (i in 1:nrow(res.table)){
   cur.channel = res.table[,Channel][i]
   for (field in specific.fields){
     print(field)
     test.t[[field]][i] = res.table[[paste(cur.channel,field,sep='_')]][i]
   }
}

How can I do it without the need to go row by row?

We can use melt from data.table to convert this to 'long' format. Also, the function can take multiple patterns

library(data.table)
melt(setDT(df1), measure = patterns("Visit", "Activity"), 
       value.name = c("VisitDuration", "Activity"), na.rm = TRUE)[, variable := NULL][]
#   UserID Channel VisitDuration Activity
#1:    aaa      TW            30     High
#2:    bbb      FB            45      Low

data

df1 <- structure(list(UserID = c("aaa", "bbb"), Channel = c("TW", "FB"
), TW_VisitDuration = c(30L, NA), TW_Activity = c("High", NA), 
FB_VisitDuration = c(NA, 45L), FB_Activity = c(NA, "Low")), .Names = c("UserID", 
 "Channel", "TW_VisitDuration", "TW_Activity", "FB_VisitDuration", 
"FB_Activity"), class = "data.frame", row.names = c(NA, -2L))