R数据表 - 根据另一列的值从替代列中提取值

更新时间：2023-01-30 21:12:02

我们可以使用 data.table 中的融化将其转换为'long'格式。此外，该函数可以采用多个模式$ c>

  ）
 melt（setDT（df1），measure = patterns（Visit，Activity），
 value.name = c（VisitDuration，Activity），na.rm = [，variable：= NULL] [] 
＃UserID Channel VisitDuration活动
＃1：aaa TW 30高
＃2：bbb FB 45低

data

  df1 ），TW_VisitDuration = c（30L，NA），TW_Activity = c ），
 FB_VisitDuration = c（NA，45L），FB_Activity = c（NA，Low）），.Names = c（UserID，
Channel，TW_VisitDuration，TW_Activity ，FB_VisitDuration，
FB_Activity），class =data.frame，row.names = c（NA，-2L））

I have a data table in R where each row represent a visit of a user in a social media platform. For simplicity, an example of this data is as follows:

UserID   Channel   TW_VisitDuration  TW_Activity  FB_VisitDuration FB_Activity
aaa        TW           30               High         
bbb        FB                                         45             Low

Each visit has a channel (e.g. FB/TW) and the other columns are filled according to this channel (only relevant columns are filled). I want to have a new table, where all the similar columns are reduced to column, and the value is taken from the relevant column. In this case, the new table will be like this:

UserID   Channel   VisitDuration  Activity  
aaa        TW           30          High         
bbb        FB           45          Low

I wrote a for loop which does this evaluation row by row, but I am sure this is not "the R way to do this" (and the performance of the loop would probably be bad as my data will scale). This is the for loop I wrote:

for (i in 1:nrow(res.table)){
   cur.channel = res.table[,Channel][i]
   for (field in specific.fields){
     print(field)
     test.t[[field]][i] = res.table[[paste(cur.channel,field,sep='_')]][i]
   }
}

How can I do it without the need to go row by row?

We can use melt from data.table to convert this to 'long' format. Also, the function can take multiple patterns

library(data.table)
melt(setDT(df1), measure = patterns("Visit", "Activity"), 
       value.name = c("VisitDuration", "Activity"), na.rm = TRUE)[, variable := NULL][]
#   UserID Channel VisitDuration Activity
#1:    aaa      TW            30     High
#2:    bbb      FB            45      Low

data

df1 <- structure(list(UserID = c("aaa", "bbb"), Channel = c("TW", "FB"
), TW_VisitDuration = c(30L, NA), TW_Activity = c("High", NA), 
FB_VisitDuration = c(NA, 45L), FB_Activity = c(NA, "Low")), .Names = c("UserID", 
 "Channel", "TW_VisitDuration", "TW_Activity", "FB_VisitDuration", 
"FB_Activity"), class = "data.frame", row.names = c(NA, -2L))

上一篇 : ：Spark:从逻辑计划中提取数据框下一篇 : mysql两汉字之间的汉明距离

R数据表 - 根据另一列的值从替代列中提取值

data

data

相关阅读

技术问答最新文章