更新时间:2023-01-30 21:12:02
我们可以使用 data.table
中的融化
将其转换为'long'格式。此外,该函数可以采用多个模式$ c>
)
melt(setDT(df1),measure = patterns(Visit,Activity),
value.name = c(VisitDuration,Activity),na.rm = [,variable:= NULL] []
#UserID Channel VisitDuration活动
#1:aaa TW 30高
#2:bbb FB 45低
df1 ),TW_VisitDuration = c(30L,NA),TW_Activity = c ),
FB_VisitDuration = c(NA,45L),FB_Activity = c(NA,Low)),.Names = c(UserID,
Channel,TW_VisitDuration,TW_Activity ,FB_VisitDuration,
FB_Activity),class =data.frame,row.names = c(NA,-2L))
I have a data table in R where each row represent a visit of a user in a social media platform. For simplicity, an example of this data is as follows:
UserID Channel TW_VisitDuration TW_Activity FB_VisitDuration FB_Activity
aaa TW 30 High
bbb FB 45 Low
Each visit has a channel (e.g. FB/TW) and the other columns are filled according to this channel (only relevant columns are filled). I want to have a new table, where all the similar columns are reduced to column, and the value is taken from the relevant column. In this case, the new table will be like this:
UserID Channel VisitDuration Activity
aaa TW 30 High
bbb FB 45 Low
I wrote a for loop which does this evaluation row by row, but I am sure this is not "the R way to do this" (and the performance of the loop would probably be bad as my data will scale). This is the for loop I wrote:
for (i in 1:nrow(res.table)){
cur.channel = res.table[,Channel][i]
for (field in specific.fields){
print(field)
test.t[[field]][i] = res.table[[paste(cur.channel,field,sep='_')]][i]
}
}
How can I do it without the need to go row by row?
We can use melt
from data.table
to convert this to 'long' format. Also, the function can take multiple patterns
library(data.table)
melt(setDT(df1), measure = patterns("Visit", "Activity"),
value.name = c("VisitDuration", "Activity"), na.rm = TRUE)[, variable := NULL][]
# UserID Channel VisitDuration Activity
#1: aaa TW 30 High
#2: bbb FB 45 Low
df1 <- structure(list(UserID = c("aaa", "bbb"), Channel = c("TW", "FB"
), TW_VisitDuration = c(30L, NA), TW_Activity = c("High", NA),
FB_VisitDuration = c(NA, 45L), FB_Activity = c(NA, "Low")), .Names = c("UserID",
"Channel", "TW_VisitDuration", "TW_Activity", "FB_VisitDuration",
"FB_Activity"), class = "data.frame", row.names = c(NA, -2L))