且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R使用列索引号预测数据框中每一列的glm拟合

更新时间:2022-12-12 08:34:57

对于重复建模,我使用类似的方法,如下所示.我已经用data.table实现了它,但是可以重写它以使用基本的data.frame(我想代码会更加冗长).在这种方法中,我将所有模型存储在一个单独的对象中(下面,我提供了两个版本的代码,一个是解释性的部分,另一个是针对干净输出的高级版本的).

For repeated modelling I use a similar approach as shown below. I have implemented it with data.table, but it could be rewritten to use the base data.frame (the code would then be more verbose, I guess). In this approach I store all the models in a separate object (below I have provided two versions of the code, one more explanatory part, and one more advanced aiming at a clean output).

当然,您还可以编写一个循环/函数,该循环/函数每次迭代仅适合一个模型,而无需存储它们.从我的角度来看,保存模型是一个好主意,因为您可能必须研究模型的鲁棒性等,而不仅要预测新值.

Of course, you could also write a loop/function that only fits one model per iteration without storing them. From my perspective, its a good idea to save the models, since you probably will have to investigate the models for robustness, etc. and not only predict new values.

提示:也请查看@AndS的答案.提供整洁的方法.与这个答案一起,我认为,这对于学习/理解数据无疑是一个很好的并排比较.表格和整洁的方法

HINT: Please also have a look at the answer of @AndS. providing a tidyverse approach. Together with this answer, I think, this is certainly a nice side by side comparison for learning/understanding data.table and tidyverse approaches

# i have used some more simple data to show that the output is correct, see the plots
df <- data.frame(x1 = seq(1, 100, 10),
                 x2 = (1:10)^2,
                 y =  seq(1, 20, 2))
library(data.table)
setDT(df)
# prepare the data by melting it
DT = melt(df, measure.vars = paste0("x", 1:2), value.name = "x")
# also i used a more simple model (in this case lm would also do)
# create model for each variable (formerly columns)
models = setnames(DT[, data.table(list(glm(y ~ x))), by = "variable"], "V1", "model")
# create a new set of data to be predicted
# NOTE: this could, of course, also be added to the models data.table
# as new column via `:=list(...)`
new_pts = setnames(DT[, seq(min(x, na.rm = TRUE), max(x, na.rm = TRUE), len = 200), by = variable], "V1", "x")
# add the predicted values
new_pts[, predicted:= predict(models[variable == unlist(.BY), model][[1]], newdata = as.data.frame(x),  type = "response")
        , by = variable]
# plot and check if it makes sense
plot(df$x1, df$y)
lines(new_pts[variable == "x1", .(x, predicted)])
points(df$x2, df$y)
lines(new_pts[variable == "x2", .(x, predicted)])

# also the following version of above code is possible
# that generates only one new objects in the environment
# but maybe looks more complicated at first sight
# not sure if this is the best way to do it
# data.table experts might provide some shortcuts
setDT(df)
DT = melt(df, measure.vars = paste0("x", 1:2), value.name = "x")
DT = data.table(variable = unique(DT$variable), dat = split(DT, DT$variable))
DT[, models:= list(list(glm(y ~ x, data = dat[[1]]))), by = variable]
DT[, new_pts:= list(list(data.frame(x = dat[[1]][
                                                 ,seq(min(x, na.rm = TRUE)
                                                 , max(x, na.rm = TRUE), len = 200)]
                                    )))
       , by = variable]
models[, predicted:= list(list(data.frame(pred = predict(model[[1]]
                                          , newdata = new_pts[[1]]
                                          ,  type = "response")))),
       by = variable]
plot(df$x1, df$y)
lines(models[variable == "x1", .(unlist(new_pts), unlist(predicted))])
points(df$x2, df$y)
lines(models[variable == "x2", .(unlist(new_pts), unlist(predicted))])