且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

逻辑回归预测的置信区间

更新时间:2022-03-03 09:24:41

通常的方法是在线性预测变量的尺度上计算置信区间,在此情况下,事情将更加正常(高斯),然后应用链接功能,将线性预测变量量表的置信区间映射到响应量表.

The usual way is to compute a confidence interval on the scale of the linear predictor, where things will be more normal (Gaussian) and then apply the inverse of the link function to map the confidence interval from the linear predictor scale to the response scale.

要做到这一点,您需要做两件事;

To do this you need two things;

  1. type = "link"调用predict(),然后
  2. se.fit = TRUE调用predict().
  1. call predict() with type = "link", and
  2. call predict() with se.fit = TRUE.

第一个生成线性预测变量范围内的预测,第二个返回预测的标准误差.用伪代码

The first produces predictions on the scale of the linear predictor, the second returns the standard errors of the predictions. In pseudo code

## foo <- mtcars[,c("mpg","vs")]; names(foo) <- c("x","y") ## Working example data
mod <- glm(y ~ x, data = foo, family = binomial)
preddata <- with(foo, data.frame(x = seq(min(x), max(x), length = 100)))
preds <- predict(mod, newdata = preddata, type = "link", se.fit = TRUE)

然后

preds是包含fitse.fit组件的列表.

preds is then a list with components fit and se.fit.

那么线性预测变量的置信区间为

The confidence interval on the linear predictor is then

critval <- 1.96 ## approx 95% CI
upr <- preds$fit + (critval * preds$se.fit)
lwr <- preds$fit - (critval * preds$se.fit)
fit <- preds$fit

critval是根据需要从 t z (正态)分布中选择的(我现在确切地忘记了哪种类型的GLM以及具有哪些属性是)与所需的覆盖范围. 1.96是给出95%覆盖率的高斯分布的值:

critval is chosen from a t or z (normal) distribution as required (I forget exactly now which to use for which type of GLM and what the properties are) with the coverage required. The 1.96 is the value of the Gaussian distribution giving 95% coverage:

> qnorm(0.975) ## 0.975 as this is upper tail, 2.5% also in lower tail
[1] 1.959964

现在,对于fituprlwr,我们需要将链接函数的逆函数应用于它们.

Now for fit, upr and lwr we need to apply the inverse of the link function to them.

fit2 <- mod$family$linkinv(fit)
upr2 <- mod$family$linkinv(upr)
lwr2 <- mod$family$linkinv(lwr)

现在您可以绘制所有三个数据.

Now you can plot all three and the data.

preddata$lwr <- lwr2 
preddata$upr <- upr2 
ggplot(data=foo, mapping=aes(x=x,y=y)) + geom_point() +         
   stat_smooth(method="glm", method.args=list(family=binomial)) + 
   geom_line(data=preddata, mapping=aes(x=x, y=upr), col="red") + 
   geom_line(data=preddata, mapping=aes(x=x, y=lwr), col="red")