且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

R插入符号朴素贝叶斯准确度为空

更新时间:2022-10-19 19:00:48

解决方法很简单:

set.seed(10)分裂 

目前所有的变量,除了 resposta,都是数字.但是,它们最多只有 12 个不同的值,这意味着它们实际上都应该是因子变量.此外,其中许多是高度不平衡的.然后,在拆分样本时,问题是将(实际上是因子)变量仅具有一个唯一值作为连续变量处理.

I have one dataset to train with SVM and Naïve Bayes. SVM works, but Naïve Bayes doesn't work. Follow de source code below:

library(tools)
library(caret)
library(doMC)
library(mlbench)
library(magrittr)
library(caret)

CORES <- 5 #Optional
registerDoMC(CORES) #Optional

load("chat/rdas/2gram-entidades-erro.Rda")

set.seed(10)
split=0.60

maFinal$resposta <- as.factor(maFinal$resposta)
data_train <- as.data.frame(unclass(maFinal[ trainIndex,]))
data_test <- maFinal[-trainIndex,]

treegram25NotNull <- train(x = subset(data_train, select = -c(resposta)),
      y = data_train$resposta, 
      method = "nb",
      trControl = trainControl(method = "cv", number = 5, savePred=T, sampling = "up"))

treegram25NotNull

The final accuracy is null

Warning messages: 1: In nominalTrainWorkflow(x = x, y = y, wts = weights, info = trainInfo, : There were missing values in resampled performance measures. 2: In train.default(subset(data_train, select = -c(resposta)), data_train$resposta, : missing values found in aggregated results

Any help would be greatly appreciated, thanks.

The fix is really simple:

set.seed(10)
split <- 0.60
maFinal[] <- lapply(maFinal, as.factor)

Currently all your variables, except for resposta, are numeric. However, they have only up to 12~ distinct values, meaning that they all actually should be factor variables. Also, many of them are highly unbalanced. Then, when splitting the sample, the issue arises from treating (actually factor) variables with only a single unique value as continuous variables.