且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

k个最近邻,对准确性得分和混淆矩阵进行交叉验证

更新时间:2023-12-03 23:45:10

我认为您的模型没有得到正确的训练,因为它只能猜测一个值而不能正确地进行训练.我可以建议切换到KFold或StratifiedKFold. LOO的缺点是,对于大样本而言,它非常费时.这是当我在您的X数据上实现3个拆分的StratifiedKFold时发生的情况.我已经用0和1随机填充y,而不是使用A和B,并且还没有转置数据,所以它有12行:

I think your model does not get trained properly and because it only has to guess one value it doesn't get it right. May I suggest switching to KFold or StratifiedKFold. LOO has the disadvantage that for large samples it becomes extemely time consuming. Here is what happened when I implemented StratifiedKFold with 3 splits on your X data. I have randomly filled y with 0 and 1, instead of using A and B and have not trasposed the data so it has 12 rows:

from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import StratifiedKFold
import pandas as pd

csv = 'C:\df_low_X.csv'
df = pd.read_csv(csv, header=None)
print(df)

X = df.iloc[:, :-1].values
y = df.iloc[:, -1].values

clf = KNeighborsClassifier()
kf = StratifiedKFold(n_splits = 3)

ac = []
cm = []

for train_index, test_index in kf.split(X,y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    print(X_train, X_test)
    clf.fit(X_train, y_train)
    y_pred = clf.predict(X_test)
    ac.append(accuracy_score(y_test, y_pred))
    cm.append(confusion_matrix(y_test, y_pred))
print(ac)
print(cm)

# ac
[0.25, 0.75, 0.5]

# cm
[array([[1, 1],
       [2, 0]], dtype=int64), 

array([[1, 1],
       [0, 2]], dtype=int64),

 array([[0, 2],
       [0, 2]], dtype=int64)]