更新时间:2023-12-02 17:59:04
去另一个极端并逐个实例地训练并不是真正必要的(更不用说有效了).您正在寻找的内容实际上称为 incremental 或 online 学习,它可在scikit-learn的 SGDClassifier
(确实包含 partial_fit
方法.
It is not really necessary (let alone efficient) to go to the other extreme and train instance by instance; what you are looking for is actually called incremental or online learning, and it is available in scikit-learn's SGDClassifier
for linear SVM and logistic regression, which indeed contains a partial_fit
method.
这是一个有关虚拟数据的简单示例:
Here is a quick example with dummy data:
import numpy as np
from sklearn import linear_model
X = np.array([[-1, -1], [-2, -1], [1, 1], [2, 1]])
Y = np.array([1, 1, 2, 2])
clf = linear_model.SGDClassifier(max_iter=1000, tol=1e-3)
clf.partial_fit(X, Y, classes=np.unique(Y))
X_new = np.array([[-1, -1], [2, 0], [0, 1], [1, 1]])
Y_new = np.array([1, 1, 2, 1])
clf.partial_fit(X_new, Y_new)
loss
和penalty
参数的默认值(分别为'hinge'
和'l2'
)是
The default values for the loss
and penalty
arguments ('hinge'
and 'l2'
respectively) are these of a LinearSVC
, so the above code essentially fits incrementally a linear SVM classifier with L2 regularization; these settings can of course be changed - check the docs for more details.
有必要在第一个调用中包含classes
参数,该参数应包含问题中的所有现有类(即使其中一些可能不存在于某些局部拟合中);可以在随后的partial_fit
调用中将其省略-再次,请参阅链接的文档以获取更多详细信息.
It is necessary to include the classes
argument in the first call, which should contain all the existing classes in your problem (even though some of them might not be present in some of the partial fits); it can be omitted in subsequent calls of partial_fit
- again, see the linked documentation for more details.