且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

在 Scikit-learn 中使用 Smote 和 Gridsearchcv

更新时间:2022-11-28 22:48:51

是的,可以做到,但是使用 imblearn 管道.

Yes, it can be done, but with imblearn Pipeline.

你看,imblearn 有自己的流水线来正确处理采样器.我在一个类似的问题中对此进行了描述.

You see, imblearn has its own Pipeline to handle the samplers correctly. I described this in a similar question here.

当在 imblearn.Pipeline 对象上调用 predict() 时,它将跳过采样方法并将数据保持原样传递给下一个转换器.您可以通过查看 源来确认这一点代码在这里:

When called predict() on a imblearn.Pipeline object, it will skip the sampling method and leave the data as it is to be passed to next transformer. You can confirm that by looking at the source code here:

        if hasattr(transform, "fit_sample"):
            pass
        else:
            Xt = transform.transform(Xt)

因此要使其正常工作,您需要以下内容:

So for this to work correctly, you need the following:

from imblearn.pipeline import Pipeline
model = Pipeline([
        ('sampling', SMOTE()),
        ('classification', LogisticRegression())
    ])

grid = GridSearchCV(model, params, ...)
grid.fit(X, y)

根据需要填写详细信息,管道将负责其余的工作.

Fill the details as necessary, and the pipeline will take care of the rest.