且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在Sklearn管道中执行Onehotencoding

更新时间:2022-02-12 08:35:43

OneHotEncoder不支持字符串功能,并且使用[(d, OneHotEncoder()) for d in dummies]会将其应用于所有假人列.使用LabelBinarizer代替:

OneHotEncoder doesn't support string features, and with [(d, OneHotEncoder()) for d in dummies] you are applying it to all dummies columns. Use LabelBinarizer instead:

mapper = DataFrameMapper(
    [(d, LabelBinarizer()) for d in dummies]
)

另一种选择是将LabelEncoder与第二个OneHotEncoder步骤一起使用.

An alternative would be to use the LabelEncoder with a second OneHotEncoder step.

mapper = DataFrameMapper(
    [(d, LabelEncoder()) for d in dummies]
)

lm = PMMLPipeline([("mapper", mapper),
                   ("onehot", OneHotEncoder()),
                   ("regressor", LinearRegression())])