且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

为什么在SVM中缩放功能?

更新时间:2023-02-18 12:08:45

在SVM中扩展功能背后的真正原因是,该分类器并非仿射变换不变.换句话说,如果将一个功能乘以1000,则SVM提供的解决方案将完全不同.它与底层的优化技术几乎没有关系(尽管它们受这些规模问题的影响,但仍应收敛到全局最优).

The true reason behind scaling features in SVM is the fact, that this classifier is not affine transformation invariant. In other words, if you multiply one feature by a 1000 than a solution given by SVM will be completely different. It has nearly nothing to do with the underlying optimization techniques (although they are affected by these scales problems, they should still converge to global optimum).

请考虑一个示例:您有一个男人和一个女人,分别由其性别和身高编码(两个特征).让我们假设使用这种数据的情况非常简单:

Consider an example: you have man and a woman, encoded by their sex and height (two features). Let us assume a very simple case with such data:

0->男人 1->女人

0 -> man 1 -> woman

╔═════╦════════╗
║ sex ║ height ║
╠═════╬════════╣
║  1  ║  150   ║
╠═════╬════════╣
║  1  ║  160   ║
╠═════╬════════╣
║  1  ║  170   ║
╠═════╬════════╣
║  0  ║  180   ║
╠═════╬════════╣
║  0  ║  190   ║
╠═════╬════════╣
║  0  ║  200   ║
╚═════╩════════╝

让我们做些愚蠢的事情.训练它来预测人的性别,因此我们试图学习f(x,y)= x(忽略第二个参数).

And let us do something silly. Train it to predict the sex of the person, so we are trying to learn f(x,y)=x (ignoring second parameter).

很容易看出,对于此类数据,最大的边缘分类器将在"175"身高附近的某个地方水平"切割飞机,因此一旦获得新的样本"0 178"(身高178cm的女性),我们将获得分类是她是男人.

It is easy to see, that for such data largest margin classifier will "cut" the plane horizontally somewhere around height "175", so once we get new sample "0 178" (a woman of 178cm height) we get the classification that she is a man.

但是,如果我们将所有内容缩小到[0,1],我们都会得到类似的东西

However, if we scale down everything to [0,1] we get sth like

╔═════╦════════╗
║ sex ║ height ║
╠═════╬════════╣
║  1  ║  0.0   ║
╠═════╬════════╣
║  1  ║  0.2   ║
╠═════╬════════╣
║  1  ║  0.4   ║
╠═════╬════════╣
║  0  ║  0.6   ║
╠═════╬════════╣
║  0  ║  0.8   ║
╠═════╬════════╣
║  0  ║  1.0   ║
╚═════╩════════╝

现在最大的边距分类器几乎按预期(垂直)切"平面,因此在给定新样本"0 178"(也将其缩放到"0 0.56"左右)的情况下,我们认为这是一个女人(正确!)

and now largest margin classifier "cuts" the plane nearly vertically (as expected) and so given new sample "0 178" which is also scaled to around "0 0.56" we get that it is a woman (correct!)

因此,通常来说,缩放可确保仅由于某些功能而不会导致将其用作主要预测变量.

So in general - scaling ensures that just because some features are big it won't lead to using them as a main predictor.