且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

keras模型上的特征选择

更新时间:2023-12-01 22:41:40

我假设您的Keras模型是某种神经网络.通常,对于NN来说,很难看到哪些输入特征是相关的,哪些是不相关的.这样做的原因是,每个输入要素都具有与其链接的多个系数-每个系数都对应于第一隐藏层的一个节点.添加其他隐藏层会使确定输入特征对最终预测的影响变得更加复杂.

I assume your Keras model is some kind of a neural network. And with NN in general it is kind of hard to see which input features are relevant and which are not. The reason for this is that each input feature has multiple coefficients that are linked to it - each corresponding to one node of the first hidden layer. Adding additional hidden layers makes it even more complicated to determine how big of an impact the input feature has on the final prediction.

另一方面,对于线性模型,这非常简单,因为每个特征x_i具有相应的权重/系数w_i,并且其大小直接确定其对预测的影响力(假设特征当然是按比例缩放的).

On the other hand, for linear models it is very straightforward since each feature x_i has a corresponding weight/coefficient w_i and its magnitude directly determines how big of an impact it has in prediction (assuming that features are scaled of course).

RFE估计器(递归特征消除)假设您的预测模型具有属性coef_(线性模型)或feature_importances_(树模型),该属性具有输入要素的长度,并表示它们的相关性(绝对值)条款).

The RFE estimator (Recursive feature elimination) assumes that your prediction model has an attribute coef_ (linear models) or feature_importances_(tree models) that has the length of input features and that it represents their relevance (in absolute terms).

我的建议:

  1. 功能选择 :(选项a)在任何线性/树模型上运行RFE,以将功能数量减少到所需的数量n_features_to_select. (选项b)使用可增强稀疏性的正则化线性模型(例如套索/弹性网).这里的问题是您不能直接设置所选功能的实际数量. (选项c)使用此处的任何其他功能选择技术.
  2. 神经网络:仅将(1)中的功能用于您的神经网络.
  1. Feature selection: (Option a) Run the RFE on any linear / tree model to reduce the number of features to some desired number n_features_to_select. (Option b) Use regularized linear models like lasso / elastic net that enforce sparsity. The problem here is that you cannot directly set the actual number of selected features. (Option c) Use any other feature selection technique from here.
  2. Neural Network: Use only features from (1) for your neural network.