更新时间:2023-12-02 21:47:58
假设您正在尝试进行图像分类.这些应该是微调模型的步骤:
Assuming you are trying to do image classification. These should be the steps for finetuning a model:
原始分类层loss3/classifier";
输出 1000 个类别的预测(它的 mum_output
设置为 1000).您需要将其替换为具有适当 num_output
的新层.替换分类层:
The original classification layer "loss3/classifier"
outputs predictions for 1000 classes (it's mum_output
is set to 1000). You'll need to replace it with a new layer with appropriate num_output
. Replacing the classification layer:
num_output
更改为您尝试预测的正确数量的输出类别."loss1/classifier"
, "loss2/classifier"
和 "loss3/classifier"
.num_output
to the right number of output classes you are trying to predict."loss1/classifier"
, "loss2/classifier"
and "loss3/classifier"
.您需要使用要微调的新标签制作新的训练数据集.例如,请参阅这篇博文,了解如何制作 lmdb 数据集.
You need to make a new training dataset with the new labels you want to fine tune to. See, for example, this post on how to make an lmdb dataset.
在微调模型时,您可以训练所有模型的权重或选择固定一些权重(通常是较低/较深层的过滤器)并仅训练最顶层的权重.这个选择取决于你,它通常取决于可用的训练数据量(你拥有的示例越多,你可以负担得起的权重越多).
每个层(保存可训练参数)都有 param { lr_mult: XX }
.该系数决定了这些权重对 SGD 更新的敏感程度.设置 param { lr_mult: 0 }
意味着你固定这一层的权重,它们在训练过程中不会改变.
相应地编辑您的 train_val.prototxt
.
When finetuning a model, you can train ALL model's weights or choose to fix some weights (usually filters of the lower/deeper layers) and train only the weights of the top-most layers. This choice is up to you and it ususally depends on the amount of training data available (the more examples you have the more weights you can afford to finetune).
Each layer (that holds trainable parameters) has param { lr_mult: XX }
. This coefficient determines how susceptible these weights to SGD updates. Setting param { lr_mult: 0 }
means you FIX the weights of this layer and they will not be changed during the training process.
Edit your train_val.prototxt
accordingly.
运行caffe train
,但为其提供caffemodel权重作为初始权重:
Run caffe train
but supply it with caffemodel weights as an initial weights:
~$ $CAFFE_ROOT/build/tools/caffe train -solver /path/to/solver.ptototxt -weights /path/to/orig_googlenet_weights.caffemodel