如何在 Keras 中使用 fit_generator() 平衡数据集?

更新时间：2023-02-10 08:22:13

如果您不想更改数据创建过程，可以在拟合生成器中使用 class_weight.您可以使用字典来设置您的 class_weight 并通过微调进行观察.例如，当不使用 class_weight 时，class0 有 50 个示例，class1 有 100 个示例.然后，损失函数统一计算损失.这意味着 class1 将是一个问题.但是，当您设置:

If you don't want to change your data creation process, you can use class_weight in your fit generator. You can use dictionary to set your class_weight and observe with fine tuning. For instance when class_weight is not used, and you have 50 examples for class0 and 100 examples for class1. Then, loss function calculate loss uniformly. It means that class1 will be a problem. But, when you set:

class_weight = {0:2 , 1:1}

这意味着损失函数现在将为您的 0 类赋予 2 倍的权重.因此，对代表性不足的数据进行错误分类将需要比以前多 2 倍的惩罚.因此，模型可以处理不平衡的数据.

It means that loss function will give 2 times weight to your class 0 now. Therefore, misclassification of underrepresented data will take 2 times more punishment than before. Thus, model can handle imbalanced data.

如果您使用 class_weight='balanced' 模型可以自动进行该设置.但我的建议是，创建一个类似于 class_weight = {0:a1 , 1:a2} 的字典，并尝试为 a1 和 a2 设置不同的值，以便您了解差异.

If you use class_weight='balanced' model can make that setting automatically. But my suggestion is that, create a dictionary like class_weight = {0:a1 , 1:a2} and try different values for a1 and a2, so you can understand difference.

此外，您可以对不平衡数据使用欠采样方法，而不是使用 class_weight.为此目的检查引导方法.

Also, you can use undersampling methods for imbalanced data instead of using class_weight. Check Bootstrapping methods for that purpose.

上一篇 : ：如何在Java中使用.Net程序集下一篇 : 检测无效日期

如何在 Keras 中使用 fit_generator() 平衡数据集?

相关阅读

技术问答最新文章