且构网

分享程序员开发的那些事...
且构网 - 分享程序员编程开发的那些事

如何在 Keras 中使用 fit_generator() 平衡数据集?

更新时间:2023-02-10 08:22:13

如果您不想更改数据创建过程,可以在拟合生成器中使用 class_weight.您可以使用字典来设置您的 class_weight 并通过微调进行观察.例如,当不使用 class_weight 时,class0 有 50 个示例,class1 有 100 个示例.然后,损失函数统一计算损失.这意味着 class1 将是一个问题.但是,当您设置:

If you don't want to change your data creation process, you can use class_weight in your fit generator. You can use dictionary to set your class_weight and observe with fine tuning. For instance when class_weight is not used, and you have 50 examples for class0 and 100 examples for class1. Then, loss function calculate loss uniformly. It means that class1 will be a problem. But, when you set:

class_weight = {0:2 , 1:1}

这意味着损失函数现在将为您的 0 类赋予 2 倍的权重.因此,对代表性不足的数据进行错误分类将需要比以前多 2 倍的惩罚.因此,模型可以处理不平衡的数据.

It means that loss function will give 2 times weight to your class 0 now. Therefore, misclassification of underrepresented data will take 2 times more punishment than before. Thus, model can handle imbalanced data.

如果您使用 class_weight='balanced' 模型可以自动进行该设置.但我的建议是,创建一个类似于 class_weight = {0:a1 , 1:a2} 的字典,并尝试为 a1 和 a2 设置不同的值,以便您了解差异.

If you use class_weight='balanced' model can make that setting automatically. But my suggestion is that, create a dictionary like class_weight = {0:a1 , 1:a2} and try different values for a1 and a2, so you can understand difference.

此外,您可以对不平衡数据使用欠采样方法,而不是使用 class_weight.为此目的检查引导方法.

Also, you can use undersampling methods for imbalanced data instead of using class_weight. Check Bootstrapping methods for that purpose.