Learning with Noisy Labels via Self-Reweighting from Class Centroids


Although deep neural networks have been proved effective in many applications, they are data hungry and training deep models often requires laboriously labeled data. However, when labeled data contain erroneous labels, they often lead to model performance degradation. A common solution is to assign each sample with a dynamic weight during optimization, and the weight is adjusted in accordance with the loss. However, those weights are usually unreliable since they are measured by the losses of corrupted labels. Thus, this scheme might impede the discriminative ability of neural networks trained on noisy data. To address this issue, we propose a novel reweighting method, dubbed self-reweighting from class centroids (SRCC), by assigning sample weights based on the similarities between the samples and our online learned class centroids. Since we exploit statistical class centers in the image feature space to reweight data samples in learning, our method is robust to noise caused by corrupted labels. In addition, even after reweighting the noisy data, the decision boundaries might still suffer distortions. Thus, we leverage mixed inputs that are generated by linearly interpolating two random images and their labels to further regularize the boundaries. We employ the learned class centroids to evaluate the confidence of our generated mixed data via measuring feature similarities. During the network optimization, the class centroids are updated as more discriminative feature representations of original images are learned. In doing so, SRCC will generate more robust weighting coefficients for noisy and mixed data, and facilitates our feature representation learning in return. Extensive experiments on both the synthetic and real image recognition tasks demonstrate that our method SRCC outperforms the state-of-the-art on learning with noisy data.

IEEE Transactions on Neural Networks and Learning Systems (TNNLS)