Abstract:
Modern neural networks may produce high confidence prediction results for inputs from outside the training distribution, posing a potential threat to machine learning models. Detecting inputs from out-of-distributions is a central issue in the safe deployment of models in the real world. Detection methods based on energy models directly use the feature vectors extracted by the model to calculate the energy score of a sample, and reliance on features that are not significant may affect the performance of the detection. To alleviate this problem, a loss function based on sparse regularization is proposed to fine-tune a classification model that has been pre-trained to increase the sparsity of in-distribution sample features while maintaining the classification power of the model during the learning process. This results in a lower energy score for in-distribution samples and a larger difference in scores between in-distribution and out-of-distribution samples, thus improving detection performance. Furthermore, the method does not introduce an external auxiliary dataset, avoiding the effect of correlation between samples. Experimental results on datasets CIFAR-10 and CIFAR-100 show that the method reduced the average FPR 95 of detecting the six abnormal datasets by 15.02% and 15.41% respectively.