• 中国期刊全文数据库
  • 中国学术期刊综合评价数据库
  • 中国科技论文与引文数据库
  • 中国核心期刊(遴选)数据库
李世鑫, 文益民. 基于kNCN的标签噪声在线核学习方法[J]. 桂林电子科技大学学报, xxxx, x(x): 1-9. DOI: 10.3969/1673-808X.2023118
引用本文: 李世鑫, 文益民. 基于kNCN的标签噪声在线核学习方法[J]. 桂林电子科技大学学报, xxxx, x(x): 1-9. DOI: 10.3969/1673-808X.2023118
LI Shixin, WEN Yimin. An online kernel learning method for label noise based on kNCN[J]. Journal of Guilin University of Electronic Technology, xxxx, x(x): 1-9. DOI: 10.3969/1673-808X.2023118
Citation: LI Shixin, WEN Yimin. An online kernel learning method for label noise based on kNCN[J]. Journal of Guilin University of Electronic Technology, xxxx, x(x): 1-9. DOI: 10.3969/1673-808X.2023118

基于kNCN的标签噪声在线核学习方法

An online kernel learning method for label noise based on kNCN

  • 摘要: 核方法被开发用于处理在线分类中的非线性分类问题,计算核函数时为了避免支持向量的数量随数据流而无限增加,近年来发展了预算维护算法。现有的固定预算的核分类算法在分类性能上会受标签噪声的严重影响。为了解决该问题,提出一种基于kNCN的标签噪声在线核学习方法。当缓冲区达到预算规模时,该方法利用kNCN原则为缓冲区内每个支持向量找到k个近质心近邻点,通过计算它们之间的局部标签不一致性来构建删除候选集和锚点集,再对删除候选集内所有实例建立试错模型,在锚点集上检验试错模型的分类准确率,从而判断哪个支持向量最应该从缓冲区中删除,实现固定预算的维护。在人工合成数据集和真实数据集上的实验结果表明,将本方法应用于固定预算的感知机和被动攻击算法,在标签噪声场景下的分类性能实现了有效提升,在6个数据集上的综合排名优于其他对比算法。

     

    Abstract: Kernel methods have been developed to handle nonlinear classification problems in online classification. In recent years, budget maintenance algorithms have been developed to avoid the infinite increase in the number of support vectors with data flow when calculating kernel functions. The existing fixed budget kernel classification algorithms are severely affected by label noise in classification performance. To address this issue, a label noise online kernel learning method based on kNCN is proposed. When the buffer reaches the budget size, this method uses the kNCN principle to find k near centroid nearest neighbor points for each support vector in the buffer. Then, by calculating the local label inconsistency between them, the deletion candidate set and anchor set are constructed. Then, a trial and error model is established for all instances in the deletion candidate set, and the classification accuracy of the trial and error model is tested on the anchor set to determine which support vector should be most removed from the buffer, Maintain a fixed budget. The experimental results on synthetic data sets and real data sets show that this method is applied to fixed budget perceptron and passive attack algorithms, and the classification performance in the tag noise scene is effectively improved. The comprehensive ranking on six data sets is better than other comparison algorithms.

     

/

返回文章
返回