• 中国期刊全文数据库
  • 中国学术期刊综合评价数据库
  • 中国科技论文与引文数据库
  • 中国核心期刊(遴选)数据库
何倩, 黄焕, 刘阳, 江炳城, 申普. 少量标签机电设备大数据故障预测方法[J]. 桂林电子科技大学学报, 2020, 40(4): 292-299.
引用本文: 何倩, 黄焕, 刘阳, 江炳城, 申普. 少量标签机电设备大数据故障预测方法[J]. 桂林电子科技大学学报, 2020, 40(4): 292-299.
HE Qian, HUANG Huan, LIU Yang, JIANG Bingcheng, SHEN Pu. Fault prediction method for electromechanical equipment big data with few labels[J]. Journal of Guilin University of Electronic Technology, 2020, 40(4): 292-299.
Citation: HE Qian, HUANG Huan, LIU Yang, JIANG Bingcheng, SHEN Pu. Fault prediction method for electromechanical equipment big data with few labels[J]. Journal of Guilin University of Electronic Technology, 2020, 40(4): 292-299.

少量标签机电设备大数据故障预测方法

Fault prediction method for electromechanical equipment big data with few labels

  • 摘要: 针对智能机电设备大数据故障预测依赖足量标签样本的问题,提出一种基于独立森林改进梯度提升树的半监督学习算法。通过独立森林算法基于少量标签样本的学习结果对无标签数据进行评估和推断标签,并使用梯度提升树算法基于新标签的数据集训练模型用于故障预测,从而减少缺乏标签对预测模型精度的影响。为了处理海量数据,在Spark大数据平台上实现了算法的并行化。实验结果表明,该方法在公开运行数据集和真实机电设备数据集的测试中提高了分类精度,具有良好的少量标签适应性和并发性能。

     

    Abstract: Fault prediction is the core technology of electromechanical equipment operation and maintenance. The traditional method of fault prediction based on machine learning classification algorithm needs sufficient labeled samples and is no longer suitable for the requirements of that intelligent electromechanical devices are widely deployed and quickly. In this paper, an improved gradient boosting decision tree algorithm based on the isolation forest, a semi-supervised learning algorithm, is proposed. The isolation forest algorithm is used to evaluate and infer the labels of the unlabeled data based on the learning results of small labeled samples. The gradient boosting decision tree algorithm is used to train the model for the data set with few labels and predict the fault, and then the influence of the lack of labels on the prediction accuracy is reduced. In order to process the massive big data, the parallelization of algorithms is realized on Spark. The experimental results show that the proposed method can improve the classification accuracy on the open and real data sets, which has good adaptability for few labels and good parallel performance.

     

/

返回文章
返回