Abstract:
Fault prediction is the core technology of electromechanical equipment operation and maintenance. The traditional method of fault prediction based on machine learning classification algorithm needs sufficient labeled samples and is no longer suitable for the requirements of that intelligent electromechanical devices are widely deployed and quickly. In this paper, an improved gradient boosting decision tree algorithm based on the isolation forest, a semi-supervised learning algorithm, is proposed. The isolation forest algorithm is used to evaluate and infer the labels of the unlabeled data based on the learning results of small labeled samples. The gradient boosting decision tree algorithm is used to train the model for the data set with few labels and predict the fault, and then the influence of the lack of labels on the prediction accuracy is reduced. In order to process the massive big data, the parallelization of algorithms is realized on Spark. The experimental results show that the proposed method can improve the classification accuracy on the open and real data sets, which has good adaptability for few labels and good parallel performance.