Abstract:
Disease prediction is an important index in epidemic risk assessment. This paper aims at studying the application of time series analysis and machine learning methods for predicting the incidence trend of infectious diseases. Based on the monthly data of notifiable infectious diseases in China from 2012 to 2022, traditional time series analysis methods (SARIMA model), machine learning methods (SVR, BP neural network), and their combination methods (ARIMA-SVR, ARIMA-BPANN) were used, respectively. The prediction models of epidemic incidence were established, and their performances were compared. It is found that for predicting the transmission of infectious diseases, the mean absolute percentage errors (MAPE) of the combined models SARIMA-SVR and SARIMA-BPNN were separately reduced by 6.85%, 7.48%, 6.97%, and 6.36%, 6.99%, 6.48%, compared with single SARIMA, SVR, and BP neural network models. Similarly, for the classes of A, B and C infectious diseases, the prediction accuracy of the combined model is also improved to a certain extent compared with the single model. The finding indicated that combination models SARIMA-SVR and SARIMA-BPNN have more advantages in predicting epidemic data than single model.