Abstract:
Aiming at the problems of insufficient global feature information, insufficient feature fusion and low detection efficiency of YOLOv5 network model in complex environment. A kind of YOLOv5 traffic sign detection algorithm incorporating self-attention mechanism was proposed.In the backbone network feature extraction part, the Swin-Transformer module based on the self-attention mechanism and the C3 module which can reduce the calculation amount of the model to increase the information interaction between the feature images to obtain multi-scale image features. The feature image processing part of the model uses the visual Transformer model and the Swin-Transformer module to fuse the feature images, obtains the global feature information of the image to be measured, and improves the detection accuracy of the model.Finally, the original feature image splicing mode is weighted for processing, and the important traffic sign feature information can be preferentially detected, which improves the detection efficiency of the model. After testing in the TT100K datasets, the final mean average detection accuracy reached 83.51%, which was 2.50 percentage points higher than the original YOLOv5 network model and 0.037s higher compared to the original single feature image detection rate. The experimental results show that the YOLOv5 model integrating the self-attention mechanism effectively improves the global feature extraction ability, detection accuracy and detection efficiency of traffic sign detection.