Enhancing YOLOv5 with Swin Transformer and Multi-Scale Attention for Improved Helmet Detection in Power Grid Construction Sites

Authors: Jindong He, Tianming Zhuang, Jianwen Min, Botao Jiang, Chaosheng Feng, Zhen Qin
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 115-126
Keywords: safety helmet wearing, target detection, vision Transformer, multi-scale representations

Abstract

With the progress of computer vision technology, helmet wearing detection has become increasingly important for site safety, particularly in complex environments like power grid construction sites. However, this remains challenging due to issues with incorrect and missed detections in crowded environments. To it, in this paper, we introduce a novel approach that refines the YOLOv5 network with swin Transformer. This method is able to address the limitations of YOLO's convolutional architecture, which struggles with long-range dependencies and dense target detection. Our hybrid strategy combines the Transformer's ability to capture global dependencies with YOLO's processing speed, resulting in a robust and real-time detection system for power grid environments. Additionally, a novel Multi-Scale Convolutional Attention (MSCA) module is proposed, which overcomes the single-scale focus of existing attention mechanisms. By integrating attention across various scales, the MSCA module captures the semantic richness of different feature sizes, enhancing the model's performance in long-range contextual understanding and fine-grained semantic awareness. Extensive experiments are conducted on the safety helmet wearing detect and VOC2028-SafeHelmet datasets, the superior performance validates the effectiveness and generalization of our proposed method.
📄 View Full Paper (PDF) 📋 Show Citation