Local-Semantic Attentive Bidirectional Bottleneck Network with Residual Feature Augmentation for Real-time Semantic Segmentation

Authors: ManYuan Gui, Jinlai Zhang, Yonghen Hu, Sheng Wu, Du Xu, Bo Ouyang, Shaosheng Fan, and Zhenzhen Jin
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 1757-1774
Keywords: Real-time semantic segmentation and local-semantic attention and residual feature augmentation and bidirectional bottleneck network and computational efficiency

Abstract

Real-time semantic segmentation is critical for applications such as autonomous driving, where the core challenge lies in achieving high segmentation accuracy while maintaining efficient inference. This paper proposes LSA-BiNet, a bidirectional bottleneck network via local-semantic attention and residual feature augmentation. The framework has three key innovations: 1 The Local Receptive Field Attention LRFA module achieves high-order feature interactions with 1st-order computational complexity through region-wise soft-weight computation and channel gating 2 The Spatial Variance Fusion Module SVFM collaboratively models local and non-local features via low-frequency variance modulation and local detail enhancement 3 The Residual Cross-level Attention Decoder RCAD enables precise pixel-level prediction using cross-level feature projection, dual gating mechanisms, and residual attention weighting. Extensive experiments on Cityscapes and CamVid benchmarks demonstrate that LSA-BiNet achieves state-of-the-art SOTA mean Intersection-over-Union mIoU of 72.74 and 68.53 without ImageNet pretraining, while maintaining low computational complexity 8.81 GFLOPs and real-time inference speeds 51.08 FPS on Cityscapes, 79.62 FPS on CamVid . Ablation studies confirm significant contributions of each module, establishing LSA-BiNet’s superiority over contemporary SOTA models.
📄 View Full Paper (PDF) 📋 Show Citation