HSAN: A Side Adapter Network with Hybrid Compression and Local Enhancement Attention
Authors:
Yankui Fu, Fang Yang, and Qingxuan Shi
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
3044-3058
Keywords:
Open-Vocabulary Semantic Segmentation, Attention Mechanism, Feature Fusion, CLIP
Abstract
Significant progress has been made in open-vocabulary semantic segmentation tasks, particularly in recognizing and segmenting unseen categories by leveraging Contrastive Language-Image Pre-training CLIP . Among existing methods, the Side Adapter Network SAN stands out as an effective approach, achieving strong performance. However, we identify that SAN does not perform well in capturing fine-grained local features in complex scenes and high-resolution images. Additionally, it suffers from high computational costs and struggles to effectively fuse the features generated by its internal modules with those extracted by CLIP, resulting in segmentation accuracy. To address these issues, we propose HSAN, which introduces the Hybrid Compression and Local Enhancement Attention HCLEA mechanism to re-duce dimensionality for lower computational complexity while using additional convolutional neural networks to preserve and enhance local features. Furthermore, we design an Adaptive Feature Fusion Block AFFB that dynamically adjusts fusion weights based on input features, achieving better global-local feature fusion and fully leveraging CLIP’s generalization ability. Extensive experiments on benchmark datasets demonstrate that HSAN achieves higher accuracy and faster inference compared to SAN and other state-of-the-art methods.
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Yankui Fu, Fang Yang, and Qingxuan Shi},
title = {HSAN: A Side Adapter Network with Hybrid Compression and Local Enhancement Attention},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3044-3058},
}