FusionCLIP-AD: Hierarchical Global-Local Adaptation with Learnable Embeddings for Robust Medical Image Anomaly Detection
Authors:
Hongwei Li and S. Kevin Zhou
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
2796-2809
Keywords:
Vision-Language Model, Few-Shot Anomaly Detection, Medical Image Analysis
Abstract
Although CLIP-based few-shot learning has shown promise in anomaly detection, it still exhibits notable limitations in medical imaging applications: fixed prompt mechanisms are difficult to finely adapt to domain differences, and the lack of collaborative modeling between local and global features results in loss of holistic information. This paper proposes a novel hierarchical adaptation framework: 1 Integration of global and local features to effectively capture potential details and comprehensive information in medical images, and 2 Multilevel learnable anomaly prompts dynamically constructed in the embedding space. By learning fused features and prompts across different layers, the model flexibly and accurately addresses complex scenarios in medical imaging. Experimental results demonstrate that the proposed method significantly enhances CLIP’s few-shot learning performance in medical image anomaly detection tasks. Our method achieves state-of-the-art performance on LiverCT with 85.55 AUROC under 4-shot settings, surpassing prior arts like MVFA 81.18
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Hongwei Li and S. Kevin Zhou},
title = {FusionCLIP-AD: Hierarchical Global-Local Adaptation with Learnable Embeddings for Robust Medical Image Anomaly Detection},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2796-2809},
}