CAP: Contextual Enhancement and Adaptive Prompting Network for Zero-Shot Composed Image Retrieval

Authors: Dian Chen, Bo Li, Ying Qin, Qingwen Li, Hong Li, and Shikui Wei
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 3586-3597
Keywords: Composed image retrieval Zero-shot Contrastive learning

Abstract

This paper focuses on the zero-shot Composed Image Retrieval ZS-CIR task, which only requires unlabeled images or imagetitle pairs for model training. Previous work has utilized textual inversion networks to form queries by combining a photo of fixed templates with pseudo-words projected from reference image features into the text embedding space. However, fixed prompt templates offer limited performance improvement for the model and can affect the learning of instance-specific contextual information in open-domain tasks. To address these issues, we propose a zero-shot composed image retrieval framework based on contextual enhancement and adaptive prompting CAP , which consists of a Contextual Enhancement Module CEM and an Adaptive Prompting module APM . CEM introduces bi-directional LSTM re-parameterized learnable prompts, and APM decouples the retrieval instances and maps the different features to the corresponding prompt parameters. These two modules cooperate to construct the optimal prompts adapted to the retrieval instances. Extensive qualitative and quantitative experiments on three datasets show that our model has a good generalization and better performance compared to state-of-theart methods.
📄 View Full Paper (PDF) 📋 Show Citation