Coarse-to-Fine Scene Graph Similarity Reasoning for Image-text Retrieval
Authors:
Chengsong Sun, Qingyun Liu, Yuankun Liu, Boyan Liu, Xiang Yuan, Bingce Wang, Tong Mo, and Weiping Li
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
3571-3584
Keywords:
Image-text retrieval, Multi-modal similarity reasoning, Scene graph, Contrastive learning
Abstract
Image-text retrieval is a crucial task, which targets at finding the counterparts from the opposing modalities. Scene graph based image-text retrieval methods leverage the object and predicate features to reason the cross-modal similarity, therefore increasing the retrieval accuracy. However, existing scene graph based image-text retrieval methods simply fuse the similarity calculations for features at each granularity in a single network, which only brings a slight improvement in the retrieval performance. The features of the scene graph fail to be effectively utilized. Therefore, this paper proposes a Coarse-to-Fine Scene Graph Similarity Reasoning CFSGR method to conduct coarse-grained and fine-grained cross-modal similarity reasoning, separately. CFSGR includes two networks: coarse-grained similarity reasoning network for graphs, fine-grained similarity reasoning network for objects and predicates. Moreover, CFSGR conducts local and global alignments for each feature, ensuring that the similarities at each granularity of visual and textual scene graphs are fully exploited. The evaluation and ablation study on Flickr30K demonstrates the superiority of CFSGR among the SOTA State-Of-The-Art image-text retrieval methods, and CFSGR achieves competitive results with Rsum as 506. The source code is available at https: github.com okeike CFSGR.
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Chengsong Sun, Qingyun Liu, Yuankun Liu, Boyan Liu, Xiang Yuan, Bingce Wang, Tong Mo, and Weiping Li},
title = {Coarse-to-Fine Scene Graph Similarity Reasoning for Image-text Retrieval},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3571-3584},
}