Cultural Heritage Assistant: A Lightweight Retrieval Augmented Generation Method Enhanced Vision-Language Model for Cultural Heritage
Authors:
Shiyu Wang, Huanda Lu, Haibiao Yao, Zhiyu Wu, Chen Hu, Xiangjie Xie, and Xin Yu
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
3617-3630
Keywords:
VLM、RAG、Cultural Heritage Assistant
Abstract
External knowledge makes the Vision-Language Model VLM mo -re versatile. However, Traditional methods often fail to address the nuanced challenges of cultural heritage, mainly when dealing with unfamiliar or complex artifact-related queries. This limitation is evident in Vision-Language Models, which struggle to generate responses without exposure to domain knowledge. Frequent retraining to accommodate new artifacts or knowledge domains is computationally expensive and impractical. To overcome these limitations, we propose Cultural Heritage Assistant, a lightweight Retrieval-Augmented Generation RAG method designed for enhancing small-scale VLMs. Our approach integrates visual and textual retrieval modules to augment the input context, enabling the model to generate professional and accurate responses for cultural heritage queries. Experimental results on the constructed Hemudu Artifacts Visual Question-Answering dataset demonstrate the effectiveness of this approach. This method offers a solution for preserving and disseminating cultural heritage, bridging the gap between advanced VLM capabilities and domain-specific expertise.
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Shiyu Wang, Huanda Lu, Haibiao Yao, Zhiyu Wu, Chen Hu, Xiangjie Xie, and Xin Yu},
title = {Cultural Heritage Assistant: A Lightweight Retrieval Augmented Generation Method Enhanced Vision-Language Model for Cultural Heritage},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {3617-3630},
}