Unlocking CLIP for Generalized Deepfake Detection with Dynamic Mixture-of-Adapters
Authors:
Jialong Liu, Guanghui Li, and Chenglong Dai
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
119-135
Keywords:
Deepfake Detection, Dynamic Mixture-of-Adapters, Contrastive Language–Image Pre-training CLIP
Abstract
The rapid development of deepfake has raised significant security and ethical concerns, requiring robust and generalizable detection methods. In this work, we propose a novel framework for deepfake detection that leverages the power of large-scale pre-trained vision-language models, specifically the Contrastive Lan-guage–Image Pre-training CLIP model. Our approach fine-tunes the CLIP im-age encoder for deepfake detection by introducing a Dynamic Mixture-of-Adapters MoA architecture, which consists of multiple lightweight, domain-specific adapter modules that are dynamically activated based on input images. To further improve cross-domain performance, we introduce three auxiliary regulari-zation terms for fine-tuning: attention alignment and similarity regularization, which enforce consistency in feature extraction, and cached domain regulariza-tion, which preserves domain-specific prototypes. The proposed framework ef-fectively balances domain-specific adaptation and generalization, addressing criti-cal challenges in generalized deepfake detection. Extensive experiments on benchmark datasets, including FaceForensics__, CelebDF, DFDC, DFD, and DiFF, show that our method performs well in both in-domain and cross-domain deepfake detection tasks.
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Jialong Liu, Guanghui Li, and Chenglong Dai},
title = {Unlocking CLIP for Generalized Deepfake Detection with Dynamic Mixture-of-Adapters},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {119-135},
note = {Poster Volume Ⅰ}
doi = {
10.65286/icic.v21i1.53285}
}