Unlocking CLIP for Generalized Deepfake Detection with Dynamic Mixture-of-Adapters

Authors: Jialong Liu, Guanghui Li, and Chenglong Dai
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 119-135
Keywords: Deepfake Detection, Dynamic Mixture-of-Adapters, Contrastive Language–Image Pre-training CLIP

Abstract

The rapid development of deepfake has raised significant security and ethical concerns, requiring robust and generalizable detection methods. In this work, we propose a novel framework for deepfake detection that leverages the power of large-scale pre-trained vision-language models, specifically the Contrastive Lan-guage–Image Pre-training CLIP model. Our approach fine-tunes the CLIP im-age encoder for deepfake detection by introducing a Dynamic Mixture-of-Adapters MoA architecture, which consists of multiple lightweight, domain-specific adapter modules that are dynamically activated based on input images. To further improve cross-domain performance, we introduce three auxiliary regulari-zation terms for fine-tuning: attention alignment and similarity regularization, which enforce consistency in feature extraction, and cached domain regulariza-tion, which preserves domain-specific prototypes. The proposed framework ef-fectively balances domain-specific adaptation and generalization, addressing criti-cal challenges in generalized deepfake detection. Extensive experiments on benchmark datasets, including FaceForensics__, CelebDF, DFDC, DFD, and DiFF, show that our method performs well in both in-domain and cross-domain deepfake detection tasks.
📄 View Full Paper (PDF) 📋 Show Citation