MLVP-Net: Deepfake Detection Based on Multi-Level Visual Perception

Authors: Kai Li, Shaochen Jiang, Liejun Wang, Songlin Li, Chao Liu, and Sijia He
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 453-467
Keywords: Deepfake Detection,Efficientnet,State Space Model,Differential Convolution,Content Adaptive Attention

Abstract

The negative impacts of Deepfake technology have attracted widespread attention in the multimedia forensics community. Due to the insufficient diversity of existing datasets, models tend to overly rely on forgery-specific features, resulting in poor generalization. To address this issue, we propose a multi-level visual perception network MLVP-Net , which explores local, spatial, and semantic consistency from different perspectives to improve detection accuracy. Specifically, we first introduce a Multi-scale Spatial Perception Module MSPM that effectively captures both long-range and local information through parallel cascaded Hybrid State Space HSS blocks and multi-kernel convolution operations. Then, we present a Detail Feature Enhancement Module DFEM , which employs multiple differential convolutions for multi-directional perception, enabling the model to sense and weight details from different directions. Finally, we propose a Content-Adaptive Attention Module CAAM , which enriches contextual information by fusing multi-level features while guiding the model to focus on more useful information through combing channel and spatial attention mechanisms. Extensive experiments demonstrate that our MLVP-Net significantly outperforms all comparison methods across five benchmark datasets in Deepfake detection.
📄 View Full Paper (PDF) 📋 Show Citation