Hierarchical Incongruity-Aware Fusion Network with Adaptive Refinement for Multi-modal Sarcasm Detection

Authors: Fang Wang, Lei Chen, and Hao Pan
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 3664-3679
Keywords: Multi-modal Sarcasm Detection , Multi-modal Fusion , Hierarchical Attention

Abstract

Multi-modal sarcasm detection MSD aims to identify sarcastic sentiment conveyed through textual and visual modalities. The key challenge lies in capturing underlying incongruity across modalities. However, many existing studies rely on shallow feature fusion strategies, resulting in limited interaction between textual and visual features. Moreover, they often overlook localized inconsistencies in sarcasm, leading to insufficient representation of fine-grained sarcastic cues. To address these challenges, we propose a hierarchical incongruity-aware fusion network with semantic adaptive refinement HIAF . Specifically, we first introduce a hierarchical fusion module that progressively captures multi-level incongruity through iterative transformer layers, guided by a cross-modal locality-constrained attention mechanism. Second, we design a semantic adaptive refinement module that dynamically integrates unimodal and cross-modal features based on their contextual contributions. Experiments demonstrate consistent outperformance over strong baselines, validating its capability in capturing multi-modal incongruity.
📄 View Full Paper (PDF) 📋 Show Citation