HyperMoCL: Emotion Recognition via Multimodal Representation Learning and Multi-Level Hypergraph Contrastive Learning

Authors: Shuoxin Liu, Guocheng An, and Xiaolong Wang
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 846-863
Keywords: multimodal representation learning, multi-level hypergraph contrastive learn-ing, conversation emotion recognition.

Abstract

Multimodal Emotion Recognition in Conversation MERC aims to identify the emotional states of speakers by integrating linguistic, audio, and visual in-formation from dialogues. The core challenge of MERC lies in effectively fus-ing multimodal information and extracting key features. In recent years, hy-pergraph-based methods have been explored to construct hypergraphs directly using features output from unimodal encoders. However, due to the heteroge-neity across modalities and the propagation of noise and redundant information within the hypergraphs, the modeling of inter-modal relationships often be-comes inaccurate. Furthermore, existing approaches that employ node-level hypergraph contrastive learning overlook global structural information, result-ing in insufficient modeling of global features. To address these limitations, we propose HyperMoCL, which integrates multimodal representation learning and multi-level hypergraph contrastive learning. First, HyperMoCL obtains higher-quality modal features through multimodal representation learning for hypergraph construction. Subsequently, a multi-level hypergraph contrastive learning framework is employed to comprehensively capture the structural fea-tures of the hypergraph, thereby enhancing feature discriminability and model robustness. Experimental results on two widely-used datasets IEMOCAP, MELD demonstrate that our method outperforms previous state-of-the-art ap-proaches.
📄 View Full Paper (PDF) 📋 Show Citation