A Deep Learning Model for 3D Skeleton-Based Temporal Hand Gesture Localization

Authors: Jingtao Chen, Jieyu Zhao, and Kedi Shen
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 2656-2669
Keywords: Temporal action localization, hand gesture, skeleton, multi-view.

Abstract

Temporal action localization has recently garnered widespread attention. Current studies on temporal action localization mostly focus on 2D temporal action localization for the human body, with little research on more complex hand gestures. 2D-based temporal hand gesture localization has notable drawbacks, such as motion ambiguity and difficulty in handling complex gestures. To address these issues, we innovatively designed a 3D skeleton-based temporal action localization model, which processes 3D hand skeleton sequences and has a more robust capacity for learning and representing complex hand ges-tures. The model includes a backbone network, a tem-poral action localiza-tion module, and a classification head. The GCN-based backbone network is responsible for feature extraction from the skeleton se-quence. The temporal action localization module fuses multi-scale temporal features through a pyr-amid structure, then performs action localization. The classification head predicts the action category. Additionally, we designed an innovative loss function to guide the model's learning. We validated our model on a self-constructed 3D hand skeleton dataset, and the results show that our model demonstrates good performance in temporal hand gesture lo-calization.
📄 View Full Paper (PDF) 📋 Show Citation