Vision-Based Pedestrian Gesture Recognition System Using Spatiotemporal Features

Authors: MD AMINUL ISLAM ; XIAOHUI CUI
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 2067-2083
Keywords: Autonomous driving · Vulnerable Road User · Spatiotemporal features · long-short term memory LSTM .

Abstract

Pedestrians, classified as Vulnerable Road Users VRUs due to their lack of protective equipment, face high risks in traffic collisions. While crossing roads, VRUs frequently use communicative gestures such as raising a hand with the palm outward-to signal drivers to stop. Although human drivers can intuitively interpret these gestures, autonomous and driver-assistance systems still lack robust dynamic gesture interpretation, contributing to safety-critical failures. Despite progress in human-vehicle interaction for autonomous driving, prior research has largely emphasized on traffic police signal recognition, pedestrian trajectory prediction, or movement-based intent analysis, neglecting VRUs’ explicit gesture-based interactions. To address this gap, we present a systematic taxonomy of pedestrian gesture behaviors, grounded in real-world observations along with a custom dataset. We propose a robust recognition framework that combines spatial feature extraction and deep learning to interpret these gestures. Our method leverages geometric relationships in body keypoints to model spatial patterns, while temporal dynamics are captured using a Long Short-Term Memory LSTM network. This architecture processes sequential geometric features to identify distinctive spatiotemporal characteristics of pedestrian gestures. Experiments on our proposed custom dataset and the public CTPG dataset demonstrate a recognition accuracy of 95.18 with near real-time inference speeds, surpassing existing vision-based approaches for VRU gesture recognition.
📄 View Full Paper (PDF) 📋 Show Citation