Hybrid Point-Pillar-Transformer Network for 3D Small Object Detection in Autonomous Driving
Authors:
Rongjie Wang and Shuo Yang
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
2098-2108
Keywords:
3D Object Detection ยท Multi-modal transformer feature fusion ยท Pillar Features.
Abstract
With the increasing demand for object detection accuracy in scenarios such as intelligent transportation and autonomous driving, methods that utilize point cloud and voxel features to achieve multi-modal feature fusion have become increasingly common in 3D object detection. However, existing methods often rely on inefficient linear fusion strategies during the multi-modal feature fusion process, which fails to adequately capture the dependencies between multi-source features, leading to insufficient feature integration. Additionally, during feature extraction, limitations in network architecture result in a lack of interaction between shallow and deep features, causing the loss of fine-grained feature information, which particularly affects the detection of small objects.To address these issues, we propose the Hybrid Point-Voxel-Transformer Network HPP-TNet , a two-stage object detection framework that integrates point and pillar features. Specifically, we design a fine-grained pillar feature extraction module CFPEM , which effectively alleviates the feature loss problem caused by voxel downsampling through shallow-deep feature interaction and lightweight attention design. Next, we develop a transformer-based multi-scale feature fusion module TMFFM , which dynamically achieves cross-modal associations through amulti-head attention mechanism, enhancing context-aware features and fully realizing multi-source feature fusion. Experiments on the KITTI dataset demonstrate that our proposed algorithm achieves competitive detection performance compared to several state-of-the-art methods, particularly in the Cyclist and Pedestrian categories. Our code will be open-sourced soon.
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Rongjie Wang and Shuo Yang},
title = {Hybrid Point-Pillar-Transformer Network for 3D Small Object Detection in Autonomous Driving},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2098-2108},
note = {Poster Volume โ
ก}
}