PAEN: Efficient Pillar-based 3D Object Detector Based on Attention and Dilated Convolution

Authors: Jia Wen,Guanghao Zhang, Kelun Tian, Qi Zhang, and Kejun Ren
Conference: ICIC 2024 Posters, Tianjin, China, August 5-8, 2024
Pages: 28-39
Keywords: 3D object detection, LiDAR, Pillar detector, Attention module, Dilated convolution.

Abstract

The Pillar-based 3D object detector can complete the scene-sensing task efficiently and quickly, meeting the basic real-time detection needs of the automatic driving sensing module. In this paper, we propose a Pillar Sequence Attention Encoder and Dilated Expansion Convolution Network. The former addresses issues of coarse encoding methods and limitations in encoding information during the pillar encoding stage, while the latter tackles the problem of insufficient receptive fields in the backbone network. Specifically, the Pillar Sequence Attention Encoder uses the Pillar Sequence Attention module (PSA) to capture attention information among points in the local region of the pillar and utilizes a Pillar Feature Soft Aggregation module (PFSA) to finely aggregate information from points within the pillar. The Dilated Expansion Convolution Network leverages dilated convolutions to capture feature information with both sparse and dense in wide-ranging receptive fields. We conducted experiments on the KITTI dataset to validate the performance of our model and the effectiveness of the proposed modules. Experiments show that our method achieved a mean average precision(mAP) of 81.48% for the car category, surpassing the baseline model by 3.12%, while the inference time only increases by about 10ms.
📄 View Full Paper (PDF) 📋 Show Citation