top of page

Abstract —  Accurate detection of objects is critical for full scale robotic autonomy. In this paper we propose KC3D-RCNN, a novel 2 stage predictor-refiner network that is able to efficiently detect, classify and estimate the pose of objects using both 2D images and 3D LiDAR point cloud data. The first stage of our network follows a K-Means-based region proposal approach and generates bounding boxes for the observed objects. We use a continuous 3D loss function and K-Means clustering algorithm to filter and reduce the number of bounding boxes. The 3D loss function scores each proposal by lifting the LiDAR returns to a continuous function in a Reproducing Kernel Hilbert Space. We also show how this function can be used for both classification and pose estimation. The adoption of this loss function makes our network especially adept in handling objects that are far away or have sparse LiDAR returns like pedestrians and bicyclists. The second stage updates the bounding boxes to refine pose estimation and classification. Our results, tested on the KITTI dataset, have outperformed state-of-the-art detection architectures and the ablation studies show how well our network performs even as we gradually make our inputs harder by increasing the data sparsity. All implementations and trained weights will be available at

bottom of page