Monocular Visual Object 3D Localization in Road Scenes

Yizhou Wang, Yen-Ting Huang, Jenq-Neng Hwang   July 15, 2019  

This is a paper published at ACM Multimedia 2019 (Long Oral). [PDF Available Here]

Problems to Solve

  • Accurately localize the 3D positions of the objects in videos captured by a camera mounted on an autonomous vehicle.
  • Adaptively estimate ground plane of each frame for more robust object 3D localization.



  • Monocular depth estimation or other 3D sensors to obtain depth information.
  • Object depth histogram analysis or 3D point cloud clustering for object depth initialization.
  • Adaptive ground plane estimation taking advantage of sparse and dense ground features.
  • Tracklet smoothing using the results from multi-object tracking.

Quantitative Results

Localization error and time complexity for pedestrians localization on KITTI dataset.

pedestrian results

Localization error for vehicle localization on KITTI dataset.

vehicle results

Ground plane estimation results.

ground plane estimation results

Qualitative Results

Example results for pedestrian and vehicle 3D localization.

pedestrian results

pedestrian results

Please refer our paper published in ACM Multimedia 2019:

  title={Monocular Visual Object 3D Localization in Road Scenes},
  author={Wang, Yizhou and Huang, Yen-Ting and Hwang, Jenq-Neng},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia},

object-localization  mask-rcnn  depth-estimation  ground-plane-estimation  multi-object-tracking  kitti