Train YOLOv2 with KITTI dataset

Yizhou Wang, Zhichao Lei   July 29, 2018  

GitHub repository:

KITTI dataset contains many real-world computer vision benchmarks for autonomous driving. There are many tasks including stereo, optical flow, visual odometry, 3D object detection and 3D tracking. YOLOv2 is a popular technique for real-time object detection. There are many pre-trained weights for many current image datasets. However, YOLOv2 doesn’t perform well on KITTI object dataset. In this post, I will explain how to train YOLOv2 with KITTI object dataset and show some test results using our trained weights.

Prepare KITTI dataset

We used KITTI object 2D for training YOLO and used KITTI raw data for test. Some of the test results are recorded as the demo video above.

Download data and labels

Download KITTI object 2D left color images of object data set (12 GB) and submit your email address to get the download link. Download training labels of object data set (5 MB). Unzip them to your customized directory <data_dir> and <label_dir>.

Convert KITTI labels to YOLO labels

To simplify the labels, we combined 9 original KITTI labels into 6 classes:


Need to refer the script from Zhichao.

Why is KITTI difficult to train on YOLO?

Many people tried to train YOLOv2 with KITTI dataset but often get really poor performance. This is a typical result of YOLOv2 detection without doing any modification. This is a YOLOv2 trained on 3 classes of KITTI dataset.

Why does YOLOv2 perform bad on KITTI unlike other datasets? After review the basic properties of KITTI, we can find that the shape of the images is really wide: \(1224 \times 370\). However, the default input shape of YOLOv2 is \(416 \times 416\). After this kind of resizing, the bbox of the object would because really thin, and probably result in the bad performance. Moreover, the sizes of the objects in KITTI could be various. Some of the objects could be too small to be detected.

Configuration settings

There are two ways of configuration:

  1. Change the input shape of YOLOv2 model and disable random resizing.
  2. Modify the resizing code in YOLOv2 source code.

Change the input shape

Open the configuration file yolov2-voc.cfg and change the following parameters:

# Training



Also, remember to change the filters in the last convolutional layer to be \(\texttt{filters} = ((\texttt{classes} + 5) \times \texttt{num})\), so that

# last convolutional layer

You can also refine some other parameters like learning_rate, object_scale, thresh, etc. to obtain even better results.

Our configuration file kitti6-yolov2.cfg for KITTI with 6 classes can be found HERE.

Modify the resizing code

Another way (refer to this post) is to directly modify the resizing source code in detector.c Line 69 and Line 79 to the following:

args.w = dim * 3;    
resize_network(nets + i, dim * 3, dim);

Here, I use number 3 to represent the typical aspect ratio in KITTI dataset.

Evaluation on KITTI

The results of mAP for KITTI using original YOLOv2 with input resizing.

Benchmark Easy Moderate Hard
Car 45.32% 28.42% 12.97%
Pedestrian 18.34% 13.90% 9.81%
Cyclist 8.71% 5.40% 3.02%

The results of mAP for KITTI using modified YOLOv2 without input resizing.

Benchmark Easy Moderate Hard
Car 88.17% 78.70% 69.45%
Pedestrian 60.44% 43.69% 43.06%
Cyclist 55.00% 39.29% 32.58%

Test on KITTI image sequences

I wrote several new functions in darknet, which can test YOLO performance for an image sequence. The file names of the image sequence should be listed in a txt file <namelist.txt>.

Test an image sequence: testseq

./darknet detector testseq cfg/ cfg/kitti.cfg <weights_file> <namelist.txt> 

Test an image sequence and save the detection results: twseq

./darknet detector twseq cfg/ cfg/kitti.cfg <weights_file> <namelist.txt> -thresh 0.5 -show 1

I also trained some models using YOLOv3 and Faster R-CNN. The performance and comparisons on KITTI is posted in the following post:

deep-learning  object-detection  kitti  yolo