Logo


Weizmann Cars ViewPoint
A Benchmark For Continuous Viewpoint Estimation



Below we present the Weizmann Cars ViewPoint (WCVP) Benchmark. The benchmark consists of set of images along with a protocol for evaluation of continuous viewpoint estimation.

Download the dataset (revised version* November 7 2012): [zip]

The data

The WCVP dataset consists of 1530 images of cars, composed of 22 sets. Each set consists of approximately 70 images and shows a different car model. This data was first used in our [ICCV11] paper. The benchmark was introduced in the [IMAVIS] journal paper. The pictures were taken in an unconstrained outdoor setting using a hand-held camera. There are significant illumination changes, and in some images the car is cropped or occluded. Many images in our dataset also include cars in the background. These less restricted conditions make for a more challenging detection task.

multiview images $ \ \ \circ \ \ \circ \ \ \circ \ \ $ multiview images

Denote the set of images by $\{x_i\}_{i\in I}$ where $I = \{(1,1),(1,2),\ldots,(1,65),\ldots,(22,56)\}$ is a set of pairs of indices, the first is a car index and the second an image index. Each image is associated with an annotation $y_i = (BB_i,vp_i)$. Here $BB_i = \left(mincol_i,minrow_i,maxcol_i,maxrow_i\right)$ is a 4 vector indicating the coordinates of the bounding box of the car in the image and $vp_i = (\theta_i,\phi_i)$ is a two vector representing the azimuth and elevation of the camera respectively. Azimuth angles are in the range $[-180^{\circ},180^{\circ}]$ where $0^{\circ}$ corresponds to a back view car and $90^{\circ}$ to a right facing car. Elevation angles are in the range $[-90^{\circ},90^{\circ}]$. There is a single annotation $y_i$ associated with an image $x_i$ and this annotation always refers to the largest car in the image, more precisely the car whose bounding box in the image has the largest area.

High-resolution images The original high-resolution images which were used as input to Bundler can be found here [zip]. The images in the WCVP dataset are downscaled versions of these images which were further radialy corrected by PMVS2.

Initial version An initial version of this dataset contained 9 images with radial distortion artifacts in the car_015 folder. We thank Damien Teney for spotting them! These images have been removed from the dataset on November 7 2012. The experiments reported in our ICCV11 and IMAVIS publications included these images. They can be found here [zip].

The estimation task

The estimation task is to generate a set $\{\hat{vp}_i\}_{i\in I} = \{(\hat{\theta}_i,\hat{\phi_i})\}_{i\in I}$ of predictions for the viewpoints. We partition the 22 car models into three sets $S_1,S_2,S_3$ of sizes 7,7 and 8 respectively and generate a corresponding partition of the images $I = I_1 \cup I_2 \cup I_3$. We go over these three subsets using one as a test set and the other two as training. Thus, in order to generate a pose estimate $\hat{vp}_i$ for some $i \in I_p$ we allow the use of all pairs $(x_j,y_j)_{\{j\in I_q| q \neq p\}}$. The sets $\{S_i\}_{i=1}^3$ are given by \[ S_1 = \{ 19, 7, 17, 15, 5, 4, 3\} \] \[ S_2 = \{ 9, 13, 11, 6, 22, 8, 18\} \] \[ S_3 = \{ 21, 14, 1, 20, 2, 10, 16, 12 \} \]

Evaluation

We score an estimate by generating a vector of azimuth errors and a vector of elevation errors \begin{equation} err_i^{\theta} = \min \{ \hat{\theta}_i - \theta_i \pmod{360}, \theta_i - \hat{\theta}_i \pmod{360} \} \end{equation} \begin{equation} err_i^{\phi} = |\hat{\phi}_i - \phi_i| \end{equation} We summarize each one of these error vectors with three statistics, the median the mean and the standard deviation. The median provides a measure which damps the effect of flipped estimates which are quite common. As a more detailed summary of the errors we also generate 46 bin histograms of these vectors.

Citing the dataset

If you use this dataset in your research please consider citing our paper [bibtex]
@article{GGABS12,
title={Viewpoint-Aware Object Detection and Continuous Pose Estimation},
author={Glasner, D. and Galun, M. and Alpert, S. and Basri, R. and Shakhnarovich, G.},
journal={Image and Vision Computing},
year={2012},
ee= {http://dx.doi.org/10.1016/j.imavis.2012.09.006}
}


eXTReMe Tracker