Back to overview

View Sets for 3-D Object Detection and Recognition

English title View Sets for 3-D Object Detection and Recognition
Applicant Lepetit Vincent
Number 135308
Funding scheme Project funding
Research institution Laboratoire de vision par ordinateur EPFL - IC - ISIM - CVLAB
Institution of higher education EPF Lausanne - EPFL
Main discipline Information Technology
Start/End 01.09.2011 - 31.08.2013
Approved amount 109'327.00
Show all

Keywords (2)

Computer Vision; Object Recognition

Lay Summary (English)

Lay summary

This project studies the properties of images, and how these properties can be exploited to make a computer understand images. In particular, we already showed that looking for a sparse representation of images yields to image features that can be efficiently used for image understanding. We apply these results to object recognition in natural images and veins and neurons segmentation in biomedical images, but the applications also include image denoising and object tracking.

Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants


Name Institute


Learning Separable Filters
Rigamonti Roberto and Sironi Amos and Lepetit Vincent and Fua Pascal (2013), Learning Separable Filters, in Conference on Computer Vision and Pattern Recognitio, Portland, USA.
Accurate and Efficient Linear Structure Segmentation by Leveraging Ad Hoc Features with Learned Filters
R. Rigamonti and V. Lepetit (2012), Accurate and Efficient Linear Structure Segmentation by Leveraging Ad Hoc Features with Learned Filters, in International Conference on Med- ical Image Computing and Computer Assisted Intervention (MICCAI), 189-197.
Supervised Feature Learning for Curvilinear Structure Segmentation
Becker Carlos Joaquin, Rigamonti Roberto, Lepetit Vincent, Fua Pascal, Supervised Feature Learning for Curvilinear Structure Segmentation, in 16th International Conference on Medical Image Computing And Computer Assisted Intervention, Nagoya, Japan.

Associated projects

Number Title Start Funding scheme
124728 View Sets for 3-D Object Detection and Recognition 01.09.2009 Project funding


Object recognition is one of the most fundamental issue Computer Vision has to address if it is ever to truly emulate Human Vision and to find a wide field of application. In the past few years, remarkable progress has been made in that direction thanks in no small measure to novel approaches to recognizing image patches [Lowe04,Miko04b,Winder07,Hua07]. In a first phase of the project we now seek to continue, we have shown that formulating this recognition task as a classification problem led to reliable real-time implementations that perform well even when perspective or lighting change drastically [Lepetit06a,Ozuysal06a,Pilet06a,Ozuysal07]. In a second phase, we demonstrated that training our classifiers could also be massively sped-up by formulating new patch descriptors, thus allowing both training and recognition to be performed online [Calonder08].However, in our early approach as well as many others, patch recognition relies on ad hoc image features, which are computed on the basis of image gradients, with SIFT [Lowe04] being the best known example. Our own features that we refer to as Ferns [Ozuysal10] are an exception to this rule but can also been understood as measuring the contrast between neighboring pixels. The study of mammalian brains offer ample evidence that such features play an important role in human and animal vision and some cell arrangements in the visual cortex appear to behave as Gabor filters. However, other cell arrangements in the same cortical areas appear to implement very different features [Olshausen05], which raises the possibility that one must go beyond simple gradient-based features to truly emulate the brain and improve the performance of our recognition algorithms.We therefore recently developed and evaluated a method to learn image features instead of designing them by hand. Examples of learned features are shown in Figure 1. While some of them correspond to contour detectors, others are indeed more complex. By pooling the features extracted from an image, which is an operation similar to the one performed by the local histograms in SIFT and by the complex cells in the visual cortex, we obtain a representation of the image that is compact and more discriminative than the original image itself: This representation is well adapted to object recognition and outperforms the state-of-the-art on CIFAR-10, a recent and very challenging benchmark.However, our method is still limited to a short pipeline, while there is evidence that by extracting features from the image representation itself, one can obtain a better representation, in terms of both compactness and discriminative power [Hinton06a]. We therefore now want to use our current method as a brick and stack several instances of it to obtain a better discriminative image representations, and thus improve the performances. We believe this approach to be promising because a layered architecture closely matches what is known about the visual cortex and can represent very complex transformations.This will result in a principled way to produce robust image features, with many potential applications. First, the new features will be more complex than those that have been used so far, which seems to be a requirement for efficient object recognition [Mutch06,Fidler10,Zeller10]. Manually designing such complex features is very difficult, and our approach offers a promising way of overcoming this difficulty. Second, it will become easy to develop features for images acquired using sensors other than ordinary cameras, such as infrared cameras and lidars, for which existing features are clearly suboptimal.Finally, we will apply our image representation to online learning. We evaluated our current representation on object category recognition only. It requires a large set of training images, and a slow training stage to build a classifier applied to the image representations. This is not applicable for some important scenarios, for example, a robotic system that has to continuously learn new objects online. We want to use the image representation of objects to learn as templates, and detect these objects by matching their templates against the representation of the incoming images. It would then become virtually instantaneous to learn new incoming objects by simply adding new templates to the database.In short, we intend to develop a robust representation of images appropriate for recognition purposes that should be useful for many Computer Vision algorithms. Furthermore, because it will be partially inspired by our current knowledge of the first layers of the visual cortex, this line of research could help us unravel some of the mysteries of the human visual system.