Project

Back to overview

View Sets for 3-D Object Detection and Recognition

English title View Sets for 3-D Object Detection and Recognition
Applicant Lepetit Vincent
Number 124728
Funding scheme Project funding (Div. I-III)
Research institution Laboratoire de vision par ordinateur EPFL - IC - ISIM - CVLAB
Institution of higher education EPF Lausanne - EPFL
Main discipline Information Technology
Start/End 01.09.2009 - 31.08.2011
Approved amount 99'471.00
Show all

Keywords (7)

Object Detection; Pose Estimation; Computer Vision; Object Recognition; Deep Belief Networks; 3D Pose; Machine Learning

Lay Summary (English)

Lead
Lay summary
Object recognition is one of the most fundamental issue Computer Vision has to address if it is ever to truly emulate Human Vision and to find a wide field of application. Recently, remarkable progress has been made in that direction thanks to novel approaches to recognizing image patches. However, these methods rely on ad hoc image features, usually image gradients. The study of mammalian brains offer ample evidence that such features play an important role in human and animal vision and some cell arrangements in the visual cortex appear to behave as Gabor filters. However, other cell arrangements in the samecortical areas appear to implement very differentfeatures (Olshausen 2005), which raises the possibility that one must go beyond gradient-based features to truly emulate the brain and improve the performance of our recognition algorithms.In this continuation of our earlier work, we will investigate this possibility with a view to improving the performance of current recognition algorithms: Instead of designing image features by hand as is currently the norm, we will use statistical learning techniques to learn them from image data to maximize recognition rates.To this end, we will consider Deep Belief Networks (DBNs) (Hinton2006) made of linear filters whose output is fed to a sigmoid-like function. We believe this approach to be promising because a layered architecture closely matches what is known about the visual cortex andcan represent very complex transformations. DBNs, however, sufferfrom some limitations: There currently is no provision for robustness to perspective distortion, changing illumination, or the small shifts in feature location that they induce. Since such robustness is critical to the success of our approach, we will train the layers to produce similar outputs for inputs that differ only in their viewing conditions.This will result in a principled way to produce robust image features, with many potential applications. First, the new features will serve the same purpose as SIFT or FERN features, but with superior performance thanks of optimization. Second, it will become easy to develop features for images acquired using sensors other than ordinary cameras, such as infrared cameras and lidars, for which existing features are clearly suboptimal.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Name Institute

Publications

Publication
Is Sparsity Really Relevant for Image Classification ?
Rigamonti Roberto, Brown Matthew, Lepetit Vincent (2011), Is Sparsity Really Relevant for Image Classification ?, IEEE, international conference on Computer Vision and Pattern Recognition.
BRIEF: Computing a Local Binary Descriptor Very Fast
Calonder Michael, Lepetit Vincent, Ozuysal Mustafa, Trzinski Tomasz, Fua Pascal, BRIEF: Computing a Local Binary Descriptor Very Fast, in IEEE Transactions on Pattern Analysis and Machine Intelligence.
On the Relevance of Sparsity for Image Classification
Rigamonti Roberto, Lepetit Vincent, Gonzalez German, Turetken Engin, Benmansour Fethallah, Brown Matthew, Fua Pascal, On the Relevance of Sparsity for Image Classification.

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
Microsoft Computer Vision School 29.07.2011 Moscow, Russia
international conference on Computer Vision and Pattern Recognition 20.06.2011 Colorado Springs, USA


Associated projects

Number Title Start Funding scheme
116195 View Sets for 3-D Object Detection and Recognition 01.07.2007 Project funding (Div. I-III)
135308 View Sets for 3-D Object Detection and Recognition 01.09.2011 Project funding (Div. I-III)
116195 View Sets for 3-D Object Detection and Recognition 01.07.2007 Project funding (Div. I-III)

Abstract

Object recognition is one of the most fundamental issue Computer Vision has to address if it is ever to truly emulate Human Vision and to find a wide field of application. In the past few years, remarkable progress has been made in that direction thanks in no small measure to novel approaches to recognizing image patches [Lowe04, Miko04b, Winder07, Hua07]. In a first phase of the project we now seek to continue, we have shown that formulating this recognition task as a classification problem led to reliable real-time implementations that perform well even when perspective or lighting change drastically [Lepetit06a, Ozuysal06a, Pilet06a, Ozuysal07]. In a second phase, we demonstrated that training our classifiers could also be massively sped-up by formulating new patch descriptors, thus allowing both training and recognition to be performed online [Calonder08].However, in our approach as well as others we know of, patch recognition relies on ad hoc image features, which are computed on the basis of image gradients, with SIFT [Lowe04] being the best known example. Our own features that we refer to as FERNS [Ozuysal07] are an exception to this rule but can also been understood as measuring the contrast between neighboring pixels. The study of mammalian brains offer ample evidence that such features play an important role in human and animal vision and some cell arrangements in the visual cortex appear to behave as Gabor filters. However, other cell arrangements in the same cortical areas appear to perform a very different function [Olshausen05], which raises the possibility that one must go beyond gradient-based features to truly emulate the brain and improve the performance of our recognition algorithms.In the continuation of our work we now propose, we will investigate this possibility with a view to improving the performance of current recognition algorithms. More specifically, instead of designing image features by hand as is currently the norm, we will use statistical learning techniques to learn them from image data to maximize recognition rates. Since the set of all features that can be derived from an image patch is enormous, we will restrict ourselves to Deep Belief Networks (DBNs) [Hinton06a] made of linear filters whose output is fed to a sigmoid-like function. We believe this approach to be promising because a layered architecture closely matches what is known about the visual cortex and can represent very complex transformations. Furthermore, efficient DBN optimization algorithms have been developed [Hinton06a]. DBNs, however, suffer from some limitations: Because they have so far been used mostly forcharacter recognition, there currently is no provision for robustness toperspective distortion, changing illumination, or the small shifts in featurelocation that they induce. Since such robustness is critical to the success of our approach, we will build it in the DBNs by training layers to produce similar outputs for inputs that differ only in their viewing conditions.This will result in a principled way to produce robust image features, with many potential applications. First, the new features will serve the same purpose as SIFT or FERN features, but with superior performance thanks of optimization. Second, it will become easy to develop features for images acquired using sensors other than ordinary cameras, such as infrared cameras and lidars, for which existing features are clearly suboptimal. Finally, it is becoming increasingly clear that sophisticated object recognition requires less local and more complex image features than those that have been used so far [Mutch06, Fidler07]. However, manually designing such complex features is very difficult, and our approach offers a promising way of overcoming this difficulty.In short, we intend to develop a robust representation of images appropriate for recognition purposes that should be useful for many Computer Vision algorithms. Furthermore, because it will be partially inspired by our current knowledge of the first layers of the visual cortex, this line of research could help us unravel some of the mysteries of the human visual system.
-