Project

Back to overview

UNSUPERVISED LEARNING OF 3D MODELS FOR OBJECT DETECTION AND CATEGORIZATION

English title UNSUPERVISED LEARNING OF 3D MODELS FOR OBJECT DETECTION AND CATEGORIZATION
Applicant Favaro Paolo
Number 149227
Funding scheme Project funding (Div. I-III)
Research institution Institut für Informatik Universität Bern
Institution of higher education University of Berne - BE
Main discipline Information Technology
Start/End 01.01.2014 - 31.12.2016
Approved amount 169'500.00
Show all

All Disciplines (2)

Discipline
Information Technology
Mathematics

Keywords (4)

convex optimization; categorization; object detection; 3D reconstruction

Lay Summary (Italian)

Lead
Proponiamo metodi nuovi per localizzazione e categorizzazione di oggetti in immagini basati sulla forma 3D di categorie di oggetti. Il nostro approccio propone nuove tecniche per imparare i modelli 3D direttamente dai dati e nuovi metodi per usare questi modelli per la localizzazione e categorizzazione.
Lay summary

La localizzazione di oggetti e la categorizzazione sono problemi fondamentali in visione computazionale e i primi passi necessari a costruire sistemi intelligenti che possano interagire con il nostro ambiente. Purtroppo questi problemi sono ancora di incredibile complessità a causa della elevata dimensionalita' e della elevata nonlinearita' nella dipendenza da fattori quali il punto di vista, l'illuminazione, la variabilità all'interno della classe, similitudine con lo sfondo, occlusioni e rumore.

Prendendo ispirazione da approcci recenti, proponiamo di studiare metodi nuovi per localizzazione e categorizzazione incentrati sulla forma 3D degli oggetti. Infatti l'uso della forma 3D degli oggetti da' una rappresentazione più accurata che non un insieme di viste 2D. Questo permette di gestire più correttamente cambi di punto di vista, illuminazione e occlusioni. L'originalita' nel nostro approccio risiede nel modo di imparare i modelli 3D dai dati e come proponiamo di usarli per la localizzazione e categorizzazione. Uno dei contributi principali e' che proponiamo di imparare i modelli direttamente da una collezione di immagini. Per fare questo occorre trovare le corrispondenze tra parti di oggetti nella stessa classe (ma con posa e texture diversi). Proponiamo sia nuovi filtraggi (features) locali che nuove tecniche di 1-shot learning. I modelli 3D assieme ad una mappa del texture estratti per ogni categoria vengono poi usati per la localizzazione e categorizzazione in nuove immagini.


Direct link to Lay Summary Last update: 11.12.2013

Responsible applicant and co-applicants

Publications

Publication
Understanding Degeneracies and Ambiguities in Attribute Transfer
Szabó Attila, Hu Qiyang, Portenier Tiziano, Zwicker Matthias, Favaro Paolo (2018), Understanding Degeneracies and Ambiguities in Attribute Transfer, in ECCV, Munich, GermanySpringer International Publishing/IEEE/CVF, Munich, Germany.
Disentangling Factors of Variation by Mixing Them
Zwicker Matthias, Hu Qiyang, Szabo Attila, Portenier Tiziano, Favaro Paolo (2018), Disentangling Factors of Variation by Mixing Them, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UTIEEE/CVF, Salt Lake City, UT.
Building the View Graph of a Category by Exploiting Image Realism
Szabo Attila, Vedaldi Andrea, Favaro Paolo (2015), Building the View Graph of a Category by Exploiting Image Realism, in IEEE International Conference on Computer Vision Workshop (ICCVW), IEEE ICCV, Santiago, Chile.

Collaboration

Group / person Country
Types of collaboration
Andrea Vedaldi, Visual Geometry Group, University of Oxford Great Britain and Northern Ireland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
European Conference on Computer Vision Poster Understanding Degeneracies and Ambiguities in Attribute Transfer 08.09.2018 Munich, Germany Szabo Attila; Favaro Paolo;
Conference on Computer Vision and Pattern Recognition Poster Disentangling Factors of Variation by Mixing Them 19.06.2018 Salt Lake City, United States of America Favaro Paolo; Szabo Attila;
International Conference on Learning Representations Poster Challenges in Disentangling Independent Factors of Variation 30.04.2018 Vancouver, Canada Szabo Attila;
5th International Workshop on 3D Representation and Recognition (3dRR-15) Talk given at a conference Building the View Graph of a Category by Exploiting Image Realism 17.12.2015 Santiago, Chile Szabo Attila;


Associated projects

Number Title Start Funding scheme
156253 Sketch-Based Image Synthesis 01.02.2015 Project funding (Div. I-III)
188690 Unupervised Learning of Interactions from Real Data 01.09.2020 Project funding (Div. I-III)
169622 Analysis and Design of Self-Supervised Learning Methods 01.04.2017 Project funding (Div. I-III)

Abstract

Object detection and categorization are fundamental problems in computer vision and the first steps towards building intelligent systems capable of interacting with our environment. Unfortunately, these are problems of baffling complexity due to the high dimensionality of visual data and its highly nonlinear dependency to nuisance factors such as viewpoint, scale, illumination, intra-class object variability, clutter, occlusions and noise.Inspired by recent approaches, we propose to investigate a novel detection and categorization approach focused on the 3D shape of objects. The novelty in our approach lies in how we learn the 3D model from data and how we use it for detection and categorization. During training we propose to build a 3D model of an object directly from images of a category. At runtime, we use 3D models at two levels: locally, to select valid 2D landmarks in images and, globally, to certify that the collection of selected 2D landmarks is the projection of a known 3D geometry. To deal with intra-class variability, a key ingredient in our approach is the design of feature descriptors invariant to texture variations. These descriptors are used to establish landmark correspondences both during training and testing. Textural information is only used at runtime for the category selection.
-