Project

Back to overview

Interactive Cognitive Systems, Indoor Scene Recognition for Intelligent Systems

Applicant Caputo Barbara
Number 146411
Funding scheme Project funding (Div. I-III)
Research institution IDIAP Institut de Recherche
Institution of higher education Idiap Research Institute - IDIAP
Main discipline Information Technology
Start/End 01.04.2013 - 31.12.2013
Approved amount 45'120.00
Show all

Keywords (6)

Kernel Methods; Cue Integration; Robot Vision; Computer Vision; Machine Learning; Place recognition

Lay Summary (Italian)

Lead
Humans usually refer to rooms and artificial environments in terms of their visual appearance (the corridor), the activities they usually perform in them (the fitness room) and the objects they contain (the bedroom). The main goal of this project is to develop a system able to visually learn such semantic concepts.
Lay summary

Gli esseri umani si riferiscono di solito agli ambientin cui operano facendo riferimento alla loro apparenza visiva (il corridoio), alle attivita' che di solito vengono svolte in quella stanza (la palestra) e agli oggetti che contengono (la camera da letto). I nomi legati alla descrizione di questi ambienti costituiscono una rappresentazione semantica dellle loro proprieta' e del loro signifcato. Lo scopo di questo progetto e' di sviluppare algorimi camaci di imparare in maniera automatica, avendo come input delle immagini digitali, questi significati semantici. Lo scopo finale e' di usare questi algoritmi in sistemi automatici autonomi come robots, per renderli sempre piu' in grado di collaborare con noi e di assisterci nella nostra vita quotidiana.

 

 

Direct link to Lay Summary Last update: 28.03.2013

Responsible applicant and co-applicants

Employees

Name Institute

Publications

Publication
Indoor Scene Recognition using Task and Saliency-driven Feature Pooling
Fornoni Marco and Caputo Barbara (2013), Indoor Scene Recognition using Task and Saliency-driven Feature Pooling, in British Machine Vision Conference.
Multiclass Latent Locally Linear Support Vector Machines
Fornoi Marco and Caputo Barbara and Orabona Francesco (2013), Multiclass Latent Locally Linear Support Vector Machines, in Asian Conference on Machine Learning.

Collaboration

Group / person Country
Types of collaboration
Toyotituteta Technological Institute United States of America (North America)
- Exchange of personnel

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
Asian conference on Machine Learning Talk given at a conference Multiclass Latent Locally Linear Support Vector Machines 11.11.2013 Camberra, Australia Fornoni Marco;


Associated projects

Number Title Start Funding scheme
131187 Situated Vision to Perceive Object Shape and Affordances 01.11.2011 Project funding (Div. I-III)
132619 Interactive Cognitive Systems (ICS) 01.10.2010 Project funding (Div. I-III)
132619 Interactive Cognitive Systems (ICS) 01.10.2010 Project funding (Div. I-III)

Abstract

Humans usually refer to rooms and arti cial environments in terms of their visual appearance (the corridor), the activities they usually perform in them (the tness room) and the objects they contain (the bedroom). The nouns used to represent these environments epitomize their semantics and allow them to use abstract representations while easily communicating their spatial concepts. In order to simplify the interaction between humans and arti cial agents, and to enable a high-level reasoning using abstract spatial concepts, the human representation of space should also be understood and reproduced by arti cial agents. For example, a robot s de nition of \oce", should point to the same set of places that a human would recognize as such. The main goal of this project is to develop a system able to visually learn such semantic concepts and to take advantage of them also in working scenarios which di er from the original ones. Taking inspiration from biological models of human perception, we identify two main components for the representation of indoor scenes: (1) a description of the global appearance of the image in term of image features, and (2) a description of the local landmarks present in some regions of the image. From a computational point of view the two representations could be regarded as a global appearance description of the scene, for example by means of statistics of visual features, and as a statistical representation of the co-occurrence of local concepts and scene categories. The design and integration of computational models of these two perceptual components, suitable for indoor place categorization, constitutes the core of our research. This project is a two-years renewal of the project ISC Interactive Cognitive Systems, track `Indoor Scene Recognition for Intelligent Systems, that started in 2011. The fund covers for the salary of Mr. Marco Fornoni, a PhD student at EPFL. During the rst two years of the project, we casted the problem into that of learning from multiple cues. Our contributions have been a principled online Multi Kernel Learning algorithm able to combine optimally multiple features while providing theoretical guarantees on the expected performance, and a global feature representation encoding at the same time task-driven and data driven spatial information. The combination of these two contributions has led us to obtain the state of the art in the eld, as measured on reference benchmark databases. The objective of this proposal is to move forward on this path, and to address three main open issues: (a) with respect to the feature representation, we will develop methods for encoding local landmark information, Rather than building object-based representations, we will take advantage of the Image-to-Class paradigm proposed in the NBNN classi er [34], casting it e ectively into the Multi Kernel Learning framework; (2) with respect to cue integration, we will move from a global to a local Multi Kernel Learning formulation, so to be able to assign di erent weights to di erent samples, over the multiple cues. Lastly, we will develop domain adaptation methods able to cope with di erent features and classi ers in the source and target tasks, so to enable intelligent systems to learn semantic spatial concepts from images collected on the Web, and use them in their own situated settings.
-