Project

Back to overview

Deep Structured Representation Learning for Visual Recognition

English title Deep Structured Representation Learning for Visual Recognition
Applicant Salzmann Mathieu
Number 175581
Funding scheme Project funding (Div. I-III)
Research institution Laboratoire de vision par ordinateur EPFL - IC - ISIM - CVLAB
Institution of higher education EPF Lausanne - EPFL
Main discipline Information Technology
Start/End 01.09.2018 - 31.08.2022
Approved amount 334'033.00
Show all

Keywords (5)

Visual recognition; Deep learning; Structured image representations; Convolutional neural networks; Computer vision

Lay Summary (French)

Lead
Ces dernières années, grâce au développement d’énormes bases de donnée d’image et de vidéo et de réseaux de neurones artificiels, d’énormes progrès ont été réalisés dans de domaine de la reconnaissance visuelle. Ceci est dû, en grande partie, au fait que les réseaux de neurones artificiels apprennent en même temps une représentation de l’image en entrée et un classificateur pour cette image. Cependant, bien que la recherche ait fait beaucoup avancer les architectures des réseaux, l’optimisation de leurs paramètres et la forme du classificateur, le type de représentation qu’ils extraient des images reste en essence inchangé. Ceci contraste avec les efforts antérieures aux réseaux de neurones, où des représentations structurées, sous forme d’histogrammes, de matrices de covariance, ou de sous-espaces, ont été étudiées.
Lay summary
Le but de ce projet est d’enrichir le répertoire des représentations que les réseaux de neurones artificiels peuvent exploiter. Dans un projet en cours, nous étudions l’utilisation de matrices de covariance à l’intérieur d’un réseau. Ici, nous allons étendre cette recherche à différents types de représentations structurées, telles que les histogrammes et les sous-espaces. De plus, nous allons développer des techniques pour apprendre automatiquement l’architecture de ces réseaux de neurones, ce qui, par contraste avec l’approche actuelle consistant à définir l’architecture à la main, facilitera grandement leur déploiement pour de nouvelles tâches.
 
Direct link to Lay Summary Last update: 06.12.2017

Responsible applicant and co-applicants

Employees

Associated projects

Number Title Start Funding scheme
165648 Second-Order Layers in Deep Networks for Visual Recognition 01.09.2017 Project funding (Div. I-III)

Abstract

Automatically recognizing and segmenting objects in images and videos is central to a wide variety of application domains, such as security and autonomous driving. Visual recognition has therefore been one of the fundamental goals of Computer Vision since its inception. In the past few years, with the increasing amount of images and videos available online, Deep Learning, and particularly Convolutional Neural Networks (CNNs), have delivered spectacular progress in this field. In contrast with earlier approaches that first extracted handcrafted features from the images and then trained a classifier on these features, CNNs learn the features and the classifier together.However, while there has been great advances in terms of network architectures, classifier types and optimization strategies, the representations extracted by CNNs have remained largely unchanged. Each layer applies filters followed by a nonlinear transformation to the output of the previous one and, sometimes, a pooling operation. This is a much less diverse repertoire of image operations than those that were used previously, which include Bags of Visual Words and spatial pyramids, Region Covariance Descriptors, and, for image sets and videos, Subspace-based descriptors. In the past, all these structured representations have proven highly effective at modeling a wide variety of problems. Furthermore, as shown in recent work, the limited diversity of CNN operations makes them unsuitable for a range problems.In other words, structured representations have essentially been discarded from current Deep Learning architectures, even though they have proven their worth in earlier approaches. The goal of this project is therefore to develop new deep architectures that leverage the representation power of structured descriptors within an end-to-end learning formalism. To this end, we will pursue the following research directions:- Learning histogram-based representations. We will develop architectures that jointly learn a codebook and a representation of the input image such that the latter relates to the elements of the former. In particular, we will study the use of generative models to learn interpretable codebooks.- Learning subspace-based representations. We will develop architectures that model image sets as linear subspaces, encoded via an orthonormal matrix computed from the global representations of the individual images. We will study different ways to extract these subspaces, as well as different layer types to transform a subspace into a different one or into a vector.- Learning the structure of a network. A severe drawback of deep networks is that there is no principled way to automatically determine the best architecture for the problem at hand. Developing new layer types, as we propose, will only add to this problem. To overcome this, we will develop algorithms that learn the network structure, including the number and type of layers and the number of units per layer.In an ongoing project, we have begun developing deep architectures that extract covariance-based representations. Here, we aim to expand this to other structured representations, which will require us to introduce entirely different architectures. The purpose of this proposal therefore is to continue and expand our previous research direction and encompass it in a single project on the more general topic of learning deep structured representations for visual recognition.In short, our project will enrich the repertoire of representations that Deep Learning can exploit, while at the same time providing automated design tools to incorporate them into a working architecture. We are therefore confident that it will lead to significant advances in visual recognition and motivate other researchers, in this and other fields, to go beyond standard Deep Learning architectures.
-