Project

Back to overview

Computational Reduction for Training and Inference (CORTI)

English title Computational Reduction for Training and Inference
Applicant Fleuret François
Number 188758
Funding scheme Project funding (Div. I-III)
Research institution IDIAP Institut de Recherche
Institution of higher education Idiap Research Institute - IDIAP
Main discipline Information Technology
Start/End 01.03.2020 - 28.02.2022
Approved amount 243'919.00
Show all

Keywords (3)

Pattern recognition; Machine learning; Deep learning

Lay Summary (French)

Lead
Les réseaux de neurones artificiels de grandes dimensions sont devenus en quelques années la méthode la plus efficace pour extraire des informations sémantiques de signaux de grandes dimensions. Ils sont appliqués avec succès à des problèmes aussi divers que la reconnaissance d'objets ou de personnes, la reconnaissance de la parole, l'analyse du langage écrit, ou la génération d'images. Ils sont au coeur de nombreuses technologies que nous utilisons quotidiennement.Ces techniques se révèlent extrêmement efficaces lorsque la quantité de données utilisées pour les entraîner, et la puissance de calcul à disposition, sont extrêmement grandes. Ce besoin de ressources limite l'utilisation de ces techniques aux grands groupes industriels, induit indirectement un impact écologique, et finalement empêche de traiter des problèmes encore plus ambitieux.
Lay summary
Le projet CORTI prend place dans un programme de recherche que nous conduisons depuis plusieurs années et qui vise à réduire le coût computationnel des méthodes d'apprentissage automatique en général et d'apprentissage profond en particulier.

Nous allons d'une part continuer le dévelopement de méthodes qui reposent sur l'échantillonnage de sous-parties du signal pour accélérer l'inférence en concentrant les calculs sur les portions porteuses d'information, et en apprentissage en re-échantillonnant les exemples pour minimiser l'erreur d'estimation de gradient pour un budget de calcul donné. En particulier, nous nous intéresserons à des modèles hiérarchiques d'échantillonnage pour traiter des signaux de très grandes dimensions tels que des images microscopiques à très haute résolution.

D'autre part nous allons étudier des techniques d'apprentissage progressives qui évitent de re-optimiser intégralement un modèles complexe, mais ajoutent progressivement des sous-parties de modèles qui raffinent la représentation. Cela sera formulé dans un cadre de théorie de l'information, où ces représentations successives visent à maintenir le contenu de l'information tout en la rendant plus facile à exploiter géométriquement.

Direct link to Lay Summary Last update: 27.04.2020

Responsible applicant and co-applicants

Employees

Associated projects

Number Title Start Funding scheme
169112 Importance sampling for large-scale unsupervised learning (ISUL) 01.03.2017 Project funding (Div. I-III)

Abstract

This project is a follow-up to the ISUL project, to fund the 4th year of two ongoing PhD theses, and open a new sub-project to investigate a very promising topic that spanned from the research we have conducted, but is too rich to be tackled in the context of the two already running theses.The ISUL project aimed at developing novel machine-learning algorithms to address two fundamental issues with modern techniques: their need for both very large data corpora and heavy computation. We have developed a series of methods that allow the transfer of structures from an existing network to facilitate the training of a new one, on a different task, for which few data examples are available. Our approaches rely on mimicking the behavior of the existing network not only point-wise, but also in term of local changes. We have in parallel developed techniques that reduce the computational cost of training and inference by relying heavily on sampling to approximate dense weighted averaging.We structure this new proposal in three sub-projects:The first sub-project will continue our work on transfer learning first by improving the optimization itself, as we observed that the complexity of the underlying optimization problem is key. Additionally, we will consider using deep generative models to produce synthetic data capturing the joint distribution of the signal components. We can see their use as a Monte-Carlo generalization of our approaches based on first order derivatives to an arbitrary order.The second sub-project will extend our line of research on sampling for gradient descent and inference. We have recently investigated the use of sampling during inference, and shown that end-to-end gradient-based learning can be generalized to such a context. Our current algorithm relies on sampling an image at a fixed scale to reject poorly informative parts, and does not take into account that different scales may lead to different statistics. This is what we are planning next.From there, we are envisioning a generalization to sampling the model itself, looking jointly at parts of the model and parts of the signal, and sample along both axes jointly. This can be seen as a data-driven adaptive dropout, that modulates the computation required for a given level of accuracy.Finally, the third project will initiate a new line of work whose objective is to combine model-selection and training into a unified forward generation of a model, avoiding at the same time the costly back-propagation of the gradient, and a grid-search for the optimization of meta-parameters.The key motivation behind this new direction is the view of a deep model as a progressive refinement of an internal representation, combined with methods based on information theory that provide criteria to assess if the change occurring at a certain level of an architecture is beneficial to the overall task at hand. Our objective is to leverage these tools and reformulate explicitly the training of a model as the progressive design of a topological deformation of the feature space in low dimension, to avoid back-propagation and gradient descent.
-