Project

Back to overview

Analysis and Design of Self-Supervised Learning Methods

English title Analysis and Design of Self-Supervised Learning Methods
Applicant Favaro Paolo
Number 169622
Funding scheme Project funding (Div. I-III)
Research institution Institut für Informatik Universität Bern
Institution of higher education University of Berne - BE
Main discipline Information Technology
Start/End 01.04.2017 - 31.07.2020
Approved amount 176'463.00
Show all

All Disciplines (2)

Discipline
Information Technology
Mathematics

Keywords (5)

transfer learning; unsupervised learning; self-supervised learning; neural network unfolding; neural networks

Lay Summary (Italian)

Lead
Metodi per Sfruttare Grandi Quantita' di Dati senza Annotazione Manuale
Lay summary

L'obbiettivo principale in molti campi scientifici e' l'abilita' di fare predizioni. Data dell'informazione (testo, una immagine, un suono) vorremmo essere capaci di dire cosa seguira'. Per esempio, quando giochiamo ad uno sport vorremmo essere capaci di predire bene dove la palla andrà quando la lanciamo o calciamo. La nostra abilita' di predire dipende dal modello che costruiamo con l'esperienza nel nostro cervello e dalla poca nuova informazione che ci e' data (per es., la nostra posizione rispetto alla palla o la configurazione del nostro corpo). Possiamo anche migliorare riprovandoci piu' volte, cosicché possiamo collegare le nostre azioni alle predizioni che facciamo, un processo che in machine learning e' detto training.

Le stesse considerazioni si applicano alle macchine. Quando si può dire che una macchina ha imparato un buon modello dei dati? Quando può fare buone predizioni sui dati stessi. Una metodologia che ha funzionato molto bene nell'ultima decina di anni e piu' e' il supervised training. Si definisce un grande insieme di dati fatti da coppie (misure, etichette). Training poi consiste nel costruire un predittore che prende una misura e restituisce una etichetta. In questo progetto consideriamo un punto di vista diverso: prendiamo un dato e lo separiamo in due parti (misure, etichette), cosi' che una parte del dato funge da misura che usiamo per predire l'altra parte del dato, l'etichetta. Un aspetto interessante di questa formulazione e' che possiamo usare supervised learning per risolvere questo compito. Un altro aspetto interessante e' che per predire una parte dei dati con l'altra, la macchina deve imparare la struttura dei dati e come le parti sono relazionate tra loro. Esploreremo questo aspetto per imparare attributi semantici dei dati. 

Direct link to Lay Summary Last update: 01.02.2017

Lay Summary (English)

Lead
Methods to Exploit Large Quantities of Data without Human Annotation
Lay summary

The main concern in many scientific fields is the ability to make predictions. Given some piece of information (text, an image, a sound) we would like to be able to say what will come next. For example, when we play a sport we would like to be able to predict well where the ball will go when we throw it or kick it. Our ability to predict depends on the model that we build from experience in our brain and from the little new information that we are given (e.g., our position with respect to the ball and the configuration of our body). We can also improve by trying over and over, so that we can link our actions to the predictions we make, a process that in machine learning is called training. 

The same considerations apply to machines. When can one say that a machine has learned a good model about data? When it can make good predictions about the data itself. A framework that has worked very well in the last decade and more is that of supervised learning. One needs to build a large set of data made of pairs (measurement, label). Training then corresponds to building a predictor that takes the measurement and returns the label. In this project we consider a little twist: we take a data sample and split it into (measurements, label), so that one part of the sample works as a measurement that we use to predict the other part of the sample, the label. An interesting aspect of this formulation is that we can use supervised learning tools to solve the task. Another interesting aspect is that in order to predict one part of the sample with the other the machine needs to understand the structure of data and how the two parts are related. We plan to exploit this aspect to learn semantic attributes of data.

Direct link to Lay Summary Last update: 01.02.2017

Responsible applicant and co-applicants

Employees

Name Institute

Publications

Publication
Learning to Have an Ear for Face Super-Resolution
Meishvili Givi, Jenni Simon, Favaro Paolo (2020), Learning to Have an Ear for Face Super-Resolution, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USAIEEE/CVF, Seattle, WA, USA.
Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics
Jenni Simon, Jin Hailin, Favaro Paolo (2020), Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics, in 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USAIEEE/CVF, Seattle, WA, USA.
Video Representation Learning by Recognizing Temporal Transformations
Jenni Simon, Meishvili Givi, Favaro Paolo (2020), Video Representation Learning by Recognizing Temporal Transformations, in arXiv preprint arXiv:2007.10730, ECCV, Glasgow, UK.
On Stabilizing Generative Adversarial Training With Noise
Jenni Simon, Favaro Paolo (2019), On Stabilizing Generative Adversarial Training With Noise, in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USAIEEE/CVF, Long Beach, CA, USA.
Deep Bilevel Learning
Jenni Simon, Favaro Paolo (2018), Deep Bilevel Learning, in ECCV, Springer International Publishing, Munich, Germany.
Self-Supervised Feature Learning by Learning to Spot Artifacts
Jenni Simon, Favaro Paolo (2018), Self-Supervised Feature Learning by Learning to Spot Artifacts, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USAIEEE/CVF, Salt Lake City, UT, USA.

Collaboration

Group / person Country
Types of collaboration
Adobe United States of America (North America)
- Publication

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
ECCV 2020 Workshop: Self Supervised Learning: What is Next? Individual talk Perspectives on Unsupervised Representation Learning 28.08.2020 Glasgow, Great Britain and Northern Ireland Favaro Paolo;
European Conference on Computer Vision Poster Video Representation Learning by Recognizing Temporal Transformations 24.08.2020 Glasgow, Great Britain and Northern Ireland Jenni Simon; Favaro Paolo;
IEEE Conference on Computer Vision and Pattern Recognition 2020 Talk given at a conference Steering Self-Supervised Feature Learning Beyond Local Pixel Statistics 16.06.2020 Seattle, United States of America Jenni Simon;
Seminar at the Department of Informatics, University of Edinburgh Individual talk Beyond Supervised Learning 12.02.2020 Edinburgh, Great Britain and Northern Ireland Favaro Paolo;
Seminar at the Department of Information Technology and Electrical Engineering, ETH Individual talk Towards Unsupervised Learning 07.03.2019 Zurich, Switzerland Favaro Paolo;
Seminar at the Department of Electrical Engineering, University of Oxford Individual talk Towards Unsupervised Learning 23.11.2018 Oxford, Great Britain and Northern Ireland Favaro Paolo;
Seminar at the Department of Electrical Engineering, University of Padova Individual talk Beyond Unsupervised Learning 28.09.2018 Padova, Italy Favaro Paolo;
European Conference on Computer Vision Poster Deep Bilevel Learning 10.09.2018 Munich, Germany Jenni Simon; Favaro Paolo;
The Rank Prize Funds Individual talk Beyond Supervised Learning 22.08.2018 Grasmere, Great Britain and Northern Ireland Favaro Paolo;
International Computer Vision Summer School (ICVSS) 2018 Individual talk Beyond Supervised Learning 09.07.2018 Scicli, Italy Favaro Paolo;
Seminar at the Institute of Informatics, ETH Individual talk Beyond Supervised Learning 02.07.2018 Zurich, Switzerland Favaro Paolo;
PAISS 2018 Poster Self-Supervised Feature Learning by Learning to Spot Artifacts 02.07.2018 Grenoble, France Jenni Simon;
2nd Workshop in Beyond Supervised Learning at CVPR 2018 Talk given at a conference Unsupervised Learning and Knowledge Transfer 22.06.2018 Salt Lake City, United States of America Favaro Paolo;
IEEE Conference on Computer Vision and Pattern Recognition 2018 Talk given at a conference Self-Supervised Feature Learning by Learning to Spot Artifacts 19.06.2018 Salt Lake City, United States of America Jenni Simon;
IEEE Conference on Computer Vision and Pattern Recognition Poster On Stabilizing Generative Adversarial Training with Noise 16.06.2018 Long Beach, CA, United States of America Jenni Simon; Favaro Paolo;
Seminar at Novartis, Basel Individual talk Beyond Supervised Learning 23.02.2018 Basel, Switzerland Favaro Paolo;
Seminar at the Italian Institute of Technology Individual talk Beyond Supervised Learning 15.01.2018 Genova, Italy Favaro Paolo;


Awards

Title Year
PAISS best poster award 2018

Associated projects

Number Title Start Funding scheme
149227 UNSUPERVISED LEARNING OF 3D MODELS FOR OBJECT DETECTION AND CATEGORIZATION 01.01.2014 Project funding (Div. I-III)
188690 Unupervised Learning of Interactions from Real Data 01.09.2020 Project funding (Div. I-III)

Abstract

Since the breakthrough performance of AlexNet on the Large Scale Visual Recognition Challenge (ILSVRC) in 2012, neural networks (NN) are being used in virtually any area of computer vision. This breakthrough was made possible by the development of better learning methodologies for neural networks and access to better hardware (GPUs, in particular). However, another important factor was the availability of very large annotated datasets. These datasets were built through crowdsourcing internet services, such as those offered by Datatang and Amazon. These services allow researchers to collect annotations of images by distributing the task to workers worldwide. Unfortunately, they also come with a number of limitations. The main one is their high cost, despite crowdsourcing being a very efficient and effective work distribution system. This has become even worse as Amazon’s Mechanical Turk just increased in 2015 its fees from 10% to 40%. Moreover, depending on the complexity of the task, the manual preparation of annotation is not only expensive, but also time-consuming and might require expertise that is not always available in sufficiently high numbers (e.g., annotation of medical imagery). In contrast, a large quantity of unlabeled images and videos is readily available.These limitations motivate us to look at the unsupervised learning approach. Our main challenge is to define meaningful learning tasks without using manually assigned labels. Towards this purpose, in this project we propose to investigate a recent unsupervised learning paradigm called self-supervised learning. In self-supervised learning, instead of training a system to solve a given task, one trains it to solve a (related) pretext task, where labeling comes for free from the data. For example, given frames of a video one can use half of them to predict the other half.The labels are then taken directly from data "for free".We therefore look at re-casting unsupervised learning as a supervised learning method where the labels are extracted directly from data. We formalize the set of all possible freely-available labels as those that we can directly extract from the data itself. We decompose the data (grayscale intensities or RGB values at all pixels and their corresponding 2D coordinates) into two disjoint sets: one that forms the new labels and the other that forms the new data. The choice of how the original data is decomposed will depend on the nature of the data (e.g., the imaging modality) and of the task. Our objective is to achieve a performance comparable to that of supervised methods on tasks such as classification and localization.
-