Project

Back to overview

Object Detection with Active Sample Harvesting (DASH )

English title Object Detection with Active Sample Harvesting (DASH )
Applicant Fleuret François
Number 140912
Funding scheme Project funding (Div. I-III)
Research institution IDIAP Institut de Recherche
Institution of higher education Idiap Research Institute - IDIAP
Main discipline Information Technology
Start/End 01.10.2012 - 30.09.2016
Approved amount 226'391.00
Show all

Keywords (4)

Topical Web-crawler; Machine learning; Active learning; Unsupervised learning

Lay Summary (French)

Lead
L'objectif de ce projet est de simplifier la production d'exemples nécessaires aux techniques d'apprentissage statistique.Notre recherche sera décomposée en deux axes principaux. Le premier consistera à construire des modèles pour déterminer si une source d'exemples d'apprentissage est informative, et de moduler l'usage de différentes sources. Le second sera de généraliser des méthodes d'apprentissage actif aux algorithmes qui utilisent une labellisation détaillée des données.
Lay summary
Les techniques d'apprentissage statistique sont devenues des composants fondamentaux du traitement du signal pour des applications telles que la détection et la reconnaissance d'objets, ou le traitement automatique de la parole.

La principale faiblesse de ces méthodes est la nécessité de disposer d'importantes bases d'exemples. Nous proposons deux approches complémentaires pour faciliter la production de ces données.

La première, destinée aux méthodes non supervisées, consistera à décomposer de très grands ensembles d'apprentissage en sous-groupes, et à construire des prédicteurs pour évaluer lesquels de ces sous groupes contiennent les exemples les plus informatifs. Nous pourrons ensuite utiliser ces prédicteurs pour biaiser l'utilisation vers les "meilleurs" sous-ensembles.

Notre seconde approche sera d'étendre les méthodes d'apprentissage actif. Alors que la plupart des méthodes existantes déterminent seulement quel exemple doit être labellisé, nous allons étudier comment choisir non seulement l'exemple à labelliser, mais aussi la nature de l'information à lui associer.

Cette recherche permettra de généraliser l'utilisation des méthodes modernes d'apprentissage automatique à d'autres domaines applicatifs pour un coût réduit.
Direct link to Lay Summary Last update: 08.01.2013

Responsible applicant and co-applicants

Employees

Name Institute

Publications

Publication
Importance Sampling Tree for Large-scale Empirical Expectation.
Canévet Olivier, Jose Cijo, Fleuret Francois (2016), Importance Sampling Tree for Large-scale Empirical Expectation., in Proceedings of the International Conference on Machine Learning, New-YorkMIT press, New-York.
Large Scale Hard Sample Mining with Monte Carlo Tree Search
Canévet Olivier, Fleuret François (2016), Large Scale Hard Sample Mining with Monte Carlo Tree Search, in Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition, Las-Vegas, USAIEEE, New-York.
Efficient Sample Mining for Object Detection
Canévet Olivier, Fleuret François (2014), Efficient Sample Mining for Object Detection, in Proceedings of the Asian Conference on Machine Learning, Nha Trang, VietnamMIT press, Cambridge.
Sample Distillation for Object Detection and Image Classification
Canévet Olivier, Lefakis Leonidas, Fleuret François (2014), Sample Distillation for Object Detection and Image Classification, in Proceedings of the Asian Conference on Machine Learning, Nha Trang, VietnamMIT press, Cambridge.

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
IEEE International Conference on Computer Vision and Pattern Recognition Poster Large Scale Hard Sample Mining with Monte Carlo Tree Search 26.06.2016 Las-Vegas, United States of America Canevet Olivier;
International Conference on Machine Learning Talk given at a conference Importance Sampling Tree for Large-scale Empirical Expectation 19.06.2016 New-York, United States of America Canevet Olivier; Fleuret François;
Asian Conference on Machine Learning Talk given at a conference Sample Distillation for Object Detection and Image Classification 26.11.2014 Nha Trang, Vietnam Canevet Olivier;
Asian Conference on Machine Learning Talk given at a conference Efficient Sample Mining for Object Detection 26.11.2014 Nha Trang, Vietnam Canevet Olivier;


Associated projects

Number Title Start Funding scheme
126920 Training Embedded Visions Systems 01.10.2009 Project funding (Div. I-III)
147693 Tracking in the Wild 01.01.2014 Sinergia
124822 Very Large Sets of Heuristics for Scene Interpretation (VELASH ) 01.09.2009 Project funding (Div. I-III)
164022 Platform for Reproducible Acquisition, Processing, and Sharing of Dynamic, Multi-Modal Data 01.07.2016 R'EQUIP

Abstract

Practical deployment of Machine Learning techniques relies on the existence of large training data sets, which exhibit all the difficulties to be met in practice, and are labeled by human experts.While critical, the production of training data remains a subject poorly studied. Efficient labeling has been tackled with active learning, to concentrate the human effort on the examples which truly influence the learning. The production of unlabeled data, however, has not been studied in itself.The objective of this project is to address these two tasks, and to develop novel, efficient, and mathematically sound procedures to produce very large quantities of labeled data to train part-based object detectors.The first part of this proposal focuses on the extension of active learning to the particular situation of object detection with part-based models. We will define multiple levels of information,spanning from the mere presence of an object in an image, to the locations of its individual parts. From there, instead of simply identifying subsets of samples whose labels are likely to be informative, the procedure we envision will select pairs of samples and levels of labeling, so that the ratio information / labeling cost will be maximum.The second part will address the production of unlabeled data. We introduce the idea of "data harvesters", web-crawling daemons built upon goal-planning algorithms. These harvesters will model the relation between information attached to the web source of images (web site, extual context of images, date and time, gps coordinates, camera type, etc.) and their usefulness for training. From this model, harvesters will implement goal-planning strategies, to properly balance exploitation and exploration, trying both to localize good sources of data for training, and to download data from the ones already identified.
-