Project

Back to overview

G3E: Geometric Generative Gaze Estimation model

English title G3E: Geometric Generative Gaze Estimation model
Applicant Odobez Jean-Marc
Number 153085
Funding scheme Project funding (Div. I-III)
Research institution IDIAP Institut de Recherche
Institution of higher education Idiap Research Institute - IDIAP
Main discipline Information Technology
Start/End 01.04.2014 - 31.01.2015
Approved amount 50'789.00
Show all

Keywords (7)

HRI; graphical model; unsupervised adaptation; focus of attention; gaze tracking; human interactions; HCI

Lay Summary (French)

Lead
Comme affichage de l'attention et de l'intérêt, le regard est une caractéristique fondamentale dans la compréhension des activités et comportements des personnes, et joue de ce fait un rôle important dans beaucoup de domaines comme la psychologie et les interfaces homme-machine ou homme-robot. De nombreux système d'estimation du regard très précis ont été développés, mais reposent souvent sur du matériel spécialisé et coûteux. Une solution basée sur du matériel bon marché est requise. Par ailleurs, afin de minimiser l'intrusion et tolérer des mouvements de l'utilisateur, l'utilisation de caméras avec des champs de vision larges est préférable mais porte le défi d'avoir une faible résolution des yeux. D'un point de vue méthodologique, deux approches principales existent: les méthodes géométriques s'appuyant sur un modèle 3D explicite de l'œil, et les approches par apparence qui apprennent directement une fonction entre l'image de l'œil et les paramètres du regard.
Lay summary

S'appuyant sur nos travaux antérieurs, nous proposons le projet G3E pour étudier une nouvelle méthode d'estimation du regard indépendante de l'orientation de la tête qui combine les avantages des deux approches citées ci-dessus. Elle repose sur un processus probabiliste qui modélise la génération d'images de l'oeil indépendante de l'orientation de la tête et obtenues grace à l'utilisation de caméras RGB-D. En utilisant un modèle géométrique explicite, nous traitons l'orientation de la tête et le regard dans un cadre unifié permettant de raisonner dans l'espace et d'extrapoler des directions de regards non-observées dans les données d'apprentissage. D'un autre coté, en modélisant séparamment les régions sémantiques de l'oeil, on découple l'aspect géométrique du regard des conditions d'illumination ambiantes, tout en évitant le problème critique de la détection et du suivi de caractéristiques de l'oeil (iris notamment) des méethodes géométriques.

Le projet étudiera différentes options de modélisation pour traiter le problème, en particulier différentes méthodes d'estimation pour atteindre l'estimation des paramètres à partir d'images de basse résolution, ainsi que des approaches non-supervisées et adaptatives reposant sur différentes informations a priori appropriées (couleurs de la peau, interactions multi-modales, etc).

Le projet s'intéresse donc à une composante fondamentale de la perception des interactions humaines et homme robots, et fera de ce fait une contribution importante dans le domaine de l'estimation des signaux sociaux  qui vise au développement de modèles permettant aux machines de comprendre les processus de communication et de comportement sociaux, et de manière générale ouvrira de nouvelles perspectives pour l'observation non-intrusive des activités humaines dans différents milieux.

 

Direct link to Lay Summary Last update: 28.03.2014

Responsible applicant and co-applicants

Employees

Name Institute

Publications

Publication
Deciphering the Silent Participant: On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions
Oertel Catharine, Funes Mora Kenneth, Gustafson Joakim, Odobez Jean-Marc (2015), Deciphering the Silent Participant: On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions, in Proceedings of the 2015 ACM on International Conference on Multimodal Interaction (ICMI), Seatle (USA).
Gaze Estimation in the 3D Space Using RGB-D sensors - Towards Head-Pose And User Invariance
Funes Mora Kenneth, Odobez Jean-Marc (2015), Gaze Estimation in the 3D Space Using RGB-D sensors - Towards Head-Pose And User Invariance, in International Journal of Computer Vision, 1-23.
Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras
Kenneth Funes Mora and Jean-Marc Odobez (2014), Geometric Generative Gaze Estimation (G3E) for Remote RGB-D Cameras, in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, Columbus, OH.
Tracking and Automatic Gaze Coding from RGB- D Cameras
Kenneth Funes Mora and Jean-Marc Odobez (2014), Tracking and Automatic Gaze Coding from RGB- D Cameras, in CVPR Workshop: When vision meets cognition, Columbus, OH.
Who will get the grant?: A multimodal corpus for the analysis of conversational behaviours in group interviews
Oertel Catharine, Mora Kenneth A Funes, Sheikhi Samira, Odobez Jean Marc, Gustafson Joakim (2014), Who will get the grant?: A multimodal corpus for the analysis of conversational behaviours in group interviews, in UM3I '14 Proceedings of the 2014 workshop on Understanding and Modeling Multiparty, Multimodal Inter, Istambul.

Collaboration

Group / person Country
Types of collaboration
Dept. of Speech, Music and Hearing, KTH - Royal Institute of Technology Sweden (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
- Research Infrastructure

Scientific events

Active participation

Title Type of contribution Title of article or contribution Date Place Persons involved
7th International Symposium on Attention in Cognitive Systems Talk given at a conference Attention recognition: from contextual analysis of head poses to 3D gaze tracking using remote RGB-D sensors 01.12.2014 CITEC, Bielefeld, Germany Odobez Jean-Marc;
2014 IEEE Conference on Computer Vision and Pattern Recognition Poster Geometric Generative Gaze Estimation G3E for Remote RGB-D Cameras 23.06.2014 Columbus, OH, United States of America Funes Mora Kenneth;
Technicolor Scientific Day Talk given at a conference 3D gaze tracking using remote RGB-D sensors 23.04.2014 Rennes, France, France Odobez Jean-Marc;
KTH Speech and Hearing Group Seminar Individual talk 3D head pose and gaze tracking using remote RGB-D sensors 04.04.2014 KTH, Stockholm, Sweden Funes Mora Kenneth;
Eye Tracking Research and Applications ETRA Poster EYEDIAP: a database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras 01.04.2014 Safety Harbor, Florida, United States of America Funes Mora Kenneth;


Communication with the public

Communication Title Media Place Year
Video/Film Gaze Coding in Natural Dyadic Interactions International 2015
New media (web, blogs, podcasts, news feeds etc.) Geometric Generative Gaze Estimation model from RGB-D sensors website International 2015

Awards

Title Year
Idiap PhD Student Research Award: An award to recognize the PhD Student who has made outstanding contributions to publications, collaboration, project involvement, communication skills, and autonomy (out of around 35 PhD students) 2014

Use-inspired outputs

Associated projects

Number Title Start Funding scheme
130152 Robust face tracking, feature extraction and multimodal fusion for audio-visual speech recognition and visual attention modeling in complex environment 01.04.2010 Project funding (Div. I-III)

Abstract

As a display of attention and interest, gaze is a fundamental cue in understanding people activities, behaviors, and state of mind, and plays an important role in many research fields like psychology, Human Robotics Interaction (HRI) or Human Computer Interfaces (HCI). For these reasons, many computer vision based gaze estimation have been proposed. Some achieve high accuracy but require expensive or specialized hardware. A solution based on consumer hardware is needed and to minimize intrusion and accommodate user's movement, remote cameras with wide enough field of view are preferred but lead to the challenge of low resolution imaging. From a methodological viewpoint, two main approaches exist. Geometric ones based on explicit eye geometry models can be very accurate but rely on high resolution images to fit and track the local features (glints, pupil center...) used to estimate the geometric parameters. On the other side, appearance based methods, which learn a direct mapping between the eye image and gaze parameters avoid feature tracking, but often need test data close to the training set in terms of user, gaze space, and illumination conditions.Leveraging on our previous work, we propose the current G3E project to investigate a novel head-pose independent gaze estimation method that takes advantage of the appearance and geometric methods. It relies on an appearance based probabilistic generative process that model the generation of head-pose independent eye images recovered thanks to the use of consumer RGB-D cameras. By using an explicit geometric gaze model, we will handle head pose and gaze direction in a unified framework, allowing 3D space reasoning and extrapolating to gaze directions not seen in the training data. On the other hand, by modeling the generation of semantic regions (eyelids, cornea, sclera), we will decouple the gazing process and user geometry from the ambient conditions (color appearance), while avoiding the critical local feature (cornea/iris) fitting and tracking problems of standard geometric methods.The project will study different modeling options to address the problem (details of geometric model, eyes coupling, image stabilization), and in particular several inference schemes to achieve the difficult learning from low resolution images. In a second thread, we will investigate the bayesian properties of the model to address the unsupervised learning and adaptation to user (eye geometry), session, and interpretation, by leveraging on relevant priors (eye color palettes, multimodal communication patterns like eye-contact or gaze-aversion and human attitudes in dynamic human-human interaction settings).The project will thus address a fundamental component towards human-human or human-computer communication perception, and will make an important contribution in the fields of social signal processing, which aims at the development of computational models for machine understanding of communicative and social behavior, or in HRI and HCI like for multimodal information desk design.
-