Project

Back to overview

MULTI: Multimodal Interaction and Multimedia Data Mining

English title MULTI: Multimodal Interaction and Multimedia Data Mining
Applicant Bourlard Hervé
Number 122062
Funding scheme Project funding (Div. I-III)
Research institution IDIAP Institut de Recherche
Institution of higher education Idiap Research Institute - IDIAP
Main discipline Information Technology
Start/End 01.10.2008 - 30.09.2011
Approved amount 1'196'956.00
Show all

Keywords (10)

automatic speech and audio processing; computer vision; indexing and retrieval of multimedia information ; advanced human-computer interaction; biometric authentication; machine learning; audio; speech and language processing; ;

Lay Summary (English)

Lead
Lay summary
The goal of the MULTI project is to carry out fundamental research in several (related) fields of multimodal interaction and multimedia data mining, covering a wide range of fundamental research activities in recognition and interpretation of spoken, written and gestural language, indexing and retrieval of complex multimedia information, advanced biometric user authentication techniques (including speaker and face verification), advanced machine learning and data fusion algorithms. By “multimedia information” we refer here to all static and dynamic images (photos, animations and video), sounds (speech recordings, music, and general audio information), moving or static text, also including the sharing of this information (through, e.g., social networks). More recently, MULTI also started focusing on the extraction, modeling and understanding of “social signals” (non verbal signals).More specifically, the research areas addressed by MULTI cover the 6 general research themes: (1) machine learning, (2) speech and audio processing, (3) computer vision, (4) information retrieval, (5) biometric authentication, and (6) multimodal interaction. Specific research activities covered by the current funding period include: Bottom-up recognition of speech sounds; Cross-modal weakly supervised learning; Social Network Analysis for multimedia indexing problems; Multi-View Face Detection; Modeling Social Media; Detection and description of unexpected words in ASR; Activity modeling from multiple sensors; Joint Bi-Modal Person Authentication; Social Signal Processing; Robust 3D Head Tracking and Head Gesture Recognition.Whenever possible and/or appropriate, all MULTI research projects work in the framework of common IDIAP tools (common task, common databases and common software), which are related to “human-to-human communication modeling and understanding”. Indeed, quite recently, and as a particularly challenging application, researchers in multimodal processing started focusing on the modeling and understanding of human-to-human communication and on creating novel technologies to augment the ways in which people communicate with each other, not only through computers, but also with computer assistance in the background. We believe that meeting processing provides us with an ideal, unified, framework to investigate most of the multimodal interaction and multimedia data mining technologies mentioned above.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Publications

Publication
AN ALTERNATIVE SCANNING STRATEGY TO DETECT FACES
Venkatesh BS, Marcel S, AN ALTERNATIVE SCANNING STRATEGY TO DETECT FACES, IEEE ICASSP 2010, not known.
Fast Bounding Box Estimation based Face Detection
Venkatesh BS, Marcel S, Fast Bounding Box Estimation based Face Detection, Springer ECCV 2010, not known.
Phoneme Recognition using Boosted Binary Features
Roy A, Magimai-Doss M, Marcel S, Phoneme Recognition using Boosted Binary Features, IEEE ICASSP 2010, not known.

Associated projects

Number Title Start Funding scheme
132620 Human activity and interactivity modeling (HAI) 01.10.2010 Project funding (Div. I-III)
132619 Interactive Cognitive Systems (ICS) 01.10.2010 Project funding (Div. I-III)
128771 DM3 : Distributed MultiModal Media server, a low cost large capacity high throughput data storage system 01.03.2010 R'EQUIP
113615 MULTI: Multimodal Interaction and Multimedia Data Mining 01.10.2006 Project funding (Div. I-III)
144281 Adaptive Multilingual Speech Processing 01.10.2012 Project funding (Div. I-III)

-