Project

Back to overview

PHASER: Parsimonious Hierarchical Automatic Speech Recognition

Applicant Bourlard Hervé
Number 153507
Funding scheme Project funding (Div. I-III)
Research institution IDIAP Institut de Recherche
Institution of higher education Idiap Research Institute - IDIAP
Main discipline Information Technology
Start/End 01.06.2014 - 31.05.2016
Approved amount 307'029.00
Show all

Keywords (4)

Automatic Speech Recognition (ASR); Deep Architectures; Hierarchical Sparse Coding; Hierarchical Posterior-based ASR

Lay Summary (French)

Lead
PHASER takes a new perspective to the Automatic Speech Recognition (ASR) problem, expressed as a hierarchical sparse recovery problem. The sought approach should be able to model temporal properties, exploit model parsimony and hierarchical decoding structures, while also integrating the phonetic and lexical constraints currently being modeled through the pre-defined HMM topology.
Lay summary
PHASER considère ici le problème de la reconnaissance automatique de la parole sous l'angle de la représentation parcimonieuse hiérarchique. Plus spécifiquement, ce projet exploite et intègre des développements récents en systèmes de reconnaissance basés sur les distributions a posteriori, les systèmes hybrides HMM/ANN, exploitant les Modèles de Markov Cachés (HMM) et les Réseaux de neurones Artificiels (ANN), y compris les ANN à plusieurs couches (DNN), le "compressive sensing" et la modélisation parcimonieuse, et le codage parcimonieux hiérarchique. The modèle résultant devrait être capable d'intégrer les propriétés temporelles du signal de parole, tout en exploitant les propriétés parcimonieuses et hiérarchiques sous-jacentes aux systèmes de reconnaissance de la parole, aussi bien automatiques que humains. L'un des points de départ de ce projet est la découverte récente de liens théoriques importants entre les modèles statistiques HMM et les modèles parcimonieux hiérarchiques.

 

Direct link to Lay Summary Last update: 27.06.2014

Responsible applicant and co-applicants

Employees

Publications

Publication
Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization
Asaei Afsaneh, Taghizadeh Mohammad J., Haghighatshoar Saeid, Raj Bhiksha, Bourlard Hervé, Cevher Volkan (2016), Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization, in IEEE Transactions on Signal Processing, 64(3), 567-579.
Computational Methods for Underdetermined Convolutive Speech Localization and Separation via Model-based Sparse Component Analysis
Asaei Afsaneh, Bourlard Hervé, Taghizadeh Mohammad J., Cevher Volkan (2016), Computational Methods for Underdetermined Convolutive Speech Localization and Separation via Model-based Sparse Component Analysis, in Speech Communication, 76, 201-217.
Exploiting Low-dimensional Structures to Enhance DNN based Acoustic Modeling in Speech Recognition
Dighe Pranay, Luyet Gil, Asaei Afsaneh, Bourlard Hervé (2016), Exploiting Low-dimensional Structures to Enhance DNN based Acoustic Modeling in Speech Recognition, in The 41st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, ChinaIEEE, Shanghai, China.
Low-Rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance DNN Acoustic Modeling
Luyet Gil, Dighe Pranay, Asaei Afsaneh, Bourlard Hervé (2016), Low-Rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance DNN Acoustic Modeling, in INTERSPEECH, San Francisco, USAISCA, San Francisco, USA.
PAoS Markers: Trajectory Analysis of Selective Phonological Posteriors for Assessment of Progressive Apraxia of Speech
Asaei Afsaneh, Cernak Milos, Laganaro Marina (2016), PAoS Markers: Trajectory Analysis of Selective Phonological Posteriors for Assessment of Progressive Apraxia of Speech, in The 7th Workshop on Speech and Language Processing for Assistive Technologies (SLPAT), San Francisco, USASLPATACL/ISCA Special Interest Group on Speech and Language Processing for Assistive Technologies (S, San Francisco, USA.
Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures
Asaei Afsaneh, Luyet Gil, Cernak Milos, Bourlard Hervé (2016), Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures, in INTERSPEECH, San Francisco, USAISCA, San Francisco, USA.
Sound Pattern Matching for Automatic Prosodic Event Detection
Cernak Milos, Asaei Afsaneh, Honnet Pierre-Edouard, Garner Philip N., Bourlard Hervé (2016), Sound Pattern Matching for Automatic Prosodic Event Detection, in INTERSPEECH, San Francisco, USAISCA, San Francisco, USA.
Sparse Modeling of Neural Network Posterior Probabilities for Exemplar-based Speech Recognition
Dighe Pranay, Asaei Afsaneh, Bourlard Hervé (2016), Sparse Modeling of Neural Network Posterior Probabilities for Exemplar-based Speech Recognition, in Speech Communication, 76, 230-244.
Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection
Ram Dhananjay, Asaei Afsaneh, Bourlard Hervé (2016), Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection, in INTERSPEECH, San Francisco, USAISCA, San Francisco, USA.
Dictionary Learning for Sparse Representation of Neural Network Exemplars in Speech Recognition
Dighe Pranay, Asaei Afsaneh, Bourlard Hervé (2015), Dictionary Learning for Sparse Representation of Neural Network Exemplars in Speech Recognition, in Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS), Cambridge, UKSPARS, Cambridge, UK.
Novel GCC-PHAT Model in Diffuse Sound Field for Microphone Array Pairwise Distance Based Calibration
Velasco Jose, Taghizadeh Mohammad J., Asaei Afsaneh, Bourlard Hervé, Martin-Arguedas Carlos J., Macias-Guarasa Javier, Pizarro Daniel (2015), Novel GCC-PHAT Model in Diffuse Sound Field for Microphone Array Pairwise Distance Based Calibration, in The 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, AustraliaIEEE, Brisbane, Australia.
On Application Of Non-Negative Matrix Factorization for Ad Hoc Microphone Array Calibration from Incomplete Noisy Distances
Asaei Afsaneh, Mohammadiha Nasser, Taghizadeh Mohammad J., Doclo Simon, Bourlard Hervé (2015), On Application Of Non-Negative Matrix Factorization for Ad Hoc Microphone Array Calibration from Incomplete Noisy Distances, in The 40th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, AustraliaIEEE, Brisbane, Australia.
On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding
Asaei Afsaneh, Cernak Milos, Bourlard Hervé (2015), On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding, in INTERSPEECH, Dresden, GermanyISCA, Dresden, Germany.
Sparse Modeling of Neural Network Posterior Probabilities for Exemplar-Based Speech Recognition
Dighe Pranay, Asaei Afsaneh, Bourlard Hervé (2015), Sparse Modeling of Neural Network Posterior Probabilities for Exemplar-Based Speech Recognition, in Workshop on Signal Processing with Adaptive Sparse Structured Representations (SPARS), Cambridge, UKSPARS, Cambridge, UK.
Sparse Modeling of Posterior Exemplars for Keyword Detection
Ram Dhananjay, Asaei Afsaneh, Dighe Pranay, Bourlard Hervé (2015), Sparse Modeling of Posterior Exemplars for Keyword Detection, in INTERSPEECH, Dresden, GermanyISCA, Dresden, Germany.
Spatial Sound Localization via Multipath Euclidean Distance Matrix Recovery
Taghizadeh Mohammad J., Asaei Afsaneh, Haghighatshoar Saeid, Garner Philip N., Bourlard Hervé (2015), Spatial Sound Localization via Multipath Euclidean Distance Matrix Recovery, in IEEE Journal of Selected Topics in Signal Processing, 9, 802-814.
Posterior-based Sparse Representation for Automatic Speech Recognition
Bahaadini Sara, Asaei Afsaneh, Imseng David, Asaei Afsaneh (2014), Posterior-based Sparse Representation for Automatic Speech Recognition, in INTERSPEECH, SingaporeISCA, Singapore.
Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding
Cernak Milos, Lazaridis Alexandros, Asaei Afsaneh, Garner Philip N., Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding, in IEEE/ACM Transaction on Audio, Speech and Language Processing, T-ASL-0569.
On Structured Sparsity of Phonological Posteriors for Linguistic Parsing
Cernak Milos, Asaei Afsaneh, Bourlard Hervé, On Structured Sparsity of Phonological Posteriors for Linguistic Parsing, in Speech Communication, SPECOM_201.
TDOA Matrices: Algebraic Properties and their Application to Robust Denoising with Missing Data
Velasco Jose, Pizarro Daniel, Macias-Guarasa Javier, Asaei Afsaneh, TDOA Matrices: Algebraic Properties and their Application to Robust Denoising with Missing Data, in IEEE Transactions on Signal Processing, (99), T-SP-20128.

Collaboration

Group / person Country
Types of collaboration
International Computer Science Institute, University of California United States of America (North America)
- in-depth/constructive exchanges on approaches, methods or results
Laboratory for Information and Inference Systems at EPFL Switzerland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
- Publication
Compressive sensing group at the university of Edinburgh, Scotland Great Britain and Northern Ireland (Europe)
- in-depth/constructive exchanges on approaches, methods or results
Machine learning for signal processing group at Carnegie Mellon university, Pittsburgh United States of America (North America)
- Publication

Awards

Title Year
IEEE Senior Member 2016

Associated projects

Number Title Start Funding scheme
169398 PHASER-QUAD: Parsimonious Hierarchical Automatic Speech Recognition and Query Detection 01.10.2016 Project funding (Div. I-III)
175589 Sparse and hierarchical Structures for Speech Modeling (SHISSM) 01.03.2018 Project funding (Div. I-III)

Abstract

PHASER takes a new perspective to the Automatic Speech Recognition (ASR) problem, expressed as a hierarchical sparse recovery problem. More specifically, the project will exploit and integrate in a principled way recent developments in posterior-based ASR systems, hybrid HMM/ANN systems, exploiting Hidden Markov Model (HMM) and Artificial Neural Networks (ANN), Deep Neural Networks, compressive sensing, sparse modeling and hierarchical sparse coding for ASR. The sought approach should be able to model temporal properties, exploit model parsimony and hierarchical decoding structures, while also integrating the phonetic and lexical constraints currently being modeled through the pre-defined HMM topology.Besides further research and development in these areas, one of the key pivots of the present proposal also lies in the recent identification of strong relationships between statistical HMM techniques (with HMM states as latent variables) and compressive sensing formalism, where the atoms of the compressive dictionary can be automatically estimated from posterior distributions of HMM-states.Exploiting and further developing our various, state-of-the-art, ASR tools (often available as Idiap open-source or integrated in other open-source systems like Kaldi), the resulting systems will be evaluated in different context, including isolated word and continuous speech recognition, as well as multilingual recognition.
-