Phoneme; Grapheme; Automatic Speech Recognition; Hidden Markov Models; Kullback-Leibler Divergence
Rasipuram Ramya, Bell Peter, Magimai.-Doss Mathew (2013), Grapheme and Multilingual Posterior Features for Under-Resourced Speech Recognition: A Study on Scottish Gaelic, in Proceedings of IEEE International Conference on Acoustics Speech Signal Processing (ICASSP) 2013
, Vancouver, CanadaIEEE, IEEE.
RasipuramRamya, Magimai.-DossMathew (2013), Improving Grapheme-based ASR by Probabilistic Lexical Modeling Approach, in Proceedings of Interspeech
, Lyon, FranceISCA, ISCA.
Rasipuram Ramya, Magimai.-Doss Mathew (2013), KL-HMM and Probabilistic Lexical Modeling
, Idiap Research Report Idiap-RR-04-2013, Martigny, Switzerland.
Rasipuram Ramya, Magimai.-Doss Mathew (2013), Probabilistic Lexical Modeling and Grapheme-based Automatic Speech Recognition
, Idiap Research Report Idiap-RR-15-2013, Martigny, Switzerland.
Rasipuram Ramya, Magimai.-Doss Mathew (2012), Acoustic data-driven grapheme-to-phoneme conversion using KL-HMM, in Proceedings of IEEE International Conference on Acoustics Speech Signal Processing (ICASSP) 2012
, Kyoto, JapanIEEE International Conference on Acoustics Speech Signal Processing (ICASSP), Kyoto.
Rasipuram Ramya, Magimai.-Doss Mathew (2012), Combining Acoustic Data Driven G2P and Letter-to-Sound Rules for Under Resource Lexicon Generation, in Proceedings of Interspeech 2012
, Portland, USAISCA, ISCA.
Imseng David, Rasipuram Ramya, Magimai.-Doss Mathew (2011), Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition, in Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) 2011
, Hawaii, USAIEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Hawaii.
Magimai.-Doss Mathew, Rasipuram Ramya, Aradilla Guillermo (2011), Grapheme-based automatic speech recognition using KL-HMM, in Proceedings of Interspeech 2011
, Florence, ItalyInterspeech, Italy.
Rasipuram Ramya, Magimai.-Doss Mathew (2011), Improving articulatory feature and phoneme recognition using multitask learning, in Proceedings of International Conference on Artificial Neural Networks (ICANN) 2011
, Espoo, FinlandInternational Conference on Artificial Neural Networks (ICANN), Finland.
Rasipuram Ramya, Magimai.-Doss Mathew (2011), Integrating articulatory features using Kullback-Leibler divergence based acoustic model for phoneme recognition, in Proceedings of IEEE International Conference on Acoustics Speech Signal Processing (ICASSP) 2011
, Prague, CzechIEEE, IEEE.
Rasipuram Ramya, Magimai.-Doss Mathew (2011), Multitask learning to improve articulatory feature estimation and phoneme recognition
, Idiap Research Report Idiap-RR-21-2011, Martigny.
Current state-of-the-art automatic speech recognition (ASR) systems commonly use hidden Markov models (HMMs), where phonemes (phones) are assumed to be the intermediate subword units. Given the high (speaker and contextual) variability of those elementary units, state-of-the-art systems have to rely on complex statistical modelling (multidimensional Gaussian mixture models with large number of mixture components). Such systems also require some minimum phonetic expertise since every word to be recognized has to be explicitly modelled in terms of a Markov model capturing its official phonetic transcription (usually found from a dictionary), as well as its pronunciation variants. In spite of its relative success, this approach remains quite cumbersome whendealing with (unavoidable) new words or, when deploying new languages. Given this, there has always been an interest in using directly thegrapheme(orthographic) transcription of the word, without explicit phonetic modeling.However, while limiting the variability at the word representation level, thelink between the acoustic waveform has become weaker (depending on the language), as the standard acoustic features characterize phonemes. Most recent attempts were based on mapping orthography of the words onto HMM states using phonetic information, or extending conventional HMM-based ASR systems by improving context-dependent modelling for grapheme units.The goal of the present project is to exploit new statistical models recentlydeveloped at Idiap and that are potentially better suited to deal withthe grapheme representation of the lexicon words and to exploit in a principled way both grapheme representation and phoneme information. This will be done by extending a novel acoustic modelling approach referred to as KL-HMM (Kullback-Leibler divergence based HMM), which has recently been shown to be much simpler, and more flexible, while yielding state-of-the-art performance (on phoneme-based ASR system) and opening up multiple opportunities for further development and research. In KL-HMM system, acoustic features are replaced by elementary unit (e.g. phonemes) posterior probability distribution and, HMM states are modelled through multinomial distribution in that posterior space. We believe this can be generalized to grapheme-based systems.Also, while working in posterior probability spaces, it is much easier to combine multiple evidences coming from multiple sources of information. The present project proposal is thus particularly well suited as a PhD project since it will allow:1. Building upon a strong PhD thesis(1) and extending a new and very promising approach towards flexible speech recognition systems.2. Investigating further its generalization properties towards new types ofmodels based on grapheme word representation.(1) Guillermo Aradilla, ”Acoustic Models for Posterior Features in Speech Recognition”, PhD Thesis, No. 4164, ´ Ecole Polytechnique F´ed´erale de Lausanne, 2008.