Grapheme; Phoneme; Hidden Markov Models; Kullback-Leibler Divergence; Automatic Speech Recognition
Rasipuram Ramya, Magimai.-Doss Mathew (2016), Articulatory feature based continuous speech recognition using probabilistic lexical modeling, in Computer Speech & Language
, 36, 233-259.
Rasipuram Ramya, Razavi Marzieh, Magimai-Doss Mathew (2015), Integrated pronunciation learning for automatic speech recognition using probabilistic lexical modeling, in ICASSP 2015 - 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, South Brisbane, Queensland, AustraliaIEEE, IEEE.
Rasipuram Ramya, Magimai.-Doss Mathew (2015), Acoustic and Lexical Resource Constrained ASR using Language-Independent Acoustic Model and Language-Dependent Probabilistic Lexical Model, in Speech Communication
, 68, 23-40.
RasipuramRamya (2014), Grapheme-based Automatic Speech Recognition using Probabilistic Lexical Modeling
, EPFL, Lausanne, Switzerland.
Magimai.-DossMathew, RasipuramRamya (2014), On Learning Grapheme-to-Phoneme Relationships through the Acoustic Speech Signal, in The Phonetician
, 109-110, 6-23.
Razavi Marzieh, Rasipuram Ramya, Magimai.-Doss Mathew (2014), On Modeling Context-Dependent Clustered States: Comparing HMM/GMM, Hybrid HMM/ANN and KL-HMM Approaches, in Proceedings of ICASSP
, Florence, ItalyIEEE , IEEE.
Rasipuram Ramya, Razavi Marzieh, Magimai.-Doss Mathew (2013), Probabilistic Lexical Modeling and Unsupervised Training for Zero-Resourced ASR, in Proceedings of IEEE International Workshop on Automatic Speech Recognition and Understanding (ASRU)
, Olomouc, Czech RepublicIEEE, IEEE.
Current state-of-the-art automatic speech recognition (ASR) systems commonly use hidden Markov models (HMMs), where phonemes (phones) are assumed to be the intermediate subword units. Given the high (speaker and contextual) variability of those elementary units, state-of-the-art systems have to rely on complex statistical modelling (multidimensional Gaussian mixture models with large number of mixture components). Such systems also require some minimum phonetic expertise since every word to be recognized has to be explicitly modelled in terms of a Markov model capturing its official phonetic transcription (usually found from a dictionary), as well as its pronunciation variants. In spite of its relative success, this approach remains quite cumbersome whendealing with (unavoidable) new words or, when deploying new languages. Given this, there has always been an interest in using directly thegrapheme(orthographic) transcription of the word, without explicit phonetic modeling.However, while limiting the variability at the word representation level, thelink between the acoustic waveform has become weaker (depending on the language), as the standard acoustic features characterize phonemes. Most recent attempts were based on mapping orthography of the words onto HMM states using phonetic information, or extending conventional HMM-based ASR systems by improving context-dependent modelling for grapheme units.The goal of the present project is to exploit new statistical models recentlydeveloped at Idiap and that are potentially better suited to deal withthe grapheme representation of the lexicon words and to exploit in a principled way both grapheme representation and phoneme information. This will be done by extending a novel acoustic modelling approach referred to as KL-HMM (Kullback-Leibler divergence based HMM), which has recently been shown to be much simpler, and more flexible, while yielding state-of-the-art performance (on phoneme-based ASR system) and opening up multiple opportunities for further development and research. In KL-HMM system, acoustic features are replaced by elementary unit (e.g. phonemes) posterior probability distribution and, HMM states are modelled through multinomial distribution in that posterior space. We believe this can be generalized to grapheme-based systems.Also, while working in posterior probability spaces, it is much easier to combine multiple evidences coming from multiple sources of information. The present project proposal is thus particularly well suited as a PhD project since it will allow:1. Building upon a strong PhD thesis(1) and extending a new and very promising approach towards flexible speech recognition systems.2. Investigating further its generalization properties towards new types ofmodels based on grapheme word representation.(1) Guillermo Aradilla, ”Acoustic Models for Posterior Features in Speech Recognition”, PhD Thesis, No. 4164, ´ Ecole Polytechnique F´ed´erale de Lausanne, 2008.