Back to overview

Rule-Based Language Model for Speech Recognition

Applicant Pfister Beat
Number 112133
Funding scheme Project funding (Div. I-III)
Research institution Institut für Technische Informatik und Kommunikationsnetze ETH Zürich
Institution of higher education ETH Zurich - ETHZ
Main discipline German and English languages and literature
Start/End 01.04.2006 - 30.09.2008
Approved amount 269'625.00
Show all

All Disciplines (2)

German and English languages and literature
Information Technology

Keywords (5)

linguistic knowledge; syntactic analysis; parsing; natural language processing; continuous speech recognition

Lay Summary (English)

Lay summary
Nowadays, state-of-the-art speech recognizers use a statistical language model, mostly some kind of N-gram which can easily be integrated into the HMM-based recognizer. No doubt, in many applications such a language model is very effective, i.e., it improves the recognition rate considerably. This is particularly true for languages with a rather fixed word-order and a relatively simple morphology such as English where we have a small number of forms per word.

For languages with a rich morphology, relatively free word order, and innumerable compound words like German, the N-gram approach to language modeling exhibits some drastic deficiencies: it is unable to properly handle linguistic phenomena like non-local dependencies, agreement, and word composition.

To compensate for these weaknesses, we plead to apply in addition to N-grams a rule-based language model (consisting roughly of a lexicon, a grammar and a parser) that checks the syntactical correctness of word sequences. The idea is to decrease the word error rate by favoring syntactically correct phrases over incorrect ones. To decide on syntactical correctness is a very difficult problem, however. Recent progress in computer linguistics, mainly in grammar theories and formalisms, suggests feasibility.

Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants


Associated projects

Number Title Start Funding scheme
104078 Rule-Based Language Model for Speech Recognition 01.10.2004 Project funding (Div. I-III)