Data and Documentation
Open Data Policy
FAQ
EN
DE
FR
Suchbegriff
Advanced search
Publication
Back to overview
Neural text normalization with adapted decoding and POS features
Type of publication
Peer-reviewed
Publikationsform
Original article (peer-reviewed)
Author
Ruzsics T., Lusetti M., Göhring A., Samardžić T., Stark E.,
Project
What’s Up, Switzerland? Language, Individuals and Ideologies in mobile messaging.
Show all
Original article (peer-reviewed)
Journal
Natural Language Engineering
Volume (Issue)
25(5)
Page(s)
585 - 605
Title of proceedings
Natural Language Engineering
DOI
10.1017/s1351324919000391
Open Access
URL
https://www.zora.uzh.ch/id/eprint/177181/1/Neural_Text_normalization.pdf
Type of Open Access
Repository (Green Open Access)
Abstract
AbstractText normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.
-