Publication

Back to overview

Neural text normalization with adapted decoding and POS features

Type of publication Peer-reviewed
Publikationsform Original article (peer-reviewed)
Author Ruzsics T., Lusetti M., Göhring A., Samardžić T., Stark E.,
Project What’s Up, Switzerland? Language, Individuals and Ideologies in mobile messaging.
Show all

Original article (peer-reviewed)

Journal Natural Language Engineering
Volume (Issue) 25(5)
Page(s) 585 - 605
Title of proceedings Natural Language Engineering
DOI 10.1017/s1351324919000391

Open Access

URL https://www.zora.uzh.ch/id/eprint/177181/1/Neural_Text_normalization.pdf
Type of Open Access Repository (Green Open Access)

Abstract

AbstractText normalization is the task of mapping noncanonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. This task is especially important for languages such as Swiss German, with strong regional variation and no written standard. In this paper, we propose a novel solution for normalizing Swiss German WhatsApp messages using the encoder–decoder neural machine translation (NMT) framework. We enhance the performance of a plain character-level NMT model with the integration of a word-level language model and linguistic features in the form of part-of-speech (POS) tags. The two components are intended to improve the performance by addressing two specific issues: the former is intended to improve the fluency of the predicted sequences, whereas the latter aims at resolving cases of word-level ambiguity. Our systematic comparison shows that our proposed solution results in an improvement over a plain NMT system and also over a comparable character-level statistical machine translation system, considered the state of the art in this task till recently. We perform a thorough analysis of the compared systems’ output, showing that our two components produce indeed the intended, complementary improvements.
-