Projekt

Zurück zur Übersicht

An Integer Linear Programming Approach to Text-to-Text Generation

Gesuchsteller/in Seretan Violeta
Nummer 131512
Förderungsinstrument Stipendien für fortgeschrittene Forschende
Forschungseinrichtung School of Informatics University of Edinburgh
Hochschule Institution ausserhalb der Schweiz - IACH
Hauptdisziplin Schwerpunkt Germanistik und Anglistik
Beginn/Ende 01.08.2010 - 31.07.2011
Alle Daten anzeigen

Alle Disziplinen (2)

Disziplin
Schwerpunkt Germanistik und Anglistik
Informatik

Keywords (6)

computational linguistics; text-to-text generation; synchronous grammars; machine translation; human language technology; linear programming

Lay Summary (Englisch)

Lead
Lay summary
The aim of the project is to investigate whether Integer Linear Programming (ILP), a mathematical technique used to solve optimisation problems in various fields (such as the problem of maximization of profit in economics) can be successfully applied in the field of Computational Linguistics to the task of text-to-text generation. This task consists in transforming a given text so that it obeys to predefined criteria: typically, a sentence is shortened (for instance, to fit the screen of a small device), is made simpler (to be understood by a broader audience), or is changed to that it adheres to a particular style, or fits a specific purpose (to be used as a headline or as a caption).

The underlying assumption in text-to-text generation is that to produce the desired outcome, it is not necessary to resort to a conceptual representation of the text, as in traditional approaches. Producing conceptual representations at a large-scale is not feasible with current methodologies. Instead, the task can arguably be performed by harnessing the existing resources and techniques (specific dictionaries, text collections, linguistic tools, and algorithms designed for text processing).

The specific aim of the project is to formulate the text transformation problem as a linear program that will be used to find, amongst all possible variants of the text, the one that is the most likely to satisfy the predefined requirements. Rather than targeting a specific application, the project will provide a general framework that can be customised according to the goals of specific applications. Our work relates to similar problems that are being tackled in the field of statistical machine translation (more precisely, in the decoding stage) and will make use of the same mechanism, synchronous grammars, which is used in forefront research on syntax-based statistical machine translation.

Our work will confirm whether ILP is as successful when applied to text-to-text generation as it was found for other Computational Linguistics tasks (such as semantic role labelling, summarization, and coreference resolution). In the affirmative case, it will provide a framework in which the constraints on the text output can be formulated easily and the optimisation problem can be solved efficiently using off-the-shelf software. Thus, it will enable researchers to focus on the actual problem and will eliminate the need for them to engineer their own solution search algorithm. Text-to-text generation is an emerging research topic in the area of language technology, which plays an increasingly important role in today's information society.

Direktlink auf Lay Summary Letzte Aktualisierung: 21.02.2013

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Publikationen

Publikation
Syntactic Concordancing and Multi-Word Expression Detection
Violeta Seretan, Eric Wehrli (2013), Syntactic Concordancing and Multi-Word Expression Detection, in International Journal of Data Mining, Modelling and Management, 5(2), 158-181.
Acquisition of Syntactic Simplification Rules for French
Violeta Seretan (2012), Acquisition of Syntactic Simplification Rules for French, in Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), Istanbul, Turkey.
A Collocation-Driven Approach to Text Summarization
Violeta Seretan (2011), A Collocation-Driven Approach to Text Summarization, in Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles, Montpellier, FranceATALA, Montpellier.
FipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora
Violeta Seretan, Eric Wehrli (2011), FipsCoView: On-line Visualisation of Collocations Extracted from Multilingual Parallel Corpora, in Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World, Portland, Oregon, USAACL, Portland, Oregon, USA.
Acquisition of Syntactic Text Simplification Rules for French
Violeta Seretan, Acquisition of Syntactic Text Simplification Rules for French.
Text-to-text Generation Methods
Violeta Seretan, Text-to-text Generation Methods.

Wissenschaftliche Veranstaltungen

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen


Abstract

This research project in Computational Linguistics focuses on an emerging research area, text-to-text generation, that departs from traditional `concept-to-text' approaches in text generation by capitalising on large-scale language resources available (text corpora, lexical thesauri, parsers) in order to overcome the problem of the reduced availability of rich conceptual information. In this context, the project aims to study the applicability of Integer Linear Programming, an optimisation technique which now begins to be increasingly used in natural language processing, for transforming an input text so that it obeys a set of specific constraints, depending on the desired front-end application. For instance, in machine translation the (often imperfect) output text may be changed so that it becomes more grammatical or more fluent. Novel techniques of machine learning combined with synchronous parsing will be used to automatically learn transformation rules in a data-driven fashion, rather than defining these rules manually. The project will be supported by a renowned expert in the field, Dr Mirella Lapata of the University of Edinburgh, and will give me the opportunity to train in statistical aspects, which are currently underrepresented in my work context at the University of Geneva, but are indispensable for a competitive computational linguistics research profile.
-