Lead


Lay summary
The aim of the project is to investigate whether Integer Linear Programming (ILP), a mathematical technique used to solve optimisation problems in various fields (such as the problem of maximization of profit in economics) can be successfully applied in the field of Computational Linguistics to the task of text-to-text generation. This task consists in transforming a given text so that it obeys to predefined criteria: typically, a sentence is shortened (for instance, to fit the screen of a small device), is made simpler (to be understood by a broader audience), or is changed to that it adheres to a particular style, or fits a specific purpose (to be used as a headline or as a caption).

The underlying assumption in text-to-text generation is that to produce the desired outcome, it is not necessary to resort to a conceptual representation of the text, as in traditional approaches. Producing conceptual representations at a large-scale is not feasible with current methodologies. Instead, the task can arguably be performed by harnessing the existing resources and techniques (specific dictionaries, text collections, linguistic tools, and algorithms designed for text processing).

The specific aim of the project is to formulate the text transformation problem as a linear program that will be used to find, amongst all possible variants of the text, the one that is the most likely to satisfy the predefined requirements. Rather than targeting a specific application, the project will provide a general framework that can be customised according to the goals of specific applications. Our work relates to similar problems that are being tackled in the field of statistical machine translation (more precisely, in the decoding stage) and will make use of the same mechanism, synchronous grammars, which is used in forefront research on syntax-based statistical machine translation.

Our work will confirm whether ILP is as successful when applied to text-to-text generation as it was found for other Computational Linguistics tasks (such as semantic role labelling, summarization, and coreference resolution). In the affirmative case, it will provide a framework in which the constraints on the text output can be formulated easily and the optimisation problem can be solved efficiently using off-the-shelf software. Thus, it will enable researchers to focus on the actual problem and will eliminate the need for them to engineer their own solution search algorithm. Text-to-text generation is an emerging research topic in the area of language technology, which plays an increasingly important role in today's information society.