A Collocation-Driven Approach to Text Summarization

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Project An Integer Linear Programming Approach to Text-to-Text Generation
Proceedings (peer-reviewed)

Title of proceedings Actes de la 18e conférence sur le Traitement Automatique des Langues Naturelles
Place Montpellier, France

We present a novel approach to extractive summarization – the task of producing an abstract for an input document by selecting a subset of the original sentences – which relies on domain-specific collocation information automatically acquired from a development corpus. A syntax-based collocation extractor is used to infer a content template and then to match this template against the document to summarize. The approach has been applied to generate simplified versions of Wikipedia articles in English, as part of a larger project on automatically generating Simple English Wikipedia articles starting from their standard counterpart. An evaluation of the developed system has yet to be performed; nonetheless, the preliminary results obtained in summarizing Wikipedia articles on cities already indicated the potential of our collocation-driven method to select relevant sentences.