Publication

Back to overview

Innovations in Parallel Corpus Search Tools

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Publication date 2014
Author Volk Martin, Graën Johannes, Callegaro Elena,
Project Large-scale Annotation and Alignment of Parallel Corpora for the Investigation of Linguistic Variation
Show all

Proceedings (peer-reviewed)

Title of proceedings Proceedings of the Ninth International Conference on Language Resources and Evaluation
Place Reykjavik

Open Access

URL http://www.zora.uzh.ch/97282/
Type of Open Access Repository (Green Open Access)

Abstract

Recent years have seen an increased interest in and availability of parallel corpora. Large corpora from international organizations (e.g. European Union, United Nations, European Patent Office), or from multilingual Internet sites (e.g. OpenSubtitles) are now easily available and are used for statistical machine translation but also for online search by different user groups. This paper gives an overview of different usages and different types of search systems. In the past, parallel corpus search systems were based on sentence-aligned corpora. We argue that automatic word alignment allows for major innovations in searching parallel corpora. Some online query systems already employ word alignment for sorting translation variants, but none supports the full query functionality that has been developed for parallel treebanks. We propose to develop such a system for efficiently searching large parallel corpora with a powerful query language.
-