Project

Back to overview

Multilingual and Contextual Information Retrieval

English title Multilingual and Contextual Information Retrieval
Applicant Savoy Jacques
Number 113273
Funding scheme Project funding (Div. I-III)
Research institution Institut d'informatique Université de Neuchâtel
Institution of higher education University of Neuchatel - NE
Main discipline Information Technology
Start/End 01.01.2007 - 31.03.2010
Approved amount 298'777.00
Show all

Keywords (7)

Information retrieval (IR); multilingual IR (MLIR); contextual retrieval; cross-lingual IR (CLIR); web search; dedicated IR; digital library

Lay Summary (English)

Lead
Lay summary
This research proposal focuses on three main objectives. First, we want to design, implement and evaluate information retrieval (IR) systems to work with various East European languages (non-English monolingual IR).More specifically, in this part we design and evaluate linguistic tools for new and less frequently spoken languages, such as Hungarian, Polish, Czech and Turkish. In this part we also translate a short query from one language to another language (most likely it will be English, the lingua franca, before accessing information written in the various other languages).

Second, we undertake a more elaborate investigation of contextual IR systems used to retrieve information in a specific domain (e.g., biomedicine, law, enterprise, webblog), instead of evaluating IR systems using newspaper test-collections. In this part of our project we investigate the most appropriate response to user information needs (varying from “classical” document searches to new requests such as known-item searches (“where is the last e-mail sent to Paul?”), pros/cons of a given argument, searches for an expert in a given domain based on e-mails or other enterprise intranet document repositories, etc.).Specific users specifications could also be considered through identifying document length (varying from a short bibliographic notice to a large novel), the level of information needed (whole document, paragraph, single sentence or short summary), and the degree of editorial control (from newspaper articles to e-mails or webblogs). In this second part we also investigate and evaluate the impact of orthographic and vocabulary variations as well as the influence of extra-document information (e.g., document contexts, temporal information, links between documents within web or legal corpuses).

Third, we integrate the above two research objectives into a common task, in order to perform searches in a multilingual collection, starting with relatively well edited web pages (e.g., information made available from the European governments when using the EuroGOV corpus), or even less structured and less “polished” web pages (e.g., webblogs written in at least three different languages) or enterprise e-mails.
Direct link to Lay Summary Last update: 21.02.2013

Responsible applicant and co-applicants

Employees

Associated projects

Number Title Start Funding scheme
129535 Multilingual and Domain-Specific Information Retrieval 01.03.2011 Project funding (Div. I-III)
66742 Multilingual information retrieval 01.07.2002 Project funding (Div. I-III)
124389 Opinionated and Polarity IR 01.07.2009 Project funding (Div. I-III)

-