Back to overview

Retrieval Effectiveness Study with Farsi Language

Type of publication Peer-reviewed
Publikationsform Proceedings (peer-reviewed)
Publication date 2012
Author Akasereth Mitra, Savoy Jacques,
Project Multilingual and Domain-Specific Information Retrieval
Show all

Proceedings (peer-reviewed)

Title of proceedings CORIA 2012
Place Bordeaux


Having Farsi as the underlying language and using a test collection of 166,774 documents and 100 topics, this experiment evaluates the retrieval effectiveness of different IR models while using a light and a plural stemmer as well as n-grams and trunc-n indexing strategies. Moreover the impact of stoplist removal is evaluated. According to the obtained results the DFR-I(ne)C2 model is the best performing one. The proposed light and plural stemmer improve the retrieval performance compare to non-stemming approach. Indexing strategies trunc-4 and trunc-5 have also a positive impact on the performance while 3-grams and trunc-3 have the most negative impact on the results. The results reveal that for Farsi stoplist removal plays an important role in improving the retrieval performance. A query-byquery analysis on the results shows that avoiding extreme results would be possible by adding extra controls and rules, according to Farsi morphology, to the stemming algorithms