Back to overview

OGER++: hybrid multi-type entity recognition

Type of publication Peer-reviewed
Publikationsform Original article (peer-reviewed)
Author Furrer Lenz, Jancso Anna, Colic Nicola, Rinaldi Fabio,
Project MelanoBase
Show all

Original article (peer-reviewed)

Journal Journal of Cheminformatics
Publisher BioMed Central
Volume (Issue) 11(1)
Page(s) 7 - 7
Title of proceedings Journal of Cheminformatics
DOI 10.1186/s13321-018-0326-3

Open Access

Type of Open Access Publisher (Gold Open Access)


Background: We present a text-mining tool for recognizing biomedical entities in scientific literature. OGER++ is a hybrid system for named entity recognition and concept recognition (linking), which combines a dictionary-based annotator with a corpus-based disambiguation component. The annotator uses an efficient look-up strategy combined with a normalization method for matching spelling variants. The disambiguation classifier is implemented as a feed-forward neural network which acts as a postfilter to the previous step. Results: We evaluated the system in terms of processing speed and annotation quality. In the speed benchmarks, the OGER++ web service processes 9.7 abstracts or 0.9 full-text documents per second. On the CRAFT corpus, we achieved 71.4\% and 56.7\% F1 for named entity recognition and concept recognition, respectively. Conclusions: Combining knowledge-based and data-driven components allows creating a system with competitive performance in biomedical text mining.