Text Mining; Bioinformatics; Biocuration; Protein functions
Mottin Luc, Gobeill Julien, Pasche Emilie, Michel Pierre-André, Cusin Isabelle, Gaudet Pascale, Ruch Patrick (2016), neXtA5: accelerating annotation of articles via automated approaches in neXtProt., in
Database : the journal of biological databases and curation, 2016, 1-9.
Gobeill Julien, Gaudinat Arnaud, Pasche Emilie, Vishnyakova Dina, Gaudet Pascale, Bairoch Amos, Ruch Patrick (2015), Deep Question Answering for protein annotation., in
Database : the journal of biological databases and curation, 2015, 1-9.
Mottin Luc, Triage by Ranking to Support the Curation of Protein Interactions, in
Database.
We aim to develop new methods to fully integrate text mining and biocuration instruments. Text mining tools are commonly used by biocurators. Such a usage is often achieved by integrating more or less optionally some text mining tools (search engines, named-entity recognizers...) in the end-user workflow. neXtpresso intends to built an integrated solution, which will support protein annotators in finding data that is (1) supported by experimental data; (2) specific; (3) non-redundant (4) of high confidence. The access to these data will be prioritized according to a flexible annotation model directly derived from the neXtprot database, which comprehensively cover proteomics-related entities such as protein, cells, variants, diseases, anatomy, and Gene Ontology axis (biological processes, molecular functions and subcellular locations). The ranking algorithms will be designed as a multimodal protein-centric search task, where the users will be uniquely offered the possibility to exclude any facts he wants. Such an original exclusion function will thus make possible to account for well-known and/or already curated relationships, including contradictory statements. Finally, the integration of the resulting novelty tracking platform in the CALIPHO annotation solution will be comprehensively evaluated.