Projekt

Zurück zur Übersicht

Semi-automated semantic enrichment of biomedical literature

Gesuchsteller/in Rinaldi Fabio
Nummer 130558
Förderungsinstrument Projekte
Forschungseinrichtung Institut für Computerlinguistik Universität Zürich
Hochschule Universität Zürich - ZH
Hauptdisziplin Weitere Sprachen
Beginn/Ende 01.08.2010 - 31.07.2014
Bewilligter Betrag 279'484.00
Alle Daten anzeigen

Alle Disziplinen (3)

Disziplin
Weitere Sprachen
Informatik
Molekularbiologie

Keywords (8)

text mining; molecular biology; syntactic parsing; semi-automated semantic annotation; information extraction; natural language processing; knowledge management; literature curation

Lay Summary (Englisch)

Lead
Lay summary
The understanding of the complex mechanism that support life, and are responsible for health or disease at the cellular level, promises to unveil new technologies that will benefit the health of every citizen, and support the growth of novel industries. The potential of Text Mining applications in the life sciences is widely acknowledged. Scientists and companies are looking for solutions that allow the automated processing of scientific literature. Text Mining tools aim at supporting the process of knowledge gaining from the literature, by supporting the search for relevant articles, the semi-automated extraction of relevant passages, and the transformation of the information from the textual format to a semantic representation.The SASEBio project focuses on the study and development of supporting tools for the semi-automated enrichment of biomedical literature and related documentation, in particular their potential role in the professional literature curation process. Curation is the activity performed by professionals who are paid to read the literature in search of particular items of information and store such information in public databases, which can be accessed later by the biologists.The project deals with entities and relationships that form complex Knowledge graphs, which are gaining relevance to initiate, guide and confirm research in the biosciences.The project leverages on the latest results in Natural Language Processing techniques (combining statistical approaches with human language expertise), in particular information extraction, including biomedical entity and relation detection. While fully unsupervised extraction of information of the literature is currently still unrealistic, text mining tools are already sufficiently reliable to be used to provide hints to the curators, in order to speed up their activities.The results of the project are of academic and industrial relevance for the knowledge management process. The pharmaceutical industry is facing an ever growing complexity of the knowledge space, e.g. genes, proteins and their relationships, as described in the literature. Various research groups maintain legacy systems containing valuable data which cannot be accessed because there is no shared semantic layer. The collaboration between the NITAS/TMS group at Novartis and the OntoGene group at the University of Zurich offers an interesting opportunity to successfully contribute to the solution of the data federation problem. Combining text mining and annotation tools, with the support of a novel web-based text mining application, will accelerate the generation or enrichment of the biomedical data.
Direktlink auf Lay Summary Letzte Aktualisierung: 21.02.2013

Verantw. Gesuchsteller/in und weitere Gesuchstellende

Mitarbeitende

Publikationen

Publikation
Assisted curation of experimental methods in RegulonDB
(2014), Assisted curation of experimental methods in RegulonDB, in Proceedings of BioCuration 2014, Toronto, Canada..
Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12
(2014), Assisted curation of regulatory interactions and growth conditions of OxyR in E. coli K-12, in Database: The Journal of Biological Databases and Curation, bau049, bau049.
BioC Implementations in Go, Perl, Python and Ruby
(2014), BioC Implementations in Go, Perl, Python and Ruby, in Database: The Journal of Biological Databases and Curation, bau059, bau059.
BioC Interoperability Track Overview
(2014), BioC Interoperability Track Overview, in Database: The Journal of Biological Databases and Curation, bau049.
Collection-Wide Extraction of Protein-Protein Interactions
(2014), Collection-Wide Extraction of Protein-Protein Interactions, in Proceedings of The Sixth International Symposium on Semantic Mining in Biomedicine (SMBM), Aveiro, Portugal.
Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction
(2014), Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction, in Proceedings of LREC 2014, Reykjavik, Iceland.
Assisted curation of growth conditions that affect gene expression in E. coli K-12
(2013), Assisted curation of growth conditions that affect gene expression in E. coli K-12, in Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, Bethesda, Maryland, 1, 1.
Assisted editing in the biomedical domain: motivation and challenges.
(2013), Assisted editing in the biomedical domain: motivation and challenges., in Proceedings of DocEng 2013, Florence, Italy, September 10-12, 2013.
BIoC: a minimalist approach to interoperability for biomedical text processing
(2013), BIoC: a minimalist approach to interoperability for biomedical text processing, in The Journal of Biological Databases and Curation, bat064, bat064.
Digital Curation Experiments for RegulonDB
(2013), Digital Curation Experiments for RegulonDB, in Proceedings of the BioCuration conference, 2013, Cambridge, UK.
How preferred are preferred terms?
(2013), How preferred are preferred terms?, in Proceedings of the eLex 2013 conference.
ODIN: a customizable literature curation tool
(2013), ODIN: a customizable literature curation tool, in Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, Bethesda, Maryland, 1, 1.
OntoGene: CTD entity and action term recognition
(2013), OntoGene: CTD entity and action term recognition, in Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, Bethesda, Maryland, 1, 1.
PyBioC: a python implementation of the BioC core.
(2013), PyBioC: a python implementation of the BioC core., in Proceedings of the Fourth BioCreative Challenge Evaluation Workshop, Bethesda, Maryland, 1, 1.
The OntoGene literature mining web service
(2013), The OntoGene literature mining web service, in EMBnet.journal, 19(Suppl B), 32-35.
Using the OntoGene pipeline for the triage task of BioCreative 2012
(2013), Using the OntoGene pipeline for the triage task of BioCreative 2012, in The Journal of Biological Databases and Curation, Oxford Journals, bas053.
Using the OntoGene pipeline for the triage task of BioCreative 2012
(2013), Using the OntoGene pipeline for the triage task of BioCreative 2012, in The Journal of Biological Databases and Curation, Oxford Journals, bas053.ful.
UZH in the BioNLP 2013 GENIA Shared Task
(2013), UZH in the BioNLP 2013 GENIA Shared Task, in Proceedings of the BioNLP workshop, ACL 2013, Sofia, Bulgaria.
Change of Biomedical Domain Terminology Over Time
(2012), Change of Biomedical Domain Terminology Over Time, in Proc. of 5th Baltic Conf. On Human Language Technologies, Tartu, Estonia.
Dependency parsing for interaction detection in pharmacogenomics
(2012), Dependency parsing for interaction detection in pharmacogenomics, in Proceedings of LREC 2012: The eighth international conference on Language Resources and Evaluation.
Notes about the OntoGene pipeline
(2012), Notes about the OntoGene pipeline, in AAAI-2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington, Virginia, USA.
ODIN: Advanced Text Mining in Support of the Curation Process
(2012), ODIN: Advanced Text Mining in Support of the Curation Process, in Pacific Symposium on Biocomputing (PSB).
Proceedings of the Fifth International Symposium for Semantic Mining in Biomedicine (SMBM)
Proceedings of the Fifth International Symposium for Semantic Mining in Biomedicine (SMBM), (2012), Proceedings of the Fifth International Symposium for Semantic Mining in Biomedicine (SMBM).
Ranking of CTD articles and interactions using the OntoGene pipeline
(2012), Ranking of CTD articles and interactions using the OntoGene pipeline, in Proceedings of the 2012 {BioCreative} workshop.
Ranking relations between diseases, drugs and genes for a curation task
(2012), Ranking relations between diseases, drugs and genes for a curation task, in Journal of Biomedical Semantics, 3(Suppl 3), 5-5.
Relation Mining Experiments in the Pharmacogenomics Domain
(2012), Relation Mining Experiments in the Pharmacogenomics Domain, in Journal of Biomedical Informatics, 45(5), 851-861.
Term evolution: use of biomedical terminologies
(2012), Term evolution: use of biomedical terminologies, in AAAI-2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, Arlington, Virginia, USA.
The OntoGene system: an advanced information extraction application for biological literature
(2012), The OntoGene system: an advanced information extraction application for biological literature, in EMBnet.journal, 18(Suppl B), 47-49.
Using biomedical databases as knowledge sources for large-scale text mining
(2012), Using biomedical databases as knowledge sources for large-scale text mining, in E-LKR workshop, SEPLN 2012, Castellon de la Plana, Spain.
Using ODIN for a PharmGKB re-validation experiment
(2012), Using ODIN for a PharmGKB re-validation experiment, in Database: The Journal of Biological Databases and Curation, bas021-bas021.
Using syntax features and document discourse for relation extraction on PharmGKB and CTD
(2012), Using syntax features and document discourse for relation extraction on PharmGKB and CTD, in SMBM 2012, Zurich.
A data-driven approach to alternations based on protein-protein interactions
(2011), A data-driven approach to alternations based on protein-protein interactions, in 3rd Congreso Internacional de Lingüística de Corpus .
An incremental model for the coreference resolution task of BioNLP 2011
(2011), An incremental model for the coreference resolution task of BioNLP 2011, in Proceedings of the BioNLP11 shared task. .
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus
(2011), Assessment of NER solutions against the first and second CALBC Silver Standard Corpus, in Journal of Biomedical Semantics, 2(Suppl 5), 11-11.
BioCreative III Interactive Task: an Overview
(2011), BioCreative III Interactive Task: an Overview, in BMC Bioinformatics, special issue on BioCreative III, -, S4.
Detection of interaction articles and experimental methods in biomedical literature.
(2011), Detection of interaction articles and experimental methods in biomedical literature., in BMC Bioinformatics, special issue on BioCreative III, -, S13.
Mining complex Drug/Gene/Disease relations in PubMed
(2011), Mining complex Drug/Gene/Disease relations in PubMed, in Pacific Symposium on Biocomputing.
Ranking Interactions for a Curation Task
(2011), Ranking Interactions for a Curation Task, in 10th International Conference on Machine Learning and Applications and Workshops, 2, 2.
SASEBio: Semi-Automated Semantic Enrichment of the Biomedical Literature
(2011), SASEBio: Semi-Automated Semantic Enrichment of the Biomedical Literature, in 1st International SystemsX.ch Conference on Systems Biology.
Terminological resources for Text Mining over Biomedical Scientific Literature
(2011), Terminological resources for Text Mining over Biomedical Scientific Literature, in Journal of Artificial Intelligence in Medicine, 52(2), 107-114.
The Gene Normalization Task in BioCreative III. BMC Bioinformatics, special issue on BioCreative III
(2011), The Gene Normalization Task in BioCreative III. BMC Bioinformatics, special issue on BioCreative III, in BMC Bioinformatics, special issue on BioCreative III, -, S2.
The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text
(2011), The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text, in BMC Bioinformatics, special issue on BioCreative III, -, S3.
Towards mature use of semantic resources for biomedical analyses
(2011), Towards mature use of semantic resources for biomedical analyses, in Journal of Biomedical Semantics, 2(Suppl 5), 1-1.
Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. Semantic Mining in Medicine, EBI, Cambridge,
(2010), Assessment of NER solutions against the first and second CALBC Silver Standard Corpus. Semantic Mining in Medicine, EBI, Cambridge,, in Semantic Mining in Medicine, 2010.
ODIN: An Advanced Interface for the Curation of Biomedical Literature
(2010), ODIN: An Advanced Interface for the Curation of Biomedical Literature, in The Conference of the International Society for Biocuration 2010.
OntoGene (Team 65): preliminary analysis of participation in BioCreative III
(2010), OntoGene (Team 65): preliminary analysis of participation in BioCreative III, in Proceedings of BioCreative III workshop.
OntoGene in BioCreative II.5
(2010), OntoGene in BioCreative II.5, in IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(3), 472-480.
Proceedings of the Fourth International Symposium for Semantic Mining in Biomedicine (SMBM)
Proceedings of the Fourth International Symposium for Semantic Mining in Biomedicine (SMBM), (2010), Proceedings of the Fourth International Symposium for Semantic Mining in Biomedicine (SMBM).
OntoGene at CALBC II and Some Thoughts on the Need of Document-Wide Harmonization
, OntoGene at CALBC II and Some Thoughts on the Need of Document-Wide Harmonization, in Proceedings of the CALBC II workshop.
OntoPDF: using a text mining pipeline to generate enriched pdf versions of scientific papers
, OntoPDF: using a text mining pipeline to generate enriched pdf versions of scientific papers, in Proceedings of The Sixth International Symposium on Semantic Mining in Biomedicine (SMBM), Aveiro, Portugal.
OntoRest: Text Mining Web Services in BioC Format
, OntoRest: Text Mining Web Services in BioC Format, in Proceedings of The Sixth International Symposium on Semantic Mining in Biomedicine (SMBM), Aveiro, Portugal.
Proceedings of the 5th International Symposium on Languages in Biology and Medicine (LBM 2013)
Proceedings of the 5th International Symposium on Languages in Biology and Medicine (LBM 2013), , Proceedings of the 5th International Symposium on Languages in Biology and Medicine (LBM 2013).
The OntoGene literature mining web service
, The OntoGene literature mining web service, in BMC Bioinformatics.

Zusammenarbeit

Gruppe / Person Land
Formen der Zusammenarbeit
RegulonDB group, Center for Genomic Sciences, UNAM, Mexico Mexiko (Nordamerika)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Publikation
- Forschungsinfrastrukturen
- Austausch von Mitarbeitern
Data Science Group (Pharma Research and Early Development Informatics) at Hoffmann-La Roche Schweiz (Europa)
- vertiefter/weiterführender Austausch von Ansätzen, Methoden oder Resultaten
- Publikation
- Industrie/Wirtschaft/weitere anwendungs-orientierte Zusammenarbeit

Wissenschaftliche Veranstaltungen

Aktiver Beitrag

Titel Art des Beitrags Titel des Artikels oder Beitrages Datum Ort Beteiligte Personen
LREC 2014 Vortrag im Rahmen einer Tagung Using Large Biomedical Databases as Gold Annotations for Automatic Relation Extraction 26.05.2014 Reykjavik, Island Ellendorff Tilia;
BioCuration 2014 Vortrag im Rahmen einer Tagung Assisted curation of experimental methods in RegulonDB 07.04.2014 Toronto, Kanada Rinaldi Fabio;
NETTAB workshop on Semantic, Social, and Mobile Applications for Bioinformatics and Biomedical Laboratories Vortrag im Rahmen einer Tagung The OntoGene literature mining web service 16.10.2013 Venice, Italien Rinaldi Fabio;
Fourth BioCreative Challenge Evaluation Vortrag im Rahmen einer Tagung OntoGene: CTD entity and action term recognition 07.10.2013 Washington, Vereinigte Staaten von Amerika Rinaldi Fabio;
Fourth BioCreative Challenge Evaluation Vortrag im Rahmen einer Tagung Assisted curation of growth conditions that affect gene expression in E. coli K-12 07.10.2013 Bethesda, Maryland, Vereinigte Staaten von Amerika Rinaldi Fabio;
Fourth BioCreative Challenge Evaluation Vortrag im Rahmen einer Tagung PyBioC: a python implementation of the BioC core 07.10.2013 Bethesda, Maryland, Vereinigte Staaten von Amerika Rinaldi Fabio;
DocEng 2013 Vortrag im Rahmen einer Tagung Assisted Editing in the Biomedical Domain: Motivation and Challenges 10.09.2013 Florence, Italien Rinaldi Fabio;
BioCuration 2013 Vortrag im Rahmen einer Tagung Digital Curation Experiments for RegulonDB 10.04.2013 Cambridge, Grossbritannien und Nordirland Rinaldi Fabio;
NETTAB workshop on Integrated Bio-Search Vortrag im Rahmen einer Tagung The OntoGene system: an advanced information extraction application for biological literature 14.11.2012 Como, Italien Rinaldi Fabio;
AAAI-2012 Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text Vortrag im Rahmen einer Tagung Notes about the OntoGene pipeline. 02.11.2012 Arlington, Virginia, Vereinigte Staaten von Amerika Rinaldi Fabio;
E-LKR workshop, SEPLN 2012 Vortrag im Rahmen einer Tagung Using biomedical databases as knowledge sources for large-scale text mining 07.09.2012 Castellon de la plana, Spanien Rinaldi Fabio;
SMBM 2012 Vortrag im Rahmen einer Tagung Using syntax features and document discourse for relation extraction on PharmGKB and CTD 03.09.2012 Zurich, Schweiz Schneider Gerold;
The eighth international conference on Language Resources and Evaluation Vortrag im Rahmen einer Tagung Dependency parsing for interaction detection in pharmacogenomics 21.05.2012 Istanbul, Türkei Rinaldi Fabio;
BioCreative-2012 Workshop Vortrag im Rahmen einer Tagung Ranking of CTD articles and interactions using the OntoGene pipeline 16.04.2012 Washington, D.C., Vereinigte Staaten von Amerika Rinaldi Fabio;
The Pacific Symposium on Biocomputing (PSB) 2012 Poster ODIN: Advanced Text Mining in Support of the Curation Process 03.01.2012 Hawaii, Vereinigte Staaten von Amerika Rinaldi Fabio;
The Tenth International Conference on Machine Learning and Applications Vortrag im Rahmen einer Tagung Ranking interactions for a curation task 21.12.2011 Hawaii, Vereinigte Staaten von Amerika Rinaldi Fabio;
1st International SystemsX.ch Conference on Systems Biology Poster SASEBio: Semi-Automated Semantic Enrichment of the Biomedical Literature 24.10.2011 Basel, Schweiz Rinaldi Fabio;
3rd Congreso Internacional de Lingüística de Corpus Vortrag im Rahmen einer Tagung A data-driven approach to alternations based on protein-protein interactions 07.04.2011 Valencia, Spanien Schneider Gerold;
CALBC II workshop Vortrag im Rahmen einer Tagung OntoGene at CALBC II and Some Thoughts on the Need of Document-Wide Harmonization 16.03.2011 EBI, Cambridge, UK, Grossbritannien und Nordirland Clematide Simon; Rinaldi Fabio;
Mining the Pharmacogenomics Literature, Pacific Symposium on Biocomputing Vortrag im Rahmen einer Tagung Mining complex Drug/Gene/Disease relations in PubMed. 03.01.2011 Hawaii , Vereinigte Staaten von Amerika Rinaldi Fabio;
Biocuration 2010 Poster ODIN: An Advanced Interface for the Curation of Biomedical Literature 11.10.2010 Tokyo, Japan Rinaldi Fabio;
BioCreative III workshop Vortrag im Rahmen einer Tagung OntoGene (Team 65): preliminary analysis of participation in BioCreative III 13.09.2010 Bethesda, Maryland, Vereinigte Staaten von Amerika Rinaldi Fabio;


Selber organisiert

Titel Datum Ort
Fourth International Symposium on Semantic Mining in Biomedicine (SMBM 2010) 11.10.2010 European Bioinformatics Institute, Hinxton, Cambridgeshire, UK, Grossbritannien und Nordirland

Verbundene Projekte

Nummer Titel Start Förderungsinstrument
162758 MelanoBase 01.03.2016 Resource not found: 'a5e0ec10-fb3b-4ced-b411-92ae91d16a3d'
125509 A comparative study of syntactic parsers and their applications in biomedical text mining 01.07.2009 Resource not found: '5ee1e5b7-e49f-428b-ad68-1f31cf3901c2'
118396 Detection of biological interactions from biomedical literature 01.04.2008 Projektförderung (Abt. I-III)

Abstract

SASEBio: Semi-Automated Semantic Enrichment of Biomedical LiteratureThe OntoGene group at the University of Zurich has developed efficienttechniques for text mining in the molecular biology domain. One oftheir core interests in recent years has been the detection of mentionsof protein-protein interactions. Using the IntAct database as a goldstandard, they have developed techniques for the identification ofinformation relevant to the process of curation, such as theexperimental methods used by the authors [1], the organism which arehosts of the experiment and which contribute the interacting proteins[2], the protein themselves [3], and their interactions [4].The effectiveness of their approach has been validated byparticipation to numerous shared evaluations, such as BioCreative II[5], BioNLP event extraction task [6], and BioCreative II.5[forthcoming]. Recently, in collaboration with the NITAS group atNovartis, they have developed an interesting prototype of anenvironment supporting the process of semi-automated semanticenrichment of the literature. The environment allows an expert user toefficiently revise annotations suggested by the system, or to add newannotations where the system missed an entity or an interaction. Thesystem is also capable of reusing the annotations added by the expertin subsequent applications, using a process of incremental learning.The SASEBio project aims at consolidating the existing text miningactivities of the OntoGene group, by further improving their relationextraction techniques, and applying them to new areas, within thecontext of the literature curation process. New types of interactions,such as drug/diseases (of particular interest to their industrialpartner) will be considered, along with incremental improvements totheir existing techniques for protein-protein interaction detection(of potential interest to the IntAct group at EBI). As in the past,their techniques will be subject to community-based evaluation throughparticipation in shared text mining challenges.Additionally, the project offers an opportunity to turn the existingsemi-automated annotation prototype into a fully fledged system whichcan then be employed by the target user groups. Intensivecollaborations with both NITAS and EBI will be sought at all stagesof development, in particular to guarantee a continuous feedback onthe effective usability of the proposed tools.References[1] Thomas Kappeler, Simon Clematide, Kaarel Kaljurand, GeroldSchneider, Fabio Rinaldi. Towards Automatic Detection of ExperimentalMethods from Biomedical Literature. Third International Symposium onSemantic Mining in Biomedicine (SMBM 2008).[2] Thomas Kappeler, Kaarel Kaljurand, Fabio Rinaldi. TX Task:Automatic Detection of Focus Organisms in BiomedicalPublications. BioNLP workshop, NAACL/HLT, Boulder, Colorado 2009.[3] Kaarel Kaljurand, Fabio Rinaldi, Thomas Kappeler, GeroldSchneider. Using existing biomedical resources to detect and groundterms in biomedical literature. Artificial Intelligence in Medicine,Verona, July 2009.[4] Gerold Schneider, Kaarel Kaljurand, Thomas Kappeler, FabioRinaldi. Detecting protein-protein interactions in biomedical textsusing a parser and linguistic resources. CICLING 2009.[5] Fabio Rinaldi, Thomas Kappeler, Kaarel Kaljurand, GeroldSchneider, Manfred Klenner, Simon Clematide, Michael Hess, Jean-Marcvon Allmen, Pierre Parisot, Martin Romacker, Therese Vachon. OntoGenein BioCreative II. Genome Biology, 2008, 9:S13.[6] Kaarel Kaljurand, Gerold Schneider and Fabio Rinaldi. A dependencybased approach to the BioNLP 2009 Shared Task. BioNLP workshop,NAACL/HLT, Boulder, Colorado, 2009
-