First of all, the project will focus on developing tools capable of organising genomic data and deducing comparable biological elements from it, such as genes that are similar between different species. Using different types of genomic data, these tools will make it possible to analyse more species, which is important for gaining a better understanding of the processes involved in the evolution of species. The second area of focus will consist in developing new machine learning algorithms capable of identifying which of the tens of thousands of genes present in the genomes show the most interesting characteristics. Studying them in depth with the help of modelling methods will enable their interactions and evolution to be understood.
Identifying the genes that are key to an organism’s development enables scientists to determine which genes relate to functions that are essential to the organism’s survival. In medicine, for example, it is vital to know whether a gene identified in a model organism such as a mouse has the same function in human beings. Answering questions of this kind requires complex computing methods and high-quality data. Such questions are therefore restricted to a small number of organisms that have been studied in great depth and ignore the enormous quantity of poorer-quality data that is currently being generated.
The project’s scope is in full conformity with the issue of Big Data, since it addresses the size, heterogeneity and quality of genomic data in biology. It also has implications that go beyond this single discipline, since establishing approaches for managing and comparing data is essential in other fields, such as language analysis. Moreover, machine learning is a key component of computational sciences.