function; arthropods; genomics; biodiversity; evolution; ecology
Lukyanchikova Varvara, Nuriddinov Miroslav, Belokopytova Polina, Taskina Alena, Liang Jiangtao, Reijnders Maarten J. M. F., Ruzzante Livio, Feron Romain, Waterhouse Robert M., Wu Yang, Mao Chunhong, Tu Zhijian, Sharakhov Igor V., Fishman Veniamin (2022), Anopheles mosquitoes reveal new principles of 3D genome organization in insects, in
Nature Communications, 13(1), 1960-1960.
Reijnders Maarten J. M. F., Waterhouse Robert M. (2022), CrowdGO: Machine learning and semantic similarity guided consensus Gene Ontology annotation, in
PLOS Computational Biology, 18(5), e1010075-e1010075.
Boyd Bret M., Nguyen Nam-Phuong, Allen Julie M., Waterhouse Robert M., Vo Kyle B., Sweet Andrew D., Clayton Dale H., Bush Sarah E., Shapiro Michael D., Johnson Kevin P. (2022), Long-distance dispersal of pigeons and doves generated new ecological opportunities for host-switching and adaptive radiation by their parasites, in
Proceedings of the Royal Society B: Biological Sciences, 289(1970), 1.
Feron Romain, Waterhouse Robert M. (2022), Exploring new genomic territories with emerging model insects, in
Current Opinion in Insect Science, 100902-100902.
Feron Romain, Waterhouse Robert M (2022), Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes, in
GigaScience, 11, giac006.
Ruzzante Livio, Feron Romain, Reijnders Maarten J M F, Thiébaut Antonin, Waterhouse Robert M (2021), Functional constraints on insect immune system components govern their evolutionary trajectories, in
Molecular Biology and Evolution, 1.
Supporting data for "Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes"
Author |
Feron, Romain; Waterhouse, Robert |
Publication date |
05.01.2022 |
Persistent Identifier (PID) |
10.5524/100974 |
Repository |
GigaDB
|
Abstract |
Ambitious initiatives to coordinate genome sequencing of Earth’s biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. In order to guide forthcoming genome generation efforts and promote efficient prioritisation of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data.Here we present an automated analysis workflow that surveys genome assemblies from the United States National Center for Biotechnology Information (NCBI), assesses their completeness using the relevant Benchmarking Universal Single-Copy Orthologue (BUSCO) datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, we examine how key assembly metrics relate to gene content completeness, and we compare results from using different BUSCO lineage datasets.These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritisations for ongoing and future sampling, sequencing, and genome generation initiatives.
The comparative approach has been a cornerstone of biological research for centuries. By observing the similarities and differences in species morphologies, life histories, and ecologies, we have refined our understanding of species relationships and learned about their biology and evolution. The ability to obtain genetic blueprints in the form of whole genome sequences from many different species means that the resolution of our observations is now increasingly enhanced and extended: to thousands of genes and billions of nucleotides. Comparisons that identify genetic and genomic similarities and differences have disentangled complex species relationships and advanced understanding of biological function and evolutionary processes at molecular, organismal, and ecosystem levels. The diversity of organisms on Earth today represents many varied evolutionary solutions to life’s challenges. Their common ancestry means that the comparative approach offers the opportunity to reconstruct the history of events that have resulted in currently observable genomic and biological diversity. The questions that this research project therefore seeks to address centre on: (i) what kinds of genomic changes or innovations lead to or facilitate which kinds of life history adaptations and species diversifications? And (ii) what are the rules or constraints governing how genomic evolution proceeds in a manner that maintains viability while allowing or even promoting the generation of biological diversity?Studies exemplifying the state-of-the-art in the field of characterising the genomic basis of animal diversity show that complex gene repertoire evolutionary gain and loss dynamics likely provide the substrate from which diversity can emerge. Early radiations were accompanied by gene family innovations, conservation, and losses that likely underlie key functional shifts in multicellularity, development, homeostasis, and immunity. In younger lineages too, genetic and genomic changes can be associated with adaptations and diversifications such as enhanced immunity and trade-offs amongst vision, smell, and echolocation abilities of bats. As the most successful group of terrestrial animals, arthropods demonstrate a vast array of evolutionary adaptations to exploit ecological niches across many different ecosystems. This diversity makes their biology fascinating to study, as models for understanding fundamental biological processes, as well as because of the impacts they have on humans and their critical roles in maintaining a healthy planet. This, together with accumulating arthropod genomics resources, make them ideal for investigating how conservation or divergence and gains or losses of functional genomic elements give rise to the splendour of animal biology.This project therefore aims to (i) apply computational comparative approaches to infer evolutionary histories of genomic elements and quantify genomic innovations across arthropods; (ii) apply machine learning and text mining methods to enhance gene- and organismal-level functional characterisations; (iii) use these quantifications and characterisations to learn the rules governing how genomic evolution proceeds and how it generates functional-biological diversity; and (iv) apply the same challenges (infections) to multiple species of Drosophila flies to test the functional implications of genomic innovations in terms of gene expression on a system-level biological response, i.e. immunity.By addressing these aims, this project will establish a new set of quantifiable evolutionary features with which the field can characterise element evolutionary histories and genomic innovations in order to explore how these changes relate to observable phenotypic differences. The research will lead to advancements in the field on the one hand covering comparative genomics methodologies, and on the other providing insights into how evolution works and how biodiversity is generated and maintained. Results will demonstrate the importance and highlight the benefits of applying genomic technologies at increasingly complex scales to understand the diversity of life on Earth.