The goal is to develop a fast and accurate joint alignment and tree inference algorithm in the frequentist framework, which will be implemented in a user-friendly software package and applicable to large genomic and metagenomic datasets with of sequences. We will connect our recent successful methods implemented in independent packages: CodonPhyML for fast maximum likelihood phylogeny inference for protein-coding genes and ProGraphMSA for fast probabilistic graph-based phylogeny-aware alignment. To circumvent the computational difficulties, we will use the Poisson indel process - a modification of the classical model with a linear time complexity. High performance computing will ensure that the implementation is optimized for memory usage and speed using parallelization.
The new method will support the phylogenetic analyses of genomic data with thousands of sequences from microbial pathogens or antibody data from infected donors. Based on our own current collaborations with industry, the new method promises to be in high demand not only in academic projects but also in pharmaceutical and biotech industry.