Back to overview

Progressive Multiple Sequence Alignment With The Poisson Indel Process

Type of publication Not peer-reviewed
Publikationsform Original article (non peer-reviewed)
Author Maiolo Massimo, Zhang Xiaolei, Gil Manuel, Anisimova Maria,
Project Fast joint estimation of alignment and phylogeny from genomic sequences in a frequentist framework
Show all

Original article (non peer-reviewed)

Journal bioRxiv
Publisher Cold Spring Harbor Laboratory
Page(s) 123513 - 123513
Title of proceedings bioRxiv

Open Access

Type of Open Access Repository (Green Open Access)


Sequence alignment is crucial in genomics studies. However, optimal multiple sequence alignment (MSA) is NP-hard, so that exact algorithms become impractical for more than a few sequences. Thus, modern MSA methods employ progressive heuristics, breaking the problem into a series of pairwise alignments guided by a phylogeny. Changes between homologous characters are typically modelled by a Markov substitution model. In contrast, the dynamics of indels are not modelled explicitly, because the computation of the marginal likelihood under such models has exponential time complexity in the number of taxa. Recently, the classical indel model TKF91 was modified to describe indel evolution on a phylogenetic tree via a Poisson process, termed PIP. PIP allows to compute the joint marginal probability of an MSA and tree in linear time. Results: We present a new dynamic programming algorithm to align two MSAs –represented by the underlying homology paths– by maximum likelihood under PIP in polynomial time, and apply it progressively along a guide tree. Our method is the first polynomial time progressive MSA method using a rigorous mathematical formulation of an evolutionary indel process.We have corroborated the correctness of our method by simulation, and compared it with competitive methods on an illustrative real dataset. Availability: Our new algorithm is implemented in C++ as standalone program. The source code is freely available on GitHub at