graphical models; haplotype inference; HIV-1; virus evolution; next-generation sequencing; genome assembly; antiretroviral therapy; ultra-deep sequencing; drug resistance; Bayesian statistics
(2016), A Comprehensive Analysis of Primer IDs to Study Heterogeneous HIV-1 Populations, in Journal of Molecular Biology
, 428(1), 238-250.
(2015), A framework for inferring fitness landscapes of patient-derived viruses using quasispecies theory., in Genetics
, 199(1), 191-203.
(2014), Challenges in RNA virus bioinformatics., in Bioinformatics (Oxford, England)
, 30(13), 1793-9.
(2014), Viral quasispecies assembly via maximal clique enumeration., in PLoS Computational Biology
, 10(3), 1003515-1003515.
(2013), HIV Haplotype Inference Using a Propagating Dirichlet Process Mixture Model., in IEEE/ACM transactions on computational biology and bioinformatics / IEEE, ACM
, 11(1), 182-191.
(2013), Next-Generation Sequencing of HIV-1 RNA Genomes: Determination of Error Rates and Minimizing Artificial Recombination, in PLoS ONE
, 8(9), e74249-e74249.
, Full-length haplotype reconstruction to infer the structure of heterogeneous virus populations., in Nucleic acids research
Genetic diversity is a hallmark of pathogen populations, associated with disease progression, immune escape, and drug resistance. We propose to use next-generation sequencing (NGS) for the analysis of viral populations within infected patients to improve viral diagnostics and treatment decisions by reconstructing the entire population structure, including low-frequency mutants and co-occurring mutations.Based on the achievements in the predecessor project (CR32I2_127017), we will develop experimental protocols and computational methods for the analysis of NGS data obtained from intra-patient HIV-1 populations. Our goal is to infer the haplotype sequences and their frequencies over the full length of the 9.2 kb HIV genome. Since the limited read length of current NGS platforms turned out to be the main bottleneck in this endeavor, we will develop two major extensions of our software tools ‘PredictHaplo’ and ‘QuasiRecomb’ to address this limitation. Firstly, we will use Illumina’s paired-end option to generate read pairs of 2x250bp length covering the HIV genome. These data are informative about long-range phasing of single-nucleotide variants (SNVs) and will be integrated into probabilistic global haplotype reconstruction using either soft constraints on SNV linkage (PredictHaplo) or silent delete states for the insert (QuasiRecomb). Secondly, we will explore Pacific Biosciences’ PacBio RS technology as an alternative long-read sequencing platform with an average read length of 1,500 bp. Using these improved experimental and computational tools, we will analyze and interpret genetic diversity in the context of HIV-1 drug resistance. Specifically, 100 samples from 40 patients will be analyzed for pre- versus post-treatment changes of viral populations, for low-frequency drug resistant variants and their role in treatment failure, for linkage among drug resistance mutations and evolutionary escape pathways, for recombinants, and for viral phenotypes such as drug resistance and co-receptor usage. Our full-length haplotype approach provides, for the first time, a complete picture of the virus population and will yield new insights into drug resistance development.