Genetic mutations in HIV have been found to impact all aspects of the disease, including overall progression, clinical outcomes and treatment efficacy, and the failure of treatments that previously had been effective. Given the importance of DNA mutations in HIV progression and treatment, it is critical to develop a better understanding of the genetic diversity in HIV. The introduction of low-cost, high-throughput DNA sequencing has dramatically improved our ability to characterize HIV GD. While many HIV genomic experiments have used either short or long sequencing reads to quantify HIV GD, both types of reads have critical limitations. To address these limitations, this study proposes to develop and evaluate a new, combined sequencing approach that uses both short and long reads for characterizing HIV GD.
In Aim 1, the study will develop short and long read HIV sequence analysis workflows for the Galaxy platform. Galaxy is an open-source, Web-based platform for biomedical data analysis that is used throughout the world by tens of thousands of scientists. These workflows will provide an important resource to the DC CFAR and broader HIV community for analyzing their sequence data. In Aim 2, the researcher proposes to build on the workflows developed in Aim 1 to develop and evaluate a combined short and long read sequence analysis workflow in Galaxy for characterizing HIV genetic diversity. Using both experimental and computational metrics, they will then compare performance of the combined analysis approach to short-read and long-read only approaches.