I develop a number of mathematical and statistical models for the study of genomic variation within and between species. These variants affect an organism's susceptibility to genetic diseases and are also responsible for speciation events. In particular, my work focuses on the following questions: (1) How does DNA causing genomic variation proliferate through the genome of a species? and (2) for members of the same species, how can we leverage a priori information (i.e, relatedness and sparsity) to improve predictions of genomic variants?
I address these questions in the the context of using noisy and low-quality data. I begin with a review of models of DNA proliferation and detecting these genomic changes in Chapter 1. In Chapter 2, I answer the first question by developing a model which describes non-actively replicating repetitive elements in an organism's genome. Although they comprise a majority of many eukaryotic genomes, these elements are often ignored by models reviewed in Chapter 1. I answer the second question in Chapter 3 by developing a general optimization framework to detect genomic rearrangements in related individuals subject to different sequencing assumptions. In the context of limited and noisy data, this work is one of the only methods (to my knowledge) that simultaneously predicts variants in a group of individuals instead of post-processing this information. Chapter 4 describes some of the convergence properties of the methods introduced in the previous chapter, and Chapter 5 summarizes this work and future projects.