Why DNA changes occur where they do, the patterns created and identifying hotspots for mutation.
Genetic mutations provide the raw material for evolution, they are responsible heritable disease and they drive the development of cancer. We are interested in the processes that generate new mutations because this can provide important new insights into the biology of the genome: its replication, packaging and transcription. It also provides a means of disentangling pattens of mutation from those resulting as a consequence of selection. These patterns of selection can be incredibly useful in identifying functionally important regions of the genome. The signal of past selection is potentially a Rosetta stone for the interpretation of contemporary human genetic variation – discriminating those change that cause disease from those without consequence. However, unlike the Rosetta stone, the key pattens of mutation and selection are written over the top of each other and mutually confound one another. By identifying the mutational patterns we aim to subtract them from the convoluted signal to reveal the patterns imposed by selection.
Our work has shown that mutation rate can vary at a very fine scale and can specifically correlate with DNA function (Taylor et al. 2006; Reijns et al. 2015) and packing (Figure 1). If not accounted for, the resulting patterns can readily be misinterpreted as evidence for selection (Taylor et al. 2008), confounding our interpretation of the genome. In demonstrating one of the mechanistic processes that leads to local mutation rate fluctuation we have also confirmed the identity of the leading and lagging strand DNA polymerases and convincingly demonstrated that the polymerase-alpha generated DNA primers of lagging strand synthesis are retained at a low frequency in the fully replicated genome (Reijns et al, 2015).
Clusters of nucleotide changes and complex mutations encompassing multiple sites are a particular interest. Generated by non-homologous recombination, gene conversion, error prone repair and probably other processes that have not yet been discovered. Again these can be misinterpreted as a signature of selection but the imperfect copying and pasting of DNA sequence could be a major force in genome evolution, enabling complex multi-site changes to occur in a single mutational step.
The underlying principal of comparative genomics is to use the signal of past selection as an assay for biological function in genomic sequence. Natural selection can only act on genetic variation that manifests as phenotypic differences between individual organisms of a population It is a stringent filter, even a 0.001% reduction in reproductive success will lead to a polymorphism being reliably removed from most mammalian populations (Piganeau and Eyre-Walker, 2003). We apply advanced evolutionary models and population genetic principals to understand the processes of selection that have shaped genomic sequences, allowing us to detect and interpret the encoded functions.
Evaluating the evolutionary trends of like-annotated regions (Figure), en mass across the genome is a strategy that lends itself well to integration with a wide diversity of functional genomic data. The general applicability of this approach to compare any meaningfully constructed sets of genome sequence makes this aspect of the work an ideal interface for collaboration with research groups based in the laboratory or clinic.