Reviewer #3 (Public Review):
Sasani et al. develop and implement a new method for mutator allele discovery in the BXD mouse population. This new "IHD" method carries several notable strengths, including the ability to aggregate de novo mutations across individuals to reduce data sparsity and to combine mutation rate frequencies across multiple nucleotide contexts into a single estimate. These advantages may render the IHD method better suited to mutator discovery under certain scenarios, as compared to conventional QTL or association mapping. Overall, the theoretical premise of the IHD method is judged to be both strong and innovative, and careful simulation studies benchmark its power.
The authors then apply their method to the BXD mouse recombinant inbred mapping population. As proof-of-principle, they first successfully re-identify a known mutator locus in this population on chr4. Next, to assess possible genetic interactions involving this known mutator, Sasani et al. condition on the chr4 mutator genotype and reimplement the IHD scan. This strategy led them to identify a second locus on chr6 that interacts epistatically with the chr4 locus; mice with "D" alleles at both loci exhibit a significantly increased burden of C>A de novo mutations, even though mice with the D allele at the chr6 locus alone show no appreciable increase in the C>A mutation fraction. This exciting discovery not only adds to the catalog of known mutator alleles, but also reveals key aspects of mutator biology. Notably, this finding reinforces the hypothesis that segregating variants in genes associated with DNA repair influence germline mutation spectra. Further, Sasani et al.'s findings suggest that some mutators may lie dormant until recombined onto a permissive genetic background. This discovery could have intriguing implications for the evolution of mutators in natural populations.
Despite a high level of overall enthusiasm for this work, some weaknesses are identified in the IHD method, approach for nominating candidate genes within the newly identified chr6 locus, and the authors' conclusions.
Under simulated scenarios, the authors' new IHD method is not appreciably more powerful than conventional QTL mapping methods. While this does not diminish the rigor or novelty of the authors findings, it does temper enthusiasm for the IHD method's potential to uncover new mutators in other populations or datasets. Further, adaptation of this methodology to other datasets, including human trios or multigenerational families, will require some modification, which could present a barrier to broader community uptake. Notably, BXD mice are (mostly) inbred, justifying the authors consideration of just two genotype states at each locus, but this decision prevents out-of-the-box application to outbred populations and human genomic datasets. Lastly, some details of the IHD method are not clearly spelled out in the paper. In particular, it is unclear whether differences in BXD strain relatedness due to the breeding epoch structure are fully accounted for in permutations. The method's name - inter-haplotype distance - is also somewhat misleading, as it seems to imply that de novo mutations are aggregated at the scale of sub-chromosomal haplotype blocks, rather than across the whole genome.
Nominating candidates within the chr6 mutator locus requires an approach for defining a credible interval and excluding/including specific genes within that interval as candidates. Sasani et al. delimit their focal window to 5Mb on either side of the SNP with the most extreme P-value in their IHD scan. This strategy suffers from several weaknesses. First, no justification for using 10 Mb window, as opposed to, e.g., a 5 Mb window or a window size delimited by a specific threshold of P-value drop, is given, rendering the approach rather ad hoc. Second, within their focal 10Mb window, the authors prioritize genes with annotated functions in DNA repair that harbor protein coding variants between the B6 and D2 founder strains. While the logic for focusing on known DNA repair genes is sensible, this locus also houses an appreciable number of genes that are not functionally annotated, but could, conceivably, perform relevant biological roles. These genes should not be excluded outright, especially if they are expressed in the germline. Further, the vast majority of functional SNPs are non-coding, (including the likely causal variant at the chr4 mutator previously identified in the BXD population). Thus, the author's decision to focus most heavily on coding variants is not well-justified. Sasani et al. dedicate considerable speculation in the manuscript to the likely identity of the causal variant, ultimately favoring the conclusion that the causal variant is a predicted deleterious missense variant in Mbd4. However, using a 5Mb window centered on the peak IHD scan SNP, rather than a 10Mb window, Mbd4 would be excluded. Further, SNP functional prediction accuracy is modest [e.g., PMID 28511696], and exclusion of the missense variant in Ogg1 due its benign prediction is potentially premature, especially given the wealth of functional data implicating Ogg1 in C>A mutations in house mice. Finally, the DNA repair gene closest to the peak IHD SNP is Rad18, which the authors largely exclude as a candidate.
Additionally, some claims in the paper are not well-supported by the author's data. For example, in the Discussion, the authors assert that "multiple mutator alleles have spontaneously arisen during the evolutionary history of inbred laboratory mice" and that "... mutational pressure can cause mutation rates to rise in just a few generations of relaxed selection in captivity". However, these statements are undercut by data in this paper and the authors' prior publication demonstrating that a number of candidate variants are segregating in natural mouse populations. These variants almost certainly did not emerge de novo in laboratory colonies, but were inherited from their wild mouse ancestors. Further, the wild mouse population genomic dataset used by the authors falls far short of comprehensively sampling wild mouse diversity; variants in laboratory populations could derive from unsampled wild populations.
Finally, the implications of a discovering a mutator whose expression is potentially conditional on the genotype at a second locus are not raised in the Discussion. While not a weakness per se, this omission is perceived to be a missed opportunity to emphasize what, to this reviewer, is one of the most exciting impacts of this work. The potential background dependence of mutator expression could partially shelter it from the action of selection, allowing the allele persist in populations. This finding bears on theoretical models of mutation rate evolution and may have important implications for efforts to map additional mutator loci. It seems unfortunate to not elevate these points.