Reviewer #2 (Public Review):
Summary and Strengths:
In this manuscript, the authors set out to determine the degree to which genetic variation among yeast strains and species influences gene expression during a large, genome-wide change in gene expression. While many studies have examined genetic influences on static expression levels, much less attention has been paid to dynamic responses. We know that different environmental conditions trigger different cellular states that engage different gene expression programs. We also know that DNA variation can have different effects in these cellular states. What we know much less about is how genetic influences shape the transition from one transcriptional state to the other.
The authors addressed this gap with a comprehensive genomics study. First, they quantified "allele specific expression" (ASE) in several diploid hybrids among strains of the yeast Saccharomyces cerevisiae as well as hybrids between S. cerevisiae and two sister species. RNA sequencing in such hybrids can distinguish RNA molecules produced from the two parental genomes. When there are cis-acting variants influencing a given gene, their effects become detectable as a difference between the expression levels of the two parental alleles.
The main innovation of this work is that the authors profiled ASE along a time series during a shift from fermentative yeast growth to respiration. During this shift, gene expression changes substantially. Using their time-resolved ASE profiling strategy, the authors were able to track when and how genetic differences influence these changes. This experimental design is a major strength of the paper. It is strengthened further by the inclusion of several hybrids and by dense temporal sampling. Overall, the authors succeeded in their goal of quantifying dynamic ASE.
Second, the authors used high-throughput reporter assays to study the effects of individual DNA variants in several hundred cis-regulatory elements (CREs). Interestingly, the CREs they studied were able to capture ASE dynamics at least in part, even though the reporter system was integrated into a common locus that probably differs from the chromatin state at the native genes. This use of a complementary genomics approach is another major strength of this paper.
One highlight result from the high-throughput assays is that many cis-regulatory elements contained multiple causal variants. Another thought-provoking result is that causal variants were neither more likely to occur at conserved nucleotides nor to cause more severe disruption of transcription factor binding sites than other variants. This result is somewhat counterintuitive given the well-established ability of conservation to mark functionally important nucleotides. As the authors state, this absence of evidence may be due to the fact that only a few handful of causal variants were found, limiting the statistical ability to detect more subtle differences in conservation or transcription factor binding sites. On the other hand, the results clearly show that there is no simple code for determining causal variants from available annotations. As the authors state, this is in line with earlier observations that much of cis-regulatory DNA variation could be evolutionarily neutral, perhaps because the effects it has on most genes are not large enough to matter for fitness. These two results are additional strengths of the paper.
Together, the paper contains an impressive amount of work. I greatly enjoyed the complementary use of ASE and reporter assays. The experiments seem to have been executed well, and are described succinctly and clearly. The paper is an interesting overview of the effects of cis-regulatory variants on dynamic gene expression change. Its main impact on the field lies in the clear demonstration that dynamic ASE exists, as well as its quantification.
Weaknesses:
First, the results in the first half of the paper are not overly surprising. They boil down to "genetic variation does influence expression dynamics". This is not unexpected, given genetic variation has been shown to influence just about any cellular process studied so far. As such, the paper essentially confirms the existence of a phenomenon whose existence was not really in doubt. Fortunately, the work into causal variants in the second half of the paper does provide additional insight.
Second, the results are somewhat descriptive. This is not uncommon for genomics work, but does leave the reader wondering how exactly a given variant may alter gene expression dynamics, especially if it neither occurs at a conserved site nor drastically changes transcription factor binding. I do understand that a deep dive into individual causal variants is outside of the already impressive scope of this paper. I nevertheless hope that one impact of this work will be future mechanistic studies of some of these variants.
Third, the statistical model to infer ASE strikes me as suboptimal (line 420). From how I understand the Methods section, allelic read counts are transformed to an allele frequency. This frequency is assumed to be 0.5 in the absence of ASE. ASE is then modeled as deviation from 0.5, using a linear model. This last point seems problematic. First, frequencies can only range from 0 and 1, whereas a basic linear model would be allowed to infer frequencies outside of this range. It is not clear to me that this model can properly capture the bounded nature of these data. Second, RNA-Seq data is count based, and transforming to an allele frequency loses information about the accuracy of each measurement. Specifically, genes with few reads have less power due to more stochastic counting noise. Third, the choice of weighting observation simply by the raw read counts (line 422) seems ad hoc and should be justified. More broadly, the authors could have opted for more established, count-based analysis strategies for ASE data, such as binomial tests or more advanced frameworks (e.g. beta-binomial tests as in https://www.biorxiv.org/content/10.1101/699074v2 ).
Fourth, there is only one biological replicate per hybrid, creating the risk that this one observation of the given time course may not be biologically representative. This also raises questions about how the linear model (see above) was fit without replicate data.
My final comments (these are not weaknesses but more discussion points) are about the analyses relating the number of sequence differences at a given gene to its strength of ASE (starting at line 120). The authors report significant associations, in line with previous studies. However, it is worth pointing out that this analysis makes an implicit assumption that there are multiple causal variants with effects in the same direction such that adding each variant would increase the ASE difference. The analyses cannot account for the case of multiple causal variants with effects in opposite directions. In this case, even a large number of variants could result in no net ASE. The authors' observation that the association between the number of variants and ASE is strongest for the most closely related strain pair (line 139) may be explained by this scenario. If there are many causal variants that cancel each other, having fewer variants in closely related strains reduces the opportunity for such cancellation. Given these considerations, it is actually somewhat surprising that there is any association between the number of variants at a gene and its ASE.
Along similar lines, the authors' point (line 226 and end of the Discussion) that inter-species chimeras should lie between the two parental species unless there are epistatic interactions misses the possibility that there could be multiple causal variants with effects in different directions. Additive combinations of these may well create phenotypes more extreme than the parents. For example, say the distal promoter of a given gene has accumulated five variants that all increase expression by the same amount x, and the proximal promoter has accumulated four variants that each decrease expression by the same amount x. The net difference between species would be an increase of one x. A chimera that only has the five distal variants would show a difference of 5x without needing to evoke epistasis.