214 Matching Annotations
  1. Nov 2022
    1. Overall, these patterns support the hypothesis that molecular model shifts are indicative of evolutionary shifts in life-history θ(t).

      I'm not necessarily suggesting you do this, since this paper already represents an impressive effort, but I wonder whether the use of phylogenetic path analysis (https://doi.org/10.7717%2Fpeerj.4718) might be informative here?

      Within each inferred molecular model shift, you could test alternative causal models depicting the relationships between rates of molecular evolution and these natural history traits, as you might anticipate that many of these covary, themselves being causally related to one another. Inference of shifts in causal relationships among natural history traits in each clade that experienced shifts in rates of molecular evolution could then provide some intuition as to why!

    2. Figure 1.

      How feasible would it be to infer these deviations from equilibrium base frequencies from simulated data? For instance, simulating trees/sequences under constant substitution rates (e.g. using AliSim - https://doi.org/10.1093/molbev/msac092) but under a model of mass (selective) extinction and subsequent early burst of diversification (e.g. TreeSim - https://doi.org/10.1093/sysbio/syr029)?

      I ask because I wonder whether simulating under the hypothesized diversification model and constant rates of molecular evolution could provide an estimate of the false-positive rate with which Janus might infer these shifts and whether false positives tend to be concentrated around the time of the simulated mass extinction?

      I realize this is probably a big ask though - the paper is already super impressive! I think this analysis could just be particularly compelling evidence if in fact this concentration of events is greater than expected.

    3. Fifteen molecular model shifts are identified on total-clades estimated to have originated near the K–Pg boundary [43-46] (Figure 1, Supplementary Figure 1, 7a-d).

      I'm curious - If you were to simulate sequences/trees under a model of mass-extinction and early burst but constant substitution rates, how often (if at all!) would Janus infer substitution rate shifts near the extinction event?

      Could the extinction-driven bottleneck and heightened species turnover in the subsequent radiation amplify apparent differences in nucleotide composition even with molecular rates remaining constant? If so, could it be that alternative diversification dynamics/models impact the inference of substitution rate shifts?

    4. Our analyses reveal well-supported shifts in estimated equilibrium base frequencies across exons, introns, untranslated regions (UTRs), and mitochondrial genomes. Remarkably, model shifts are mostly constrained to previously hypothesized clade originations associated with the K–Pg boundary.

      Is there any chance that this pattern could, at least in part be attributed to the relative paucity of branches in the tree prior to the K-Pg boundary?

      In other words, could the identification of molecular rate shifts by Janus be impacted by "sample size" differences of branches prior to and following mass extinctions as in this example?

  2. Sep 2022
    1. (A, B) The contribution of TEs and CDS to genome size variation across Heliconiinae, respectively.

      These two relationships should probably be assessed in a phylogenetic context, accounting for shared evolutionary history and non-independence among species (e.g. using phylogenetic least-squares regression - pgls). There is clear phylogenetic structuring in these two plots. For instance, Dione/Agraulis appear to have lower TE content for their genome sizes (shallower slope and intercept), whereas Erato seems to accumulate TEs more rapidly for their corresponding genome size as compared to other clades. Raw points and the original regression slope certainly provide taxonomic intuition, but to summarize the relationship across the entire clade, a regression slope as inferred by PGLS (for instance) should be reported.

    2. We illustrate how dense genomic sampling improves our resolution of gene-phenotype links, and our understanding of how genomes evolve.

      Really exciting work, and an immense amount of work - congratulations! This was a fun read!

      It seems that towards the conclusion of the paper, you outline how your data does not appear to be consistent with one of the prevailing hypothesized mechanisms for the digestion of pollen in Heliconius. I can't help but wonder if, using the data you present here, you might be able to point towards individual gene-families/orthogroups that are consistent with the evolution of this key-innovation?

    3. bar plots show total gene counts partitioned according to their orthology profiles, from Nymphalids to lineage-restricted and clade-specific genes.

      What does the distribution of mean per-species gene-count look like across all orthologs? Is there a small handful of very large orthogroups that are observed in all species and comprise the majority of the gene-counts shown here?

    4. From each OG in-paralogs were removed (custom python script RemoveInParalogFromTree.py available at https://francicco@bitbucket.org/ebablab/custum-scripts.git). If the procedure generated a single-copy OG (nscOGs) it was analysed by contrasting evolutionary pressures between Eueides and Heliconius species. The signature of selection (aBSREL) and relaxation (RELAX (114)) were performed as implemented in HyPhy.

      Could removal of in-paralogs influence/impact the signatures of selection that you recover using aBSREL and RELAX? It might be worth assessing whether these results are robust to the inclusion of in-paralogs.

      For instance, neofunctionalization of in-paralogs following duplication is likely to manifest signatures of positive selection (Fernández et al., 2020: https://doi.org/10.1093/molbev/msaa110, Wang et al., 2015: https://doi.org/10.1007/s11103-015-0285-2). Alternatively, pseudogenization of in-paralogs is expected to lead to a relaxation of selection on the duplicated gene-copy (e.g. Emerling 2018: https://doi.org/10.1016/j.ympev.2017.09.016; Calderoni et al., 2016: https://doi.org/10.1038/hdy.2016.59).

    5. All sequences from each of the four clades were realigned separately and a number of tests implemented in HyPhy (overall ω; SLAC; aBSREL; RELAX). In particular, the sign of diversifying positive selection (aBSREL) was detected by scanning all internal branches of the whole cocoonases phylogeny, correcting for multiple tests using a final P-value threshold of 0.05.

      If I'm reading this correctly, it seems that for each analysis, all branches were tested for signatures of selection (with the exception of SLAC which assumes selection is constant across the entire tree)? Although RELAX and aBSREL can be used to test each branch, each are significantly more powerful when used in a hypothesis-testing framework, assigning a subset of branches to the "background" (control) and the remainder to the "foreground" (treatment), and estimating parameters within each test-set. The per-branch tests are generally considered to be most useful in exploratory use-cases.

      I think this use-case scenario is ideally suited to the hypothesis testing framework of these two models. Specifically, to test the hypothesis that selection on cocoonases intensified (or was relaxed) in H. aoede as compared to the rest of the Heliconius butterflies, you could treat H. aoede as the foreground, and the rest of Heliconius as the background for both RELAX and aBSREL. It's unclear whether cocoonases in Eueides or other species within Heliconiini should be experiencing the same/similar selection regime as in H. aoede, so I would suggest excluding those species from this specific analysis.

    6. We next explored the relationship between transposable elements (TEs) and genome size, and their effect on gene architecture. We found that larger genomes tend to have a distribution of intron length skewed towards longer introns (fig. S16a), with a positive correlation between median intron length and total TE content (fig. S22; Pearson’s ρ=0.72; R2=0.51). Long introns also accumulate significantly more TEs then expected by their size (fig. S23; Wilcoxon rank-sum test P value = 2.13×10-13), with the effect of changing gene structure more than gene density (fig. S22 and fig. S24).

      It's difficult to contextualize these results without access to the supplementary materials (at the time I'm reading this), but it's unclear to me why the choice was made to subset introns into those that are short and long (as in Fig. 4A). I'm particularly wondering about the choice to divide introns as short/long at the median, which doesn't appear to sort them according to a natural break in the length-distribution. Dividing introns in this way, and then summing for each species and each 'TE size-class' runs the possible risk of exaggerating the observed difference in intron length and TE content.

      Additionally, the relationship between TE content and total intron length (regardless of how these traits are defined) should be analyzed in a phylogenetic context (e.g. using phylogenetic least-squares regression - pgls).

    7. Hemocyanins are also among several GFs with evidence of divergent selection regimes (ω) between Eueides and Heliconius, alongside Trypsins, Protein kinases, P450s, Sugar transporters, Ion and ABC transporters (Fig. 5B).

      What was the estimated ω for H. aoede at these gene-families as compared to those distributions estimated for Eueides and Heliconius? If involved in the evolution of pollen-feeding, I think that we would expect ω to be more similar between H. aoede and Eueides than between H. aoede and Heliconius?

    8. Curiously, a contraction in the Hemocyanin superfamily was only observed in H. aoede, the only Heliconius species not to feed on pollen in our data, marking hexamerins as a potential mechanistic link to the divergent strategies for nitrogen storage in pollen-feeding Heliconius (Fig. 5A).

      Interesting - this seems like a perfect complement for the analysis of Cocoonases later on! Would be very interesting to apply RELAX/aBSREL in the hypthesis-testing framework (i.e. foreground/background - discussed in later comment) to this gene family to test the hypothesis that this gene-family contraction in H. aoede is paired with an increased intensity of positive selection as compared to other Heliconius butterflies.

      What sre the members of the Hemocyanin superfamily that contracted within H. aoede? Are they the same (or do they include) the two Hemocyanins that were found to have previously undergone expansions?

    9. optimizing parameters (Romain Derelle, personal communication) for a more reliable list of single-copy orthologous groups (scOGs).

      How might this approach to identify orthologs, which is tailored to the identification of single-copy orthologs, impact downstream inferences about patterns of gene-family evolution? Put another way, if the approach described here is more prone to recovering scOGs (or low-copy number OGs), then it seems likely that fewer multi-copy number (particularly large/high-copy number) OGs will be identified as a result. If so, the rate estimates of gene-family expansion/contraction using CAFE seem likely to be downwardly/upwardly biased respectively, as the majority of gene-families under consideration are particularly conserved with respect to copy-number.

      I think this possibility would be worth exploring/addressing. For instance, if the implementation of broccoli is relaxed so as to be less tailored to the recovery of scOGs, does the shape of the gene-family expansion/contraction distribution fundamentally change?

    10. Ancestral state reconstruction of genome size was assessed using The maximum likelihood method implemented in the R package phytools (68).

      Under what model specifically was ancestral state reconstruction (ASR) conducted? The ML method in phytools can conduct ASR under three models (Brownian Motion, Ornstein–Uhlenbeck, and Early Burst). If not already done so, I might suggest fitting these alternative models of trait evolution prior to conducting ASR under the best-fit model.

      Given the taxonomic scale, it might also be worth considering including fitting some multi-regime BM/OU models as implemented in OUwie: https://doi.org/10.1111/j.1558-5646.2012.01619.x.