1,295 Matching Annotations
  1. Sep 2020
    1. Reviewer #1:

      In this manuscript, Nayler et al present a new protocol to generate cerebellar organoids that they differentiate from human iPSCs. Using this system and single-cell sequencing, the authors show that most major cerebellar cell types develop in these organoids. They also find that the micro-environment of the developing organoids changes growth dynamics and cellular differentiation, which motivated the authors to suggest that this organoid approach may be a good model for studying human cerebellar development and disease. The strength, and indeed the motivation, of this manuscript is the description of a novel model system with which to study multiple human cerebellar subtypes in an ex vivo system. In general, this work is a timely addition to several other recent studies on the transcriptomics of mouse cerebellar development, transcriptomics of human cerebellar development, and the use of hPSC derived Purkinje cells grown in co-cultures with mouse granule cells. The data in this manuscript are strong and likely of broad interest to the neuroscience community. However, below I outline several concerns that, if addressed, would help improve the clarity, readability, and impact of the manuscript:

      Comments:

      1) In the title, the authors state "...cerebellar organoids shows recapitulation of cerebellar development". Development in what? human? model systems? Some specificity will be needed in this title, especially since recent work from the Millen group has unveiled some specific differences between mouse and human cerebellar development.

      2) In the Abstract, the authors state "However, this was at the expense of reproducibility." What do you mean? There are issues with reproducibility? If yes, the authors need to provide a thorough discussion about this, as this issue would be essential for researchers to know about if they were to adopt this approach.

      3) Also in the Abstract, the authors state "...conditions, representing a more biologically relevant..." More biologically relevant than what? What about the counter argument that studying the cerebellum would be "more biologically relevant" in vivo in an animal model?

      4) In the Introduction, the authors state "Specifically, abnormal cerebellar development is an emerging theme contributing to many brain disorders (Sathyanesan et al., 2019)." Do you mean to many non-motor brain disorders?

      5) A couple of times in the Introduction the authors use the Manto et al. 2012 reference. This is in fact a very large online book consisting of several dozen chapters. Rather than using such a broad sweep approach, I would highly recommend using the primary original references for such key statements. It's also slightly misleading since Manto himself did not have any involvement in these developmental studies.

      6) The authors state that "Current models have mainly focussed on the differentiation of hPSC-derived Purkinje cells through co-culture with mouse cerebellar progenitors." Okay, but what is your argument against such methodology? Some context and motivation for this statement should be provided.

      7) In the Introduction, the authors frame their case by stating "As a proof of principle,..." But, what is this method proof of principle for?

      8) In the Introduction, the authors state "...we show perturbation analysis of the organoids..." Please state what the perturbation was, and what problem was this perturbation used to test?

      9) Based on the Introduction of the paper, it is very hard to see what motivates this work. Also, related, why focus on the basement membrane? What led to this? The authors need to provide a much stronger rationale for the study upfront, and in particular for the specific concepts that they tackle using their new approach.

      10) The authors state that induction of GBX2 was observed at the expense of the anterior marker OTX2. Apologies if I have missed it, but what was the experiment that shows directly in your organoids that OTX2 was initially high and then lowered due to GBX2?

      11) The authors state "...EBs to MG treatment, we encapsulated these at three different timepoints during differentiation..." What was the justification for picking these timepoints?

      12) The authors state "Overall, the relative effect of MG encapsulation resulted in distinct responses in the various cerebellar populations..." So, what does it mean that each cell type has a different response? Please expand on this.

      13) The authors state "using the murine cerebellum as a close developmental blueprint, most signatures indicate a mixture of mid-late embryonic temporal maturity, suggesting that the cerebellar organoids recapitulate developmental stages of the normally developing cerebellum. An exception to this was overlap of human GCs with murine GCs of postnatal maturity, suggesting that this cell type was more mature than its counterparts." AND "human PCs clustered more closely to murine progenitors and astroglia, suggesting that by day 90 organoid-derived PCs were still developmentally immature, compared with murine PCs. In further support of this, we did not detect appreciable levels of SHH."

      The sentences in this statement raise several questions. First, PCs normally develop before the GCs. Thus, the finding that PCs in the organoids are less mature than GCs is surprising and may even be concerning as it suggests that the organoids do not fully (or reliably) replicate the temporal order of normal cell development that is so characteristic for cerebellar development. Second, the relationship between PC SHH secretion and the responding GC is now well established and has been shown to be an important, if not essential, mechanism for GCs proliferation in vivo. It is therefore surprising that GCs form and proliferate in the organoid without proper SHH signaling. What may be the mechanism for this? The authors need to account for this issue and provide a discussion to address all of these points as well. Moreover, the authors should discuss how the maturation state of PCs in the organoids is different between this paper and the recently published Buchholz et al paper (2020 - DOI: 10.1073/pnas.2000102117).

      14) The authors argue about the cell structure and expression of cell markers in the organoids. However, based on what is shown, it is not clear how robust these features are in the organoids. The authors need to provide additional images of the organoids at much higher magnification in order to properly demonstrate cell structure and identity. In this regard, based on their argument, it would be important for authors to show the bipolar morphology of Calbindin-positive cells and excessive neural outgrowth at the periphery of the organoids (currently referred to in text as "data not shown"). Finally, it would be interesting to see whether different cell types are intermingling or spatially segregated in the organoids. That is, what does the cellular organization in the organoid actually look like?

      15) Along the same lines as above, it seems to me that the authors should present more details about the anatomical architecture of the organoids. One of the major arguments raised by the authors is that the organoids recapitulate many features of normal cerebellar development. Of course, the organoids likely don't show all the intricacies of in vivo cerebellar development, but given that the 3-dimensional assembly of the cerebellum is essential for all aspects of cellular and circuit formation, one needs to fully appreciate exactly what aspects of the cerebellum the organoid is able to reflect. Only then can one predict its full utility towards studying different aspects of development or disease.

      16) There are several cases that the authors state "data not shown". In every one of these cases the data seems essential to me and it should be presented in full.

      17) The authors use the fact that the cell types from the human organoids cluster with mouse cerebellar cell types as an argument that the human organoids have a good representation of the cerebellar cell types. But, the authors also go on to state that the human organoids are advantageous over model organisms because they may better model human genetic background. These two statements are contradictory, especially given the previous issue raised about the organoids not reproducing the temporal sequence of cellular development. Do the authors have additional data to support their statements about the biological relevance of their xeno-free conditions? For example, did they find any human specific genes or developmental pathways? The statements presented by the authors creates a circular argument that needs to be revised and/or supported by additional data. What would help is a much deeper comparison between the organoids, human cerebellar development, and mouse cerebellar development.

      18) What is the fold-change of RNA expression in figure 1 based on? What is the statistical test actually testing? What is the control that this fold-change is compared to?

      19) On the issue of statistics, the section describing the statistics in the methods is rather brief. It would help tremendously if the authors expanded this section by describing which test goes with which experiment and some level of justification for the use of the different statistical tests would be very useful as well.

      20) The authors use a lot of abbreviations. Some of these abbreviations hinder the readability of the text, which would be especially problematic for an audience not as closely acquainted with these terms. It may help to limit the use of abbreviations to cell-types and gene names. For example, Matrigel and embryonic bodies do not have to be abbreviated.

      21) The size of the text in all figures is too small, including gene names, axis labels, and legends.

      22) In the Discussion, the authors state "...this includes proximally located territories in which adjacent signalling is required for cerebellar maturation and development." I am not sure whether enough direct evidence is presented to make this conclusion. As commented on before, additional anatomy should be presented, and based on those data, inter-cellular signaling could then be examined with more confidence. Otherwise, the authors would have to tone down and/or revise this conclusion.

      23) The authors conclude that "hiPSC-derived organoid models offer unprecedented opportunities to model brain development and disorders and for therapeutic development..." I agree, but as a general comment, I found it very hard to know what exactly the authors are comparing in this paper. It appears that the comparisons are mainly to mouse development, although it seems that a more thorough and direct side by side comparison should be made. I suppose some kind of detailed developmental timeline-based model is warranted.

    2. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Based on several recent molecular studies, the strength of the current manuscript is the establishment of an organoid approach that could potentially add to our knowledge of normal and abnormal cerebellar development by providing a flexible technique with which to resolve cellular mechanisms. However, there was overall agreement that while the approach has promise, the data presented are lacking in terms of a concrete comparison to known milestones in cerebellar development (in animal models or human). Moreover, given the technical nature of the manuscript, it was deemed necessary that a more complete characterization of the organoid "anatomy" would be required in order to convince the reader of the claims. There was also a concern that the quantitative aspects and interpretation of the scRNA-seq experiments, particularly the characterization of the clusters obtained and the analysis performed to compare the human organoid data to the mouse developmental data, could have been carried out with greater depth.

    1. Reviewer #3:

      In this paper the authors show for the first time that optogenetic activation of the subthalamic nucleus (STN) is aversive and can drive avoidance behavior. This effect may be mediated by polysynaptic activation of the Lateral habenula, which they show is activated following optogenetic activation of the STN. They propose that the STN may excite glutamatergic neurons in the ventral pallidum that in turn project to and excite the lateral habenula. The authors do mention that other pathways may mediate the aversive effects but no other pathways are tested.

      Overall this paper presents a simple and clear demonstration that optogenetic activation of the subthalamic is aversive. It may be that this effect involves activation of the ventral pallidum and the lateral habenula but the evidence provided to support this possibility is weak and currently uncompelling.

      Major issues,

      -While it has not to my knowledge been reported that activation of the STN can drive aversive responses there are a number of lines of evidence that suggested it should be the case. None of these are mentioned in the paper and should be discussed. First the STN is part of the indirect pathway in the basal ganglia. Previous work has shown through optogenetic and other methods that the indirect pathway striatal neurons in the dorsal and ventral striatum can drive aversive responses and are involved in aversive learning (for a critical review that discusses this literature see Soares-Cunha et al., 2016). In line with this, recordings of the indirect pathway have also shown that this pathway is preferentially involved in processing aversive information, for example STN neurons are activated by nociceptive information and are needed for appropriate behavioural responses to nociceptive stimuli (Pautrat et al., 2018), STN neurons are also activated by aversive stimuli and by negative reward prediction error (Breysse et al., 2015). The paper needs to discuss their findings in the context of this and other previous work (these references are just examples and not an exhaustive list) that supports the role of the indirect pathway in processing aversive information.

      -Another topic that should be discussed is the heterogeneity of the STN. The authors themselves mention that the STN is composed of distinct spatio-molecular domains. This may well be relevant as rabies tracing from the EP neurons that project to the habenula and from the glutamatergic neurons in the ventral pallidum has revealed that they receive the majority of their input from the parasubthalamic nucleus and not from the core of the STN (Stephenson-Jones et al., 2016, Stephenson-Jones et al., 2020, Tooley et al., 2018). This raises the possibility that the aversive responses from the STN are primarily driven by neurons in the pSTN. The authors could test this point by restricting their ChR2 expression to one or the other region of the STN. At the moment all example images show that expression is in both the STN and pSTN. This possibility should be discussed.

      -The authors mention that they perform selective activation for the STN-VP pathway by stimulating the STN terminals in the VP. It is not clear that this will selectively activate this pathway. If the STN neurons that project to the VP also project to other areas then these will likely also be activated due to back propagating action potentials driven by the ChR2 stimulation. More work needs to be done to determine if the VP is really the pathway that mediates the aversive effect. Additional work including multi-colour retrograde tracing, selective inactivation of the VP projection while stimulating the STN or stimulating the STN fibers in the VP while inactivating the STN cell bodies would be needed to really determine if the VP is important for mediating the aversive effect. This may be beyond the scope of what the authors want to do but would be needed to support a claim that their evidence "provide strong support for a STN-VP-LHb is a pathway for aversion".

      -The title should not include the word encoded as there were no experiments performed in this paper that looked at any aspect of coding in the STN.

    2. Reviewer #2:

      In this manuscript Serra et al. demonstrate that stimulation of subthalamic nucleus (STN) neurons can drive place avoidance and delayed (presumably bisynaptic) excitation of lateral habenula (LHb) neurons. They also show that STN inputs to the ventral pallidum (VP) can drive place avoidance and excitation of VP neurons. While the potential role of a STN-VP-LHb of driving aversion and avoidance is intriguing, the manuscript leaves many open questions regarding the nature of STN's role in mediating aversion, as well as the circuit mechanisms governing STN-induced avoidance.

      Major Comments:

      1) STN in aversion: The manuscript addresses the role of the STN in mediating "aversion" in a very limited manner, despite the framing of the title ( "Aversion encoded in the subthalamic nucleus"). Based on the title I expected data showing that STN activity is correlated with the aversiveness of stimuli, or data showing that STN activity is required for aversion processing. Instead the authors show that STN stimulation can drive avoidance, which does not necessarily mean that this activity drives "aversion" per se. Data showing that STN represents the aversiveness of stimuli or that activity here is necessary for avoidance or other responses to aversive stimuli would strengthen the point. Currently the evidence for the statement made by the title is weak.

      2) Claims about the role of the STN->VP->LHb pathway in the abstract and elsewhere in the text: The authors demonstrate that activation of STN terminals in VP recapitulates their RTPP avoidance effects, but they do not directly demonstrate that these effects are mediated by downstream VP->LHb connectivity. They show that activation of STN terminals in VP results in excitation of VP units, but it remains unknown whether STN neurons specifically target/activate VP neurons that project to LHb, and/or whether they target VP glutamate neurons specifically (the primary cell type in the VP->LHb pathway that mediates aversion). The current data set does not demonstrate either that that a) STN-induced activity changes are LHb are predominantly mediated by VP (as opposed to EP or GP or other connections), or that b) avoidance elicited by STN->VP activation is mediated by LHb activity. Therefore, statements throughout the manuscript about the STN-VP-LHb circuit are not supported.

      3) Statistical analysis: The authors provide comprehensive statistical information for their behavioral experiments, but not for the electrophysiology. It appears that individual neurons were treated as independent measurements even when they were recorded from the same subject, though in some cases it is not clear how many mice were recorded from (e.g. 1G, 5D, 5E). If multiple measurements were taken from the same subjects, then this should be taken into account in the statistical analysis (such as by including subject as a random effect in an ANOVA or linear mixed model).

    3. Reviewer #1:

      In this study, Serra et al. attempted to study the circuit responsible for aversive behavior in mice. They had previously observed that subthalamic nucleus (STN) excitation induced aversive jumping behavior. The authors proposed that the indirect projection to the lateral habenula (LHb) via the ventral pallidum (VP) could be involved. They used Pitx2-Cre mice for STN-specific gene expression and performed real-time place preference paradigm (RT-PP) and elevated plus maze (EPM) as a means to study aversive behavior. Overall, the findings in this study are potentially important as they describe a previously unknown role of the STN, and its downstream targets, in aversive behavior. However, the authors have not convincingly demonstrated the pathway involved. The evidence so far is rather circumstantial and the arguments made were based entirely on gain-of-function experiments using ChR2. As outlined below here are a number of significant concerns that need to be addressed.

      Major:

      1) The authors should demonstrate the effectiveness and specificity of Pitx2-Cre in driving ChR2 expression. What was the cellular expression pattern within the STN? Did the authors observe ChR2 expression in 100% of STN neurons? Did it label any non-neuronal cells? Did the neighboring regions also express ChR2? According to Papathanou et al., that is likely to be the case. The authors should provide a more rigorous histological examination. Otherwise, a more in-depth discussion is needed to address how these concerns would confound the interpretation of results.

      2) It is interesting that Pitx2/ChR2-eYFP mice avoided STN-photostimulation by spending less time in the light-paired compartment. It should be discussed why not the compartment where STN is stimulated is not completely avoided.

      3) It is unclear if the mice jumped in this study, as the authors had previously observed. Was there any other movement-related behavioral changes?

      4) In Figure 1B, it seems like the entry into Compartments A and B of Pitx2/ChR2-eYFP on Day 5 and 6 is not very different. However, in Figure 1C, the representative heatmap shows a difference. In contrast, in Figure 1B, it seems like the entry into Compartments A and B of Pitx2/ChR2-eYFP on Day 9 is equal. Whereas in Figure 1C, the representative heatmap shows substantial entries. It would be helpful to have an explanation for the discrepancies.

      5) Assuming that the STN excitation duration is 10 seconds upon entry to the "Light" compartment, do the mice remain in the "Light" compartment? If the mice are only stimulated at the entry point of the "Light" compartment, do they just remain there and avoid exiting (as a means to avoid reentry)? As 10 seconds is a long time period for the mice to move around, is the stimulation continued if they then switch to the neutral compartment before the end of the 10-second stimulation period?

      6) It is not exactly clear the point of the first EPM experiment with 10 minutes of stimulation of STN neurons or their terminals. That is a very long time period; it is very likely plasticities were induced with such a paradigm and would confound the study.

      7) The STN-VP slice experiment does not really address any of the circuit questions they proposed to answer. The STN-VP connection is already known. It would be more interesting if the authors show the specific connection between STN and the glutamatergic VP neurons, as they speculate as the downstream target of the STN. This is an important point because of the complex cellular composition within the VP.

      8) It would be important to show that direct optogenetic stimulation of glutamatergic neurons within the VP produced the same phenotype. At the very least, the authors should locally infuse glutamatergic blockers into the VP to examine if the effects with STN stimulation can in fact be blocked.

      9) Both Figures 1 and 5 show a rather low density of STN fiber in the VP and they are restricted to about one-third of the VP. The involvement of the STN-VP circuit in mediating the observed behavior is less than convincing. On the other hand, there are no investigations of whether direct connections to other known targets are involved in the aversive response.

      10) All optogenetic interrogations were based on ChR2 stimulation. As antidromic spikes can propagate to other collateral branches in other synaptic targets of STN neurons (i.e., the GP, EP, and/or SNr), orthogonal approaches are needed to decisively show STN-VP circuit is involved.

      11) What is the latency of STN-driven spiking in LHb? The latency in the peri-stimulus time histogram Figure 4 looks too short to be a polysynaptic event. It also does not match up with that stated in the text (i.e., 10 ms, line 212). This is not a trivial matter as synaptic delays can provide important clues for whether mono- vs polysynaptic events are involved.

      12) In Figure 5E, DNQX and APV did not completely block the evoked currents. A more rigorous examination is needed if multiple neurotransmitters were released.

      13) As anxiety and aversive behaviors are often dichotomous between males and females, the authors should comment on whether there were any sex differences observed.

      14) Some of the sample sizes are very small (only 3-5).

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      While the demonstration that stimulation of subthalamic nucleus (STN) neurons produces avoidance is potentially interesting, the circuit basis of this effect was not well established. Specifically, the proposed functional connection of STN with lateral habenula through ventral pallidum was not clearly demonstrated and the STN stimulation findings on their own represent a more minor advance.

    1. Reviewer #2:

      The study provides evidence that an aphid effector Mp64 and a Phytophthora capsici effector CRN83_152 can both interact with the SIZ1 E3 SUMO-ligase. The authors further show that overexpression of Mp64 in Arabidopsis can enhance susceptibility to aphids and that a loss-of-function mutation in Arabidopsis SIZ1 or silencing of SIZ1 in N. benthamiana plants lead to increased resistance to aphids and P. capsici. On siz1 plants the aphids show altered feeding patterns on phloem, suggestive of increased phloem resistance. While the finding is potentially interesting, the experiments are preliminary and the main conclusions are not supported by the data.

      Specific comments:

      The suggestion that SIZ1 is a virulence target is an overstatement. Preferable would be knockouts of effector genes in the aphid or oomycete, but even with transgenic overexpression approaches, there are no direct data that the biological function of the effectors requires SIZ1. For example, is SIZ1 required for the enhanced susceptibility to aphid infestation seen when Mp64 is overexpressed? Or does overexpression of SIZ1 enhance Mp64-mediated susceptibility?

      What do the effectors do to SIZ1? Do they alter SUMO-ligase activity? Or are perhaps the effectors SUMOylated by SIZ1, changing effector activity?

      While stable transgenic Mp64 overexpressing lines in Arabidopsis showed increased susceptibility to aphids, transient overexpression of Mp64 in N. benthamiana plants did not affect P. capsici susceptibility. The authors conclude that while the aphid and P. capsici effectors both target SIZ1, their activities are distinct. However, not only is it difficult to compare transient expression experiments in N. benthamiana with stable transgenic Arabidopsis plants, but without knowing whether Mp64 has the same effects on SIZ1 in both systems, to claim a difference in activities remains speculative.

      The authors emphasize that the increased resistance to aphids and P. capsici in siz1 mutants or SIZ1 silenced plants are independent of SA. This seems to contradict the evidence from the NahG experiments. In Fig. 5B, the effects of siz1 are suppressed by NahG, indicating that the resistance seen in siz1 plants is completely dependent on SA. In Fig 5A, the effects of siz1 are not completely suppressed by NahG, but greatly attenuated. It has been shown before that SIZ1 acts only partly through SNC1, and the results from the double mutant analyses might simply indicate redundancy, also for the combinations with eds1 and pad4 mutants.

      How do NahG or Mp64 overexpression affect aphid phloem ingestion? Is it the opposite of the behavior on siz1 mutants?

    2. Reviewer #1:

      In this manuscript, the authors suggest that SIZ1, an E3 SUMO ligase, is the target of both an aphid effector (Mp64 form M. persicae) and an oomycete effector (CRN83_152 from Phytophthora capsica), based on interaction between SIZ1 and the two effectors in yeast, co-IP from plant cells and colocalization in the nucleus of plant cells. To support their proposal, the authors investigate the effects of SIZ1 inactivation on resistance to aphids and oomycetes in Arabidopsis and N. benthamiana. Surprisingly, resistance is enhanced, which would suggest that the two effectors increase SIZ1 activity.

      Unfortunately, not only do we not learn how the effectors might alter SIZ1 activity, there is also no formal demonstration that the effects of the effectors are mediated by SIZ1, such as investigating the effects of Mp64 overexpression in a siz1 mutant. We note, however, that even this experiment might not be entirely conclusive, since SIZ1 is known to regulate many processes, including immunity. Specifically, siz1 mutants present autoimmune phenotype, and general activation of immunity might be sufficient to attenuate the enhanced aphid susceptibility seen in Mp64 overexpressers.

      To demonstrate unambiguously that SIZ1 is a bona fide target of Mp64 and CRN83_152 would require assays that demonstrate either enhanced SIZ1 accumulation or altered SIZ1 activity in the presence of Mp64 and CRN83_152.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Detlef Weigel (Max Planck Institute for Developmental Biology) served as the Reviewing Editor.

      Summary:

      A major tenet of plant pathogen effector biology has been that effectors from very different pathogens converge on a small number of host targets with central roles in plant immunity. The current work reports that effectors from two very different pathogens, an insect and an oomycete, interact with the same plant protein, SIZ1, previously shown to have a role in plant immunity. Unfortunately, apart from some technical concerns regarding the strength of the data that the effectors and SIZ1 interact in plants, a major limitation of the work is that it is not demonstrated that the effectors alter SIZ1 activity in a meaningful way, nor that SIZ1 is specifically required for action of the effects.

    1. Reviewer #3:

      In their paper, Liutkute et al., use an elegant combination of force profile analysis (FPA) and photo-electron transfer (PET) experiments to probe the co-translational folding pathway of the N-terminal domain of the protein HemK. Over the past decades, it became increasingly clear that co-translational folding pursues different routes than those found in solution. Despite the fact that many proteins fold and unfolded many times during their lifespan after being released from the ribosome, the question of whether and how proteins fold during the process of translation is not only fundamental but also extremely difficult to access experimentally. Here, Liutkute et al. present a synergistic combination of two largely different methods to answer this question. By stalling a nascent polypeptide chain at different sequence positions and measuring the amount of full-length relative to arrested protein in a gel assay, the authors identified a sequential folding path in which the order of helix formation of the 5-helix NTD of HemK follows the order from N- to C-terminus. The authors interpret these results using the foldon concept from the Englander lab. Though the FPA is a rather qualitative experimental tool that measures the amount of molecules that crossed a certain force threshold, the analysis is striking. These experiments were complemented by PET-FCS experiments that were used to quantify the kinetic rates of conformational fluctuations of ribosome-stalled states of the protein. The conclusions drawn by the authors are that conformational fluctuations slow-down the further a protein is away from the ribosome exit tunnel. In my opinion, the work is a substantial step towards understanding the process of co-translational folding. The experiments are beautiful, well described, and the results are of clear interest to a broad readership.

      1) I would like to emphasize that care has to be taken when deducing the order of events from single time-point experiments such as FPA. The speed of translation compared to the folding speed is an important factor that eventually dictates the order at which certain structural elements will form. I admit, however, that the formation of helices, at least in solution, typically exceeds translation speeds by far, thus indicating that the identified intermediates will also form under conditions of continuous translation. Nevertheless, it would have been interesting if the authors could provide data or relevant publications about the folding speed of the HemK-NTD.

      2) The PET-FCS is indeed very appealing, however, I had some problems in understanding the actual procedure that was used for fitting. On p. 25, it is mentioned that the diffusion and triplet component based on the empirical fit with eq. 1 were subtracted from the data. Equation 1 would rather indicate that a separation of the dynamic components requires a division of the data by the relevant diffusion and triplet terms.

      3) I would call eq. 1 'empirical' rather than 'analytical'.

      4) On p. 25, the authors explain that the dynamic components of the FCS-curves were fitted using a sum of terms, one for each species. It would have been more explanatory if the authors would provide the actual equations that had been used for fitting. I would have guessed that the authors derive expressions for the correlation functions of the individual models, e.g., using the approach of Gopich & Szabo (see Eq. 1 in Gopich et al. (2009) J Chem Phys, 131, 095102), but the approach described in the methods sounds different.

      5) I was surprised that the two-step model can even provide negative, i.e., rising, amplitudes, which is very unusual for autocorrelation functions. This feature implies that the kinetic models have amplitudes that are decoupled from the actual kinetic rates. It would be great if the authors could clarify this point.

      6) I find the calculation of free energy barriers a bit overstretched given the complexity of the system. First, the pre-exponential factor of the Eyring equation (eq. 2) is only adequate for gas-phase reactions, particularly when assuming a transmission coefficient of 1. The appropriate pendant is Kramers equation. Clearly, the problem of defining the pre-exponential factor for folding reactions remains also with the Kramers expression. However, a large body of work has been dedicated to this problem over the past 20 years. It seems that a value of 1 μs-1 seems to be a good guess (see e.g. Schuler & Eaton (2008) Curr Opin Struct Biol, 18, 16). Clearly, there is no way to decide whether conformational fluctuations slow-down due to a decrease of the free energy barrier or due to a change in the pre-exponential factor.

    2. Reviewer #2:

      Liutkute and coworkers use a combination of arrest peptide assays and fluorescence correlation spectroscopy to investigate the folding of the HemK N-terminal domain. Previous work from the same group has shown that the domain rapidly forms compact structures co-translationally while still partially within the ribosome exit tunnel, limited by the rate of elongation. Data from the arrest peptide assay presented here suggest that, surprisingly, stably folded structures form as soon as the first of five helices in the domain has moved past the tunnel constriction. Several additional apparent folding events occur at longer chain lengths, suggesting discrete events of structure formation within the tunnel and near its vestibule. Experiments with a destabilized mutant (4xA) indicate that some of the folding events are dependent on formation of the hydrophobic core of the domain, suggesting that they depend on tertiary structure formation. PET-FCS experiments with HemK nascent chains reveal two interconverting states, compact (C) and dynamic (D). Both states are populated similarly regardless of chain length. However, the barrier between these states increases when the domain emerges from the ribosome. These experiments indicate a destabilizing effect of the ribosome on the nascent chain. Taken together, the experiments support earlier work that proposed a sequential co-translational folding mechanism for the HemK NTD, and provide rate constants for the dynamics at the earliest stages of nascent chain folding.

      The experiments appear very carefully designed and executed, and the data is of high quality. The PET-FCS measurements in particular provide valuable quantitative information about early nascent chain folding and should be of broad interest. While the results from arrest peptide experiments are intriguing, I have concerns about their interpretation, detailed below.

      Main point:

      The arrest peptide data is interpreted entirely in terms of a pulling force on the nascent chain, generated by folding. The conclusion that formation of just one (peak I) or two (peak II) alpha-helices inside the tunnel generate substantial mechanical forces is surprising, particularly given the presumed mechanism of arrest released mediated by force. How would a force be generated by a single alpha helix? It is easier to rationalize that forces acting on the arrest peptide are generated by stable tertiary structures. However, in that case, the 4xA mutant should show much lower arrest release in the region where full folding of the domain is expected (regions VII and VIII in Fig. 1), because the mutant is largely unfolded (see Holtkamp et al., Science 2015). This effect is not observed. Together, these considerations make we wonder whether alternative explanations for the observed release rates can be ruled out. For instance, could sequence-specific effects that are not related to folding of HemK, such as local interactions of the nascent peptide with the tunnel, cause the observed changes in arrest release rates? Alternatively, could local structure formation (of an alpha helix) in the tunnel cause arrest release that is not mediated by a pulling force?

      At a minimum, the authors should discuss how they envision single alpha helices to generate the forces necessary to accelerate arrest release (which have been estimated in the literature, e.g. in Goldman et al, Science 2015, and Kemp et al., PNAS 2020).

      In addition, two control experiments should be carried out: (1) An experiment demonstrating that a bona fide unstructured protein yields more or less constant arrest release rates over a range on nascent chain lengths. Perhaps a construct starting residue 73 of HemK could serve as a control. (2) An experiment with previously characterized folded domains (e.g. some of the spectrin constructs from Kemp et al, PNAS 2020; or some of the constructs from Farías-Rico e al., PNAS 2018) to establish the fraction of full length protein (f_FL) obtained with stably folded domains under the experimental conditions used in the present manuscript. How do the f_FL values for the HemK NTD compare to fully folded proteins under the conditions used here?

    3. Reviewer #1:

      This study by Liutkute et al. investigates the co-translational folding of a small alpha-helical domain from HemK. The study continues earlier studies by Rodnina and colleagues that showed using FRET and other measurements that HemK begins folding inside the ribosome exit tunnel and occurs sequentially as individual alpha-helical segments are able to be accommodated in the exit tunnel vestibule. Folding completes just outside the ribosome when the entire HemK domain is exposed. The current work extends these earlier studies using biochemical assays of "force" on the nascent chain and spectroscopic assays of intramolecular dynamics with an N-terminal fluorescent probe.

      The force assays illustrate that tension is seen as individual alpha helices move beyond the exit tunnel constriction, and at other previously documented steps of folding in the vestibule. These intra-ribosomal events are not impacted by a mutation that disrupts packing of the hydrophobic core. The fluorescence quenching dynamics show that the N-terminus is more dynamic inside the exit tunnel prior to folding and not dynamic after folding outside the tunnel. A detailed kinetic model of the fluorescence correlation data is provided to help explain the observations.

      Overall, the study provides a finer resolution view of the sequential co-translational folding of HemK. Although the broad concepts from the earlier studies are not changed by the current work, the study introduces analytical tools based on fluorescence quenching and FCS that may be useful to study the co-translational folding of other proteins.

      My primary suggestion is that the authors should be more explicit about what is being measured in the "force" sensor assay. SecM stalling relies on a specific secondary structure of the stalling sequence that causes an altered P site geometry that is unfavourable for peptide bond formation. Stalling will not occur if this altered geometry cannot be stabilized. Thus, what the authors refer to as 'force' is actually a constraint applied to the nascent chain to prevent SecM secondary structure formation. Thus, the folding is not generating force so much as constraining the nascent chain as a consequence of the ribosome exit tunnel geometry. It is a subtle, but I feel important, distinction to explain the assay. The reason is that such a constraint can actually be due to reasons other than folding. For example, an interaction between the nascent chain and the exit tunnel (or other proteins) could similarly constrain the nascent chain.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

    1. Reviewer #3:

      Summary:

      The authors report a between-subjects, double-blind psychopharmacological study on explore/exploit behavior in healthy human subjects. The authors used propranolol to block norepinephrine (NE), and amisulpride to block dopamine (DA), and compared to a group taking placebo. Using a 3-armed bandit task, coupled with computational modelling and pharmacological manipulation, the authors show that "tabula rasa" (or random exploration) is reduced when NE is blocked. This interpretation was supported by behavioral effects whereby subjects taking propranolol were significantly more consistent than other groups when facing identical choices, and chose the low-value option more often than the other groups. Blocking DA did not appear to affect any parameters. The computational model showed that the E-greedy parameter, which computes the proportion of time an entity makes a random selection, was most affected by the NE blockade. In addition, the modelling shows that some directed exploration (exploring lesser-known options) was also at play.

      General comments:

      The manuscript is well-written and the results are compelling. The findings are important to researchers particularly interested in the cognitive effects of catecholamines, and/or the explore/exploit dilemma. The results may not be that interesting to a broader readership.

      Criticisms:

      1) I do not really like the use of the term "tabula rasa" exploration, over "random" exploration. Using the term random exploration is just simpler, and clearer. The particular problem for me is that "tabula rasa" has the connotation that both the current "tabula rasa" choice and all future choices will not take into account information obtained before that choice. Random exploration is a better term because it is easy and intuitive to see that random choices can be sprinkled in with choices based on previous information, whereas tabula rasa implies wiping previous information away from that point forward. As best I can tell, previous related work has not termed the random exploration associated with the E-greedy parameter "tabula rasa". One consideration I am wrestling with is that apparently there is another parameter in one or more of the models that reflects random exploration (line 618, inverse temperature). This may be why the authors opted to call the E-greedy parameter something else. At the very least, I would like a better explanation of the choice of term (tabula rasa) as well as a thorough explanation of the difference between tabula rasa and random exploration. I recommend changing the term used as well, but am amenable to accepting an argument for keeping it.

      2) Line 162: "Reported findings were corrected for IQ (WASI)". How? It seems WASI was included as a covariate in the repeated-measures ANOVA, but it's not clear exactly what factors went into the ANOVA by the results reported lines 170-185. I recognize that often in higher-impact journals including a full description of the factors and levels of statistical tests is considered a tedious waste of space, but I feel that holds only in cases where the structure of the test is obvious. In my opinion, that is not the case here.

      3) Line 209-210: "the probability of choosing bandits with a lower expected value (here the low-value bandit, Fig 1e) will be higher. We investigated whether such behavioural signatures were increased in the long horizon condition (i.e. when exploration is useful), and we found a significant main effect of horizon (F(1, 54)=4.069, p=.049, η2=.07; Figure 3c)." Isn't this just evidence of general exploration, not specifically tabula rasa exploration? How does this test rule out, for example, directed exploration?

    2. Reviewer #2:

      In this study, Dubois and colleagues claim that noradrenaline promotes tabula-rasa in decision making during exploration, using a novel paradigm involving a short and a long horizon conditions, to elicit exploitation and exploration, respectively. The work tests different computational models and examined in particular supposedly less costly forms of exploration, that is 1) tabula-rasa, in which prior information is ignored and the same probability is assigned to all available options and 2) novelty exploration, in which information processing is biased toward choices that has not been encountered previously. They provide evidence that both of these processes coexist with more demanding exploration strategies. In addition, using a double-blind, placebo-controlled, drug study, they provided support for a role of noradrenaline in tabula-rasa exploration.

      This work extends previous work from the same group that aimed at solving the important question related to decision making and the neuromodulatory influences on these processes. The overall approach and the results are clearly presented. The extensive model comparison is particularly interesting to better approach this difficult question. The results are interesting and bring novel insights about the processes at play during exploration and the influence of neurotransmitters on these processes.

      1) Noradrenaline influence on tabula-rasa exploration:

      The authors claim that "Phasic noradrenaline is thought to act as a reset button, rendering an agent agnostic to all previously accumulated information, a de facto signature of tabula-rasa exploration." It might be interesting to discuss the results in terms of a potential impact of noradrenaline onto the subjective value of the choices. For instance, Rogers et al. (Psychopharmacology, 2004) suggest that propranolol affects the processing of possible losses in decision-making paradigms, and might also reduce the discrimination between the different levels of possible gains (Rogers et al. 2004). In another study, Sokol-Hessner et al. (Psychol Sci., 2015) also report a loss aversion reduction after propranolol administration. These effects might also change prior information and reset behavioral adaptation to look for new opportunities. In this latter study the authors also report a lack of effect of propranolol onto choice consistency, contrary to what the present study reports. I was also wondering how this new result about the effect of propranolol on decision making relates to previous findings from the same group (Hauser et al. 2019) where they described noradrenaline influence on information gathering and the urgency to decide. Finally, according to the network reset hypothesis, it has been indeed suggested that a change in the environment might enhance information gathering at the expense of prior expectations to produce an adaptive behavioral output. Perhaps the authors might avoid using the term 'agnostic', this might instead reflect a reduced influence of 'top-down' prior information, related to changes in subjective value of the different choices.

      2) Model selection:

      One strength of the paper is that the authors compared several computational models. The model selection is presented in Figure 4 and in Figure 4 - Figure supplement 1, the authors provide additional information regarding the winning model that accounted best for the largest number of subjects in comparison with two other models, namely the UCB model (with novelty and greedy parameters) or hybrid (with novelty and greedy parameters). It would be useful for the reader to get a better sense about the number of subjects which results favored any given model (i.e. a more exhaustive picture). One could use the same table as the one presented as in the Appendix Table 2 with the respective number of subjects for which the model achieved the best performance. In fact, as shown in Figure 4, the winning model does not look very different (at least visually) from other models such as UCB (with novelty and greedy parameters) or hybrid (with novelty parameter or novelty and greedy parameters) models. As such, I am wondering whether the conclusion about the 𝜖-greedy parameter would hold true if other model with similar performance were tested e.g. with UCB model (with novelty and greedy parameters) or hybrid (with novelty and greedy parameters)?

      3) The authors used propranolol (40mg), a non-selective β-adrenoceptor antagonist to reduce noradrenaline functioning. Previous studies have shown that it significantly decreased heart rate (e.g. Rogers et al., 2004). How that might relate to the reported results? In terms of NA influence and given the distributions of β receptors, could the authors be more explicit about the relation of their work with the potential mechanisms (e.g. Goldman-Rakic et al. J Neurosci. 1990 or Waterhouse et al., Journal of Pharmacology and Experimental, 1982).

      4) Could the authors clarify whether the PANAS questionnaire was administered to the participants prior to or after the drug treatment to understand if this group difference was a mere difference in groups or whether this was a consequence of the drug administration. It would be indeed interesting to have a measure of the drug effect on these parameters.

      5) The authors claim that: "Although tabula-rasa exploration can comprise influences of attentional lapses or impulsive motor responses, the difference between horizon conditions cancels them out". I would suggest to temper this claim as the effect might be more enduring in the long horizons' conditions. The authors might also want to look at RT variability in addition to RT means that did not differ between groups.

    3. Reviewer #1:

      Dubois and colleagues investigate how two modes of exploration - tabula-rasa and novelty-seeking - contribute to human choice behavior. They found that subjects used both tabula-rasa and novelty-seeking heuristics when the task conditions were in favor of exploration. Specifically, participants could, and had to, make more responses in the long-horizon condition, which favored exploration, compared to the short-horizon condition, which favored exploitation. Then the authors provide evidence that blockade of norepinephrine beta receptors leads to decreased tabula-rasa exploration and increased choice consistency whereas blockage of D2/D3 dopamine receptors had little effects. Novelty seeking was not affected by catecholaminergic drugs.

      The paper provides evidence on exploration-exploitation trade-offs from two different points of view. On the one hand, it addresses computational aspects of exploration by investigating how computationally intense forms of exploration might be supplemented by the usage of heuristic strategies. For doing so, the authors propose a novel task allowing them to disentangle these strategies and quantitatively assess their usage. On the other hand, the findings presented in the paper shed some novel light on neuropharmacological mechanisms underlying explorations. Some interpretations seem to go beyond the data and information is missing in the description of the results and the computational approaches used. In general though, the manuscript conveys the impression of a well-designed and carefully conducted study.

      Major points:

      General

      1) It is one thing to come up with computational terms and model-based quantities correlating with behavior but a different one to show their psychological meaning. Did the trials with tabula-rasa exploration or novelty exploration differ in terms of response times from the other types of responses? Did participants report that they indeed intended to explore in the tabula-rasa exploration trials?

      2) On a related note, how do the authors distinguish random (tabula-rasa) exploration from making a mistake? From how the task was designed, choosing the low value option appears to receive a more natural interpretation as a mistake rather than as exploration because this option was clearly dominated by the other options and remained so within and across trials.

      3) Previous research of the authors (Hauser et al., 2017, 2018, 2019) has associated beta receptor blockade with enhanced metacognition, decreased information gathering/increased commitment to an early decision (Hauser et al., 2018, JNS) and an arousal (i.e., reward)-induced boost of processing stimuli. Of course, it is possible that norepinephrine plays multiple roles, but it appears not exactly parsimonious to imbue it with a different role for each task tested. Are there some commonalities across these effects that could be explained with some common function(s)?

      4) Throughout, the paper implies that a beta blocker provides information about the function of norepinephrine in general. However, blocking beta receptors leaves synaptic norepinephrine to act on alpha receptors; accordingly, beta-blockers can be viewed as partial alpha agonists. Given that the function of these receptor families differs, more care should be taken when describing the nature of the intervention, labeling the groups and interpreting the effects.

      Introduction:

      5) As mentioned above, the paper investigates not only computational aspects of exploration but also the underlying neuropharmacological correlates. However, the introduction focuses mostly on different computational algorithms (which is in itself very helpful for the understanding of the paper!) while the neuropharmacological basis of explorative behavior is only briefly introduced. In the same regard, while some insights were given in the Discussion, it would be interesting to have a rationale for using amisulpride and propranolol already in the introduction.

      6) Relatedly, the introduction focuses on tabula-rasa and novelty strategies based on the argument that these are more computationally efficient. The authors may also want to motivate this with the perspective of neural constraints/brain process. Specifically, they argue that it may be computationally demanding to process the expected value (mean) and variance of choice options. However, computational efficiency has been put forward as an argument for why mean-variance-like signals are coded in the brain, particularly with multi-outcome options where expected utilities are difficult to compute (D'Acremont and Bossaerts, 2008). Thus, the computational efficiency argument at the moment seems insufficiently motivated.

      Materials and Methods:

      7) Successful performance of the task is based on the ability to discriminate between different reward types and select the one with the higher value. From the experimental design description, one can see that in order to do so, the subjects needed to distinguish between different apple sizes. In this regard, a question arises: how large was the difference between two adjacent apple sizes? Was it large enough so that after a visual inspection, the participant could easily understand that the apple size = 7 was less rewarding than the apple size = 8? Finally, since the task requires visual inspection of reward stimuli, was the subject vision somehow tested and did it differ between groups?

      8) The point of heuristics from a psychological perspective is that they dispense with the need to use full-blown algorithmic calculations. However, in the present models, the heuristics are only added on top of these calculations and the winning model includes Thompson exploration. Stand-alone heuristic models would do the term more justice and one wonders how well a model would fare that includes only tabula rasa exploration and novelty exploration.

      9) The simulations provide a nice intuition for understanding choice proportions from different models/strategies (Figures 1e and 1f). However, it would be helpful to provide simulated results for long and short horizons separately. Do the models make different predictions for the two horizons? Additionally, it would be helpful to also show the results from other models (i.e. the proportion of low value bandit chosen by novelty agent). These can be added in the supplement.

      10) One of the best-known effects of propranolol is to reduce heart rate. Did the authors measure heart rate and can they control for the possibility that peripheral effects of the drug explain the findings (and what was the reason for not collecting pupil diameter data, contrary to the previous research of the authors)?

      11) The long horizon condition appears to confound exploration with higher effort demands and longer delays to reward, at least in the early responses. If the authors cannot control for these they should mention them as limitations.

      12) Not only choice rules but also value functions seem to differ between Thompson and UCB (lines 583 and 593). This raises the question how well pharmacological effects on choice rules can be distinguished from effects on valuation and how confident we are that the observed effects indeed arise from changes in choice rules.

      Discussion:

      13) Line 410: The statement that memory is not at play in the present task because all information is always visible on the screen seems too strong. At least some exploration-relevant information, such as the overall distribution of outcomes across all options, is not presented and may be remembered differently by the different groups.

      D'Acremont M, Bossaerts P. Neurobiological studies of risk assessment: a comparison of expected utility and mean-variance approaches. Cogn Affect Behav Neurosci. 2008;8(4):363-374. doi:10.3758/CABN.8.4.363

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

    1. Reviewer #3:

      This is a comprehensive meta-analysis of empirical literature on sex differences in mammalian trait variability. The authors nicely articulate competing hypotheses: "estrus-mediated variability" (which predicts higher trait variability in females because they exhibit cyclic reproductive [estrous] hormone secretion that occurs over multi-day timescales) vs. "male variability hypothesis" (which predicts higher trait variability in males because they are the heterogametic sex). Several prior meta-analyses related to this have not provided support for the estrus-mediated variability hypothesis. The analysis performed here differs significantly from prior work in that the subjects were 27,147 mice from the International Mouse Phenotyping Consortium, which generated over 2x10^6 data points. Unlike other meta-analyses, the subjects of this analysis were therefore more systematically evaluated (9 WT strains across 11 labs). A total of 218 continuous traits were evaluated, grouped into 9 functional trait groups. Some traits were biased towards males and others towards females. There was no consistent pattern of greater variability in either sex. The results support a straightforward conclusion that neither hypothesis adequately explains patterns of trait variability. the discussion is a restrained defense of the practice of including females (please clarify that monitoring of estrous cycles was not performed in these studies so the females are classified as as "unstaged"); consequently females can be included in research studies without a default assumption that they are any more likely to introduce more variability than including males. The authors also apply their data on widespread differences in trait specific lnCVR values to the potential for phenotypic response to selection due to rapidly changing environmental events. The discussion is well written with the sections that are each meaningful. The web-based tool is a very helpful contribution. The discussion of statistical implications of the work (e.g., equalizing power and Type I consequences of unequal variance) is of significance to research on mammalian biology.

      1) The present work adds important new information to a growing literature (see for example Smarr BL, Rowland NE Zucker I. Male and female mice show equal variability in food intake across 4-day spans that encompass estrous cycles. PLoS One. 2019 Jul 15;14(7):e0218935) indicating that incorporation of unstaged female rodents in biomedical research does not increase variability compared to that generated by males; importantly, it also specifies several circumstances in which specific traits are more variable in one sex than the other.

      2) The statement on line 41-42 is a strong overgeneralization and should be tempered and/or clarified: "However, scientists in (bio-)medical fields have not traditionally regarded sex as a biological factor of intrinsic interest (2-7)." This is an overstatement. The study of sex differences and sexual differentiation in mammals (a class of animals of most direct relevance to biomedical research) has a long history, complete with dedicated journals (e.g. Biology of Sex Differences), learned societies, etc. Such an enduring interest in sex among biologists only makes the present work more interesting and important. This critique may be addressed with a more clear definition of "(bio-)medical", here, and throughout the manuscript.

      3) Colloquialisms such as "This is an important step, but we can go much further" (line 50) are vague and difficult for this reader to endorse as true, as written and we recommend deletion.

      4) In the Introduction, the authors delineate competing hypotheses: "estrus-mediated variability" vs. "male variability hypothesis". In their elaboration of the former hypothesis, the authors should clarify that the historical concern regarding decreased power and increased variability in females compared to males specifically regarded the inclusion of females that were not synchronized (or "staged") so as to be tested/treated on the same day/phase of the estrous cycle. Data from these so-called 'randomly cycling' females were predicted to be more variable than data from males. "Staged" females were presumed to be less variable, and the interventions and costs associated with the presumed need for staging are viewed as onerous. But a growing literature, including the important new results from the present study, argues that there is no empirical support for the contention that females generally are more variable than males across many traits.

      5) Methods: the data analysis pipeline is clear and rigorous. It should be stated that the data used come from unstaged females.

    2. Reviewer #2:

      Summary:

      There are significant methodology and interpretative concerns with this article. The analysis over stretches and does not consider the potential weaknesses. It needs to refocus on the primary question of whether there is a pattern in the sex's impact on the variance for these traits. The analysis then needs to go deeper and remove other sources of variance that could be confounding their findings.

      Major comments:

      Methodology

      1) The methodology is not clear.

      2) Meta-analysis is used when you don't have access to the raw data - why not use mixed effect regression models?

      3) The variance summary metric is calculated for an institute and strain for data collected in multiple batches, with potential baseline shifts as the data is collected across many years. This isn't a representative metric of variability for a sex as there are multiple sources of variance impacting this metric.

      4) Figure 3b and code: It is very rare for a fixed effect analysis to be justifiable. Why assume that there is no variation between the different traits when testing effect of sex? Normally you would explore sources of heterogeneity by meta regression rather than just assume it is sex differences.

      5) "A previous study found that the heterogametic sex was more variable in body size". If this holds, would not traits that are correlated with body weight also demonstrate the same finding?

      6) "minimum of 2 different institutes" is a very low N. Why would this give meaningful analysis? What was the minimum amount of data for a strain*centre for a trait to be included?

      7) Consider the recent discussions on phenotypic plasticity and the phenotypic interaction with the environment (https://www.nature.com/articles/s41583-020-0313-3 ). This suggests a fixed effect model is not appropriate. The results and approach need discussing in this context.

      Conclusion;

      1) It isn't made clear that this analysis is trying to assess the role of sex across strains and institutes.

      2) There are no discussions of the potential weakness of the analysis.

      3) Figure 3a

      • Why is there no discussion of measures of heterogeneity within the meta-analysis at the population level?

      • Should the differences in classification as male or female biased within functional group not be assessed by a fisher exact test and the p value adjusted for multiple testing before you state an area has a difference?

      4) Concern by "Notably most SD trait means also show the greater difference in trait variance" - seems to be an eyeball rather than a statistical analysis

      5) I have concerns on relating these results to power

      • These estimates are from an analysis across strains, batches and institutes looking at global behaviour in the traits. This absolute variance measure would be very different to that seen in a lab within a classic parallel group design study with one strain.

      • They advocate a factorial design but suggest the powering of the sexes independently. This feeds into the misconception that to study both sexes you have to double your sample size.

      6) The authors report that this analysis on mean differences was in accordance with previous studies. Not really. The differences will arise from the different approaches taken and highlights how this summary metric is losing sensitivity. The authors relate many of these changes to differences in body size. However, the earlier published analysis, adjusted for body weight.

      7) Why would the "difference in variability impact on the potential of each sex to respond to changes in specific environments"?

    3. Reviewer #1:

      This study looks at whether there are sex differences in the variability of traits in mice, via a meta-analysis of published datasets. The analyses show that females typically show greater variability in traits categorised as immunological, while males show greater variability in morphological traits. Traits related to the eye were also more variable in females. These findings are interpreted in light of evolutionary theory about greater between-individual variability in males, and greater within-individual variability in female mammals due to estrus. A handy online tool is provided to allow researchers to consider possible sex-specific variability in traits at the experimental design phase.

      I enjoyed the paper and thought the question and conclusions were interesting. The figures are great. I am not an expert in meta-analyses, so my comments mostly relate to the hypotheses and discussion of the results.

      1) The paper jumps about quite a bit between talking about sex differences relevant to mammals only and those that might apply to animals more generally. For example, the Introduction begins with reference to biomedical research (mammals) and the estrus hypothesis (mammals) but then introduces the "male variability" hypothesis by stating the "males are often the heterogametic sex". Given that the subject of your study is the mouse, I think it would be more logical to restrict the Introduction to mammals (i.e. explain the two hypothesis with respect to mammals). You could then include a section in the Discussion on if/why we might expect the same trends in other animals (see below also).

      2) I feel that the rationale behind the two hypotheses (female estrus and male variability) could be explained better in the Introduction. i.e. WHY estrus might produce higher variability in females and WHY stronger sexual selection or male heterogamety might produce greater male variability. A few extra sentences on each would probably be enough. At the same time, I think it would be worth clarifying a priori the extent to which these hypotheses are expected to apply to different traits. Some predictions are given only in the Discussion (e.g. estrus expected to mostly affect immune response and physiology).

      3) The Discussion on eco-evolutionary implications (line 184) would be greatly strengthened if it included at least one specific example of how sex-specific differences in trait variability might affect the evolutionary trajectory of a population. At present, one very general hypothetical is given, but I did not find it easy to follow (disease/climate change kills more of one sex than the other --> sex ratio of the population is skewed (temporarily?) --> mating system is "influenced" --> "downstream effects on population dynamics"). It is also stated that "modelling sex difference in trait variability could lead to different conclusions compared to existing models (cf 44)". The cited study there is on Eurasian sparrowhawks. I'm not familiar with this sparrowhawk study, but perhaps it is a suitable one to highlight in more detail as a clear example? What sort of different conclusions would be expected? It's fantastic that your paper is aiming to speak to a broad range of biologists, but I think that greater clarity in this section is needed to make ecologists and evolutionary biologists really take notice.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers agreed that the topic of the study was an interesting one, and that the issue of sex differences in trait variability is relevant to good experimental design. As you'll see below, however, Reviewer #2 felt that the current analytical treatment of this mouse dataset is not appropriate to the question. Of particular concern is that sources of variability other than sex were not adequately considered.

    1. Reviewer #3:

      Saderi and colleagues study the effects of arousal and task engagement on sound responses in the (primary) auditory cortex and inferior colliculus of ferrets. Arousal is measured by pupillometry, task engagement by contrasting an auditory detection task with passive sound exposure, and effects are quantitatively dissociated using a general linear model and multiple regression. The authors find that the sound responses of about half of the recorded neurons are modulated by task engagement and/or by arousal, with IC neurons most frequently modulated by arousal and AC neurons modulated by both factors. Increased arousal was associated with enhanced sound responses. In AC, task engagement was associated both with enhanced and suppressed sound responses. In IC, task engagement was associated with suppressed sound responses.

      Major comments:

      1) Some of the main conclusions of the results from AC are not novel. Using a different experimental approach, the study of Knyazeva et al., 2020, Front Neurosci. 14: 306 already suggested that the discharges of many neurons in AC are affected by arousal, that task effects can disappear if effects of arousal have been accounted for, and that there is no systematic difference in response modulation between neurons tuned, or not tuned, to task-relevant sounds. Dissociations of the effects of different non-auditory factors on sound responses in AC have also been described by Zhou et al., 2014, Nat Neurosci. 17:841-850 and by Carcea et al., 2017, Nat Comm. 8:14412.

      2) The study is based on a relatively small number of neurons and behavioral sessions, potentially reducing the strength of the statistical inference, e. g., that IC was more strongly affected by arousal than AC. It appears that data from about 20 behavioral sessions entered analyses. This estimate is based on the information that 1-3 behavioral blocks were tested during individual sessions (line 611) and that Figure 1F shows the results of about 36 active-passive comparisons in four animals. This indicates that, on average, about 10 neurons were simultaneously recorded in individual sessions. Therefore these neurons were statistically more dependent than neurons recorded in different sessions. This needs to be considered for potentially global effects such as arousal and task engagement. The authors should include this information, together with the number of trials in active and passive blocks and whether the responses to different TORCs were averaged.

      3) The authors did not distinguish single unit and multiunit data. This difference should be considered in detail because it could affect the interpretation of whether there are units that are affected both by arousal and task engagement.

      4) The authors should include a statement that the results on the effects of task engagement may not apply to all types of auditory tasks. This is highly important because the authors used an auditory detection task, which is a task that may not require AC at all.

    2. Reviewer #2:

      Main Review:

      Saderi and her colleagues have performed a cool study that attempts to determine whether and how two behavior-related variables - arousal and task engagement - differently influence activity in two stages of the auditory neuraxis, IC and A1. They define arousal as pupil diameter and task-engagement as a binary variable determined by the experimental block design. They find that although these two parameters often co-vary, they sometimes do not. They find that IC was more influenced by arousal and A1 was modulated by both arousal and engagement. One of their main findings is that previous reports of task-engagement effects may in fact be attributed to arousal state.

      This is a nice quantification of neural activity and behavior. My major concerns are all thematically linked and they stem from the use of a continuous readout of arousal (i.e. pupil diameter) but a binary readout of task-engagement (i.e. the block the animal is in at any moment). Relatedly, I am interested in knowing whether neural effects can be accounted for by the animals from which they were recorded (and from that particular animal's behavior). I expect that my enthusiasm for this paper will not be diminished in any way regardless of any changes that come out of the deeper analyses outlined below. Also, I do not intend that responses to these concerns will require any new experimentation.

      Major concerns:

      1) Can task engagement be explained more rigorously as a continuous rather than binary variable? In my experience training and testing animals on appetitive behaviors, task engagement can wax and wane within a single block, across an experimental recording session, or across days of behavioral testing. Such changes in engagement can be inferred, for example, as strings of (seemingly) easy trials in which the animal does not answer correctly. The authors should attempt to quantify through behavioral analysis (running lapse rate, lick latency, etc) whether and how task engagement may be changing within and across task blocks. Alternatively, the authors could clearly explain that their binary encoding of engagement has limitations and may not actually describe the animal's engagement at any given moment.

      2) Can a continuous readout of task engagement better explain neural activity? For many neurons, task-engagement does not provide unique predictive information, yet for others it does (e.g. Fig. 3C). If task engagement can be modeled as a continuous rather than binary variable, is it still true that "some apparent effects of task engagement should in fact be attributed to fluctuations in arousal" (Abstract)? In general, I worry that the current analysis is effectively a floor on task-related modulations since it assumes constant engagement throughout a task block.

      3) Can neural heterogeneity be attributed to animal-to-animal behavioral variability? Even if task engagement does not vary within a task block for any one animal, it may indeed vary across animals. In theory, the actual task engagement of some animals might more closely mirror the block design that the experimenters are imposing, and some animals may simply have a higher level of engagement than others. This could mean that some results that are currently attributed to population-level heterogeneity (e.g. some A1 neurons do this, while others do that) might actually be attributed to animal-to-animal heterogeneity as opposed to distinct neural populations. For example, the authors state that for a subset of neurons, persistent task-like activity after a block change can be accounted for by pupil, whereas for other neurons this effect cannot (Fig. 7, line 452). The authors should confirm that key findings are consistent across animals and not related to degrees of task engagement (see point #1). If the findings are not consistent across animals but can be explained by each animals' unique behavior, this would also be really cool.

    3. Reviewer #1:

      This study distinguishes effects of generalized arousal and specific task engagement on the activity of neurons in the inferior colliculus and auditory cortex of ferrets as they engaged in a tone detection task, while monitoring arousal via pupillometry. The authors found that arousal effects were more prominent in IC, while arousal and engagement effects were equally likely in A1. Task engagement was correlated with increased arousal. They propose that there is a hierarchy such that generalized arousal enhances activity in the midbrain, and task engagement effects are more prominent in cortex. I have no major concerns, but two points to consider:

      I would like to know how the model would perform if task engagement were modeled as a continuous regressor.

      The authors state that they separated single units and stable multi units from the electrode signal, but I do not see where these data are separately reported.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Saderi and her colleagues have attempted to determine whether and how two behavior-related variables - arousal and task engagement - differently influence activity in two stages of the auditory neuraxis, IC and A1. They define arousal as pupil diameter and task-engagement as a binary variable determined by the experimental block design. They find that although these two parameters often co-vary, they sometimes do not. They find that IC was more influenced by arousal and A1 was modulated by both arousal and engagement. One of their main findings is that previous reports of task-engagement effects may in fact be attributed to arousal state.

    1. Reviewer #3:

      The role of histone chaperone Hira during the formation of paternal pronucleus has been well documented in both mouse (Lin et al, 2014; Inoue and Zhang, 2014; Nashun et al, 2015), and in Drosophila (Loppin et al, 2005). The histone chaperone Hira is known to act in a protein complex with Ubn and Cabin 1 (Tang et al, 2012). The authors built on their previous findings (Lin et al, 2014) and assessed the effect of the Ubn and Cabin 1 oocyte deletion during the fertilisation. Not surprisingly, the observed phenotypes more or less recapitulated the observation made using Hira deletion. In this sense, the findings are not novel. It has also been previously shown that deletion of Hira leads to the removal of the whole complex (Nashun et al, 2015).

      The authors add some potentially interesting observations using 1PN (aberrant) human zygotes. Although the observed lack of Hira complex components in these zygotes could be interesting, the causality is not established.

      Beyond the statements above, there are major issues that would need to be addressed:

      1) Validation and characterisation of the ko/kd models: Ubn1 knockdown using morpholinos: Fig S1C - lots of protein remains present in the nucleus, Hira Zp3Cre driven oocyte specific knockout - how much Hira protein is left in the zygote?

      2) H3.3 staining to document the deletion of the complex: Figs1E - not obvious what the authors are trying to say here? How is H3.3 signal quantified? Only paternal signal should be affected by the KO ?? The same is true for Fig2D - no signal is obvious even in the control.

      3) Presence of Cabin1 in the zygote - pre-extraction needs to be carried out (Fig 2C)

      4) Fig S2: overexpression of Hira : is there a significant difference between the Hira signal in control (het) and KO zygote?? It does not appear so, which undermines the whole knockout study. The same is true for the quantification of H3.3 . What should the quantification of GFP signal demonstrate?

      5.) The authors say that they developed a conditional KO for Hira in the main text. But they haven't verified the Hira deletion after Cre expression (by IF or PCR)

      6) "Data not shown" in the text. The authors say that their new hiraF/F, zp3 females are sterile but they don't show it.

      7) The authors never show anti-ubn1, cabin1 staining on HiraKO.

      8) Language: the text needs editing. There a number of statements that are wrong: Hira (or any other component of the complex) does not incorporate into chromatin - the complex associates with chromatin to incorporate histones (there are several other examples of similar statements).

    2. Reviewer #2:

      A high proportion of in vitro fertilized eggs yield zygotes with 1 pronucleus (1PN) instead of the normal 2PN. The authors previously showed that maternal Hira is important for H3.3 deposition on the male pronucleus; and that the loss of Hira leads to a high proportion of 1PN zygotes.

      In this manuscript, the increase in 1PN zygotes after fertilization was confirmed following deletion of Hira in mouse oocytes. The effect could be rescued upon microinjection of Hira RNA. The authors also depleted the other Hira protein complex subunits, Cabin1 and ubinuclein-1. The 1PN phenotype was again seen. Human 1PN zygotes were finally shown to lack HIRA on the abnormal pronucleus.

      This is an interesting observation that is definitely worth the investigation. The lack of HIRA components on the abberant pronucleus in 1PN human zygotes is an important find. However, because the authors had already shown that the loss of Hira correlates with a high proportion of 1PN in mice, the experiments (though respectable) provide limited novelty as is.

      Main concern:

      • Unless there are reasons to believe that there are Hira-independent Cabin1 and ubinuclein-1 functions in oocytes, their depletion only serves to confirm the role of Hira and its relation to the 1PN phenotype. The rescue experiment and human data is important, but again serves as confirmation on the role of HIRA without further mechanistic insights.

      Perhaps novelty could come through a deeper exploration on Hira levels in oocytes and what differentiates 'poor quality' oocytes that lead to 1PN from normal ones. For example, does maternal Hira RNA and protein levels increase with maturation? Are HIRA levels lower in poor quality oocytes? Is there a step in the IVF procedure that affects Hira levels and/or changes on the paternal chromatin?

    3. Reviewer #1:

      This study establishes the role of additional members of the Histone chaperone HIRA complex in male pronucleus formation in mouse. Genetic inactivation of maternal Ubn1 and Cabin1 affects histone deposition following protamine removal on the fertilizing sperm nucleus in a way similar to maternal Hira mutants. However, the study does not provide new insights about the way these factors function or cooperate during paternal chromatin assembly. Analysis of aberrant human zygotes revealed a correlation between the lack of male pronucleus and the absence of maternal HIRA. Although the data are generally convincing, the manuscript does not sufficiently acknowledge earlier work. Notably, the rescue experiment which is presented as a "proof of principle" for future human therapy is not entirely original.

      Substantive concerns:

      1) The authors present the (partial) rescue experiment of maternal Hira KO (oocyte injection of Hira mRNA) as an original experiment that serves as a proof of principle for future therapy. However, a very similar rescue experiment of Hira KD oocytes was successfully performed by Inoue & Zhang, NSMB, 2014, a work that is not cited in the manuscript.

      2) The authors used PLA to detect interactions between the Hira complex proteins in mouse zygotes. However, it is not clear from the images in Fig. 1C how the specific interactions are actually appreciated. The foci seem to be everywhere and not particularly in the male pronucleus shown in the insets.

      3) The occurrence of 1 PN human zygotes is intriguing but the origin of this defect is unknown. It could reflect a more general problem than the sole lack of Hira expression. In this context, overcoming male pronuclear formation by re-expression of Hira seems to represent a hazardous therapeutic strategy.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All three reviewers agree on the fact that the study, although interesting, does not appear sufficiently novel regarding the already established role of Hira complex in sperm chromatin remodeling in mouse and other animals. In addition, although the reviewers were intrigued by the observation that 1PN human zygotes lack HIRA, the origin and timing of this defective expression are not established. The reviewers share the feeling that these experiments do not really bring novel insights about the regulation of HIRA levels in mammalian oocytes.

    1. Reviewer #3:

      The manuscript by Vera et al. reports on cohesin-dockerin interaction studies of cellulosomal subunits using mainly single-molecule FRET, but also molecular dynamics simulations and NMR measurements. The authors study a range of cohesin-dockerin pairs and discover a varying distribution of two alternative binding modes that apparently follows a built-in cohesin-dockerin code. Finally, the authors show that prolyl isomerase activity can regulate kinetics towards equilibrium/steady state as well as distribution of the binding modes. The results are important for understanding the mechanistic basis cellulosome function.

      In my opinion, this is an important paper, which provides new interesting insight into cellulosome function. The single-molecule FRET and molecular dynamics parts of the study are well designed, the corresponding experiments are thoroughly performed, and data are carefully analysed. The manuscript is also very well written. However, there are several issues that need to be addressed:

      1) The authors claim to have uncovered a built-in cohesin-dockerin code. However, the principles of the code remain elusive. For example, what is the relationship between the Pro66 cis/trans conformation and the binding mode? What needs to be known to predict the dockerin binding mode? This point should be elaborated in the manuscript.

      2) The conclusion that prolyl isomerase activity is able to change the distribution of binding modes requires more consideration and/or research. First, it seems from Figure 6A that the expected steady-state B1 fraction of c1C-CcCel5A and c1C-CcCel5A+prolyl isomerase could be the same within error ranges. Second, it is unlikely that the enzyme will change the equilibrium ratio of Pro66 cis/trans conformation that is controlled by thermodynamics. Therefore, the prolyl isomerase activity may only be relevant in case of slow re-equilibration kinetics.

      3) NMR measurements were performed in order to check if the dockerin ́s Leu65 - Pro66 peptide bond is in the cis conformation in the cohesin-dockerin complex. The authors found very similar dockerin chemical shifts in the absence or presence of 1.3 equivalents of cohesin suggesting that the binding does not significantly alter the conformation. However, this is an indirect measurement, although NMR also allows direct determination of Pro cis/trans conformation (based on 13C chemical shifts and NOE patterns, e.g. see https://doi.org/10.1107/S1744309110005890 ). The authors should check if direct determination of the cis conformation is possible in their case. Also, peak doubling in the 15N-1H HSQC spectrum should be checked, which is an indication of Pro cis/trans equilibria.

      4) Furthermore, a direct measurement of the Pro66 cis/trans ratio for two cohesin-dockerin pairs that show distinct B1/B2 preferences would be useful to clarify the role of Pro66.

    2. Reviewer #2:

      By analyzing the formation of a series of dockerin-cohesin complexes from the cellulosome of two species of the Clostridium bacteria using smFRET experiments and other techniques, the authors conclude that the overall equilibrium between the two binding modes of the complex can be allosterically regulated by the enzymatic isomerization of dockerin's proline 66, which is part of a structural clasp between the N and C terminus of the protein. They speculate that a mechanism of enzymatically or environmentally driven clasp de/stabilization may be present in other dockerin-cohesin complexes, as well, and may provide the cellulosome with the required plasticity to carry out its function.

      In large part the work is clearly written and the claims seems to be supported by the data provided, however there are few issues that the authors should address:

      1) The computer simulations presented in the manuscript are not described very clearly. For example on page 19 regarding the foldX MC method: the author identifies two variables to describe the binding: an "axis" Z and a rotation angle phi. An axis, however, is defined by three coordinates, while the authors always associate a single number to Z. The reader has to guess that the axis is the axis of symmetry of the two binding modes and Z is only an offset along the axis. Similarly in eq. (4) the authors associate the sum over the conformations indexed by i to an average (first line page 20) but in reality that sum and the others that appear in the argument of the logarithm of equation 4 are an estimate of the partition function of the system.

      2) The computer simulations of the complex do not seem to add significant information to the overall message of the manuscript: the rigid-body coarse grained approach does not allow to distinguish allosteric effects as the authors already admit, while the FoldX approach provides only very large errors. Most probably, given the presence of well defined crystallographic structures for some of the complexes, simple free-energy estimation techniques (i.e. metadynamics, steered MD etc.) based on classical atomistic molecular dynamics simulations (with limited homology modelling for the mutants) would have provided more accurate results. The authors should explain why they did not consider this approach.

      3) The data about the time dependency of the FRET signal in C. cellulolyticum are a bit worrying. The authors should dissect them more carefully, possibly adding additional control experiments to exclude artifacts (whose possible presence is also admitted by the authors in the caption of Fig.6 figure supplement 3). Then, if the process is confirmed, they should really try and identify the underlying process in a more precise way.

      4) Fig 5C and Fig 5F show two different curves for the same data. Similarly Figure 6 figure supplement 4 C shows two different histograms for the same complex. If this is the result of repeated experiments, the authors should make an effort and report histograms with error bars. Visual comparison of histograms which have a large intrinsic variability may be misleading.

      5) A picture showing a model of the molecular structure of the dyes attached to the molecular structure of the proteins would be very useful to to understand the relative size of the objects.

    3. Reviewer #1:

      Vera et al. report the detection of binding and quantification of populations of two different orientations of assembly of dockerin and cohesin, which define structural organization and plasticity of bacterial cellulosome multi-enzyme complexes. The authors apply smFRET spectroscopy in in-vitro experiments carried out on isolated, modified domains. Vera et al. find uneven distributions of populations of the protein in the two modes of binding. Vera et al. investigate the molecular origins of the observed bias by studying homologous sequences obtained from various organisms, by mutagenesis and by domain-swap experiments. The authors complement experimental studies by Monte Carlo and molecular dynamics simulations. The authors arrive at the conclusion to having identified a cohesion-dockerin "code" of binding and a novel allosteric control mechanism involving cis/trans isomerization of a C-terminal proline residue in dockerin.

      Structural plasticity of the cellulosome induced by variable assembly of the cohesion-dockerin adapter, facilitated by rotational symmetry of the two-helical binding interface, is an interesting biological phenomenon. The dual binding mode is already reported in the literature (refs. 23, 24, Wojciechowsky et al. Sci Rep 2018, 8:5051), somewhat limiting the novelty of findings. But forces and mechanisms that drive the orientations are not yet understood. The authors successfully developed a smFRET assay that can distinguish the two binding modes of the cohesion-dockerin interaction and that can measure the respective populations in vitro. Their homology, mutagenesis and domain-swap experiments show that specific interactions within the binding interface are not responsible for modulation of orientation. Instead, they show that interactions of a C-terminal proline can modulate binding. However, the relevance of findings for the in-vivo situation appear unclear.

      I have the following concerns:

      1) The authors' smFRET assay clearly distinguishes the two binding modes B1 and B2. A key element of their work, which goes beyond state of the art, is the quantification of populations estimated from integrals of smFRET histograms and PDA. Their FRET analysis presumes that photophysics or quantum yields of donor/acceptor fluorophores are independent on orientation of binding. But the protein micro-environment at the positions of the labels close to the binding interface may change in B1 and B2 orientation, which may modulate photophysics and thus FRET. This would, in turn, lead to errors in estimation of populations. The authors could test for such effects by measuring fluorescence of donor-only and acceptor-only constructs in B1 and B2 orientations.

      2) From their study of homologous sequences, mutagenesis experiments and swap of helix 1 and 2 of dockerin, the authors provide a solid body of data that shows that specific interactions within the binding interface are not responsible for the swap of binding mode. Instead, their results show that interactions of a C-terminal proline can modulate binding through an elusive mechanism. Proline mutagenesis experiments and enzymatic cis/trans isomerization show significant effects. But the relevance of a prolyl isomerase for the modulation of the dockerin-cohesin interaction in vivo remains speculation. The conclusion calls for additional experiments where, e.g., changes in catalytic activity of cellulosomes are measured upon application of a prolyl isomerase. Alternatively, the packing of enzyme subunits in the dense cellulosome may be responsible for alternate binding. Such protein-protein interactions may also modulate a proline interaction.

      3) An allosteric mechanism of the proline interaction in modulating binding, as proposed in this work, is not sufficiently supported by the data presented. The flexibility of the C-terminal tail of dockerin, which hosts the proline, and its close proximity to the cohesin binding interface, evident in structures (please provide PDB IDs in Fig. 1), may allow a direct interaction of the proline with cohesin.

      4) The impact of the intrachain proline/tyrosine interaction on binding, however, identified by the authors, is very interesting. This finding calls for further investigations on mechanistic details. Here high-resolution techniques, like NMR, which can provide atomic details of protein structure and dynamics, are desirable. Such experiments could help to identify potential allosteric effects on the conformation and thus on binding.

      5) Having said that, the authors state (in the abstract and introduction) to have performed NMR experiments in their study. But no NMR data are shown or discussed in this manuscript.

      6) If the C-terminal proline was a biologically relevant switch that modulates binding, this residue should be conserved. Have the authors checked conservation of the C-terminal proline in homologous sequences?

      7) The authors conclude to have identified a cohesion-dockerin "code". The word "code" in this context is unclear to me. What do the authors mean by "code"?

      8) The authors conducted and analysed a set of kinetic experiments. But these experiments are not described at sufficient detail in the results and methods sections.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers find your work of interest and acknowledge your development of an elegant smFRET assay that can detect and quantify populations of cohesion-dockerin binding orientations. They further acknowledge your interesting finding of a role of the molecular clasp in modulating binding orientation involving a terminal proline. The reviewers find, however, that your conclusions of an enzymatic and allosteric control mechanism present in the cellulosome is not sufficiently supported by the data presented. The study lacks molecular-level information required to identify allosteric effects, which could, for example, be obtained using NMR spectroscopy that falls short in the present work. The proposed Monte Carlo approach and coarse-grained computer simulation does not provide sufficient molecular details and dynamic information to obtain mechanistic insight. There are further issues with the kinetic experiments. Some reported quantities are within error and controls are required to exclude artefacts.

    1. Reviewer #3:

      In this manuscript by Hansen et al., the authors describe three low (3.0 to 4.0 Å) resolution crystal structures of Ca2+-ATPase from Listeria, a gram positive bacterium. Two are crystal structures of wild type protein with B eF3- and AlF4- in the absence of Ca2+, thus, likely to represent the E2P ground state and E2~P transition state. The third one is a structure of a G4 mutant, in which 4 Gly residues are inserted into the A-domain -M1 linker, with BeF3- and Ca2+-present in crystallisation, designed to capture the E2P[Ca2+] state. Authors state, however, the three structures are virtually the same and that the E2·BeF3- crystal structure represents a state just prior to ("primed for") dephosphorylation. They also propose that proton counter transport "mechanism" is different from that of SERCA.

      As Listeria Ca2+-ATPase has been studied by a single molecule FRET, its crystal structures will certainly contribute to our understanding of ion pumping. Furthermore, different from SERCA, Listeria Ca2+-ATPase transports only one Ca2+ per ATP hydrolysed. Therefore, how site I is managed is an interesting topic, although let's not forget the same 1:1 stoichiometry is observed with plasma membrane Ca2+-ATPase (PMCA), for which an EM structure appeared in 2018 (ref. 9). The authors indeed find that the Arg795 side chain extends into binding site I. This part is solid and a more elaborate (and interesting) discussion could be made than what is currently described.

      Another solid finding is that the two E2·BeF3- crystal structures are similar to the E2·AlF4- crystal structure, although how similar is unclear as a structural superimposition reporting an RMSD is not provided and the presented figure makes it difficult to judge directly; the structures are viewed from almost one direction, which makes it infeasible to discern the differences in M1 and M2 and in the horizontal rotation of the A-domain. Two or three structures are superimposed, but with cylinders and again viewed from only one direction. As the authors designate that the structures represent H+ occluded states, it is important to clearly show the extracellular gate is really closed to H+ (not only to Ca2+ as well). For completeness, they should also examine the effect of crystal packing on the A-domain position.

      With regard to the point that the E2·BeF3- structure is "primed for dephosphorylation", only Fig. 2 is shown, in which differences appear to be the path of the TGES loop and the orientation of the Glu167/183 side chain. Their atomic models show that there is plenty of space for the Glu167 sidechain to take an orientation similar to that of Glu183 in SERCA. The authors should, however, provide an omit annealed Fo-Fc map for the Glu167 side chain and explain why that is the preferred and only orientation. If a Glu side chain is free to move, it could adopt in less than a nanosecond a different orientation. If it does, then the difference in the orientation of the Glu side chain does not sufficiently explain "the rapid dephosphorylation observed in single-molecule studies". The authors place further emphasis on proton occlusion and countertransport. However, this part of the manuscript is more speculative and, as detailed later should, at least, be entirely moved to the Discussion section.

      As mentioned, the authors place a larger emphasis on proton countertransport. Here a number of issues show up. First of all, I think they have frequently used the term "occlusion" improperly. From my understanding, occlusion of a site (or ion) means that the site (or ion) is inaccessible from either side of the membrane. This means more than closure of the gates, as the two gates have to stay closed for a substantial length of time (i.e. locked). It is experimentally well established with SERCA that Ca2+ ions are occluded in E1P species. It can be shown that the lumenal gate is closed for Ca2+ in the E2 state. However, that does not necessarily mean that the gate for H+ is also closed. As far as this reviewer knows, nobody has actually demonstrated that H+ is occluded, even in the E2 state of SERCA.

      Furthermore, the authors presume that protons enter the binding sites through a different pathway from that used for Ca2+ release, citing ref 26. However, if it does, can closure of the gate for Ca2+ really mean closure for the gate for H+? This seems a contradictorily statement as the authors designate that the E2·BeF3- state in Listeria Ca2+-ATPase as a proton occluded state (p.12). Apparent closure of the gate for Ca2+ on the extracellular side in a crystal structure seems insufficient for such a statement. One must keep in mind that a crystal structure merely provides a possible conformation in that particular state. It may not, however, represent the most populated conformation for that state. It is equally plausible that the E2·BeF3- complex takes a closed conformation for only a small fraction of the time. At this resolution it is simply not possible to determine if H+ occupies the binding site in the crystal structure. Furthermore, although it may be possible to show the gate is closed for Ca2+, it would be very difficult to show the gate is closed for H+. Thus, more experimental evidence is required to support that the structure represents a H+ occluded state.

      The authors write in the Abstract "Structures with BeF3- mimicking a phosphoenzyme state reveal a closed state, which is intermediate of the outward-open E2P and the proton-occluded E2-P* conformations known for SERCA". In essence this statement is fine, although what "closed" means is still unclear to me. In Figure 1, the authors state that "LMCA1 structures adopt proton-occluded E2 states". This statement is a bit misleading, because, in E2·BeF3-, the lumenal (extracellular) gate can in fact be opened and closed, at least with SERCA. As the authors recognize (p.14), the BeF3- complex of SERCA can be crystallised in two conformations, one with the lumenal gate is closed (with thapsigargin) and the other with the gate open; yet, they write "In SERCA, the calcium-free BeF3 -complex adopts an outward-open E2P state,..." p.8). This is for lumenal (extracellular) Ca2+, not for H+. Further evidence is required to establish that the extracellular gate of LMCA1 is fixed in a closed position for H+ in E2·BeF3-. Again more experimental evidence is required to support that E2·BeF3- is a H+ occluded state.

      The authors write that "SERCA has two proposed proton pathways: a luminal entry pathway [26] and a C-terminal cytosolic release pathway [27] (p. 9). One has to be careful here, as the luminal entry pathway has not been experimentally confirmed in SERCA. The authors write that "The luminal proton pathway has been mapped to a narrow water channel …” [26]. But since the pathway is not confirmed in SERCA I don't think it can be used to justify that the corresponding part of LMCA1 is mainly hydrophobic and that protons cannot enter through this pathway.

      The description on the exit pathway for H+ also needs clarification. They describe (p. 10; first line) "In SERCA it consists of a hydrated cavity...[27]. ... M7 in LMCA1 further blocks the pathway ... and LMCA1 therefore does not appear to have a C-terminal cytosolic pathway either" and rationalize that "This may explain why no distinct proton pathways are required in LMCA1". I think it should be made clearer that this is a proposal rather than an established fact.

      As H+ release takes place in the E2 to E1 transition the authors state that the E2·BeF3- structure of LMCA1 is different from that of SERCA. However, I don't think they can confidently make such statements without E1 and E2 structures of LMCA1. Furthermore, these descriptions (discussion) should not be in the "Results" section. As they conclude that LMCA1 use the Ca2+ release pathway, which is assumed to be the same as that in SERCA (even though no Ca2+ release pathway is visualised in their crystal structures), for H+ entry, why does SERCA not use the same pathway? I think experimental evidence is required for a proposal that H+ binds to E309 from the cytoplasmic side.

    2. Reviewer #2:

      The manuscript by Hansen et al. presents three new structures of LMCA1, Ca2+-ATPase 1 from Listeria monocytogenes. They determined structures with BeF and AlF, and a Gly4 linker form of LMCA1 in complex with BeF. This latter structure is at 3 Å resolution and was very challenging. The other two structures are at low 4 Å resolution. These structures are a follow up to an excellent single-molecule fluorescence study of the same enzyme. The structures support the main conclusion of that work that LMCA1 more rapidly progresses through the dephosphorylation step of the reaction cycle. The manuscript is well written, the structures and findings are interesting and make a significant contribution, and the work seems ideally suited for this journal. There are no substantive concerns with the manuscript. Overall the R factors are high for the structures, particularly the 3 Å resolution structure for which they should be lower. However, the authors offer a reasonable explanation for this in the supplemental information provided.

    3. Reviewer #1:

      Structural comparison is an important tool to understanding how proteins function at the molecular level. The mechanistic premise of obtaining LMCA1 structures from the gram-positive bacteria Listeria monocytogenes was to understand how Ca2+ pumps have different Ca2+ stoichometies to the mammalian SERCA and how they are proton coupled differently. Per molecule of ATP hydrolyzed, SERCA exports two Ca2+ ions in exchange for 2 or 3 protons, whereas LMCA1 exports a single Ca2+ and perhaps 1 proton in return.

      The paper describes two intermediate states of LMCA1 and from my understanding a mechanism is proposed based on structural differences in ionisable groups at the Ca2+ binding site, in particular the positioning of Arginine 795 that in SERCA is a glutamate. Since a previous crystal structure of LMCA1 was determined the new mechanistic insights rely heavily on the details achieved by the improved resolution. While this is technically an important achievement, just the assignment of side-chains in the current structures is not sufficient to reach the mechanistic conclusions reached and, as such, the current paper is unfortunately too preliminary. Proton-coupling pathways are mechanistically difficult to detangle and require extensive experimentation, such as ITC, mutagenesis and transport measurements as well as computational approaches. Indeed, ion or proton coupling pathways that alter energetics are rarely just the result from differences in a few residues. For example, glucose (GLUT) transporters are passive sugar transporters, whilst the bacterial counterparts are proton coupled. The proton coupling in the bacterial proteins is due to single aspartic acid residue in TM1. Whilst one can convert the bacterial sugar transporters to be no longer proton coupled by the mutagenesis of this TM1 residue to asparagine, you cannot make GLUT transporters proton coupled by mutating the corresponding asparagine residue to aspartic acid.

      One would have liked the authors to biochemically demonstrate how they could evolve LMCA1 to function similar to SERCA. This would have broader implications in our understanding of how biological systems can evolve substrate coupling and energetics.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      We all agreed that the LMCA1 complex structures are an important step forward for providing a structural framework for piecing together an ion pumping model to follow on from the previous smFRET studies. Nonetheless, two of the reviewers think that the mechanistic conclusions reached - based solely on crystal structures - require further validation. In particular, further experimental work (and likely computational) is required to i) confirm the hitherto designated crystallographic "states" and ii) to begin clarify how LMCA1 and SERCA have different Ca2+:H+ stoichiometries as there are other, plausible models.

    1. Reviewer #3:

      Bolze and colleagues describe a new database of mitochondrial variation that consists of a greater number of samples than existing databases. To overcome some of the limitations of existing databases, they use the same sequencing pipeline for all samples, do not select for any particular phenotypes, and report both heteroplasmic and homoplasmic calls. They demonstrate the utility of their database by defining intervals of invariable regions, which may indicate mutational constraint and could aid in interpreting candidate variants in disease patients. The authors also calculate the filtering allele frequency for LHON variants and suggest that the allele frequencies for many LHON variants in their database and UKB are too high for the variants to be considered pathogenic and that they should be reclassified. The main limitations of this database, as stated by the authors, are the lack of diverse haplogroups and the relatively low depth of coverage considering the variable heteroplasmy of the mitochondria. The technical aspects of the data aggregation and database are solid, and the scientific analyses are sound. I have only a few comments that would strengthen the paper.

      1) There is no discussion of how to distinguish heteroplasmy from sequencing errors. While some filtering was done akin to germline variant filtering (particularly that calls at positions with fewer than 10 reads were removed), this could still result in a ~1/11 variant being called as heteroplasmic (at 9%). The spike in Figure 3F (final panel) around 90% ARF could suggest that something like this could be happening (homoplasmic variants with sequencing errors reverting to another base). Was there a minimum heteroplasmy level used for this analysis? Perhaps showing these plots filtered to a minimum of 2, 5, etc of the same alternate allele would reveal a sensible cutoff that could then be used for the whole paper.

      2) Line 484: This is the only mention of NUMTs in the paper, but the complications that can arise from them are not detailed by the authors. Considering the mitochondrial coverage, how confident are the authors that their low heteroplasmic calls are not false positives resulting from NUMTs?

      3) Along the same lines, the authors use HaplotypeCaller, which is a standard tool for germline variation but not optimized for mitochondrial calling. Was this run in haploid or diploid mode? It would be useful to state the limitations of using this tool to call mitochondrial variants as it is designed for diploids.

      4) The suggestion that "all protein-coding genes in the mitochondrial genome were highly intolerant to LoF variants" is certainly plausible, but not definitive from the current data. While 0 LoFs are observed, how many would be expected? If these genes are small (which they must be since they are on a very small chromosome), the number of expected variants based on a mutational model (akin to [Samocha et al., 2014]) would likely be <1, and thus 0 would not necessarily be remarkable. Given that, you may not be quite powered to do this at a per-gene level, but pooling all the genes may provide enough power to make a broader statement. The same goes for the % of bases invariable analysis (Figure 5) - it would be good to make this more quantitative, perhaps comparing these proportions to autosomes, or within each other (are the tRNA and rRNA ones significantly different from the protein-coding? Would it be possible to split protein-coding by synonymous, missense, LoF?).

      5) "Indeed, we found that no haplogroup markers -- even those from haplogroups not represented in our dataset -- were mapped to these highly constrained regions" - is this not circular? Markers that delineate haplogroups are found as homoplasmic calls that were used to determine the constrained regions, so it stands to reason that these would not be found in them, no? But perhaps I'm missing something.

    2. Reviewer #2:

      The authors represent a resource of human mtDNA variants and heteroplamies from 195983 individuals, and scoring 14,324 mutations. The resource is of value. It may be possible to criticize the European ancestry- heavy data set, and the American specificity of it, but the authors fully acknowledge and disclose this in their manuscript, and make the data available to others to continue the work. Other high depth human papers are out there (Wei 2019 reference) and others, but the data is often not available due to patient confidentiality issues as in Wei 2019. Having this dataset available is of great intrinsic value.

      I only have a few comments that would require looking into the data for a few small things, or changing the writing of the manuscript.

      Comments:

      1) My biggest concern is that the authors use a read-aligning method where they take in all calls where the was at least 1 read mapping to mtDNA. The logic seems to be that they do not want to discard reads that may "mis-map" to the NuMTS, but this leads to another, potentially larger problem of potentially including NuMTS as heteroplasmic variants (See PMID: 23972387). For instance, the recent claim of paternal mtDNA transmission appears to be the result of a complex NuMT that was able to amplify in the strategies used in the original study (PMID: 32269217). More details on how the authors exclude the possibility of NuMTS incorporation are needed, especially in light of the 1+ alignment parameters used.

      2) Line 340 - 357 - regarding LHON. The problem with choosing LHON for this analysis is that it has a complicated clinical manifestation, which may not support the handling of the 14484t>C allele in the manner present. First, the 8:1 male to female ratio in becoming afflicted (with homoplasmic LHON), the fact that many people with the homoplasmic allele will not become afflicted, and the fact that it can onset late in life (after having children) all could contribute to it's allele being more representative in a random sampling of the population.

      While the authors are correct that the allele on its own may not be pathogeneic in specific haplogroup backgrounds (Howell 2003 reference), or require the co-expression with secondary "affector" mtDNA mutations (ex. PMID: 25342614 - alleles including 3397A>G, 3497C>T, 3571C-T, 3745G>A, and other "helper" mutations in MitoMap). The paper need a bit more on the 14484 conclusion due to all of these issues. Perhaps finding linkage (or lack thereof) to these helper alleles would strengthen this section sufficiently.

      3) Lines 206 - 207. How did the authors handle AGG / AGA codons? In 2010 a lab published evidence that AGA and AGG may not be true stop codons, but are simply not coded in the human mtDNA genome (PMID: 20075246). While this finding remains not universally accepted, it does explain the lack of an AGA/AGG-binding translational termination factor in the mitochondria. It is possible that the authors are in a position to comment on the behaviour of AGA or AGG codons, relevant to their section on PCG-truncating mutations.

      4) The work - especially discussing the control region, overlaps a bit more with Wei et al. 2019 than the manuscript lets on. A bit more direct openness about this overlap and similar finding should be introduced into the manuscript, within the discussion.

    3. Reviewer #1:

      Bolze et al. report their effort to sequence the mitochondrial genomes of ~200,000 individuals. The authors generated a large, unified database that can be used for the investigation of mitochondrial mutations and the prediction of pathogenic alleles. Importantly, it addresses key limitations of other currently available sources, mainly it is not biased for mitochondrial diseases, all analyses were done in the same lab and using the same bioinformatics tools, and heteroplasmic alleles are reported. The authors then use their source to draw conclusions on the nature of mitochondrial mutations, their distribution across the mt-genome, and to challenge previously annotated pathogenic mutations, specifically for LHON disease.

      For example, figure 3A, which is one of the main take home messages from the paper, does not reflect hardly any "interesting" alleles. The vast majority of the >14,000 discovered variants cannot be seen on the plot. Unfortunately, many of the plots display the same data in similar, and unnecessary formats, making the figures dense and confusing. Examples include figure 3F (mean and max ARF distribution) and figure 5A, B & C.

      Another, and more concerning issue, is the quality of heteroplasmic variants. The authors mention very briefly in the Methods section what was done to consider NUMTS - nuclear copies of mtDNA - that may be mutated and thus bias SNV calling. From their short description, it seems like NUMTS could be a source of errors. Furthermore, Figure 2E shows that the vast majority of individuals had {less than or equal to}1 heteroplasmic variation. This observation cannot be reconciled with the basis underlying current methods to infer cellular lineages based on heteroplasmy in a cellular population (PMID: 30827679).

      These issues are particularly critical when using the data to draw conclusions on the pathogenesis of mutations, which is the focus of the last part of the manuscript. When considering the effect of m.14484T>C mutation on LHON disease, the authors argue that this mutation should be reclassified as non-pathogenic as it satisfies the "Bening Strong 1" criteria. Given the above limitations, this is certainly too strong of a conclusion. Stronger evidence for this claim is required, especially since all subjects carrying this mutation are from the same haplogroup.

      Lastly, to assess the probability that m.14484T>C is indeed non-pathogenic, the authors use previously published estimates of the "maximum credible population allele frequency". Despite the abundance of papers that estimate these parameters, the authors provide only one number, with no error or range estimates, and show that the frequency of m.14484T>C is higher than expected. It is important to understand what is the certainty of this claim, and ideally to reflect it as a range around the dashed lines in Figure 6.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      Bolze and colleagues describe a new database of mitochondrial variation that consists of a greater number of samples than existing databases. To overcome some of the limitations of existing databases, they use the same sequencing pipeline for all samples, do not select for any particular phenotypes, and report both heteroplasmic and homoplasmic calls. They demonstrate the utility of their database by defining intervals of invariable regions, which may indicate mutational constraint and could aid in interpreting candidate variants in disease patients. The authors also calculate the filtering allele frequency for LHON variants and suggest that the allele frequencies for many LHON variants in their database and UKB are too high for the variants to be considered pathogenic and that they should be reclassified. The main limitations of this database, as stated by the authors, are the lack of diverse haplogroups and the relatively low depth of coverage considering the variable heteroplasmy of the mitochondria.

      While the database is indeed unique and will likely be very valuable for the community, on the whole, the computational analyses are in several places superficial, in some cases even flawed and overall not as well presented as they could be.

    1. Reviewer #3:

      General assessment:

      This manuscript examines publicly available genomes of a number of Enterobacteriaceae species, and makes statements regarding their evolution, geographical distribution and antimicrobial resistance. While repurposing existing data can add value, such analyses must be carefully done and inferences only made after assessment and consideration of the potential limitations and biases of such data. Currently, the rationale and methods for performing the analyses outlined in this manuscript are not sufficient to support the conclusions. Following critical evaluation of the metadata associated with the genomes, and more robust analyses, useful insights may be obtained.

      Numbered summary of substantive concerns:

      1) More justification for examination of these particular bacterial species is required. For example, only 59 M morganii genomes were included; given these small numbers, how big is the clinical problem, and is a global analysis really possible?

      2) There is no description of inclusion / exclusion criteria for these genomes. It is clear that most genomes derived from the United States; a full description of the selection process will provide a greater understanding of potential bias, which could affect the results and conclusions reached.

      3) A number of outbreaks are stated to have been observed, but there is no robust evidence presented to support such identification, other than presumably clustering in the phylogenetic trees. More generally, without proper evaluation of the metadata associated with the genomes, there is a large risk that any observations (regarding similarity or clustering, or higher prevalence of resistance determinants, etc) are merely due to the nature of the genome collection rather than true biological or epidemiological relatedness. A critical evaluation of the representativeness of the genome collection is required.

      4) Various qualitative statements on differences between species or clades are made, such as the relative richness of resistomes, but (in addition to the issue described in the previous point) such statements require the use of appropriate statistical tests. Definitions are required for terms such as "closely related", "comparable" resistome diversity etc.

      5) The analyses performed are currently not sufficient to underpin many of the statements made in this manuscript regarding the evolution and transmission of these bacteria. For example, the trees presented in the figures appear to be cladograms, therefore the branch lengths are meaningless. Branch lengths are important in this context. Also, the phylogeography was evaluated by mapping genome origins physically onto a map, but there are more sophisticated approaches for this (eg phylogenetic diffusion models), though such analyses may regardless be heavily biased by the nature of the genome collection.

    2. Reviewer #2:

      This manuscript presents species-by-species analysis of presence and distribution of antimicrobial resistance (AMR) genes for the less isolated Enterobacteriaceae species using the genome and meta data registered in PATRIC database. It is valuable, but most analyses are not quantitative but just descriptive, and sentences describing the results are not easy to read. The phylogenetic tree and heatmap indicating presence of AMR genes are presented for each species, but it's hard to understand what the main message is in each figure, and what are characteristics of a species compared to the others. The current manuscript will be useful as a dictionary indicating the presence of a specific AMR gene in each species for researchers in AMR.

      -Each figure should have legends to let readers understand which color indicates what at a glance. Information of geographical region should be clearly indicated in the figure, in particular when it is mentioned in the main text. Also, what do the different colors in the strain names in the tree mean?

      -The Method section is too simple and lacks sufficient explanation. For example, what is a criterion to judge presence of an antimicrobial resistance gene?

      -The list of detected AMR genes at the top should be clearly categorized using different colors and headers (e.g., "ESBL", "AmpC" etc)

      -L126: what is the "outbreak"? I cannot tell in the figure and how it was defined.

      -Examples of the not quantitative but just descriptive explanations are L135 "richer resistome" and L136 "common". Why do the authors not specifically present the number and percentage?

      In the entire text, the authors do not conduct any statistical test to judge significance of the difference they mention.

    3. Reviewer #1:

      Sekyere and Reta present a comprehensive descriptive characterization of the epidemiology, phylogeographical distribution and antibiotic resistance profiles of six species of Enterobacteriaceae. Using a total of 2377 publicly available genomes, the authors show many multidrug resistant clones that are distributed worldwide. This study potentially provides important insight into a group of clinically relevant bacteria that remain poorly characterized compared to their more well-known relatives. Below are my comments.

      Major comments:

      1) The entire study is basically a descriptive enumeration of the resistance characteristics six different bacterial species based on genome sequences, with numerous reference to "less" or "more" or synonyms of these words (a few examples are line 140 "richer resistome diversity", line 157 "lesser resistome abundance and diversity", line 163 "richly endowed", line 215 "fewer resistome diversity and abundance", line 217 "sparse", lines 218 and 221 "virtually absent", line 222 "substantial abundance", line 244 "richest abundance of resistomes"). The lack of statistical analyses to compare lineages/clusters of the same species and between species and determine significant differences among them is problematic. Throughout the text, there is no reference to specific numerical values (e.g., p values) when making these comparisons.

      2) Similar to my comment above are the references to "short (or close) evolutionary distance" (for examples, lines 131, 208, 228, 265, 432, 439). How was evolutionary distance measured - number of SNPs, phylogenetic distance, average nucleotide identity? This "closeness" or "shortness" should be explicitly stated in terms of number, for example number of SNPs.

      3) The Methods section needs more details. I have listed my specific comments on methods below.

      3 a) Lines 504-511: How many genomes were initially downloaded? Were these genomes complete or in draft stages? How were these filtered and the final 2377 genomes selected? What were the criteria for selecting the 2377 genomes - number of contigs, size of genomes, assembly quality, available metadata, etc - or did the authors use programs that check genome quality such as CheckM? Line 510 "filtered to remove poor genome sequences" How is poor defined here?

      3 b) Line 517: How were the 1000 genes used for phylogenetic reconstruction selected?

      3 c) Lines 522-525: Simply drawing the distribution of subspecies and species on a map does not constitute a phylogeographical analysis. There are many biases that can influence the geographic distribution of microbes, most notably the sampling scheme used (for example, more samples from a single country or from a specific host/environment/setting), the composition of the database being used (NCBI and PATRIC in this study) and the collection of more strains of a single species and fewer strains in other species. The current study, similar to many others, has these biases and were in fact mentioned in the Results section. How do the authors address these biases?

      3 d) Lines 526-531 Resistome analyses: The current study is basically a summary of the information from the NCBI Pathogen Detection database. The authors need to briefly describe how resistance genes were identified in the genomes from this database. Since the entire study and all figures focus on the ARGs, authors need to show the reliability and confidence on how these were identified.

      4) Results, lines 187-188: Citation for "local and international outbreaks" needed. How did the authors come up with the inference that lines 183-186 represent outbreaks? Analyses of outbreaks require information on dates of sampling, which are lacking from this dataset. Hence, to make inferences that such topologies in the tree represent outbreaks is quite a stretch. I suggest that the authors either carry out temporal analyses of their data to be able to say that there were outbreaks or remove suggestions of the occurrence of outbreaks.

      5) Discussion, lines 447- 457: I agree that both vertical and horizontal modes of evolution of resistance bacteria are important mechanisms in the spread of resistance in many pathogens and there are numerous previous studies that have reported this. However, the study did not carry out any specific analyses on HGT and vertical evolution, hence to say that "both phenomena are being observed" (lines 455-456) is misleading.

      6) Discussion or Conclusion: The authors mentioned that a limitation in their study is that the genomes they downloaded were those available only up to January 2020. I think there are a few more important limitations and caveats that need to be discussed (for example, see comment 3.c above)

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on medRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that the topic is interesting in principle, i.e. tracking antibiotic resistance globally in less well-studied but nonetheless clinically important bacterial species. However, the reviewers also had several major concerns, with the main concerns being:

      1) Overall lack of rigor in the analysis. This is due in large part to a lack of precision in the methods, e.g. differences in diversity are not statistically supported, lengths of evolutionary distance are not defined, the definition of a resistance gene is unclear, how an outbreak is defined is unclear.

      2) The paper does not address biases in sample collection. Since the data were taken from a central repository, there are many different studies included, each with their own biases. It is important to address these biases when comparing datasets from different groups and from different geographical locations.

      3) There is insufficient evidence to make claims about horizontal gene transfer.

      The individual reviews provide more details on each of these points.

    1. Reviewer #3:

      In this paper, Barnett and colleagues used network-based, data-driven analyses to characterize how the default mode network (DMN) and the Medial Temporal Network (MTN) interact with the hippocampus. First, the authors confirmed previous findings that the MTN is a distinct network from the DMN. Second, the authors identified three subnetworks of the DMN that differ from each other based on their connectivity profiles. They further investigated cross-network and intra-network dynamics during rest and also the representational similarity of patterns within these networks during a memory retrieval task. Finally, they used meta-analytic analyses to develop hypotheses about the specific cognitive functions of the MTN and DMN subnetworks.

      Major comments:

      1) One noteworthy aspect of this paper is that the networks identified by the current investigation do not map on perfectly to a previous framework outlined by the senior author (the AT-PM framework; Ranganath and Ritchey, 2012). I think that readers of this work will be very curious to hear about this update, and I think that the similarities and differences between the AT-PM framework and the current findings should be made crystal clear. For example, perhaps a schematic could be used to visually depict the similarities and differences.

      2) In addition to this suggested visualization, I think that memory scholars that are familiar with the AT-PM framework will be curious to know how these results can update the current thinking of how different brain networks organize memories and perform different types of cognitive functions. The meta-analysis partially does this, but one is left wondering about how this changes our updates the field's understanding of how specific types of memory (e.g. object versus scene memory as in Maass et al., Brain, 2019) are supported.

      3) The authors state in the methods, "This sample size is comparable to the cohort sample sizes from the seminal Power et al., study investigating functional brain organization." I think a bit more can be said about the effect sizes reported in the previous literature (which might be inflated due to publication bias), and the power to detect such effect sizes (or smaller) here.

      4) I found the results reported in the section "Regions within the same community represent similar kinds of information during a memory task" difficult to follow. Moreover, I was not sure what this analysis provides beyond the resting state analyses. This paper would be strengthened if these analyses were linked to behavioral performance on the memory retrieval task.

      5) I was surprised to see that the Anterior Hippocampus was more highly correlated (numerically) to the DMN (Supplementary Table 1) and the MP and PM sub-networks (Figure 4) compared to the MTN network. Is this difference statistically significant, and, if so, do the authors think that this difference is meaningful?

      6) Tau spreading models have been demonstrated to follow patterns of function connectivity (Franzmeier et al., Nature Comms, 2020). The authors may wish to comment on the relevance of these findings to different patterns of tau accumulation in different types of dementia.

    2. Reviewer #2:

      Overall, I thought that the topic addressed and approaches used were interesting and in particular I appreciated the motivation of relating data-driven analyses of resting state data to existing theoretical frameworks and task-based data. As described below, I believe the manuscript could be strengthened with additional comparison to past work as well as addressing a potential methodological issue.

      1) As noted by the authors, past work has used data-driven approaches on resting state data to subdivide the default mode network. The manuscript would be strengthened by highlighting the similarities/differences of the current work with such past work. In terms of revealing subnetworks, Is it believed that some aspects of the data acquisition/delineation methods employed here are preferable? MTL signal dropout was mentioned in the discussion, but was this a major motivating factor? Might there be any way of quantifying or tabulating the differences between the proposed subdivisions here and other efforts in order to help bridge the current findings to past work and to assess how and why the current results might differ?

      2) The motivation to link data-driven network clustering approaches (e.g. the MTN and DMN subnetworks found here) with more hypothesis-driven approaches (e.g. the PM/AT framework) is a key strength of the study, although the findings and conclusions drawn about the relationship were a little difficult to fully understand. For example, how functionally distinct are the MTN and the PM/AT DMN subnetworks given that the PM/AT framework highlights the distinct contributions of subregions of the MTN (e.g. PHC/PRC)? Is it thought that there is a distinction between PM/AT pathways that spans DMN and MTN but is not captured here or do the findings suggest that a better distinction in terms of understanding hippocampal-based memory in the brain is between DMN subregions and MTN? Relatedly, might it be possible that the DMN subnetworks connectivity with the hippocampus is mediated by MTL subregions? More generally, this comment is intended to probe the authors as to whether they believe that the data-driven and hypothesis-driven are reconcilable or if they are arguing that the data-driven approach is preferable.

      3) To what degree might the spatial proximity of the ROIs influence the results of the various analyses? In particular, I wonder if the analyses done using pattern similarity might be influenced by partial non-independence of adjacent ROIs. That is, adjacent ROIs might have correlated pattern similarity due to smoothing and other sources of voxelwise spatial non-independence, and so insofar as there are more nearby ROIs within networks than across networks, it might influence the observed results. Similar concerns might be applicable to the Participation analysis, but seem less obvious.

    3. Reviewer #1:

      This paper characterizes resting state functional connectivity across the brain and within memory networks, evaluates whether similar networks arise in a memory-guided decision-making task, and collects descriptions of the function of these networks in prior imaging studies. The authors find that the DMN and a Medial Temporal Network (MTN) can be differentiated, and that there are three subnetworks within the DMN that interact differently with different parts of the hippocampus and that have been ascribed different kinds of functions in prior imaging studies.

      The paper provides a systematic overview and re-examination of multiple approaches that have been used before to characterize networks across the brain and those focused on memory systems. My overall sense is the paper will be very useful to the cognitive neuroscience / memory communities but does not present a substantial theoretical advance. I am also concerned about the interpretation of the memory task connectivity data, as described below.

      Major comments:

      -It seems possible to me that the trial-by-trial RSA analyses run on the task data are picking up on basically the same signal as the functional connectivity resting state analyses. If the authors ran the RSA analyses TR by TR on the resting state data, would that pick up the same structure? Similarly, would the functional connectivity analyses on the task data explain the same variance as the RSA? Univariate signals can drive RSA effects, so careful analyses would need to be done to demonstrate that these methods are picking up on different aspects of the interactions between these regions. Relatedly, if the authors have access to a non-memory task dataset, perhaps it could be useful to show that the results are different in that case.

      -The results are displayed on surfaces, but I think (but am not sure) that all the analyses were done in the volume. Given the interest in the hippocampus and its connectivity, it would be very useful to see results displayed in the volume in addition to (or replacing) the surfaces.

      -By eye, the MP network as shown in Fig 2 looks much less coherent than the other two. It is difficult to see much cluster structure there at all. I am therefore unsure how confident to feel in the existence of this as a distinct network.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers felt that this work represents a useful contribution to the literature, relating different perspectives on the nature of interactions between brain areas and how these interactions may support memory, but that it does not offer a substantial theoretical advance beyond prior work. The reviewers also raise some methodological concerns that the authors may wish to consider.

    1. Reviewer #3:

      In this work, Yao and colleagues described transcriptome profiling of human plasma from healthy individuals by TGIRT-seq. TGIRT is a thermostable group II intron reverse transcriptase that offers improved fidelity, processivity and strand-displacement activity, as compared to standard retroviral RT, so that it can read through highly structured regions. Similar analysis was performed previously (ref. 20), but this study incorporated several improvements in library preparation including optimization of template switching condition and modified adapters to reduce primer dimer and introduce UMI. In their analysis, the authors detected a variety of structural RNA biotypes, as well as reads from protein-coding mRNAs, although the latter is in low abundance. Compared to SMART-Seq, TGIRT-seq also achieved more uniform read coverage across gene bodies. One novel aspect of this study is the peak analysis of TGIRT-seq reads, which revealed ~900 peaks over background. The authors found that these peaks frequently overlap with RBP binding sites, while others tend to have stable predicted secondary structures, which explains why these regions are protected from degradation in plasma. Overall, this study provided a robust dataset and expanded picture of RNA biotypes one can detect in human plasma. This is valuable because the findings may have implications in biomarker identification in disease contexts. On the other hand, the manuscript, in the current form, is relatively descriptive, and can be improved with a clearer message of specific knowledge that can be extracted from the data.

      Specific points:

      1) Several aspects of bioinformatics analysis can be clarified in more detail. For example, it is unclear how sequencing errors in UMI affect their de-duplication procedure. This is important for their peak analysis, so it should be explained clearly. Also, it is not described how exon junction reads (when mapped to the genome) are handled in peak calling, although the authors did perform complementary analysis by mapping reads to the reference transcriptome.

      2) Overall, the authors provided convincing data that TGIRT-seq has advantages in detecting a wide range of RNA biotypes, especially structured RNAs, compared to other protocols, but these data are more confirmatory, rather than completely new findings (e.g., compared to ref. 20).

      3) The peak analysis is more novel. The authors observed that 50% of peaks in long RNAs overlap with eCLIP peaks. However, there is no statistical analysis to show whether this overlap is significant or simply due to the pervasive distribution of eCLIP peaks. In fact, it was reported by the original authors that eCLIP peaks cover 20% of the transcriptome. Similarly, the authors found that a high proportion of remaining peaks can fold into stable secondary structures, but this claim is not backed up by statistics either.

      4) Ranking of RBPs depends on the total number of RBP binding sites detected by eCLIP, which is determined by CLIP library complexity and sequencing depth. This issue should be at least discussed.

      5) Enrichment of RBP binding sites and structured RNA in TGIRT-seq data is certainly consistent with one's expectation. However, the paper can be greatly improved if the authors can make a clearer case of what is new that can be learned, as compared to eCLIP data or other related techniques that purify and sequence RNA fragments crosslinked to proteins. What is the additional, independent evidence to show the predicted secondary structures are real?

      6) The authors should probably discuss how alignment errors can potentially affect detection of repetitive regions.

      7) Many figures are IGV screenshots, which can be difficult to follow. Some of them can probably be summarized to deliver the message better.

    2. Reviewer #2:

      Yao et al used thermostable group II intron reverse transcriptase sequencing (TGIRT-seq) to study apheresis plasma samples. The first interesting discovery is that they had identified a number of mRNA reads with putative binding sites of RNA-binding proteins. A second interesting discovery from this work is the detection of full-length excised intron RNAs.

      I have the following comments:

      1) One doubt that I have is how representative is apheresis plasma when compared with plasma that one obtains through routine centrifugation of blood. The authors have reported the comparison of apheresis plasma versus a single male plasma in a previous publication. I think that to address this important question, a much increased number of samples would be necessary.

      2) For the important conclusion of the presence of binding sites of RNA-binding proteins in a proportion of apheresis plasma mRNA molecules, the authors need to explore whether there is any systemic difference in terms of mapping quality (i.e. mapping quality scores in alignment results) between RBP binding sites and non-RBP binding sites, so that any artifacts of peaks caused by the alignment issues occurring in RNA-seq analysis could be revealed and solved subsequently. Furthermore, it would be prudent to perform immunoprecipitation experiments to confirm this conclusion in at least a proportion of the mRNA.

      3) In Fig. 2D, one can observe that there are clearly more RNA reads in TGIRT-seq located in the 1st exon of ACTB, compared with SMART-seq. Is there any explanation? Will this signal be called as a peak (a potential RBP binding site) in the peak calling analysis (MACS2)? Is ACTB supposed to be bound by a certain RBP?

      4) For Fig 2A, it would be informative for the comparison of RNA yield and RNA size profile among different protocols if the author also added the results of TGIRT-seq.

      5) As shown in Figure 4 C (the track of RBP binding sites), it seems quite pervasive in some gene regions. How many RBP binding sites from public eCLIP-seq results are used for overlapping peaks present in TGIRT-seq of plasma RNA? What percentage of plasma RNA reads have fallen within RBP binding sites? Are those peaks present in TGRIT-seq significantly enriched in RBPs binding regions?

      6) Since there is a considerable portion of TGIRT-seq reads related to simple repeat, one possible reason is likely the high abundance of endogenous repeat-related RNA species in plasma. Nonetheless, have authors studied whether the ligation steps in TGIRT-seq have any biases (e.g. GC content) when analyzing human reference RNAs and spike ins (page 4, paragraph 2)?

      7) As described in Figure 2 legend, there are 0.25 million deduplicated reads for TGIRT-seq reads assigned to protein-coding genes transcripts which are far less than 2.18 million reads for SMART-seq. The authors need to discuss whether the current protocol of TGIRT-seq would cause potential dropouts in mRNA analysis, compared with SMART-seq?

      8) While scientific thought-provoking, the practical implication of the current work is still unclear. The authors have suggested that their work might have applications for biomarker development. Is it possible to provide one experimental example in the manuscript?

    3. Reviewer #1:

      The Lambowitz group has developed thermostable group II intron reverse transcriptases (TGIRTs) that strand switch and also have trans-lesion activity to provide a much wider view of RNA species analyzed by massively parallel RNA sequencing. In this manuscript they use several improvements to their methodology to identify RNA biotypes in human plasma pooled from several healthy individuals. Additionally, they implicate binding by proteins (RBPs) and nuclease-resistant structures to explain a fraction of the RNAs observed in plasma. Generally I find the study fascinating and argue that the collection of plasma RNAs described is an important tool for those interested in extracellular RNAs. I think the possibility that RNPs are protecting RNA fragments in circulation is exciting and fits with elegant studies of insects and plants where RNAs are protected by this mechanism and are transmitted between species.

      I have one major comment for the authors to consider. In my view the use of pooled plasma samples prevented the important opportunity to provide a glimpse on human variation in plasma RNA biotypes. This significantly limits the use of this information to begin addressing RNA biotypes as biomarkers. While I realize that data from multiple individuals represents a significant undertaking and may be beyond the scope of this manuscript, I urge the authors to do two things: (1) downplay the significance of the current study on the development of biomarkers in the current manuscript (e.g., in the abstract and discussion - e.g., "The ability of TGIRT-seq to simultaneously profile a wide variety of RNA biotypes in human plasma, including structured RNAs that are intractable to retroviral RTs, may be advantageous for identifying optimal combinations of coding and non-coding RNA biomarkers for human diseases."). (2) Carry out an analysis in multiple individuals - including racially diverse individuals - very important information will come of this - similar to C. Burge's important study in Nature ~2008 where it was clear that there is important individual variation in alternative splicing decisions - very likely genetically determined. This second suggestion could be added here or constitute a future manuscript.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Timothy Nilsen (Case Western Reserve University) served as the Reviewing Editor.

    1. Reviewer #3:

      In this manuscript Brown et al characterized fatty acid taste discrimination in Drosophila melanogaster. Fat taste is relatively poorly understood, but has critical implications for feeding and obesity research; thus, studies that advance our understanding of the molecular and physiological underpinning of this modality are important. The finding that Ir56d neurons enable organisms to discriminate between short, medium and long chain fatty acids but not to differentiate between types of medium chain fatty acids is certainly novel and interesting. It is also surprising but fascinating that this receptor is only required for the detection of medium fatty acids. The manuscript is well written and the figures presented in a clear and thoughtful manner. These findings lay out ground for future exciting work to investigate how sweet taste and fatty acid taste perception are selectively modulated by the brain since these gustatory neurons overlap and whether such discrimination is altered depending on the state of hunger.

      Strengths:

      1) Despite the overlapping nature of taste neurons in this case, i.e., Ir56d neurons being co-expressed with Gr64f - those that broadly label the sweet GRNs and the fact that Ir56d neurons are responsive to both sucrose and fatty acids; mutation in Ir56d results in loss of taste for hexanoic acid, but not sucrose. Authors use this taste discrimination to their advantage in combination with a robust aversive taste memory assay to address the question of differential fatty acid taste perception.

      2) Authors rule out the potential involvement of olfaction in modulating taste perception.

      3) Use of CRISPR-Cas9 to generate Ir56dGAL4 flies, implying accurate and targeted genome editing, provide validation to the results obtained when Ir56d expressing neurons are silenced. Additionally, use of the fly gustatory system for in-vivo Ca2+ imaging strengthens and corroborates the results at the physiological level, especially the rescue experiments.

      Overall (minor) comments and questions:

      1) Are the differences in taste discrimination between male and female flies?

      2) Individual data points should be shown whenever possible for all figures (except PER because that would make it impossible to interpret).

      3) Can the authors discuss how discriminating between different fatty acids types may be adaptive? Are they found in different food sources, some of which are "good" and some "bad"? Is there evidence from other organisms about this type of molecular discrimination in fatty acid taste?

    2. Reviewer #2:

      In the present paper Brown et al., study the ability of Drosophila melanogaster to discriminate between Fatty Acids (FAs) of different lengths. Using a combination of behavioral experiments, molecular biology and in vivo calcium imaging, the authors show that a subset of Ir56d expressing neurons are able to differentiate FAs. However, the Ir56d receptor is only necessary for the detection of medium-length FAs but not short- or long-. The paper explores in detail the role of the Ir56d receptor as FA detector, a role previously described by the authors in a previous paper Tauber et al 2017.

      Major concerns:

      I consider that the experiments are properly done, and so the statistical analysis, however gain in knowledge is very limited. So far, the authors can prove that flies can discriminate FAs of different lengths, being Ir56d the receptor detecting medium-length FAs, a result that expands the knowledge gained in Tauber et al 2017. In figure 3, the authors show that silencing Ir56d neurons using tetanus toxin expression, reduces dramatically PER to medium-length fatty acids, but not to short or long, pointing to a different set of neurons involved in their detection. However, the in vivo calcium imaging experiments show that Ir56d neurons also respond to short- and long- FAs. In this regard, I disagree with the statement at the abstract: Characterization of hexanoic acid-sensitive Ionotropic receptor 56d (Ir56d) neurons reveals broad responsive to short-, medium-, and long- chain fatty acids, suggesting selectivity is unlikely to occur through activation of distinct sensory neuron populations. In fact, I consider that selectivity would come from the activation of different subsets of gustatory neurons. It seems that Ir56d neurons could be a subset of the neurons that generally respond to FAs, providing the specificity for medium-length FAs. Other neurons, in addition to the Ir56d ones, might be responding to short- and long- FAs in an Ir56d independent manner.

      I consider the authors should explore in deep how short- and long- FAs are actually detected, whether it depends on other Ionotropic Receptors (probably Ir25a and Ir76b might be involved (Ahn et al. 2017)) and which subset of gustatory neurons are actually responding to these compounds, considering they do not require Ir56d nor Ir56d neurons.

    3. Reviewer #1:

      This paper investigates fatty acid taste in flies and asks the broad question of whether flies can discriminate different compounds within a single taste modality. The authors' main finding is that flies can discriminate between long, medium, and short chain fatty acids using a previously established aversive memory taste paradigm. When they delve into the cellular and molecular basis of fatty acid detection they find that IR56d neurons respond to all three classes of fatty acids, but are required only for the behavioural responses to medium chain molecules. Similarly, CRISPR/Cas9 deletion of the IR56d receptor reveals that it too is required only for medium-chain fatty acid responses. Thus, different fatty acid classes presumably activate distinct, but partially overlapping subsets of appetitive taste neurons. In general I think the paper is potentially interesting (see comment 1 below) and the data mostly supports the conclusions. However, there is some lack of attention to details that make some of the data hard to interpret (see minor comments).

      Concerns:

      1) The ability of flies to discriminate between different fatty acid classes is presented as the interesting finding, since, as the authors point out, discrimination between compounds within a taste modality is generally not thought to occur. On the surface I agree that this is interesting. However, in the authors' set up of the main question (line 101), they raise an important issue: "Is it possible that flies are capable of differentiating between tastants of the same modality, or is discrimination within a modality exclusively dependent on concentration?" This should be rephrased to replace "concentration" with "intensity" since not all tastants at the same concentration have the same intensity, and from a behavioural perspective it is intensity that matters. Given that, the authors don't do anything to demonstrate that their discrimination task does not depend on intensity, aside from the fact that 1% solutions of all the FA seem to give similar PER. They need to show more explicitly that this task is truly showing identity-based discrimination.

      2) The second broad concern I have is over the nature of short and long chain fatty acid detection. Interpreting the discrimination results would be greatly aided if we knew what other neurons mediate the PER to these molecules. Is it the non-IR56d population of Gr64f neurons?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers find fatty acid taste discrimination potentially interesting and agree that the experiments are performed to a high standard. One major concern is whether discrimination is based on intensity rather than quality. A second limitation is that the mechanism of FA detection is not greatly advanced beyond the authors' previous work: the cellular mechanisms for long and short chain FA detection remain unclear. The reviewers agreed that if the major concerns of Reviewer 1 were addressed, this manuscript would provide a broader understanding of fatty acid discrimination.

    1. Reviewer #2:

      In this paper the authors use a genomics approach to tackle the question of how the combined transcriptional response to two signals compares to the responses to the two treatments individually. They treat MCF-7 cells with TGF-beta and retinoic acid, and find that the combined response at the level of gene expression (RNA-seq) and chromatin accessibility (ATAC-seq) may encompass additivity, multiplicativity but also a wide range of other intermediate or more extreme behaviours.

      The work is conceptually very interesting, and the manuscript text and figures were extremely clear and a pleasure to read. We suggest that the following major points be addressed to clarify the assumptions and limitations of the study.

      The authors treat the cells for 72h. This is a very long time where secondary effects may be dominating the results. The choice of this time point should, at the very least, be justified and discussed. For example, previous studies that quantitatively characterized distinct temporal dynamics in SMAD signaling after TGF-beta treatment showed a transient, dose dependent SMAD response in the first 4 h after TGF-beta treatment, with a strong early peak in the nuclear/cytoplasmic ratio of SMAD2/4 (Clarke & Liu, 2008; Schmierer et al, 2008; Zi et al, 2011; Zi et al, 2012; Strasen et al., 2018). In addition, TGF-b signaling has been suggested to depend on cell density and cell cycle stage (Zieba et al, 2012), which may also affect the results. Also it would be helpful to have a quantitative measure of the corresponding nuclear TF levels at the selected time-point after 72h (e.g for main affected TFs such as pSMAD2 and RARA levels).

      MCF7 cells were treated with three different doses of TGF-beta (1.25, 5, and 10 ng/mL) and RA (50, 200 and 400 nM). As it seems that the selected doses are higher than what has been used in previous studies, the authors should comment on their choice. The authors state that "We defined a master set of 1,398 upregulated genes by selecting the set of genes that were differentially expressed in any dose of the combination treatment (log FC {greater than or equal to} 0.5 and padj {less than or equal to} 0.05) and that had increased expression in each dose of each individual signal." It is unclear how this gene set relates to the top-right Venn diagram in Fig 1B, in which only 303 genes are shown as being upregulated in all three treatments and the total according to the numbers in the diagram are >1398.

      Fig 1B shows that a large proportion of genes were differentially expressed in response to both signals, but not to either of the signals individually. Their responses are presumably more non-additive than the responses of genes upregulated in response to all three treatments. Restricting analysis to the latter group therefore introduces a bias for certain modes of combinatorial regulation. The justification for this choice should be discussed.

      The authors suggest a bimodal distribution for the observed c values, with peaks at 0 and 1. The authors write that "Our simulated c value distributions bear a moderate resemblance to our observed c value distributions". This conclusion is central to the paper's claim that "Gene regulation gravitates towards either addition or multiplication when combining the effects of two signals" (title) and that "the combined responses exhibited a range of behaviors, but clearly favored both additive and multiplicative combined transcriptional responses" (abstract). However, the additional peak at c=1 is not obvious from the data in Fig. 1E. Stronger evidence (i.e. statistical analysis of the observed distributions) would be needed to demonstrate overrepresentation of c values ~1. Alternatively, the title and abstract could be revised to better reflect the strength of the findings.

      The authors frame the work on the basis of simple models of gene regulation by pairs of transcription factors that predict either addition or multiplication. However, they are activating two signalling pathways that could interact also at the level of signal transduction (and need not be directly regulating the genes in question, as noted in point 1). How justifiable is it to make inferences about the nature of combinatorial transcriptional regulation from this kind of experimental set up? These issues should be made more clear from the beginning, and should be taken into account when interpreting the data.

      Related to the point above, the authors use chromatin accessibility as a proxy for TF binding. However, this does not need to be the case, especially if the accessibility data are considered quantitatively. For example, TFs may bind and recruit remodeling factors that affect accessibility differentially across the genome, obscuring the relationship between TF binding and accessibility. This is especially pertinent at longer time scales after perturbation. We suggest presenting the data on accessibility as just that, instead of presenting it as data that directly reports on TF binding. The relationship to TF binding can and should still be explored in the analyses, but with clarification for how accessibility data is limited in this case.

      The following are instances where accessibility data is described as directly reporting on TF binding that we recommend revising (the list is not exhaustive):

      -the title of section two

      -Fig.2E

      -the link between models of TF control and the relationship between peaks and expression, such as the reference to the thermodynamic model at the end of section 3

      -remove the implicit assumption between cooperativity of TF binding and super-additive peaks in section 3 and section 4. This may help explain more naturally the lack of dual-motif finding in section 4

    2. Reviewer #1:

      Cells perform many types of computations to respond to external signals at the transcriptional regulatory level. Often, regulatory sequences read out the concentration of input transcription factors and combine that information to dictate the level of transcriptional output. Yet, for most genes, the quantitative rules for how regulatory regions integrate multiple inputs remain unclear.

      Sanford et al. studied how two signals are interpreted by downstream genes using quantitative tools such as RNA-seq and ATAC-seq. The authors propose two phenomenological models to understand combinational regulation. Specifically, a model in which output gene expression in the presence of two different input signals is the sum of the gene activity in the presence of each signal alone (additive), and an alternate model where the output of the two signals is the product of the output driven by each individual signal (multiplicative).

      The authors performed a genome-wide analysis of thousands of genes and found that most genes responding to either TGF-β or retinoic acid behave in either an additive or multiplicative fashion. The authors further asked whether these additive/multiplicative behaviors can be explained by the accessibility of DNA regulatory regions reported by ATAC-seq. The result reveals that DNA accessibility is mostly additive. However, they also find that multiplicative gene expression is correlated with super-additive accessibility.

      This work provides a platform to quantitatively assess combinatorial transcriptional regulation both at the level of DNA accessibility and transcriptional output. Indeed, one of the exciting aspects of the work is the attempt to use the quantitative values of DNA accessibility reported by ATAC-seq to constrain possible biophysical models of transcriptional regulation. We foresee that this work will set the stage for a better understanding of the molecular relation between transcription factor binding and the gene activity resulting from this binding, in general, and for dissecting the molecular mechanisms of combinatorial regulation, in particular.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      In this work the authors used a genomic approach to investigate the way cells interpret two combined signals versus two individual signals. The authors used RNA-seq to examine the gene expression outputs from thousands of genes in response to two signal inputs, TGF-b and retinoic acid, either individually or in combination. The authors found that when stimulated with both signals, most cells exhibited additive or multiplicative responses. The authors further used paired chromatin accessibility by ATAC-seq to relate such responses to putative transcription factory binding patterns in these genes. Surprisingly, ATAC-seq revealed that most genes prefer addition to combine two signals as chromatin accessibility is largely additive, although some super-additive accessibility may respond to multiplicative gene expression.

      This work provides a platform to quantitatively assess combinatorial transcription regulation both at the level of DNA accessibility and transcriptional output. Although the concept of additive v.s. multiplicative transcriptional response is phenomenological, it may be used to clarify and constrain certain biophysical models of transcriptional regulation and set the stage for a better understanding of the molecular relation between combinatorial transcription factor binding and corresponding gene activity.

      While the work is written in a clear and concise language, there are places that require further clarification and better presentations.

    1. Reviewer #2:

      The authors investigated the joint influences of visual evidence strength and action (un)certainty on the formation of perceptual decisions, and used MEEG to track the associated cascade of visual-motor processing using a relatively complex set of analyses. This manuscript addresses a general question that has already attracted (but also continues to attract) considerable interest. One of the main advances of this specific work (in addition to the advanced MEEG analyses) is the explicit manipulation of action certainty in addition to evidence strength. My enthusiasm for this work, however, remains somewhat limited in light of the following aspects.

      1) The article is set-up from a perspective of adjudicating between strictly "serial models" of perceptual decisions in which decisions are reached about what is viewed before turning to the appropriate action, versus more "continuous models" in which potential action plans are formed while evidence accumulation is still taking place. Is there not already ample evidence for the latter scenario (e.g., the work of Tobias Donner, Floris de Lange, Ian Could, and others)? Moreover, the authors currently provide only a single reference for the serial model, which dates back to 1966. Thus, the temporal overlap between visual evidence accumulation and action planning is, in itself, not very surprising, nor new; and yet it appears a central component of the article's pitch.

      2) While the manipulation of action (un)certainly provides an interesting extension of the popular random-dot-motion task, the nature and rationale of this manipulation remain insufficiently unclear. Do participants view multiple patches of equal coherent motion and arbitrarily decide which to respond to? If so, does this not confound action uncertainty with evidence (i.e., more patches with motion may give more evidence)? And should this not make participants faster, rather than slower? Are they slower simply because they are asked to make a "fresh" response? At a minimum the authors should more clearly explain this manipulation, starting in the Results section. In this, the authors should clarify exactly how visual signals and action certainly are independent in their design, or (as I suspect) acknowledge that the current manipulation confounds action certainty with the availability, collective strength, and/or spatial region of the visual evidence (which may each in turn affect neural signals throughout the brain).

      3) It would help to first show the (basic) effects of sensory and action certainty on time-frequency activity in several brain areas (at least visual and motor), for example by showing power modulations for each of the certainty levels, together with a contrast plot of high vs low certainty. This would help understand the data, before turning to the more complex analyses. Such a plot may reveal, for example, decreased alpha activity in posterior sites with higher action uncertainty, simply as a result of more visual stimulation. If so, this may be problematic for the more complex analyses of transfer entropy. It could also help justify the current focus on beta and gamma (but not, for example, alpha) and to help understand the distinction between modulations in beta and gamma.

      4) I am surprised the authors find a gamma decrease rather than an increase. Does gamma not usually increase with motor preparation (e.g., Donner et al. Current Biology 2009) and visual attention (e.g., Fries et al., Science, 2001; Siegel et al., Neuron, 2008)?

      5) Given that both certainty manipulations affected RT, are all neural correlates of these certainty manipulations not "confounded" with differences in RT?

      6) Do the two uncertainty factors (sensory and action certainty) interact? This information appears missing from the analysis of the behavioural data. Also, if these two factors interact, it would be sensible to also explore this in the modelling and MEEG analyses.

    2. Reviewer #1:

      This study uses combined EEG/MEG to characterise the neural dynamics of the visuomotor decision process by separately manipulating its perceptual- and action-related components. Subjects monitored 4 simultaneous random dot stimuli to detect changes from incoherent to coherent motion, and indicated detection with a finger press. Perceptual and action uncertainty were manipulated by varying the motion coherence of the stimuli, and number of motor response options (1 vs. 3), respectively.

      Authors identify activity in the beta and gamma bands correlating with decision-related trajectories predicted by an accumulation-to-bound model. They reveal distributed networks in both frequency bands that show a negative relationship with the predicted patterns (i.e., desynchronization after onset of coherent motion). Several interesting findings stand out: 1) beta activity follows a gradual progression from posterior to anterior regions, a finding further supported by a connectivity analysis assessing the direction of information flow. 2) The accumulating signals across the identified regions overlap in time, which is taken as evidence for a continuous flow of information along the visual-to-motor pathway. 3) regions where (beta) activity flow is modulated by perceptual (as opposed to action) uncertainty show earlier responses to perceptual evidence, and are more likely to drive the information flow to downstream areas.

      This is overall a well-written, clearly structured paper on an ever-relevant topic. Authors use elegant, rigorous statistical methodology, and their characterisation of beta activity provides some important insight into the global neural dynamics of decision making, in particular the temporal properties of decision-related signals across the perception-to-action processing pipeline. I do however have a couple of points of concern regarding parts of the results (in particular those involving gamma activity) and their interpretation:

      1) Gamma band activity is seen to exhibit a negative relationship with the predicted accumulating signal, with a gradual desynchronisation upon the onset of perceptual evidence (coherent motion). I found this surprising, as several previous studies looking at decision-related activity have shown increases in gamma activity with perceptual evidence (Polania et al. 2014 Neuron, Donner et al 2009 Curr. Biol., Wilming et al. 2020 biorxiv). Is it possible that with the broad gamma range investigated here (31-90Hz) and the spectral smoothing involved, the negative relationship might be at least partly driven by activity in the lower ranges, i.e., qualitatively closer to task/motor-related beta desynchronisation? It would be interesting to see if the significant negative correlation is maintained with a slightly narrower gamma range (e.g., >35Hz or >40Hz). Either way, I think it's important for these results to be discussed in relation to the literature mentioned above.

      2) Regarding the interpretation of the beta-gamma relationship, authors seem to place the results in the context of feedforward/feedback information dynamics (or at least they make several references to the literature throughout the manuscript). I am not sure if I understand or agree with this interpretation - if anything, doesn't the temporal progression of decision-related information for gamma and beta observed here (e.g., Fig. 5b) go against the current understanding of their roles in feedforward and feedback information flow, respectively? Some clarification on this point would be very useful.

      3) While the timing of beta/gamma decision-related accumulation is summarised in Figs. 4/5, I think it would be informative to also include (either in the main figures or as supplement) the actual trial-averaged traces, highlighting the overall timing differences between activity in the two bands (from Fig. 4), as well as the progression across the anterior-posterior axis (shown in Fig. 5).

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      While we found the topic very relevant - especially the role of large-scale beta dynamics in visuomotor processing - and the approach used interesting, our overall enthusiasm was limited by concerns regarding novelty, design and interpretation. Critically, it remains unclear whether we are dealing with narrowband oscillations here, especially regarding the reported gamma band results, but also in terms of separating different oscillatory contributions in the alpha/beta frequency ranges. Since everything that follows hinges on this assertion, one would have to first establish a separation of these different spectral contributions in order to attribute particular dynamics to particular bands.

    1. Reviewer #3:

      This report examines the mechanisms by which the KSHV KaposinB (KapB) protein causes disassembly of processing bodies (PBs) in HUVECs. Convincing data is presented showing that mDia1 and ROCK, factors downstream of RhoA, are necessary for PB disassembly in HUVEC cells. Data suggesting cofilin enhances KapB PB disassembly is less convincing. Over-expression of actinin-1 or directly activating actomyosin contraction favored PB disassembly, implicating mechano-responsive signaling components. Analysis of YAP, a mechano-responsive transcription factor showed that levels were elevated in cells expressing KapB and its knockdown rescued PB formation in KapB expressing cells. Expression of constitutively active YAP promoted PB disassembly, similar to KapB, although it did not reproduce the stabilization of ARE-containing mRNAs seen in KapB-expressing cells. Interestingly, subjecting cells to shear stress or increasing the stiffness of the matrix on which they grow, both thought to activate YAP, recapitulated the PB disassembly phenotype seen in cells expressing KapB and knockdown of YAP abolished this.

      These are interesting and exciting results that further illuminate the mechanisms by which a viral protein perturbs PB function. Perhaps even more exciting is the finding that mechano-sensitive signaling pathways can influence PB formation (and perhaps) function. The data are of high quality and support the major conclusions of the study. However, a couple items could be addressed that raised questions with me. First, there is some question as to whether or not the impact shown is a general effect on PBs as a whole or just on the HEDLS marker that is used exclusively in the study. Showing that another PB marker (or two) behaves similarly would support this conclusion. Perhaps doing this for a few key conditions- such as the shear-stress and expression of constitutively active YAP would be possible. The authors conclude, based on a TEAD-Luc reporter assay, that YAP transcriptional activity is not induced even though it appears to be up significantly compared to controls (Fig S5A, left panel). Could they elaborate on how they arrived at this conclusion? The argument that levels of phospho-YAP are not increased in KapB-expressing cells is not supported by the data. While the ratio may not be different, the total amount of phospho-YAP is clearly elevated, as are total YAP levels. Throughout the manuscript, can the authors comment on the impact of knockdowns on cell viability, morphology, if any?

    2. Reviewer #2:

      The authors demonstrate that disappearance of P-bodies from cells expressing a KSHV protein, KapB, requires factors regulating actin contractility, mechanosensation and YAP - but does not require the transcriptional regulatory activity of YAP. The function of P-bodies has long been contentious, and the endogenous mechanisms regulating P-body assembly vs. disassembly are still being elucidated. Many studies of P-body dynamics have relied on treatment with sodium arsenite, global translational inhibition, etc. This study therefore has the potential to add significantly to our understanding of P-body disassembly mechanisms and improve our understanding of the role of these ribonucleoprotein granules in cells. Several points of data presentation and interpretation may benefit from clarification.

      1) The introduction and discussion present P-bodies as sites of decay of ARE-containing mRNAs, a long-accepted model of P-body function. However, building on well-established observations from the Izaurralde lab that RNA decay is uncoupled from P-body formation, recent work by Parker, Singer, and Chao utilizing single-molecule imaging of 5' end decay provided clear support for cytosolic localization of RNA decay events, with no decay occurring inside P-bodies, strongly supporting a storage/translational repression role for P-bodies rather than a role in decay. The authors then attempt to provide a complex explanation of the observation that constitutively active YAP decouples P-body disassembly from ARE mRNA stability, rather than considering this result in the context of alternative P-body models.

      2) It is unclear why, in Fig. 1B (middle panel), there is a large, statistically significant increase in P-bodies per cell in vector-expressing cells - which do not express KapB - treated with shDia1-1 over shNT - but not with shDia1-2. Is this due to the more efficient silencing of mDia1 expression by shDia1-1, and does mDia1 have a KapB-independent effect on P-bodies? Or does this suggest off-target shRNA effects?

      3) It appears throughout the manuscript that there is always far more dispersion in P-body numbers in experimental (either shRNA or inhibitor-treated) cells than in control cells, though this may be an artefact of the fold-change calculation in which the authors normalize control cells to 1.0 and present no estimate of variance. Especially for experiments in which p values are close to the cutoff for significance, meaningful analysis of variance in all measurements is important and presentation of the raw data pre-normalization may be helpful.

      4) In Figure 4A, are the KapB expressing cells larger than the vector-expressing cells, or is a higher magnification used? The nuclei appear nearly double in diameter. In the immunofluorescence experiment, no other control marker is imaged to support the assertion that YAP signal is selectively increased by KapB expression. No image quantitation is performed to support the assertion that "nuclear:cytoplasmic YAP was not markedly increased". Quantitation across multiple fields of view (and discussion of how many cells were utilized in the image analysis) rather than presentation of a single image would address these concerns. The authors' observation that the fraction of phosphorylated YAP, as measured by Western blotting in Fig. 4B, decreases in KapB expressing cells appears incongruent with the stated lack of change in cytoplasmic:nuclear YAP in KapB vs. vector expressing cells (Fig. 4A).

      5) While I appreciate that the authors have utilized the luciferase assay in multiple studies, direct measurement of the luciferase reporter mRNA stabilities should be performed to differentiate between changes in stability of the ARE mRNA vs. selective translational repression of the ARE mRNA in this specific experimental context.

      6) "Comparison of the transcriptomic data from HUVECs subjected to shear stress from Vozzi et al (2018) (Accession: GEO, GSE45225) to entries in the ARE-mRNA database (Bakheet, Hitti, and Khabar 2017) showed a 20% enrichment in the proportion of genes that contained AREs in those transcripts that were upregulated by shear stress." This comparison (1) lacks any measure that this enrichment is significant, and (2) relies on a single steady-state microarray measurement, and therefore does not accurately report on RNA decay rates/permit conclusions about RNA dynamics.

      7) It is impossible for the reviewer to assess "unpublished data" on autophagy cited in the discussion.

    3. Reviewer #1:

      In this manuscript the authors show that the oncogenic transcription factor YAP is an important factor in the signaling pathway from the Kaposi Sarcoma virus protein KapB via the host cell GTPase RhoA down to the disassembly of processing bodies (PBs). This is in principle an interesting finding. However, the connection between KapB and PB-disruption, between YAP and the Rho pathway, Kaposi KapB and the Rho pathway, as well as the connection between Kaposi virus infection and YAP (and Rho) have been described before. Therefore, this connection alone does not come as a surprise. New mechanistic insight into how exactly YAP contributes in PB disruption is unfortunately missing.

      1) A bit contradictory is that the last author in 2015 was first authoring a paper in which they did not receive a significant PB-rescue with ROCK inhibitor, leading them to the conclusion that contractility and PB-disruption are independent events downstream of RhoA activity. In the current manuscript they now revise this and convince the reader that PB disruption involves contractility (which is also more in line with earlier work (Takahashi et al., 2011)).

      2) The fact that contractility leads to YAP activation is known, but the authors now convincingly show that this does not happen in parallel, but that PB disruption depends on YAP activation. Therefore, the most interesting aspect is that RNAi-mediated removal of YAP leads to suppression of P-body disruption. This finding places YAP as an essential intermediate between contractility and PB-disruption. This reviewer really likes this finding but requests that the authors follow this path a little further and add to the mechanism.

      i) Is it based on a protein-DNA interaction of YAP, i.e. does YAP need to act as transcription factor to induce PB dissolution? And what transcripts would then be induced and be required for PB disruption or dispersal? Could it be something like DICER RISC (Chaulk et al., 2014)? The authors delineate that this first option is less likely to them but no experimental proof is provided.

      ii) The effect of YAP on PBs might be based on a protein-RNA interaction or

      iii) It might depend on a protein-protein interaction between YAP and an unidentified partner?

      iv) Finally, one could ask if PB dispersal is connected to an induction of autophagy?

    1. Author Response

      Summary:

      A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.

      We would like to thank eLife for this Preprint Review service.

      In this manuscript, we present for the first time a model of DNA rereplication, which permits us to analyse how the process evolves at the single-cell level, across a complete genome, over time. This analysis revealed a pronounced heterogeneity at the single cell level, resulting in increased copies of different genomic loci in different cells, and highlighted rereplication as a powerful mechanism for genome plasticity within an evolving population. We would like to thank the reviewers for their critical appraisal of our work and the editor for his summary of the reviews. The points raised were overall easy to address, and we have done so in a revised version of the manuscript, where we have also clarified points which were unclear to the reviewers. Importantly, we have clarified that: there are currently no available methods for studying rereplication dynamics experimentally at the single cell level across the genome, and it is exactly this analysis that our manuscript offers; model assumptions were either standard and previously validated experimentally for DNA replication or subjected to sensitivity analysis with key findings shown to be robust to model assumptions; there was no arbitrary cut-off point in the rereplication process, which was analysed over time - an advantage of our approach. Data were depicted early in the process (2C) and late in the process (16C) but findings were robust across the process; fission yeast cells can be experimentally induced to rereplicate to different extents (from 2C to 16C or even 32C) and our model permits us to capture the process as it evolves at any ploidy; correlations between experimental and simulated data were highly significant and robust to model assumptions.

      We would like to thank the reviewers for their comments, which we believe have helped us improve our manuscript and clarify points of possible misunderstanding. A point-by-point response follows.

      Reviewer #1:

      The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

      The manuscript has been modified to further clarify the underlying questions and model assumptions. We would like to point out that the model was presented in detail in the supplementary material of the original manuscript, which included all model assumptions. In addition, model parameters used for the base-case model were systematically varied, the outcome was presented in a separate paragraph (“Sensitivity Analysis” in Results), and findings were shown to be robust to model assumptions. These points are presented in detail below.

      1) It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

      With this work our goal is to elucidate the fundamental mechanisms and properties underlying DNA re-replication. Specifically, we aim to investigate how re-replication evolves over time along the genome, and how it may lead to different number of copies of different loci at the single-cell level and result in genetic heterogeneity within a population. Given the large number of origins along the genome and the stochasticity of origin firing (Demczuk et al., 2012; Kaykov and Nurse, 2015; Patel et al., 2006), it is unclear how re-replication would evolve along the genome in each individual cell in a re-replicating population and how local properties and genome-wide effects would shape its progression and the resulting increases in the number of copies of specific loci. As no experimental method exists that can analyze DNA re-replication at the single-cell level over time along the genome, we designed a mathematical model that is able to track the firing and refiring of origins and the evolution of the resulting forks along a complete genome over time, and in this way capture the complex stochastic hybrid dynamics of DNA re-replication. Since existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007), we believe that our in silico model, which is the first modeling framework of DNA re-replication, is an important contribution in the field.

      In the revised version of our manuscript, we have modified the introduction to explain these points in more detail.

      2) One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

      Modeling choices and assumptions are presented in detail in the Supplementary material of the manuscript, and were made to accurately capture the dynamics of origin firing, which is known to be stochastic, as established by many studies in fission yeast (Bechhoefer and Rhind, 2012; Patel et al., 2006; Rhind et al., 2010) and the continuous movement of forks along the DNA. Specifically, the choice of the exponential distribution used for assigning a firing time to each origin has already been discussed and validated in our previous work on normal DNA replication (Lygeros et al., 2008). Indeed, as shown in Figure 2 of (Lygeros et al., 2008), our model was able to accurately reconstruct experimental data derived by single molecule DNA combing experiments (Patel et al., 2006).

      The use of the exponential distribution for transition firing times is standard in stochastic processes in general, including what are known as Piecewise Deterministic Markov Processes (PDMP), the class where the models considered in the paper belong. There are good mathematical reasons for this, for example the "memoryless" property that makes the resulting stochastic process Markov, a basic requirement for the model to be well-posed [M. H. A. Davis, "Markov models and optimization", Monographs on Statistics and Applied Probability, vol. 49, Chapman & Hall, London, 1993]. Practically, assuming an exponential distribution can be quite general, because the rate (the probability with which a transition "fires" per unit time) is allowed to depend on the state of the system, both the discrete state (in our case, the state of individual origins) and the continuous state (in our case, the progress of individual replication forks). It can be shown that one can exploit this dependence to write seemingly more general processes (that at first sight do not have exponential firing times) as PDMP (with exponential firing times) by appropriately defining a state for the system [M. H. A. Davis, "Piecewise-Deterministic Markov Processes: A General Class of Non-Diffusion Stochastic Models", Journal of the Royal Statistical Society. Series B (Methodological), Vol. 46, No. 3 (1984), pp. 353-388]. In the manuscript this feature is exploited in what we call the LF model, where the rate of the exponential firing time of each origin (probability of firing per unit time) depends on the state of the system (specifically, the number of PreR origins), as discussed in the section on Sensitivity Analysis. We have further clarified these in the revised manuscript.

      3) The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

      Under the experimental conditions employed (ectopic expression of a mutant version of the licensing factor Cdc18, stably integrated in the genome under a regulatable promoter), the vast majority of cells undergo rereplication but to relatively low levels, resulting in cells with a DNA content of 2C-8C. Though the DNA content of several cells indeed appears similar to the DNA content of normal G2 phase cells, the vast majority (>90%) of cells undergo rereplication, as manifested by the appearance of DNA damage and, eventually, loss of viability. We have chosen this experimental set-up (medium levels of rereplication) as it allows induction of rereplication in practically all cells in the population, without the abnormal nuclear and cellular morphology which accompanies a pronounced increase in DNA content (ie 16C), and would make single-cell imaging more prone to artifacts. Fission yeast cells can be induced to undergo rereplication to various extents, by regulated expression of different versions of Cdc18 to different levels and/or co-expression of Cdt1. We have now explained this more extensively in the revised manuscript and thank the reviewer for identifying a point which may not have been clear in the first version of the manuscript.

      Concerning the possibility of studying two loci at the same time, we have indeed tried to tag a second region with TetR/TetO, however the signal-to-noise ratio and thus reproducible detection of the TetR focus was suboptimal under rereplication conditions. We therefore did not proceed further with this approach.

      4) What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

      The definition of signal ratios is given in Results: DNA re-replication at the population level: “Specifically, we computed in silico mean amplification profiles across the genome, referred to as signal ratios in (Kiang et al., 2010), by averaging the number of copies for each origin location and normalizing it to the genome mean in 100 simulations. In these profiles, peaks above 1 correspond to highly re-replicated regions, and valleys below 1 correspond to regions that are under-replicated with respect to the mean.”

      Indeed, as observed by the reviewer, simulated peaks appear overall sharper and higher than experimental peaks. This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 probes and 2 independent experiments. We have clarified this in the Results.

      Last, we chose to compare in silico and experimental profiles at a similar ploidy. Plotting in silico profiles of an earlier timepoint would indeed lead to visually more similar patterns in terms of peak intensity, but we believe this could be misleading for the readers.

      5) From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

      We have now clarified this point. Experimental observations show that under high levels of rereplication, DNA content reaches 16C four to six hours following accumulation of Cdc18 (Nishitani et al., 2000). Estimates for 0.5 kb/min and the LF model are therefore closer to experimental observations.

      6) I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

      In the manuscript, we initially present what we observe in the Results section and then proceed to provide possible explanations in Discussion. We quote from the Discussion: “Such in trans negative regulation of distant origins could be explained by competition for the same limiting factor: high-level amplification of a given locus recruits high levels of the limiting factor, indirectly inhibiting firing of other genomic regions.” and “[…] in cis elements contribute to amplified copy numbers not only directly by passive re-replication, but also implicitly through increasing the firing activity of their neighbors”. To our understanding, these sentences are in complete agreement with the reviewer’s suggestions. Nonetheless, and to make this even more clear, we have modified the Discussion in our revised manuscript.

      7) Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

      We have clarified this point in the revised manuscript.

      8) One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

      As the reviewer points out, one of the important input parameters of the model are origin efficiencies. Since the model is stochastic however, origin efficiencies do not directly determine the amplification levels at a single-cell level. For example, in Figures 3A and Supplementary Figure S4, we show the outcome of 4 random simulations with identical underlying parameters, where it is clear that re-replication can lead to markedly different single-cell amplification levels. Indeed, genome-wide analysis across 100 simulations (Supplementary Figure S5) indicated that on the onset of re-replication, amplification levels are highly unpredictable (again, despite the fact that the input parameters are identical).

      On the contrary, when analyzing amplification profiles at a population level (averaging across sets of 100 simulations), the most highly amplified regions appear to be highly reproducible. We agree with the reviewer that these population level profiles are strongly affected by the origin efficiencies, but they are not determined solely by them. For example, low efficiency origins can be highly amplified, or highly efficient origins can be suppressed (see discussion on in cis and in trans effects) depending on their neighborhood and system-wide effects, and the extend of these effects depends on the fork speed. Sensitivity analysis with respect to different model assumptions, or model parameters (see Results, section Sensitivity Analysis and Supplementary Figure S3) indicated that amplification profiles might appear sharper or flatter, but overall amplification hotspots were highly robust.

      To summarize, in our conclusions (Discussion, section Emerging properties of re-replication) we highlight these properties (stochasticity vs. robustness) and elaborate further on how they emerge during the course of re-replication (onset vs. high re-replication) or depending on the level of analysis (single-cell vs. population level).

      9) It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

      We thank the reviewer for the useful suggestion, which we have incorporated in the revised manuscript.

      10) The methods section should provide more detail.

      We would like to point out that Supplementary Material, including a full mathematical description of the model is available on BioRxiv, which was also available at the time of the preprint review, (https://www.biorxiv.org/content/10.1101/2020.03.30.016576v1.supplementary-material ), and has also been uploaded as a separate document in our GitHub page: https://github.com/rapsoman/DNA_Rereplication

      Reviewer #2:

      Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

      1) It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

      Fission yeast cells undergo robust rereplication, and reach a ploidy up to 32C - see for example (Kiang et al., 2010; Mickle et al., 2007; Nishitani et al., 2000). 16C is therefore a usual ploidy for rereplicating fission yeast cells, observed under many experimental conditions. In addition, by manipulating the licensing factors over-expressed, different levels of ploidy can be experimentally achieved, ranging from 2C (the normal ploidy of a G2 cell, but with uneven replication) to 32C. In Figure 4, we have employed a truncated form of Cdc18 (d55P6-cdc18 (Baum et al., 1998)), which induces medium-level re-replication, as confirmed by FACS analysis in Supplementary Figure S6A. Under these conditions, the vast majority of the cells (>90%) undergo re-replication, albeit at medium to low levels. We have opted to use this strain to avoid artifacts due to disrupted nuclear morphology under high levels of re-replication We have now clarified this point in the revised manuscript. We would like to point out that in silico analysis is not carried out at 16C only but across different ploidies – it is actually a strength of our approach that we can follow the rereplication process as it evolves, at any ploidy, and we have shown that our conclusions are robust throughout. We show plots at the beginning of the process (2C) and towards the end (16C), at the single-cell and at the population level, to facilitate comparison.

      Last, as also discussed in our response to reviewer 1, simulated data appear sharper, with higher peak values than experimental data (Figure 2). This is expected, since simulated data show the actual number of copies generated, while experimental data are subject to background noise and represent averages of 3 neighboring microarray probes and 2 independent experiments. We have clarified this in the revised manuscript.

      2) This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

      We agree with the reviewer that comparing our predictions with known gene duplication events in S.pombe would be of interest. Unfortunately to our knowledge no such dataset for fission yeast exists in the literature. The most comprehensive datasets are the ones from (Kiang et al., 2010; Mickle et al., 2007), which analyse rereplicating cells, and which we have already exploited in our paper. We would like to point out that this manuscript aims to show how rereplication evolves genome-wide. Whether the additional copies generated can lead to gene duplication events is beyond the scope of the present manuscript.

      3) The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

      One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

      This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

      It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

      We thank the reviewer for the positive comments. Indeed, as we elaborate in our Discussion, we believe that the mechanism behind the observed in trans effects is the competition for a factor that exists in a rate-limiting quantity (see also reply to point 6, reviewer 1 above), which is essentially the constant in his/her equation. Though less pronounced, such in-trans effects are also possible in the UF model, and could be due to the total DNA increase being dominated by certain origins, as suggested by the reviewer. We do not suggest anywhere in the manuscript that this inhibition is direct, but rather clearly state that it is an indirect effect.

      Reviewer #3:

      This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

      The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

      We would like to point out that the overlap between experimental and simulated data is highly significant. Firstly, the Spearman correlation coefficient between simulated and experimental genome-wide profiles is highly statistically significant (p values ranging from 7.310-12 to 3.610-41 for the three fission yeast chromosomes). Furthermore, 100.000 repetitions of random peak assignment resulted in only one case where 10 out of 22 peaks overlapped (median 2 out of 22 peaks overlapping), while comparing simulated and experimental data resulted in 14 out of 22 peaks overlapping. Simulations appear more sharp than experimental data, this is however expected as simulated data correspond to the actual number of copies generated, while experimental data are subject to background noise, have a signal-to-noise ratio that is limited by the experimental method employed and represent averages of 3 probes and 2 independent experiments (see Kiang et al., 2010 and also above). We have modified the manuscript to clarify this point. The reviewer suggests that the model is of limited use, because one could trivially generate new experimental data. We would like to point out that existing methods to analyze DNA re-replication in vivo only provide static, population-level snapshots (Kiang et al., 2010; Menzel et al., 2020; Mickle et al., 2007). To date no experimental method can generate single-cell, whole-genome, time-course measurements in re-replicating cells. Our model aims to fill this gap, and for this reason we believe in its usefulness.

      Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

      We would like to point out that it is the nature of replication in fission yeast which is stochastic, as experimentally shown (Patel et al., 2006), and defined at the level of single origins, and this is captured by the simulations. Heterogeneity amongst single rereplicating cells has not been previously shown or suggested in any organism, at least to the best of our knowledge. It is in our opinion a highly interesting observation, as it provides a powerful mechanism for generating a plethora of different genotypes within a population, from which phenotypic traits could be selected.

      Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

      Again, the reviewer seems unaware that no experimental method currently exists for analysing the dynamics of re-replication at a single-cell level genome-wide. We also feel obliged to point out that modeling and in silico analysis is in our opinion of great value for analysing complex biological processes, even when experimental methods are available. Though we are sure this is not what the reviewer really meant, his/her comment appears derogative to a complete field.

      Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

      As clearly stated in our manuscript (Results, section Modeling DNA re-replication across a complete genome), many studies have estimated fork speed in yeasts in normal DNA replication, with plausible values ranging from 0.5 kb/min to 3 kb/min (Duzdevich et al., 2015; Heichinger et al., 2006; Raghuraman et al., 2001; Sekedat et al., 2010; Yabuki et al., 2002). In our model, we set the base-case value as the lowest estimate (0.5 kb/min), but also explored the model’s sensitivity to this parameter by simulating the model for higher values (1 and 3 kb/min). This analysis indicated that estimates for 0.5 kb/min were closer to biological reality, a non-surprising finding given that fork speed is expected to be slower in re-replication that in normal replication.

      Overall, the comments of reviewer 3 appear in our eyes more derogative than constructive and provide little specific criticism.

      References

      Baum, B., Nishitani, H., Yanow, S., and Nurse, P. (1998). Cdc18 transcription and proteolysis couple S phase to passage through mitosis. The EMBO Journal 17, 5689–5698.

      Bechhoefer, J., and Rhind, N. (2012). Replication timing and its emergence from stochastic processes. Trends in Genetics 28, 374–381.

      Duzdevich, D., Warner, M.D., Ticau, S., Ivica, N.A., Bell, S.P., and Greene, E.C. (2015). The dynamics of eukaryotic replication initiation: origin specificity, licensing, and firing at the singlemolecule level. Mol. Cell 58, 483–494.

      Heichinger, C., Penkett, C.J., Bähler, J., and Nurse, P. (2006). Genome-wide characterization of fission yeast DNA replication origins. The EMBO Journal 25, 5171–5179.

      Kiang, L., Heichinger, C., Watt, S., B\ähler, J., and Nurse, P. (2010). Specific replication origins promote DNA amplification in fission yeast. Journal of Cell Science 123, 3047–3051.

      Lygeros, J., Koutroumpas, K., Dimopoulos, S., Legouras, I., Kouretas, P., Heichinger, C., Nurse, P., and Lygerou, Z. (2008). Stochastic hybrid modeling of DNA replication across a complete genome. Proceedings of the National Academy of Sciences 105, 12295–12300.

      Menzel, J., Tatman, P., and Black, J.C. (2020). Isolation and analysis of rereplicated DNA by Rerep-Seq. Nucleic Acids Res 48, e58–e58.

      Mickle, K.L., Oliva, A., Huberman, J.A., and Leatherwood, J. (2007). Checkpoint effects and telomere amplification during DNA re-replication in fission yeast. BMC Molecular Biology 8, 119.

      Nishitani, H., Lygerou, Z., Nishimoto, T., and Nurse, P. (2000). The Cdt1 protein is required to license DNA for replication in fission yeast. Nature 404, 625–628.

      Patel, P.K., Arcangioli, B., Baker, S.P., Bensimon, A., and Rhind, N. (2006). DNA Replication Origins Fire Stochastically in Fission Yeast. Mol. Biol. Cell 17, 308–316.

      Raghuraman, M.K., Winzeler, E.A., Collingwood, D., Hunt, S., Wodicka, L., Conway, A., Lockhart, D.J., Davis, R.W., Brewer, B.J., and Fangman, W.L. (2001). Replication Dynamics of the Yeast Genome. Science 294, 115–121.

      Rhind, N., Yang, S.C.-H., and Bechhoefer, J. (2010). Reconciling stochastic origin firing with defined replication timing. Chromosome Res 18, 35–43.

      Sekedat, M.D., Fenyö, D., Rogers, R.S., Tackett, A.J., Aitchison, J.D., and Chait, B.T. (2010). GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome. Molecular Systems Biology 6, 353.

      Yabuki, N., Terashima, H., and Kitada, K. (2002). Mapping of early firing origins on a replication profile of budding yeast. Genes to Cells 7, 781–789.

    2. Reviewer #3:

      This manuscript by Rapsomaniki et al uses mathematical modeling to study the properties of DNA re-replication. They develop a model that shows some consistency with experimental data from S. pombe, and use it to conclude that re-replication is heterogeneous at the single-cell level.

      The simulations have only moderate correlations with experimental data (0.5-0.6). Indeed, simulations and actual data (Figure 2) appear quite different. Despite the statistical significance of the overlap, the limited correspondence brings into question the usefulness of the model compared to directly generating new experimental data.

      Heterogeneity among single cells, which appears to be one of the main messages of this paper, is not necessarily a surprising finding, and may even arise from the nature of the simulation being stochastic and defined at the level of single origins. They validate this prediction experimentally at a single locus, providing little novel insight.

      Overall, the insights here are limited and would need to await experimental validation and further empirical data. Given that experimental measurements of re-replication are now feasible genome-wide, the value of these simulations is limited.

      Fork speed is assumed based on limited data and assumptions regarding re-replication fork speed without empirical data.

    3. Reviewer #2:

      Here, Rapsomaniki et al have modeled the process of DNA re-replication. The in silico analysis is an extension of their previous work describing normal DNA replication (Lygeros et al 2008). The authors show that there is a large amount of heterogeneity at the single cell level but when these heterogeneous signals are averaged across a population, the signal is robust. The authors support this with simulations and with experimental data, both at the single cell level and at the population level.

      1) It is a bit concerning that simulations were carried out to a ploidy level of 16C. Has it been observed that the DNA content in any given cell can rise to 16 times the initial amount? Figure 3 (simulations) shows that certain chromosomal regions can reach 30x and 160x copies for 2C and 16C. However, Figure 4 (experiment) suggests that copy numbers should only be slightly more in re-replicating conditions, compared to normal replicating conditions. Additionally, in Figure 2, the simulated data seems to be consistently noisier than the experimental data. Taken together, this may suggest that the assumptions in the model do not adequately recapitulate the biological system.

      2) This work currently is agnostic to the genes and sequences within the simulated genomes. The authors suggest that DNA re-replication can result in gene duplications. It might strengthen the manuscript if the authors are able to show that re-replication hotspots coincide with gene duplication events in S pombe. It should be relatively straightforward to overlap the hotspots found in this analysis with known gene duplication events in the literature.

      3) The authors have nicely demonstrated that cis activation can be driven by the physical proximity of origins. The authors go on to describe trans suppression in which the activation of one origin suppresses the activation of a different origin. I would argue that this observation is simply the result of randomness in the model and stopping the simulations at fixed points.

      One of the two origins will randomly re-replicate first and simply outpace the other. Stopping the simulations at 16C will simply prevent the lagging origin from catching up the first origin. There does not seem to be an inhibitory mechanism that acts between two origins.

      This can be explained by the following equation: X + Y = constant Where X is the amount of origin 1 and Y is the amount of origin 2.

      It is also possible that the two origins could start re-replicating at the same time. This would result in the data points observed for cluster 2 (Figure 6 BC)

    4. Reviewer #1:

      The authors develop and analyse a mathematical model of DNA rereplication in situations, where re-firing of origins during replication is not suppressed. Using the experimentally measured position and relative strength of origins in yeast, the authors simulate DNA copy number profiles in individual cells. They show that the developed model can mostly recapitulate the experimentally measured DNA copy number profile along the genome, but that the simulated profiles are highly variable. The fact that increasing copy number of an origin will facilitate its preferential amplification essentially constitutes a self-reinforcing feedback loop and might be the mechanism that leads to overamplification of some genomic regions. In addition different regions compete for a limiting factor, and thereby repress each others' over-amplification. While the model generates some interesting hypotheses it is unclear in the current version of the manuscript, to what extent they arise from specific model assumptions. The authors do not clearly formulate the scientific questions asked, they do not discuss the model assumptions and their validity and they do not adequately describe how model results depend on those assumptions. Taken together, the scientific process is insufficiently documented in this manuscript, making it difficult to judge whether the conclusions are actually supported by the data.

      1) It is not clear what questions the authors want to address with their model. Do they want to understand how the experimentally observed copy number differences between regions arise? The introduction should elaborate more on the open questions in the field and explain why they should be addressed with a mathematical model.

      2) One of the main messages of the paper is that the amplification profiles are highly variable across single cells, because that was found in the described simulations. This behavior does however likely depend on specific choices that were made in the simulations, e.g. that the probabilities of the origin state transitions are exponentially distributed. These assumptions should at least be discussed, or better experimentally validated.

      3) The authors aim at testing their prediction that rereplication is highly variable across cells. To this end they use the LacO/LacI system to estimate locus copy number. The locus intensity is indeed highly variable across cells. However, the Dapi quantification suggests that only a subset of cells actually undergo rereplication under the experimental conditions used (Fig. 4C). Therefore the analysis should atleast be limited to those cells. It would be even better, if a second locus could be labelled in another color to show that rereplication of two loci is anti-correlated as predicted by the model.

      4) What does "signal ratio" in Fig. 2 mean? And why are the peaks much higher in the simulations? Would the signal ratio between simulation and experiment correspond better, if an earlier time point in the simulation was selected?

      5) From line 248 onwards, the authors compare different assumptions for polymerase speed and conclude that "0.5 kb/min is closer to experimental observations". It is unclear, however, which experimental observations they refer to and what was observed there. The same question arises when they compare the LF and UF models (line 275-277).

      6) I find the description of cis- and trans-effects rather confusing. The authors should rather explain what happens in the model. Neighboring strong origins can amplify a weak origin and origins compete for factors. In line 475-476 for example, it should be clarified that the assumption of the LF model could lead to trans-effects, instead of presenting this as a general model prediction.

      7) Throughout the manuscript, a clear distinction should be made between the firing activity of one origin molecule and the cumulative activity of multiple copies of an origin. For example, it should be clarified in line 435 that the cumulative activity of weak origins might increase if they are closed to a strong origin, because they get amplified, instead of just writing "increased firing activity of weak origins".

      8) One of the major conclusions of the manuscript is that rereplication is robust on the population level. It is not clear to me what the authors mean by that. The average amplification levels are probably determined by the origin efficiencies that are put into the model. What would robustness mean in this context?

      9) It would be helpful if, in Fig. 2 also the origins and their respective efficiencies could be shown to understand to what extent the signal ratio reflects these efficiencies.

      10) The methods section should provide more detail.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Tim Formosa (University of Utah School of Medicine) served as the Reviewing Editor.

      Summary:

      A strength of the work was that the mathematical modeling of re-replication captured variability in origin firing and supported a mechanism that might explain copy number variation observed in many eukaryotes. However, concern was expressed regarding the influence of assumptions made in developing the model on the outcomes and the moderate correlations between simulations and experimental data. Further explanation of the questions being investigated, the validity and nature of assumptions that were used to develop the simulations, and details explaining how these assumptions were built into the modeling were considered important. Some attempt to align the modeling outcomes with known re-replication hotspots would also improve the study. Some of the parameters used for modeling were concerning, including the use of a 16C ploidy cutoff without adequate justification. Reviewers also made suggestions for improving the experimental validation tests. Reviewers also noted places in the manuscript that require additional clarification. Overall, some concerns were raised regarding the experimental methods, and the impact of the insights gained.

  2. Aug 2020
    1. Reviewer #3:

      This paper by Thaker et al describes the use of lung-on-a-chip microfluidic devices for early interactions during acute M. tuberculosis infection under conditions chosen to mimic the alveolar environment in vivo. The authors use time-lapse microscopy to study host-Mtb interactions in macrophages and alveolar epithelial cells, the role of the Mtb Type VII secretion system and the impact of surfactant on Mtb infection. This study suggests that organ-on-a chip systems might be able to reproduce host-microbe physiology during infection, which is difficult to reproduce ex vivo using single cells, air-liquid interface, organoids or organ explants. This is an exciting approach which has the potential to expand the ability to study host-pathogen interactions, but there are some limitations that dampen my enthusiasm.

      Major concerns:

      While I recognize that it is challenging to use live cell imaging with colocalization markers, much of the data of the paper, such as comparisons between AECs and macrophages, or mutant Mtb strain vs WT, or role of surfactant, rests on the ability to determine the precise localization of bacteria. However, neither AECs nor macrophages are specifically identified with high enough resolution to give confidence that the Mtb are associated with those cells specifically, and more importantly, that the bacteria are growing intracellularly rather than extracellularly. The authors show multiple bacterial microcolonies that grow in size over time, but whether these are inside or outside cells, and whether the cells are AECs or macrophages isn't overtly specified. Many of the images are of such low resolution that only tiny dots of bacteria are observed. To the author's credit, the quantitative and statistical analysis is very rigorous, however, better evidence for the issues raised above would increase confidence in the results. This point is highlighted in detail by by the following:

      Lines 60-63: "Inoculation of the LoC with between 200 and 800 Mtb bacilli led to infection of both macrophages (white boxes in Fig. 1M, P, zooms in Fig. 1O, R) and AECs (yellow boxes in Fig. 1M, P, zooms in Fig. 1N, Q) under both NS (Fig.1M-O) and DS (Fig. 1P-R) conditions." Identification of GFP-expressing macrophages can be assumed based on their expression of GFP (though the cells themselves aren't colocalized) on images but the same cannot be said of AECs. The yellow boxes could represent AECs or spaces on the chip with no cells at all. Furthermore, the 2D images showed in Figure 1 do not necessarily represent infected cells, and the possibility of visualization of Mtb outside the cells should be considered. Thus, higher resolution images, with clear colocalization and z-stacks, would increase the confidence in the results.

      The data arguing for attenuation of Esx-1 mutant Mtb in AECs and macrophages is not strong, and the authors do not actually make a direct statistical comparison between appropriate groups (i.e. AEC NS WT vs Esx-1, or Mac NS WT vs Esx-1). For example, it appears that the mean/median growth rate of WT Mtb in macs is ~0.25hr-1, which appears roughly the same for Esx-1 mutant Mtb in the same cells. There may be a difference under DS conditions, but since the comparisons aren't made directly it is impossible to know.

    2. Reviewer #2:

      The manuscript by Thacker et al, entitled "A lung-on-chip model reveals an essential role for alveolar epithelial cells in controlling bacterial growth during early tuberculosis" is an interesting study describing a new in vitro model to determine the early events of Mycobacterium tuberculosis infection. This model is important and novel; however, this study is descriptive and some of the findings (e.g., attenuated growth of M. tuberculosis after exposure to surfactant in macrophages and alveolar epithelial cells, as well as changes on the M. tuberculosis cell wall after exposure to surfactant, or that exposure to surfactant does not alter the extracellular viability of M. tuberculosis) have been reported by others using other in vitro models. The use of the ESX-1 attenuated mutant is not clear in this study, as well as the concept that exposure to surfactant may change the attenuation of this strain. The composition of mouse surfactant and human surfactant is also quite different, thus extrapolating results need to be done with caution.

      Major concerns:

      1) Results provided in Figures 1, 2 and Fig. 3 supplement 1 are confusing, and readers need to guess what they are looking at, especially in Figure 1 M-R. As this is an important model , it will be appropriate to have detailed and better images showing well-defined cells, and quantify their findings in Tables (e.g. number of alveolar epithelial cells type I and II, number of macrophages, numbers of endothelial cells, bacteria per cell, etc.). In Fig. 3 supplement 1 one needs to guess what is intracellular or extracellular within the studied system.

      2) The definition of Normal surfactant (NS) vs. Deficient surfactant (DS) is confusing as used. Alveolar epithelial cells type II (AT-IIs) become type I (AT-I) over time in in vitro cultures (in 5 to 7 days) and thus, these stop secreting surfactant. Authors found that after 6-11 passages AT-IIs stopped producing surfactant but also lost their cellular characteristics as well as the expected characteristics of AT-Is. This needs to be further studied in detail to ensure that this cell is not an artifact produced by multi-passaging in vitro. Authors need to use several AT-IIs and AT-Is markers to be certain that the DS cell monolayers indeed still are ATs. Surfactant protein C, although used as a marker for AT-IIs, is a soluble protein that has been shown to interact with many cells within a cellular system. A correlation between SPTPC and AQP5 expression over time is also necessary as points out the differentiation of AT-IIs to AT-Is, a key feature of the role of AT-IIs as progenitors of AT-Is.

      3) Authors did not consider that M. tuberculosis can form micro-colonies on the cell surface of alveolar epithelial cells and thus, the intracellular growth that they are reporting could be extracellular growth. Did the authors after infection treat the system with an antibiotic to kill extracellular M. tuberculosis bacilli attached to the alveolar epithelial cell surface? In addition, the concept of M. tuberculosis micro-colonies growing inside cells need to be better explained. Are these bacterial clumps? How the authors discern that the ones that are not growing vs. the ones that are dead?

      4) If I understand the described method well, the staining of Curosurf (poractant alfa) is not as such. Authors used a commercial labeled phosphatidylcholine (PC) added into the Curosurf. This labeled PC may or may not interact with Curosurf components, but what is obvious is that it makes micelles. What it is quantified is the interaction of the labeled PC with M. tuberculosis. Moreover, the artificial addition of this phospholipid (at 10%) is changing the original composition of Curosurf, and this may have physiological implications. Authors need to confirm if the PC added was indeed DPPC. Authors also need to come up with a better way to demonstrate that Curosurf components are opsonizing M. tuberculosis bacilli. In addition, why authors used 1% Curosurf for their experiments. Is there a dose titration effect? Why authors did not use Survanta or Infasurf or mouse surfactant?

      5) The in vivo simulation of infection using grow rates randomly chosen from the kernel density estimations for the respective populations. In this graph, it is very important to discern the bacteria with high growth rates from the bacteria with low growth and intermediate growth rates (at the 99 percentile, 75 percentile, at the 50 percentile, at the 25 percentile and at the 1 percentile) and assess how these are projected to behave in vivo. As presented it is not very informative about the impact of NS ATs vs. DS ATs on M. tuberculosis infectivity in this model system.

      6) Similar alterations on the M. tuberculosis cell wall and release of cell wall components to the milieu when exposed to physiological concentrations of human lung surfactant have been already described. The same is applicable to the slower replication rate in ATs (an intracellular killing in macrophages) after M. tuberculosis exposure to human lung surfactant. Although two different systems, authors need to contrast their findings with these reported ones in their discussion. In addition, it is not clear how many times this was performed. Statistics are mentioned on the figure legends, but there are no stats in the figure.

    3. Reviewer #1:

      1) What quality control is done for each experiment to determine the ratio of type I and type II AECs in each chip set up for each experiment? This is of particular importance because the authors do not show any images where they stain for both type I and type II AECs in the same chip. Do the authors have images stained for both type of cells to illustrate the composition of each chip? After figure 1, what staining is done to confirm the DS cells decrease proSPC expression for each experiment?

      2) The authors focus on the difference in surfactant gene expression in the newly isolated AECs (NS) versus in vitro passaged AECs (DS), but they also observe that aqp5 is downregulated. In fact, the data supports that the cells are just de-differentiating during passage in culture, which will have multiple effects on the cells, not just surfactant production. This should be commented on and discussed. After loss of those markers, how do the authors confirm they still have type I and type II AECs in their cultures? Is there microscopy data with other markers that are retained in the AECs? The add back experiments with Curosurf support that surfactant can contribute to bacterial control, but this imparts only a partial complementation and the evidence for de-differentiation implies other pathways at play.

      3) One of the biggest concerns is that the authors never stain for type I or type II AECs after infection and make the conclusion that the bacteria are within type II cells based on the absence of macrophage staining. However, the bacteria may not even be in a cell, or the AECs could be dying during infection. On a related note, there is no data presented that shows that type I cells are not infected in the lung on chip system with Mtb.

      4) The authors state that their data with the Esx1 mutant "demonstrates that ESX-1 secretion is necessary for rapid intracellular growth in the absence of surfactant, consistent with the hypothesis that surfactant may attenuate Mtb growth by depleting ESX-1 components on the bacterial cell surface". This seems like quite a jump in interpretation of the data since the Esx1 mutant is likely attenuated for many reasons, and this attenuation is dominant to any effect that surfactant is having. The authors also show that PDIM levels are not different in the presence or absence of surfactant, and this is an Esx1 dependent lipid.

      5) What is the purpose for including the icl1/icl2 mutant? This experiment is not included in the data quantification.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      This paper by Thacker et al. describes the use of lung-on-a-chip microfluidic devices to study early interactions during M. tuberculosis infection under conditions meant to mimic the alveolar environment in vivo. The authors use time-lapse microscopy to study host cell-Mtb interactions in macrophages and alveolar epithelial cells and the impact of surfactant on Mtb infection. This study suggests that organ-on-a-chip systems might be able to reproduce elements of host-microbe physiology during infection, which is difficult to reproduce ex vivo using single cells, air-liquid interface, organoids or organ explants.

      This is an exciting approach which has the potential to expand the ability to study host-pathogen interactions. However, the reviewers all agree that the manuscript requires a major revision and additional data. Specifically, the manuscript requires improvement in the cell identification/classification, co-localization of Mtb with epithelial cells and macrophages, and distinction between intracellular and extracellular growth in order for the authors to provide convincing data to support their interpretations and conclusions.

      While the reviewers recognize that it is challenging to use live cell imaging in this system, much of the data of the paper, such as comparisons between infection of AECs and macrophages, rests on the ability to determine the precise localization of bacteria. However, neither AECs nor macrophages are specifically identified with high enough resolution to give confidence that the Mtb are associated with those cells specifically, and more importantly, that the bacteria are growing intracellularly rather than extracellularly. Many of the images are of such low resolution that only tiny dots of bacteria are observed.

      In addition, the findings of attenuated growth of Mtb after exposure to surfactant in macrophages and alveolar epithelial cells, changes in the Mtb cell wall after exposure to surfactant, and the finding that exposure to surfactant does not alter the extracellular viability of M. tuberculosis have been reported by others using other in vitro models and should be discussed in manuscript.

    1. Reviewer #3:

      The study by Taverna et al. uses NGN2-induction in human, chimpanzee, and bonobo pluripotent stem cells to attempt to decouple the process of neuronal maturation from the cell cycle in order to study species-specific differences in neuronal maturation. Using single cell RNA sequencing, analysis of neuronal morphology, and electrophysiological recordings, the study argues that neuronal maturation is delayed in human compared to chimpanzee and bonobo among a heterogeneous class of sensory neurons and that this delay is cell-intrinsic. However, the current data are incompletely analyzed and do not provide strong support for this conclusion.

      Major comments:

      The dramatic differences in cell type composition of the induced neurons across species, revealed by single cell sequencing in Figure 2A, pose significant problems for the interpretation of the rest of the results. Specifically, if the chimpanzee cells are biased to making different sensory neuron cell types than the human cells, then differences in maturation rates between cell types rather than between species could drive the results. The authors must take into account the influence of cell type, individual, and species in order to support their claims of species differences.

      First, the number of individuals (only one chimpanzee individual) used for single-cell analysis is inadequate. There could be individual differences in timing and neuronal composition between lines that are independent of species and are not accounted for. At least 3-5 individuals per species should be used to enable statistical analysis of species differences. Ideally, the same lines should be used for single-cell analysis and morphological/physiological analyses. Staining for the cluster markers discovered from the current single cell analysis could also be applied to the remaining individuals to understand whether induced neurons have a similar composition across all the individuals from the three species.

      If the single chimpanzee individual shown in the single cell data is really representative of the three chimpanzee lines used elsewhere in the manuscript, the dramatic differences in neuronal types across species must be taken into account in subsequent analyses. For example, gene expression in Figure 3 could be analyzed on a cluster by cluster basis rather than grouping all neuronal clusters together. As shown, the differences across species could just be due to cell-type specific differences (for example, cluster 4 appears to be made up of entirely chimpanzee neurons while cluster 5 has more equal species representation). For physiology and morphology experiments, post hoc marker staining could ensure that neurons of the same type are compared across species, or if not registered to individual cells, it could still reveal the similarities and differences in composition between plates.

      Does NGN2 induction make a valid cell type? The authors should compare their expression data to previous work utilizing NGN2 induction (Zhang et al 2013) as well as to data from mouse and human tissue samples. It would be helpful to clarify whether the differences with previous work (i.e. induction of sensory neurons compared to cortical neurons) are due to incomplete characterization previously or to a different outcome here. And most importantly, it would be helpful to more clearly identify the endogenous cell types modeled in this data, perhaps by integration with primary sensory and cortical neurons single cell datasets.

      Do the BRN2 and CUX1-positive cells show co-expression with other cortical markers, like FOXG1 and EMX2, to support the statement that some of these cells may be cortical, or are these genes also expressed in some sensory neurons, or are these simply cells of mixed identify that lack in vivo counterparts?

      Please provide more detail about the NGN2 expression system as utilized across species.

      For each species, was the corresponding NGN2 gene used? If so, are there sequence differences between species that could influence differentiation?

      Is the time course of NGN2 expression the same across species?

      What are the dynamics of NGN2 induction in this system compared to normal differentiation - does persistent NGN2 expression after differentiation ultimately keep neurons in a more immature state?

      Does the NGN2 system entirely de-couple differentiation from cell cycle as the authors claim or do a few cell cycles still occur post-induction, and does this number differ between species? The focus in the introduction on cognition and the role of cortical differences between humans and non-human primates is puzzling in light of the claim that most of the neurons generated in this study are sensory neurons. If the authors' conclusions are valid, then it seems that this finding should be framed differently. Are there known species differences in sensory neurons? Do these results suggest that delayed maturation is a more general phenomenon and not restricted to brain regions involved in cognition?

      The following sentence in the discussion attempts to address this point: "Of note, sensory neurons are interesting from an evolutionary point of view, as the development and evolution of working memory in humans is linked to a higher integration of sensory functions in the human prefrontal cortex." However, this statement and the references cited instead support the view that species differences might be found in the prefrontal cortex rather than in sensory neurons.

    2. Reviewer #2:

      This is a well written MS looking at comparing the rate/tempo of maturation of Chimpanzee, Bonobo and human neurons. The work is well done and easy to follow. The core findings are that human neurons, developed in vitro via a well-established directed differentiation protocol mature slower than the NHP neurons.

      Several groups have previously used both in vivo and in vitro models (similar to the one used here) to define cross-species maturation features. These earlier studies have shown that indeed human cells develop more slowly than other species (like mice or Chimpanzees). The authors recognize this work in their introduction. While the finding of slower human neuron maturation is not completely novel, the current work furthers these earlier studies by adding additional characterization of electrophysiological and molecular properties of the neurons made. It also highlights an underappreciated presence of sensory neurons in these cultures.

      Things to consider:

      1) Definitive characterization of the neurons produced by Ngn2 overexpression. Prior work defined the neurons mostly as pyramidal, of cortical origin. Here, the authors claim both mix identity (very probable) and the presence of large numbers of sensory neurons. One is left wondering whether this is a slightly different differentiation protocol, whether the interpretation of the data is different, or whether variability is high. If the authors classify the single cell RNA data from prior studies with this same protocol, would they still conclude that these are sensory neurons? If the authors could prove that the protocol produces bona fide sensory neurons, that would be an advance for the field. That may require direct comparison to endogenous sensory neurons (beyond a small number of markers) and classification based on electrophysiological properties (which the authors do have). Are these sensory neurons based on physiology?

      2) Could one use the system to point at mechanisms that may mediate the observed differences in maturation rates? This would move the field forward in a powerful way.

    3. Reviewer #1:

      The results are somewhat underdeveloped and there are several aspects of the study that can be improved by deeper analyses:

      1) The rigor of the experiments and statistical analysis is not clear. Although the use of several lines of iPSCs from each species is a strength, there are no details of how many batches of differentiation/induction were done or how many replicates were used for analysis. This is especially important for structural and functional analysis that can vary between lines and batches.

      2) The identity of the induced neurons as sensory neurons is interesting but is based solely on gene expression (scRNAseq). It would be more compelling if the authors would show other characteristics that identified this population of neurons. It is possible that some neurons express these sensory neuron genes, but do not express the proteins and/or do not differentiate into functional sensory neurons.

      3) The proportions of cells in each cluster of the scRNAseq would be informative to 1) identify changes as the neurons mature and compare between species, and 2) identify differences between species, as the authors state (page 9) that same populations were found in different proportions.

      4) Given the valuable time course scRNA seq data, the analysis of neuron maturation over time is somewhat limited. More sophisticated analysis of gene expression changes/coexpression would strengthen the overall impact of the data.

      5) Similarly, the discussion is superficial and focused on consequences but not causes of differences in neuron maturation time. The discussion does not build on the rich and extensive transcriptomic data to provide any mechanistic hypotheses of the causes of the differences.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      This manuscript is in revision at eLife.

      Summary:

      The manuscript by Schörnig presents an elegant comparison of structural and functional maturation of cortical neurons from different primate species that is of broad interest to researchers interested in evolutionary neuroscience and those who are interested in the unique qualities of the human cortex. The authors use an induced neuron approach to generate cortical-like neurons from iPSCs from different species and compare the structure, function and gene expression of the different neurons over time in culture. This strategy bypasses development and provides much more heterogeneous cultures for analysis. While the results are largely descriptive, they provide very interesting resource data providing insight into both primate neural development and human-specific attributes.

    1. Reviewer #3:

      This very interesting manuscript further describes the receptive field structure of ON-OFF retinal direction selective ganglion cells. The authors demonstrate that spot light stimuli flashed at positions that do not correspond with dendritic processes of the recorded DSGC evoke strong excitatory responses that are most powerful on the preferred side of the (moving bar determined) receptive field. The authors go on to show that small light stimuli flashed in the dendritically sampled area of visual space are also non-uniform, and maximal on the preferred side. The authors data are in line with previous reports of a nondirectional zone at the periphery of the dendritic tree of DSGCs. The experimental approaches taken by the authors seem sound. I was concerned by the obviously different kinetics of the flash response recorded under control and GABAA/nAChR antagonists in Figure 1 D, is this a consistent finding, what are the authors thoughts on the unusual shape of the current in Figure 1 D (lower, red trace)? As indicated in the discussion the authors have not investigated the mechanisms underlying this asymmetry, other than dismissing structural determinants (dendritic tree asymmetry, investigation of existing EM volume). This to my mind is a vital component missing from the manuscript. The authors however do go on to describe using elegant light stimulus patterns and modelling some of the potential emergent properties of this behaviour. In this reviewer's mind, I am left puzzled and wanting to understand the cellular basis of the behaviour the authors have identified.

    2. Reviewer #2:

      In this research, Ding and colleagues present evidence that the excitatory input to OO DS RGCs from bipolar cells is strongly asymmetric, with strong inputs occurring on the side opposite from the SAC inhibition. They performed careful studies to show that this was not due to spatial asymmetry in the DSGC morphology nor to ribbon synapse density. Using 'interrupted motion' stimuli, which are effectively local directional stimuli, they show that this asymmetry leads to a non-directional response on one side of the cell's RF. Last, they create a model to show that such firing patterns could be used to improve localization of edge position under the specific conditions of an edge emerging from behind an occlusion.

      The work showing the asymmetry appeared careful, thorough, and well-done. The second half of the paper dealing with the functional consequences of this asymmetry left me with a few questions:

      1) Throughout the paper, several experiments showed no changes when a mix of receptor antagonists was added to exclude SAC inhibition as the origin of these effects. But I did not find a positive control, showing that these antagonists had the desired effect. Later, in Figures 5CD, the remaining effect after application of these antagonists was cited as evidence that the excitational asymmetry was responsible for the effect; that interpretation is only valid if the drugs truly kill all SAC input to the DSGC. What if the drugs were not 100% effective? Relatedly, in the experiments in 5CD, the measured responses all decrease with the antagonists, an effect that seems surprising and is not explained. Connecting the asymmetry in excitation to the interrupted motion is central to this paper, so it should have strong support.

      2) The measured functional results appear quite similar to results in Kuhn & Gollisch 2019, which is not cited in that context. That paper found that DSGCs responded to local contrast, not just motion, much like the results here, and suggested that oppositely tuned cells could be subtracted to eliminate this contaminating contrast signal or added to isolate the contrast signal. Here, the authors suggest a very similar use for these signals, albeit with a decoder of position and a focus on motion rather than contrast changes. (See line 528, where the authors suggest that this position-direction hypothesis is new. See also line 537: or could not be salient, if there's any kind of downstream opponent subtraction, as in primate MT.)

      3) The interrupted motion stimuli are more complex than standard motion stimuli, but it's not clear how ethological or naturalistic they really are. In particular, the occluder was the same contrast as the rest of the background, which seems like a very specific kind of occluded motion, and it's not clear how this would generalize when the occlude is the same or opposite contrast of the moving edge. Moreover, the existence of directed motion in these stimuli lead the authors to emphasize the motion on the 'preferred side', rather than just non-directional contrast changes, which seem as though they would also induce responses.

      4) The modeling/decoding aspect of this paper seems pretty speculative. It doesn't seem as though these cells are known to be involved in any kind of position encoding. The fact that they transmit information about contrast changes means they can enhance position-decoding, but many other RGCs could also (better?) serve this purpose. The optic-flow-field arrangement of these cells in the retina suggests just the opposite - that they appear likely to be used for optic flow detection, in which positional information is less relevant than the field structure.

      5) Last, I kept wondering how this offset excitatory input made the DSGCs look very similar to a classical Barlow-Levick model (though with DS inhibition). I believe a classical BL model would have many of the properties shown here, including the sensitivity to occluded ND motion on its 'preferred side'. Is there an advantage in the BL model formulation to having disjoint excitatory and inhibitory spatial inputs, rather than a broad excitatory field that overlaps with the delayed inhibition? If so, would such an advantage explain why this asymmetry might exist in these DSGCs, even with DS inhibition from the SACs? I guess I'm asking whether there is an advantage for general motion detection, rather than proposing a new role for these cells in localizing specific types of motion stimuli.

    3. Reviewer #1:

      This paper describes a new finding about stimulus encoding in On-Off directionally selective ganglion cells. It is well established that these cells have spatially displaced inhibitory input from starburst amacrine cells, and that the spatial offset of inhibitory input contributes to the cells' selectivity for direction of motion. The work in this paper shows that the cells also have spatially offset excitatory input, and that this input can give rise to a non-directional response. Several functional roles are suggested for the non-directional response. I felt that the evidence for the non-directional response was strong, but that the connection to visual function was too preliminary.

      Functional importance:

      The paper emphasizes the possible functional importance of the non-directional motion signal; this is a focus of the discussion, and is highlighted in both the abstract and introduction. I found this part of the paper less complete and convincing than the experimentally-driven results. Several issues contribute to this. One is that the contribution to identifying the position of a moving object is fairly modest. Another is that the impact of the non-directional component on other stimulus properties - e.g. the accuracy with which motion direction is encoded - is not explored. A third is that the position of a moving object is almost certainly encoded by multiple ganglion cell types, and hence the modest improvement in position encoding in the DS cell population may make even less contribution when the entire ganglion cell population is considered. A complete investigation of coding in the ganglion cell population is clearly too much, but a more balanced and complete consideration of the benefits and drawbacks of the mechanism described would strengthen the paper considerably.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The reviewers were in broad agreement that the findings were interesting and that the experiments were well executed and clear. The main concern is that the paper does not provide either a definitive mechanistic insight into why excitatory input is asymmetric, or a definitive functional argument about the importance of this asymmetry.

    1. Reviewer #3:

      Summary:

      Gene drives are alleles that bias their inheritance to spread through a population. Engineered gene drives could potentially be used to spread genes that prevent malaria transmission in mosquitoes. In this study, the authors develop a proof-of-principle of effector components that would be part of a proposed integral gene drives. Such drives are different from standard gene drives by separating the Cas9 and effector components at different loci, with each one having biased inheritance, a useful strategy if the Cas9 has a substantial fitness cost (though it remains unclear if this is the case). They can also more easily target conserved sites of important genes compared to a standard drive, though this is not unique to the integral gene drive strategy. The Cas9 and effector components would be expressed from natural promoters, with introns and translation skipping utilized so that the original gene works properly and so gRNAs can be expressed within the intron. The authors showed that the effector component of such a drive performed as expected, and that both effectors and the target gene were expressed. Overall, the manuscript is a mostly sound technical demonstration of the effector component of an integral gene drive.

      Review:

      1) It's unclear how exactly resistance alleles would be dealt with in the author's strategy. While an integral gene drive could target an essential gene so that resistance alleles are nonviable, that doesn't seem to be the strategy here, since the authors needed to target a gene with a promoter that would be a good match for their effector. The need for both an essential gene and a suitable promoter in one package may thus limit the use of the integral gene drive strategy. Higher fitness costs associated with disruption of the gene may partially ameliorate this issue, but this was not confirmed in the current study (transgenic strains had lower fitness, but was this due to the drive, the effector, or the reduced expression of the host target gene?).

      2) The authors removed their marker genes by surrounding them with LoxP sites and crossing their lines to Cre. This was justified since the authors believed that the presence of the marker would interfere with expression of the target gene, causing fitness issues. However, the authors found no sign of fitness reduction based on anecdotal (?) observations. Were these observations actually quantified, in which case they should be supplemental material? It could be particularly interesting in light of the fact that even without the marker, the transgenic strains suffered fitness effects. It would be nice if the decision to remove the marker was better justified in this section, based on the next section where it was found that the marker interfered with effector expression. Perhaps even combining or reversing the order of the sections would be appropriate (for example, consider first saying that the marker interferes with expression, then mention how this was expected and the marker could be removed, solving the problem).

      3) Based on figure 3D-E, it appears that the target host gene has reduced expression even after the marker is removed. This is quite important for future considerations, yet seems to be glossed over. For example, if a target is chosen that can effectively help remove resistance alleles due to fitness costs from disrupting the target gene, this means that the gene drive will also suffer fitness costs.

      4) The fitness analysis examining fecundity and hatch rates is not very informative. While similar fitness effects among the transgenic strains lends some weak evidence that inbreeding may account for the fitness reduction, variability between individuals certainly does not (after all, wild-type individuals were also highly variable). Also, if the Cre line has a different background than G3, wouldn't all the lines have received some of this background from prior crosses? Perhaps this could be the answer. It would nonetheless have been better for the authors to outcross the lines before inbreeding them, with similar inbreeding for the wild-type control, before doing this experiment. Because of the issues with this experiment, I'd suggest that it is conducted again with better controls or is moved to the supplement.

      5) It's hard to believe that no end-joining took place, even though the last sentence of the results indicates that no end-joining was detected. Did the authors not sequence any progeny with the drive, to look for end-joining products formed from maternally deposited Cas9? Other studies with vasa-Cas9 in Anopheles saw this phenomenon occur at a high rate. For end-joining products formed as an alternative to HDR, was it 21 individuals that were sequenced (nine with Aper1 and twelve form the full AP2 sequencing)?

    2. Reviewer #2:

      Hoermann et al. present a new gene engineering concept for disease vector mosquitoes, whereby endogenous mosquito genes are hijacked to express a heterologous effector peptide intended to render mosquitoes resistant to human pathogens. In addition, a synthetic intron added within the effector-coding sequence will express gRNAs for the CRISPR-Cas9 system, recognizing the transgene's own wild-type insertion locus. In the presence of a source of Cas9, the effector gene is thus able to home into a wild-type chromosome, triggering a gene drive effect that can increase the frequency of the modification in the mosquito population. A fluorescent marker, also cloned within the intron, is used at early steps to track the transgene, but is subsequently removed by Cre/lox excision to restore host gene + effector expression and to result in minimal genetic modification.

      This is an extremely elegant procedure and a remarkable technical achievement, especially in such a difficult species as Anopheles gambiae. The choice of midgut-specific promoters to express anti-malaria effectors makes sense to target early stages of development of parasites, before they had a chance to amplify in the mosquito. Using endogenous regulatory sequences without a need for promoter cloning alleviates the tedious work of individual promoter characterization. The molecular designs are well described, and the results likely to have a large future impact in the development of vector control tools, notwithstanding some weakness in assessing the antiparasitic effect of Scorpine in the transgenic mosquitoes (see below). I agree that this type of transgene should facilitate semi-field or field testing of candidate anti-parasitic effectors, before any true gene drive intervention is envisaged.

      Major Comments:

      P. falciparum transmission blocking assays - Fig. 5:

      I have several questions about figure 5.

      -Are mosquitoes with 0 parasite taken into account in the calculation of the mean and median? This should be explained in the legend or in Exp procedures

      -Several replicates have been pooled to generate the figure, for each transgenic strain. Is this legitimate? i.e. were the mean oocyst number and prevalence, reflecting the quality of each ookinete culture, similar enough between replicates to allow pooling? If not, it would be more legitimate to show the result of a single representative replicate. Please provide a table with the raw parasite counts of the separate replicates in a supplemental file so that readers can better judge these results. I note that panel C is very useful.

      -I find the bar graph hard to interpret. The median M is represented either as a stroke inside some bars, or overlapping the x axis when M=0. The size of the bar doesn't represent the mean, m. Does it represent a confidence interval? This must be explained in the legend. Maybe a dot plot where each dot represents the parasite counts of one mosquito would better represent these results?

      -From my point of view, mosquito numbers in some of these infections may be too low to yield solid results. Especially in the ScoG-AP2 experiment: 37 mosquitoes in the G3 control with a prevalence of 51% means that only 19 mosquitoes across R=2 replicates contained parasites. This low number is associated with a risk of atypical outliers in the parasite counts, even if the statistical tests presented here show good significance. In the panel C analysis of these values, we see from the size of the squares that the replicate that had the highest statistical significance also had the smallest number of mosquitoes. The replicate with a larger N has only one *. For the Aper1-Sco line, N is large and the statistical significance is high (although panel C shows that one of the 4 replicates showed no difference) but I'm still somewhat unconvinced of the effect of scorpine in this line: the mean only drops from 10 to 6 parasites, prevalence drops from 37 to 21%. Combining this moderate effect with the facts that (1) some replicates sometimes show no Scorpine effect, (2) the Sco-CP line, which has a comparably high level of scorpine expression according to Suppl fig . 3, shows the exact opposite, i.e. pro-parasitic effect, makes me doubt the antiparasitic effect of scorpine.

      In the case of the ScoG-AP2 line, scorpine expression is only 1/10 to 1/8 of the expression in the other two lines, but seems to have a similar effect as in the highest (Aper1) expressing line: one possibility is that fusion to GFP stabilizes Scorpine so that lower expression results in higher activity, but a milder effect would have been logical if scorpine had a dose-dependent effect.

      One caveat of these experiments is that the genetic background of the control mosquitoes (G3) is not exactly the same as the transgenics (G3 x KIL). There is a possibility that the KIL background contributed some alleles conferring elevated Plasmodium resistance (or the opposite in the case of Sco-CP). I would find the results more trustable if a control of equivalent genetic background could have been generated for each transgenic strain (in the process of homozygous line selection, the homozygous WT siblings could have been retained to serve as specific controls, though I know how demanding this work would have been...).

      Another caveat is that we don't know the precise kinetics (e.g. between 0-36h post blood meal) of the scorpine protein midgut concentration in each transgenic line, and we don't know at what time point after the blood meal parasites would be most susceptible to killing by scorpine (probably between 3 and 24h, time after which they transform into protected cysts). Taken together, the scorpine data is not highly conclusive and there remains much uncertainty about the efficacy of transgenically expressed Scorpine as an anti-plasmodium molecule. I'm not requesting additional experiments (though future long term assessments of these transgenic lines with new isogenic controls would be very interesting), but I invite the authors to downstate scorpine's potential effectiveness as an antimalarial effector in vivo. This does not decrease the importance of this work of which scorpine is only one aspect. A candidate molecule had to be chosen for these proof-of-principle experiments. Scorpine may not have been a very lucky choice, but its moderate (or opposite) effect should be seen as an interesting result in itself. The way is now open to test other possible candidates.

    3. Reviewer #1:

      This is a compelling demonstration of a number of important steps that take population replacement gene drive for malaria control closer to reality. I have no major concerns and think the manuscript shows the authors have made substantial progress in a) taking Integral Gene Drive (which is a recent idea from senior author Windbichler) into mosquitoes, b) successfully removing marker genes to make the whole system more effective, c) demonstrating that the approach works to express a molecule to reduce parasite infection rates in the lab while also making it possible to test these effector molecules in natural settings without risk of accidental drive release, and d) also showing that drive is successful. My comments are only minor and I think the study is high impact.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This paper demonstrates a number of important steps necessary for implementing the recently proposed "integral gene drive" strategy. In this approach, endogenous mosquito genes are hijacked to express a heterologous effector peptide intended to render mosquitoes resistant to human pathogens. Such drives differ from standard gene drives by separating the Cas9 and effector components at different loci, with each one having biased inheritance. This could be useful if the Cas9 has a substantial fitness cost and could also more easily target conserved sites of important genes compared to a standard drive. While it remains to be seen how effective this approach will be in practice, the paper provides valuable insights into how such gene drives could work in mosquitoes.

    1. Reviewer #1:

      The manuscript by Wuertz-Kozak et al explores the relationship between early life stress bone parameters in mice and humans. In mouse studies, micro CT and qPCR analyses were done, while in humans with depression and history of childhood neglect had bone turnover markers and DXA scans done. Increased CTX levels were noted in both mice with early life stress and in certain groups of humans with depression. These investigators recommend that early life stress be further assessed as a risk factor for human bone disease.

      1) Although the authors acknowledge the limitations of controlling and even assessing accurately the kind of impacts (e.g., nutritional, activity-related, body weight changes, age when stress inflicted etc) that may operate during childhood stress and neglect, the human model is very problematic because of this heterogeneity. There do not appear to be good parallels between the mouse model and the human cohort.

      2) Bone cell proliferation and differentiation are proposed to be affected in the mouse model. Proliferation can be directly measured in many ways and should be formally tested. Similarly, the stage of osteoblast differentiation can be easily assessed by PCR with well-validated gene markers of early vs late differentiation. The hypothesis proposed in line 140 can be directly tested.

      3) What is the significance of the increased innervation that is reported in Figure 1 and the reduced neuronal receptor expression in the next figure? It would make sense that more nerve growth would lead to greater receptor expression. Is it also unexpected that NGF2 levels are so low when there is increased nerve innervation to the bone in MSUS mice?

      4) The authors propose a 'catabolic shift' in bone in the MSUS mice. There are a few unusual things that have been reported in this matter. Most researchers would not consider osteoprotegrin a matrix gene (line 159). Furthermore, changes in osteocalcin, osteopontin and sclerostin mRNA would not be the most sensitive markers for the proposed catabolic shift. The proteins encoded by these genes are in the matrix but they are the products of osteoblasts and osteocytes and the bone formation marker P1NP per the authors is unchanged in the mice. It is the CTX that is elevated and perhaps more sensitive gene markers for a catabolic shift would be RANK-ligand, mCSF and perhaps osteoclastic genes.

      5) The Descriptive Result for the Human Study (line 172-184) is very difficult to follow. Many more key demographic, biochemical, and clinical characteristics of the human study populations need to be provided. The paper uses a wide age range of patients (18-65 years). Therefore some of the subjects will have gone through menopause and others who may not yet have reached peak BMD. This introduces a great deal of heterogeneity into the population being studied.

      6) What was the exposure and duration of the use of SSRI's in the population? These medications are implicated in reduced BMD and increased fracture rates in some studies.

      7) DXA results: (a) What site in the hip DXA is "H" or "collum femoris"? (b) One would have suspected that the total hip BMD and femoral neck BMD would have aligned with the results for the greater trochanter BMD, as shown in Table 1. Yet the 3 sites in the hip do not align. This suggests a weak relationship. (c) Lines 179-181, it seems that only ~33 subjects were included in the DXA studies. Given the heterogeneity of the population being studied in key parameters - age, sex etc - this would be an extremely small number to break into 4 groups as in Table 1, run statistical testing on, and report out on BMD results. This is a very under-powered study. BMD varies with age, sex, ethnicity, body size. Such characteristics need to be controlled to tease out an effect of childhood trauma and depression on bone.

      8) Micro CT data in the MSUS mice are driven by effects on body weight, and these data do not support a direct effect of postnatal stress on the bone itself.

      9) The human cohort needs to be better defined and described. It likely should not cover such a wide age range (18-65 years). Drug therapies for depression and their duration should be specified to compare the groups. A thorough medical assessment needs to be done on these subjects with screening labs and a basic screening medical history and physical examination. Many disorders known to affect bone could be missed (e.g., menopause, liver or kidney disease, etc). Alcohol consumption needs to be explored and clearly reported as well as the amount of smoking since both habits affect bone parameters.

    2. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      The authors examine the robustness of coupling of distinct oscillatory circuits of different frequencies across a range of temperatures. The two circuits have different means of generating oscillations and could therefore, potentially, be impacted to different degrees by temperature perturbations. Across all temperatures tested the two distinct rhythms increased their frequency but remained coordinated. The coordination was in the form of the previously-described integer-coupling where the cycle period of the slow rhythm was an integer multiple of that of the fast one. This is due to the fact that the slower rhythm was most likely to start at a given phase within the faster oscillation cycle. The temperature robustness of this coupling is an interesting and important result and the description and analysis are both well done.

      Major comments:

      The main finding of the paper is that a previously-described integer-coupling between two rhythms remains more or less intact across temperature variations. It is a nice descriptive finding, but rather disappointing in that there is so much more that could have been done rather easily that would have given much more depth to this finding. Most obviously, because it is known that the source of the coupling is the inhibitory synapse from the pyloric pacemaker to the gastric mill half-center, it is quite important to know how the strength of this synapse affects the interaction at different temperatures. That is, to expand what Bartos et al 1999 did across a range of temperatures. Short of that, it would have been nice at least to perturb the cycle period of the pyloric rhythm and see whether the interaction would remain robust across temperature despite changes in cycle period.

      While the study convinces the reader that integer coupling between pyloric and evoked gastric rhythms is robust to temperature changes, it does not attempt to to explore the origin of this robustness, e.g. by using different methods to activate the gastric rhythm or testing if integer coupling is present with spontaneous gastric rhythms.

    2. Reviewer #2:

      In the present paper, Powell and colleagues investigated how coupled oscillatory circuits maintain their coordination over a wide range of temperature. To do so they used the stomatogastric system of the crab Cancer borealis that contains the fast (1Hz) pyloric network and the slow (0.1 Hz) gastric mill network. The two generated rhythms are coordinated such that there are an integer number of pyloric cycles per gastric cycle. Both rhythms exhibit temperature-induced frequency changes, but their coordination is well maintained even at high temperature. Therefore, this study shows that the relative coordination between rhythmic circuits can be maintained as temperature changes, thus ensuring appropriate physiological functions even under global perturbations.

      This study, that uses a fantastic model for investigating neural networks in general, addresses an important physiological question. However, I have a few concerns that could be probably clarified with some additional explanations in the text:

      -While the intrinsic temperature sensitivity of the pyloric rhythm has been nicely investigated in some previous excellent publications (most done by the authors), that of the gastric rhythm is less well known. Stadele has shown that increasing the temperature leads to a breakdown of the gastric rhythm that can be rescued by modulatory afferences. What do we know about the temperature sensitivity of the afferent neurons that are stimulated to trigger the gastric rhythm here? Is there the possibility that what is observed also includes an effect of the temperature changes on these neurons (MCN1 function for example) or that the gastric temperature sensitivity described here reflects in fact that of the afferences?

      -All experiments were performed in conditions in which the gastric rhythm is triggered by stimulation of the two dorsal posterior esophageal nerves (dpons) that contain axons of modulatory afferent neurons. However stimulating these nerves also modulates the pyloric network that is also a target of those afferences (as stipulated in the text line 583-584). Isn't this a bias in the experiments and their interpretations? Also, because as schematically represented in Fig 1, the pyloric pacemaker neuron AB has direct connections with Int1 gastric neuron that is itself connected to the LG gastric neuron, the simplest interpretation of the experiments would be that this connection is preserved and remains efficient even under high temperature. Is it finally one of the conclusions of the paper?

      -In the same vain, the sensitivity to temperature changes of the gastric rhythm has been studied here but with the pyloric network, being itself intrinsically sensitive to temperature changes, still active (Fig 3 and related text). What do we know about the intrinsic temperature sensitivity of the gastric rhythm when elicited by dpons stimulation but isolated from the pyloric network (AB neuron killed for example)?

      -Data presented here show that coordination between PD and LG neurons is preserved after temperature increase, but that this is not the case between PD and DG neuron that shows no phase-coupling at high temperature (Fig 6). The PD neurons are used here as an indicator of the pyloric rhythm while the LG neurons indicate the gastric rhythm. Then what would be the conclusions of the authors if the DG neuron would have been used as the gastric rhythm indicator? How do you conciliate everything together?

    3. Reviewer #1:

      Powell and colleagues measured coordination robustness between pyloric and gastric rhythms in in vitro preparations of Cancer borealis exposed to temperature variations (7-23C degrees). Using extracellular recordings, they first show that spontaneous rhythms are not stable, likely resulting from multiple physiological processes that are difficult to monitor. Therefore, they rather used bouts of activity reproducibly evoked by stimulation of a neuromodulatory pathway. As expected, cold temperatures slowed down rhythms, warm temperatures accelerated rhythms in a similar manner. Despite this variation in rhythm frequency across temperatures, coordination between pyloric and gastric rhythms was stable . This suggested that the activity of rhythmogenic neurons is coordinated across temperatures. Powell and colleagues also found that the gastric Lateral Gastric motor neuron (LG) was phase-locked with the Pyloric Dilatator neuron (PD), suggesting they may be involved in coordination robustness.

      The originality of the study is that the authors focused on the coordination of pyloric (1 Hz) and gastric (0.1 Hz) networks. A large quantity of raw data is beautifully illustrated. Data analysis is sophisticated and convincingly supports the interpretations on the authors. The text is exquisitely written in a clear style and pleasant to read. In my view, the study contains the first experiments of a potentially exceptionally interesting study, once more mechanistic insights are added. To further strengthen the relevance of the study, I would suggest pursuing one of the three options below to further uncover the mechanisms underlying the effects described. 1.) Could the authors design causality-based experiments to identify which neuron is responsible for the coordination of the rhythms at different temperatures? There are many interconnected neurons in Figure 1C. Even if LG is phase locked to PD, is it possible that another neuron drives PD and LG? If PD controls LG, would it be relevant if the authors reversibly switched off PD (e.g. with tonic hyperpolarisation) and see the effect on gastric rhythm frequency at various temperatures?

      2) Could the authors identify using pharmacological tools whether distinct neuromodulatory substances influence coordination robustness over specific ranges of temperature, but not in others? It seems that Stadele et al. 2015 PLoS Biol 13(9):e1002265 used a different way to evoke the rhythm, and their gastric rhythm crashed at lower temperatures (13C degrees) than in the present study (27C degrees). Do the authors think that the different stimulation approaches used in the two studies could involve different neuromodulatory substances, which would result in different robustness profiles?

      3) Do the same intrinsic properties or synaptic connections underlie coordination robustness across temperatures? Modeling suggests that different conductances are involved in a temperature-dependent manner (Alonso and Marder 2020 Elife 9:e55470.2020). Is it possible for the authors to experimentally deactivate specific conductances using dynamic clamp in LG or PD or with pharmacological tools and determine whether this would reversibly disrupt the coordination between pyloric and gastric networks in some specific temperature ranges but not in others?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This study addresses an important question about the physiology of coupled oscillatory neuronal networks operating under a wide range of temperatures. The stomatogastric system of the crab Cancer borealis contains the fast (~1Hz) pyloric network and the slow (~0.1 Hz) gastric mill network. The two generated rhythms are coordinated so that there is a given number of pyloric cycles per gastric cycle. Powell and colleagues show that upon stimulation of a neuromodulatory pathway, these coupled oscillatory circuits exhibit reproducible bouts of activity and maintain their coordination, and that this coordination is maintained over a wide range of temperatures, thus ensuring appropriate physiological functions even under global perturbations.The authors show that the gastric Lateral Gastric motor neuron (LG) is phase-locked with the Pyloric Dilatator neuron (PD), suggesting these neurons may be involved in coordination robustness.

    1. Author Response

      Summary:

      The bacterial ribosome from E. coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryo-EM reconstruction with an overall resolution of 2Å, and even better than that in the best areas of the map. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these could be better developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what the advantage is here as the values are the same when estimated from half map FSCs. Therefore, it is suggested that the discussion about the model-to-map FSC is toned down considerably in (or even removed), while adding in more information about the new findings in the map, along the lines of the comments below.

      We thank the reviewers for their interest in this work, and for their helpful comments on the first version of the manuscript. We provide responses to the individual points below.

      Reviewer #1:

      This paper describes a 2A cryo-EM reconstruction of the E.coli 70S ribosome. This structure represents the highest resolution ribosome structure, by any method, available thus far and highlights interesting modifications that were not possible to see in previous structures. I'll let the ribosome experts comment on the relevance of these and focus my review on the cryo-EM technical parts. The paper is clearly written and the figures are informative and beautiful.

      The first author is particularly gratified that the figures were well received.

      Major comments:

      1) The authors make a big deal out of resolution assessment by model-to-map FSCs. It is unclear to me why they do this. First of all, model-to-map FSC is not a new resolution measure: it is in widespread use already. Second, it is unclear why the authors are so forceful in stating that it is better than the half-map FSC. They say "While map-to-model FSC carries intrinsic bias from the model's dependence on the map, in a high resolution context it does provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." What additional information does it provide? I would say it only provides true additional information if the atomic model comes from another experiment! In the way it is used here: by refining the model inside the very same map, there is a danger of increasing model-to-map FSC values through overfitting of the model (see also below). This danger is not recognized enough in the text (it is only hinted at in the sentence above), and overfitting is not measured explicitly for this case. Yes, half-map FSC measures self-consistency, but in practical terms (when done right!), this doesn't matter for the resolution estimate. The same is true for model-to-map FSCs: when done right they convey the right information, but the danger of self-consistency (through overfitting) also exists here. As the paper is mainly about the high-resolution ribosome structure, and no proper evaluation of the relative merits of half-map FSC versus model-to-map FSC is performed, I would suggest that the authors remove (or at least tone down considerably) their statements about resolution assessment from the manuscript.

      All three reviewers commented on our emphasis on using the map-to-model FSC criterion. We thank the reviewers for pointing out our motivation to discuss FSC metrics was not clear. We agree with the reviewers that the map-to-model FSC metric has been available for some time. However, in the ribosome field, the half-map FSC is still very commonly used as the sole resolution-dependent metric, including in recent literature that we cited (Nürenberg-Goloub, 2020; Tesina, 2020; Stojković, 2020; Pichkur, 2020; Halfon, 2019), as well as in a newer publication (Loveland, 2020, Nature, https://doi.org/10.1038/s41586-020-2447-x ). We mention some of the shortcomings of half-map FSC, which the third reviewer alludes to in their comment on “intense debate” in the field. While it is acknowledged as best practice to examine both maps and models, many visitors to the PDB likely will download only the model. Therefore we find it prudent to communicate confidence in the model resolution and not just the half-maps, particularly in this resolution regime. Again, this is not common in recent ribosome literature, which we will clarify in the Discussion. We will make changes throughout the manuscript to streamline and clarify our discussion of the two metrics, including an additional comparison to a newly released ribosome structure, as detailed below.

      When we discuss “additional information provided by map-to-model FSC,” we recognize that there may be semantic issues with the word “information” as map-to-model FSC depends on the same information content of the maps. However, the map-to-model FSC provides new information about the model quality to the reader. While half-map FSC tells us something about the best model one might achieve, new practical information lies in the authors’ handling of the model, which will vary among individuals (as discussed further below). Furthermore, model refinement procedures leverage well-defined chemical properties (i.e. bond lengths, angles, dihedrals, and steric restraints) that the map “knows” nothing about, which has value for keeping the realism in check. This is also why we originally included the sentence, “Sub-Ångstrom differences in nominal resolution as reported by half-map FSCs have significant bearing on chemical interactions at face value but may lack usefulness if map correlation with the final structural model is not to a similar resolution.” We will rewrite portions of this section for clarity.

      Comparisons to other recent high-resolution cryo-EM ribosome structures show discrepancies in the reported half-map FSC and map-to-model FSC calculated by us (see beginning of section “High-resolution structural features of the 50S ribosomal subunit”), with the map-to-model FSC values being to lower resolution. These structures report half-map FSCs only, which we could not replicate because of unavailability of half-maps, but we describe our calculation of map-to-model FSC with their deposited maps. We did not explicitly highlight the comparisons with their reported half-map FSC resolutions in the original manuscript, and we will include further discussion to more clearly communicate our point. We will also include another comparison to the newly released structure by Pichkur et al. (Pichkur, 2020) which has become available during the review process and is the closest to our map resolution. The map-to-model FSC with their model and map yields 2.29 Å resolution, while a simple rigid-body fit of our model into their map without further adjustment yields 2.07 Å. This difference highlights the practical insufficiency of focusing only on half-map FSC and the value of our model as a reference for future work.

      2) To test for the presence of overfitting their atomic models in the maps, the authors should shake-up the atomic models and refine them in the first independently refined half-map. The FSC of that model versus that half-map (FSC_work) should be compared with the FSC of that very same model versus the second half-map (FSC_test). Deviations between the two would be an indicating of overfitting. If that were to be observed, the weights on the stereochemical restraints should be tightened until the overfitting disappears. The same weighting scheme should then be used for the final model refinement against the sum of the half-maps.

      In lieu of what the reviewers have suggested, we think the additional map-to-model comparison of our model rigid-body docked into the 2.1 Å 50S map by Pichkur et al. provides reasonable evidence that our model suffers from minimal overfitting. Without any additional refinement of our model into their map, the map-to-model FSC resolution is 2.07 Å. We will include the new comparison in the revised manuscript.

      For model refinement, we used default parameters for phenix.real_space_refine, which internally optimizes weights for hundreds of different “chunks” during the refinement. This “black box” aspect does not give us facile control over the weighting scheme. However, we also note that the final model is not “fresh” out of Phenix; rather, the macromolecules have been meticulously reviewed and adjusted manually in Coot, with blurred maps to aid in accurate modeling for areas that are not as well connected/resolved. RSR in Coot was also required to “stitch” sections of the model together, since the models were refined in multiple focus-refined maps. Further, we think that for models that are ⅔ RNA, manually optimizing the Ramachandran restraints is unlikely to provide much new insight into RSR of this structure.

      3) Figure 1 -supplement 7: if radiation damage breaks the ribose rings, they should still be OK during early movie frames. This could be investigated by performing per-frame (or per-few-frames) reconstructions. The radiation damage argument would be a lot stronger if the density is present in early frames, yet disappears in the later ones. There will be a balance between dose-resolution and achievable spatial-resolution to see this of course. But it may be worth investigating.

      This is a great suggestion, and we have now carried out this analysis. We have performed the early-frame reconstruction and now have an alternative hypothesis that may make more sense. We will include the alternative hypothesis that we are likely seeing disorder due to conformational flexibility in the RNA backbone, rather than radiation damage, which seems unlikely given the features in the early-frame map. We will also update Figure 1–figure supplement 7 with new panels to aid this discussion.

      Reviewer #2:

      The manuscript by Watson et al. presents the structural analysis of a bacterial ribosome at high resolution. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these are missing or not well developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what this actually brings here as the values are the same when estimated from half map FSCs. The structure visualizes chemical modifications of ribosomal RNA and amino acids and water molecules, which together are interesting and important. However, here one would expect a comparison with structures of previously analyzed bacterial ribosomes, e.g. E. coli and T. thermophilus, e.g. from the same group and from the work by Fischer et al., Nature 2015: how far are the sites conserved? How do the maps compare? Are the same features seen? It is surprising to see that the main chemical modifications are not discussed and shown (only summarized in the Suppl. Data). Pseudo-uridines are mentioned, but how were these identified? It should be mentioned here that due to their isomeric nature these can be discussed only from their typical hydrogen bond pattern. The paper discusses new sites with chemical modifications, but this could benefit from a more thorough discussion of existing biochemical data or from including new biochemical characterization. The structural role of these modifications is not much described. The side chain of IAS119 has no density, hence one should be careful in interpreting an isomerization of this residue, not sure whether the data allow the conclusions to be made. Similar for the mSAsp89 residue for which the density is uncertain, hence it is not clear whether the conclusions stay on a safe ground.

      We thank the reviewer for their interest in this work. We addressed our emphasis on the map-to-model FSC in response to reviewer #1.

      For the majority of rRNA modifications, we included the supplementary figure as a reference for comparison to the published 4YBB and 4Y4O maps and models. These modifications have been extensively described in the structural biology literature, including in the recent cryo-EM study of the 50S ribosomal subunit (Stojković, 2020) and warrant no detailed comment by us at this time. Instead, we focused on new features that were not previously observed, such as hypomodifications and new modifications. The new modifications are the isoAsp observed in uS11 and the thioamide modification in uL16.

      IAS119 modeling in uS11: We thoroughly analyzed Asn or isoAsp modeled at this residue, and will provide additional evidence that isoAsp is correctly modeled at residue 119. In the original maps, although the side chain density is weak, the backbone density is unequivocal. There is clear density for the extra methylene group (marked with an asterisk in Fig. 4A). We have now calculated a map of the 30S subunit using the first three frames in the image stacks corresponding to a ~3 electron/Å2 dose. In this map, the side chain of isoAsp is more clearly visible (we will include a new figure panel with this density in the supplement). In addition to visual inspection, PHENIX provides a quantitative measure of the fit that also rules out Asn at this position. As we noted in the Methods, “Initial real-space refinement of the 30S subunit against the focused-refined map using PHENIX resulted in a single chiral volume inversion involving the backbone of N119 in ribosomal protein uS11, indicating that the L-amino acid was being forced into a D-amino acid chirality, as reported by phenix.real_space_refine.” Of the 10,564 chiral centers in the 30S subunit, only that for N119 stands out, having an energy residual nearly 2 orders of magnitude larger than the next highest deviation. This stereochemical problem was resolved by modeling isoAsp at this position. We will add these refinement details to the Methods.

      Furthermore, as we noted in the manuscript, isoAsp has been identified in E. coli uS11 by biochemical means (see David, 1999). We examined the phylogenetic conservation of the neighboring sequences in uS11, finding that the N is nearly universal in bacteria and organelles, and D is nearly universal in archaea and eukaryotes (Figure 4 and original Figure 4–figure supplement 1). Finally, even in lower-resolution maps of the archaeal and eukaryotic ribosomes, we find that isoAsp better fits the density, visually with respect to the backbone, and quantitatively based on correlations between RSR models and the density (original Figure 4–figure supplement 2). We therefore think we have been careful in interpreting the isoAsp in uS11, structurally, phylogenetically, and in light of available biochemical evidence. We also provided an in-depth analysis of the neighboring 16S/18S rRNA residues that are in intimate contact with the isoAsp119 region of uS11. See Figure 4B and Supplementary Table 2 and accompanying description.

      mSAsp89: Density for mSAsp89 has been seen previously in the X-ray crystal structure of the 70S ribosome (Noeske, 2015). Here, we also see density for mSAsp89 at lower contour levels. See Figure 1–figure supplement 5. We should have noted in the legend of this panel that we used a lower contour level for mSAsp89 and m7G527, to reveal the modifications. This will be added. Notably, at higher contours that still enclose the standard nucleobase and amino acid side chains, we do not see clear density for the mSAsp89 and m7G527 modifications, in Figure 1–figure supplement 6. In the section of the manuscript covering hypomodifications in RNA, we will clarify this point.

      Pseudouridines: We will clarify how pseudouridines are inferred in the main text. These can be inferred if a solvent molecule or other polar atom is within hydrogen-bonding distance of the N3 in pseudouridine (would be C5 in uridine). We will update Figure 1–figure supplement 5 to better show solvent molecules within hydrogen bonding distance of pseudouridine N3 atoms.

      From a methodological point of view it would be interesting to discuss in more detail how this high resolution structure was obtained, what the specific aspects of high-resolution data collection were and which were the important parameters to refine the structure. Also, how were the thousands of water molecules validated? Regarding the discussion on electrostatic potentials, in contrast to what might be intuitive, the contribution of electron scattering is actually stronger at medium resolution, i.e. its effect does not need high resolution per se. The discussion on radiation damage is a hypothesis at this stage and should be done more carefully including processing of the data using less electron dose (see detailed points below). Taken together, this work describes some interesting findings, but some remain unclear in the discussion because for some no biochemical data are available yet. However, this analysis provides useful hints to design future experiments. Also, there are no developments of tools in this paper in contrast to what is stated.

      We will add some additional information to the Discussion and Methods. In terms of the water molecules, we have not gone through these one by one at this point. We actually do not claim to have introduced new tools, but we note that our water modeling spurred the incorporation of phenix.douse into the latest PHENIX releases. This will be more clearly stated, and we will acknowledge Pavel Afonine for helping us as he developed this functionality. (He indicated we should cite Liebschner, 2019.) Solvent modeling is ripe for future development, as we note in the Discussion.

      Although scattering is stronger at medium resolution, it is not absent at < 2 Å. See the recent atomic-resolution structures of ferritin for examples. In fact, we have now examined the 2.1 Å map deposited by Pichkur et al. (Pichkur, 2020), in which the thioamide is barely visible. The thioamide in the 2.2 Å map deposited by Stojković (Stojković, 2020) is not obviously visible. We will add panels showing this in the revised manuscript.

      We have now used the early frames to address the question of ribose damage and the carboxylate of IAS119 in uS11, as noted above.

      Reviewer #3:

      The bacterial ribosome from E.coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryoEM reconstruction with features compatible with a resolution in the range of overall 2Å and below that resolution in the best areas of the map. With this level of detail, a chemical interpretation of many and fundamental aspects of translation and antibiotic interaction can be discerned in physicochemical terms, greatly improving our understanding of this key component of bacterial cells. The manuscript is well presented with clear evidence supporting the author's claims and interpretations. Specially remarkable is the detailed and accurate handling of the reference list.

      We thank the reviewer for their interest in our work. In the revision, we will keep the references mostly as-is, but will add a few based on the revisions we need to make.

      Mayor concern:

      There is an intense debate within the cryoEM community regarding which is the best way to estimate the resolution of a cryoEM reconstruction. In this manuscript, the authors claim map-to-model FSC values could "in a high resolution context [...] provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." Regardless of the opinion of this reviewer about this specific point, if a map-to-model FSC is to be used as a claim of "high-resolution" a convincing overfitting test proving the absence of overfitting in the refined model should be presented. Otherwise, map-to-model FSC values may be artificially high due to unrealistic deformation of the model. The authors thus, should prove their refined model is not overfitted.

      This was a concern of all the reviewers, which we addressed above. We think the comparisons to other recent structures, especially the 2.1 Å 50S map by Pichkur et al., makes the case for using the map-to-model FSC criterion.

    2. Reviewer #3:

      The bacterial ribosome from E.coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryoEM reconstruction with features compatible with a resolution in the range of overall 2Å and below that resolution in the best areas of the map. With this level of detail, a chemical interpretation of many and fundamental aspects of translation and antibiotic interaction can be discerned in physicochemical terms, greatly improving our understanding of this key component of bacterial cells. The manuscript is well presented with clear evidence supporting the author's claims and interpretations. Specially remarkable is the detailed and accurate handling of the reference list.

      Mayor concern:

      There is an intense debate within the cryoEM community regarding which is the best way to estimate the resolution of a cryoEM reconstruction. In this manuscript, the authors claim map-to-model FSC values could "in a high resolution context [...] provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." Regardless of the opinion of this reviewer about this specific point, if a map-to-model FSC is to be used as a claim of "high-resolution" a convincing overfitting test proving the absence of overfitting in the refined model should be presented. Otherwise, map-to-model FSC values may be artificially high due to unrealistic deformation of the model. The authors thus, should prove their refined model is not overfitted.

    3. Reviewer #2:

      The manuscript by Watson et al. presents the structural analysis of a bacterial ribosome at high resolution. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these are missing or not well developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what this actually brings here as the values are the same when estimated from half map FSCs. The structure visualizes chemical modifications of ribosomal RNA and amino acids and water molecules, which together are interesting and important. However, here one would expect a comparison with structures of previously analyzed bacterial ribosomes, e.g. E. coli and T. thermophilus, e.g. from the same group and from the work by Fischer et al., Nature 2015: how far are the sites conserved? How do the maps compare? Are the same features seen? It is surprising to see that the main chemical modifications are not discussed and shown (only summarized in the Suppl. Data). Pseudo-uridines are mentioned, but how were these identified? It should be mentioned here that due to their isomeric nature these can be discussed only from their typical hydrogen bond pattern. The paper discusses new sites with chemical modifications, but this could benefit from a more thorough discussion of existing biochemical data or from including new biochemical characterization. The structural role of these modifications is not much described. The side chain of IAS119 has no density, hence one should be careful in interpreting an isomerization of this residue, not sure whether the data allow the conclusions to be made. Similar for the mSAsp89 residue for which the density is uncertain, hence it is not clear whether the conclusions stay on a safe ground.

      From a methodological point of view it would be interesting to discuss in more detail how this high resolution structure was obtained, what the specific aspects of high-resolution data collection were and which were the important parameters to refine the structure. Also, how were the thousands of water molecules validated? Regarding the discussion on electrostatic potentials, in contrast to what might be intuitive, the contribution of electron scattering is actually stronger at medium resolution, i.e. its effect does not need high resolution per se. The discussion on radiation damage is a hypothesis at this stage and should be done more carefully including processing of the data using less electron dose (see detailed points below). Taken together, this work describes some interesting findings, but some remain unclear in the discussion because for some no biochemical data are available yet. However, this analysis provides useful hints to design future experiments. Also, there are no developments of tools in this paper in contrast to what is stated.

      Overall, this work appears to be promising, but it could benefit from clearer explanations, further comparisons with previous structures and clearer formulation of the conclusions drawn. There is indeed a significant level of novelty in this study.

    4. Reviewer #1:

      This paper describes a 2A cryo-EM reconstruction of the E.coli 70S ribosome. This structure represents the highest resolution ribosome structure, by any method, available thus far and highlights interesting modifications that were not possible to see in previous structures. I'll let the ribosome experts comment on the relevance of these and focus my review on the cryo-EM technical parts. The paper is clearly written and the figures are informative and beautiful.

      Major comments:

      1) The authors make a big deal out of resolution assessment by model-to-map FSCs. It is unclear to me why they do this. First of all, model-to-map FSC is not a new resolution measure: it is in widespread use already. Second, it is unclear why the authors are so forceful in stating that it is better than the half-map FSC. They say "While map-to-model FSC carries intrinsic bias from the model's dependence on the map, in a high resolution context it does provide additional information about the overall confidence with which to interpret the model, not captured in half-map FSCs." What additional information does it provide? I would say it only provides true additional information if the atomic model comes from another experiment! In the way it is used here: by refining the model inside the very same map, there is a danger of increasing model-to-map FSC values through overfitting of the model (see also below). This danger is not recognized enough in the text (it is only hinted at in the sentence above), and overfitting is not measured explicitly for this case. Yes, half-map FSC measures self-consistency, but in practical terms (when done right!), this doesn't matter for the resolution estimate. The same is true for model-to-map FSCs: when done right they convey the right information, but the danger of self-consistency (through overfitting) also exists here. As the paper is mainly about the high-resolution ribosome structure, and no proper evaluation of the relative merits of half-map FSC versus model-to-map FSC is performed, I would suggest that the authors remove (or at least tone down considerably) their statements about resolution assessment from the manuscript.

      2) To test for the presence of overfitting their atomic models in the maps, the authors should shake-up the atomic models and refine them in the first independently refined half-map. The FSC of that model versus that half-map (FSC_work) should be compared with the FSC of that very same model versus the second half-map (FSC_test). Deviations between the two would be an indicating of overfitting. If that were to be observed, the weights on the stereochemical restraints should be tightened until the overfitting disappears. The same weighting scheme should then be used for the final model refinement against the sum of the half-maps.

      3) Figure 1 -supplement 7: if radiation damage breaks the ribose rings, they should still be OK during early movie frames. This could be investigated by performing per-frame (or per-few-frames) reconstructions. The radiation damage argument would be a lot stronger if the density is present in early frames, yet disappears in the later ones. There will be a balance between dose-resolution and achievable spatial-resolution to see this of course. But it may be worth investigating.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The bacterial ribosome from E. coli has traditionally been a reference model in structural biology. Basic studies in translation and the mode of action and resistance to antibiotics, have greatly benefited from the mechanistic framework derived from structural studies of this cellular machinery. Recently, electron cryo-microscopy has surpassed the resolution limits X-ray crystallography studies of bacterial ribosomes historically reported. In the present manuscript, Watson et al present a landmark work where these limits are pushed even further, reporting a ribosome cryo-EM reconstruction with an overall resolution of 2Å, and even better than that in the best areas of the map. The achieved resolution is impressive and one thus expects major findings, methodological highlights and comparisons with previous structures. However, these could be better developed. Instead, the usage of map-to-model Fourier shell correlation (already known in the field) is stressed to estimate the resolution, but it is not clear what the advantage is here as the values are the same when estimated from half map FSCs. Therefore, it is suggested that the discussion about the model-to-map FSC is toned down considerably in (or even removed), while adding in more information about the new findings in the map, along the lines of the comments below.

    1. Reviewer #3:

      In this paper, the authors proposed an automatized method to sub-cortically parcellate the brain given a set of manual delineations. One of its strongest points relies on the adoption of a Bayesian approach, combining priors from the brain anatomic and MRI acquisition. These priors are then used to estimate the posterior probabilities per voxel, which after a series of operations on them provide a final sub-cortical parcellation. The paper sounds correct from a technical point of view and the proposed method potentially relevant, given the importance of having competent tools to find good sub-cortical brain delineations, especially in high resolution datasets.

      I have some possible concerns and suggestions that might increase the quality of the paper:

      -From Figure 4, it is clear how estimated Dice coefficients decrease with age. As it is well noted by the authors, this is likely caused due to the fact that the priors were built from 10 subjects that had an average age of 24.4 years and thus, the highest predicted performance rates are reflected for subjects whose age range (18-40) lies around this average prior age. I know that the authors mentioned in the paper that they plan on modelling the effects of age in the priors in future works. However, I was wondering whether they could already sort of address this question in the current work. Since the data used to test this age bias has already been manually delineated, what if the authors generate new priors for this set of delineations, including subjects from all ages, and test whether the predicted Dice coefficients still depend on age, in the same way as was done in Figure 4?

      -Automatized methods are usually sensitive to the number of subjects used to build the parcellation, with results from a bigger training cohort being potentially more robust and generalizable. As said earlier, I think that one of the strongest points of the automated method presented in this paper is the adoption of a Bayesian approach, which usually works efficiently for small sample sizes and allows to update previous results when new data comes. Still, I think it could be highly illustrative to show the performance of the current method depending on the initial training size. From the same set of delineations of the 105 subjects used to test the age bias, what if the authors show the predicted performance from generating the priors on a training set varying its size?

      -What is the value for the scale parameter delta that appears in the priors? Is that a free parameter? If so, do results change when this parameter varies?

    2. Reviewer #2:

      In the present manuscript, Bazin and colleagues describe an automatized computational approach to segment 17 subcortical nuclei from individual quantitative 7T quantitative MRI derivations. Therefore, they have trained a Bayesian "Multi-contrast Anatomical Subcortical Structure Parcellation (MASSP)" algorithm. They validate the approach in a leave-one-out fashion trained on 9/10 high-resolution scans. They assess age-related bias and report that dilated dice overlap allowing 1 voxel of uncertainty is demonstrating very high accuracy of segmentations when compared to expert delineation.

      This is a straight forward work. It would certainly benefit from an additional step of out-of-center / out-of-cohort validation, but I have no serious concern that performance would be unsatisfactory. The most important limitation is acknowledged, which is the bias from anatomical variation through age or disease. The algorithm is shown to be affected by age and most certainly will be affected by contrast and size changes in neurodegenerative disorders.

      The authors certainly know their field and are a driving force in open 7T research of the basal ganglia.

    3. Reviewer #1:

      The main criticisms of the work fall under categories largely centered on how the method is evaluated, rather than fundamental concerns with the method itself.

      Major concerns:

      1) Relative effectiveness. While a critical advancement of this method is the ability to segment many more regions than previous subcortical atlases, there are still many regions that overlap with existing segmentation tools. Knowing how the reliability of this new approach compares to previous automatic segmentation methods is crucial in being able to know how to trust the overall reliability of the method. The authors should make a direct benchmark against previous methods where they have overlap.

      2) Aging analysis. The analysis of the aging effects on the segmentations seemed oddly out of place. It wasn't clear if this is being used to vet the effectiveness of the algorithm (i.e., its ability to pick up on patterns of age-related changes) or the limitations of the algorithm (i.e., the segmentation effectiveness decreases in populations with lower across-voxel contrast). What exactly is the goal with this analysis? Also, why is it limited to only a subset of the regions output from the algorithm?

      3) Clarity of the algorithm. Because of the difficulty of the parcellation problem, the algorithm being used is quite complex. The authors do a good job showing the output of each stage of the process (Figures 7 & 8), but it would substantially help general readers to have a schematic of the logic of the algorithm itself.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Timothy Verstynen (Carnegie Mellon University) served as the Reviewing Editor.

      Summary:

      In this study, Bazin and colleagues propose a novel segmentation algorithm for parcelling subcortical regions of the human brain that was developed from multiple MRI measures derived from the M2RAGEME sequence acquired on a 7T MRI system. The key advancement of this approach is a reliable segmentation of more subcortical areas (17 regions) in native space than what is possible with currently available methods. The authors validate their algorithm by comparing against age-related measures.

      This manuscript was reviewed by three experts in the field, who found that this method has strong potential to be a new "workhorse" tool in human neuroimaging that could substantially advance our ability to measure brain structures that are largely overlooked due to problems with segmentation. The main criticisms of the work are largely centered on how the method is evaluated & implemented, rather than fundamental concerns with the validity of the method itself.

    1. Reviewer #3:

      This paper presents a neural network based approach to predict the retinotopic organization of the human visual cortex from structural MRI data. The authors are promoting the use of non-Euclidean/geometric deep learning methods for this problem. They apply their technique to the HCP data and show some interesting results, which they claim demonstrates that functional organization in the visual system can be predicted at the individual level. For me, the paper has several substantial and important flaws.

      First, one of the most important contributions of the paper is the promotion of geometric deep learning. To me, the value of this framework has not been demonstrated with the experiments. In order to assess the additional boost afforded by geometric techniques, one would need to establish a baseline with a Euclidean model. Without this comparison, it is impossible to evaluate the value of this innovation.

      Second, in general, I did not find the quality of the individual-level predictions and the presented quantitative results convincing or impressive. In Figure 3, for example, I'd like to see the underlying sulcal geometry (of each subject) to assess the value of the presented "individualized" predictions. Also, the quality of the predictions, as the authors acknowledge, is significantly reduced in large parts of the cortex, including higher order areas. Importantly, though, it is not clear how much of the individual variability is truly captured in these predictions. For example, the error maps in Figure 6 for the "shuffled" and "constant" cases look very similar to the actual error maps. And quantitatively, the overall error values are very close for these cases. This suggests that the predicted retinotopic maps are not much better than a simple group average retinotopic map. One way to counter this concern would be to conduct a fingerprinting/identifiability experiment and demonstrate that the predicted maps are much closer to the observed/measured/estimated (ground truth) maps for the same individual than other individuals. Without such an analysis, it is impossible to assess how much of individual variation is captured.

      The proposed smooth L1 loss was not properly justified and seems inappropriate. The threshold of 1 seems arbitrary. In fact, the cyclical nature of polar angle should require a cyclical loss function. However, this is a minor concern.

      The need for dropout was not also demonstrated. Was there a concern of overfitting? Showing learning curves (for training and validation data) would help with that.

      Choosing the best model based on validation loss can be improved with a "deep ensemble" strategy.

      In the shuffling procedure, spatial correlation structure seems to have been destroyed. A better approach would be to randomly deform/rotate the structural image.

      Setting the structural data to zero at input and assessing test time performance makes no sense and provides no real value.

      I suggest that authors make their code available during peer review too. Otherwise, it is impossible to assess the reproducibility of their work.

      Finally, I believe 10 is too small for the test dataset. A widely accepted convention is to use at least 10% of the total dataset for testing. I would recommend using 20 or 30 subjects for testing.

    2. Reviewer #2:

      The authors use deep learning to map brain anatomy (cortical curvature and myelination) to retinotopic maps (eccentricity and polar angle) in individual subjects.

      My overall assessment of this work is that, although the idea is neat, the execution seems a bit rushed and lacks somewhat in depth of analysis.

      More specifically:

      1) This is my main concern: The evaluation of the method's ability to find fine-grained individual differences is somewhat anecdotal and not strongly backed by rigorous analyses.

      -The idiosyncratic differences shown in Fig4a are intriguing but they could also simply be explained by gross differences in the gyral patterns of these subjects.

      -The differences between the predictions of different subjects is much lower than the within-subject prediction errors.

      -The authors should make these evaluations more quantitative. For example, by delineating several visual areas in the empirical datasets and predicted maps (in a blinded manner) and checking to see if the sizes of the different visual areas are well predicted at an individual level. This could even be built up in the model as a classifier for different visual areas.

      -Using shuffled features as some sort of null is not appropriate in my opinion, as that breaks the statistics of the input. In fact, I am amazed that it has any predictive power at all, which it clearly does seeing that the prediction errors are similar to the empirical data (Fig 6). Why is that? Is it the case e.g. that the model learns the relation between where the edges of the visual areas mask is and the retinotopy map? What happens if you give the model a mask as input that is completely different (e.g. arbitrarily expanded or contracted). My guess is that the predictions will be vastly different and distorted.

      2) It is really unclear what the approach achieves beyond finding the border between primary regions V1,V2, and V3.

      -The authors should consider delineating more areas in the empirical data and showing that their predictions cover the full 0-360 and 0-12deg range in both dimensions. This analysis would greatly inform the individual variations mentioned above.

      -One interesting suggestion by the authors is that dorsal areas in the IPS actually have bad empirical retinotopy data (indeed these areas might need specialised tasks, e.g. involving attentional components, i.e. attending to parts of the visual field [see Sereno et al.]). In fact the empirical data seem to predict that these regions cover a different hemifield in the shown test subjects, which is not what is expected. It would be interesting to see if the model proposed here does indeed predict, e.g. polar angle reversals in IPS1,2,3.. (I can see a hint of it in Fig3). To me, even without empirical data to compare to, this would be a strong suggestion that the authors may be capturing some genuine structure-function relations.

      3) Some discussion around the modelling/quantification is lacking:

      -Errors in the polar angles are really high (~30deg even in V1).

      -Related to a sub-point in comment (1): why does shuffling work? Can the authors show the actual predictions of the shuffled data (as opposed to the errors) - do they look like retinotopy maps?

      -Do we need deep learning? Previous work has shown simple relations between V1,V2,V3 and the geometry of the brain. Does this model actually capture more fine-grained features?

      -I would have set it up as a regression against x,y coords in the visual field rather than polar coords (which have obvious wrap-around problems). This will avoid the use of tricks like rotating the visual field before training, as the authors did.

      -The obvious deep-learning question: learning such a highly parameterised model based on 180*2 hemispheres sound hard. What evidence is there that this is not overfitting?

      -The authors mention in the methods that the 3D coordinates were also used as features, but in their Fig2 It looks like the features are only curvature+myelin: which is it? Are 3D coords used as explicit features?

      4) Show the data:

      -It would be good to see the features going into these predictions and the relationship with the targets. Maybe even scatter plots of curvature/myelin vs polar angle?

      -Subjects shown (test set) have very noisy maps outside the early visual cortex. Where are they in the subject-distribution of variance explained? (Benson J Vis 2018).

    3. Reviewer #1:

      The manuscript by Ribeiro, Bollmann and Puckett uses machine-learning to predict, across individuals, the retinotopic mapping from the cortical myeline and curvature map. The authors use a sophisticated method (convolutional network on the graph, called here geometric deep learning) and show appealing predicted maps of individual retinotopy in V1. While the work is interesting, the quality of the result is disappointing and the positioning to the literature imprecise.

      The authors claim that their model is "able to predict retinotopic organization far beyond early visual cortex, throughout the visual hierarchy". However the figures do not seem to support this claim: the qualitative figures do not show a clear structure in the higher-level regions.

      Figure 1 is appealing, however it should be compared to a simple average of all retinotopic maps. Likewise, the quantitative results in supplementary table 2 do not come with a comparison to the mean predictor (as with an R2 score), and it is not possible to judge whether these numbers are a good performance or not.

      Rather, figure 6 shows that the models trained on shuffled and constant data perform qualitatively and quantitatively well. The proposed model does perform slightly better, but the statistical and practical significance of this improvement is unclear. The manuscript makes no clear attempt at judging the statistical significance, and the small number of participants in the test set (10), makes it unlikely that significance would be attained. It would be beneficial to perform a complementary analysis on a larger cohort, for instance using the 3T HCP data, at the cost of lower-quality data.

      There have been many prior works that have shown the ability to predict functional organization from other mapping information. In this respect, the positioning of the present manuscript with regards to the literature is very unclear. The manuscript does acknowledge some prior work, including work using template warping, but claims that they have not "been able to capture the detailed idiosyncrasies seen in the actual measured maps of those individuals". However, no precise argument is brought forward: no quantitative measure can be compared with the prior publication, no comparison is performed. Also, individual task functional topography has been inferred from other information such as anatomical connectivity [Saygin 2012], resting-state activity [Tavor 2016], or movie watching [Eickenberg 2017]. A discussion of the relative accuracy, or pros and cons would have been interesting here.

      With this in mind, the title feel much too general: "Predicting brain function from anatomy using geometric deep learning"

      As a minor comment: controlling for the twin structure could be done in a more powerful way by isolating siblings in each of the train, validation, and test set so that there is one pair separated across sets.

      [Saygin 2012] Saygin, Zeynep M., et al. "Anatomical connectivity patterns predict face selectivity in the fusiform gyrus." Nature neuroscience (2012)

      [Tavor 2016] Tavor, I., et al. "Task-free MRI predicts individual differences in brain activity during task performance." Science 2016

      [Eickenberg 2017] Eickenberg, Michael, et al. "Seeing it all: Convolutional network layers map the function of the human visual system." NeuroImage (2017)

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Gaël Varoquaux (INRIA) served as the Reviewing Editor.

      Summary:

      The reviewers all expressed interest in the research agenda as well as the methods. However, it was felt that the results did not demonstrate a clear and sufficient improvement with regards to prior art. On the methodological side, the benefit of the deep-learning formulation was not clearly revealed. On the neuroscience side, the evidence that the method captures fine inter-individual differences was felt insufficient.

    1. Author Response

      Summary:

      This is an interesting and creative paper implicating a differential mechanism of intracellular trafficking and subsequent signaling that is triggered by different dynorphins binding to the kappa opioid receptor. In principle, if the authors could explain the molecular basis for this phenomenon, the story would be of tremendous impact in the fields of opioid receptor signaling and trafficking. The reviewers noted a number of concerns that would require significant further work and clarification to support the authors' conclusions.

      We are very happy that you and the reviewers found that the study could be of tremendous impact and describe the paper as “interesting and creative”, “novel and intriguing”, “fascinating and novel”, and feel that the study was “nicely conducted”. We appreciate the comments of the reviewers, and we are confident that we can address the comments as below.

      Reviewer #1:

      General assessment: In this manuscript the authors have assessed the different endocytic routes of KOR when activated by DynA or DynB. These are nicely conducted experiments that show interesting results, however the authors completely obviate the connection with their own work that highlights the different degradation mechanisms of these two peptides. As it stands it does not add to the field, and lacks a mechanistic explanation that could be explored given the authors’ expertise in these systems.

      We thank the reviewer for the positive comments. We are happy that the reviewer felt that the experiments are nicely conducted, and that the results are interesting. However, we respectfully but strongly disagree with the comments that our study does not add to the field.

      First, considering the extended and severe opioid epidemic, understanding the many ways in which the opioid peptide/receptor system is modulated is of high priority. Endogenous opioid peptides are highly relevant neuromodulators about which we know even less than opioid drugs. Why there are over 20 different endogenous opioid peptides but only three receptors, has been a question that has been unanswered for decades. We show that two highly related endogenous opioids, which initially activate KOR to similar levels but subsequently diverge in trafficking and endosomal signaling. We feel that this is a clear advance in the field of opioids and GPCRs.

      Second, the idea that location-biased signaling can lead to different consequences for the same agonist is still a relatively new idea, and clearly a very important area of continuing research. Even for well-studied systems like the adrenergic receptor system, we know very little about the mechanisms or the relevance of differential signaling. Demonstrating that endogenous opioids take advantage of location bias to generate distinct signaling consequences is a clear indication that such differential trafficking and signaling is physiologically relevant. Considering that opioid receptor trafficking has been implicated in opioid signaling and tolerance (although again, the mechanisms are debated), showing that different endogenous opioids can regulate localization and trafficking of the same receptor is a key advance.

      Numbered summary of substantive concerns:

      1) The major conclusion of the study is that after endocytosis, DynA preferentially sorts KOR into the degradative pathway, while DynB sorts KOR into the recycling pathway and this has consequences in the duration of the active state of the receptor and its ability to signal. It is surprising that the authors do not investigate the connection between these results and previously published work that shows differences in the degradation of DynB vs DynA within endosomes. Indeed, the authors have previously shown that: i) ECE2 hydrolyzes DynB and not DynA (Mzhavia et al JBC 2003), ii) overexpression of ECE2 increases the rate of mu-opioid receptor recycling upon DynB stimulation (Gupta et al BJP 2015) and iii) inhibition of ECE2 decreases mu-opioid receptor recycling (Gupta et al BJP 2015). Considering this previous work, it is totally expected that the two ligands show distinct post-endocytic trafficking of KOR.

      The reviewer cites data that the surface recovery rates of a different GPCR (MOR) is regulated by ECE2, and that ECE2 differentially processes Dyn A and B, to argue that it is expected that the two ligands will direct KOR to different subcellular localizations. While our results certainly could be one logical outcome of previous data, we disagree that it is a foregone conclusion.

      Specific to the reviewer’s assessment of our previous work, we were never able to test DynA previously because traditional assays did not have the sensitivity to resolve DynA-mediated recycling or trafficking. This limitation precluded the key comparison, between DynA and DynB, necessary for addressing differences between these two physiologically relevant opioid peptides. Here we use advanced high-resolution imaging experiments to carefully address how DynA and DynB diverge in directing KOR trafficking and signaling.

      More generally, we have known for over a decade that the rates of GPCR recycling can be regulated by signaling pathways without changing sorting, endosomal localization, or fates (e.g., PMID: 16604070, PMID: 27226565, PMID: 25801029, PMID: 24003153). Further, many recent studies have highlighted that the details of how GPCRs are regulated and how that affects their function diverges considerably between different receptors, even though the gross signaling characteristics are nearly identical. Therefore, it is becoming increasingly clear that we cannot apply our understanding of one GPCR too broadly to argue that we expect all GPCRs are regulated in the same manner.

      We also appreciate the reviewer’s interest in the question of whether and how ECE2 regulates location-specific signaling, and we agree that it will be very exciting to study. This is particularly important since ECE2 is not ubiquitously expressed in every cell type in the brain and thus cells with no/low ECE2 expression should exhibit different profiles for recycling or location-based signaling by DynA and DynB compared to cells expressing moderate/high levels of ECE2.

      Nevertheless, we disagree with the reviewer’s assumption that there is an obvious correlation. ECE2 sensitivity for opioid peptides was estimated using purified peptides and enzymes, and there is no evidence that the selectivity persists in vivo. In fact, most of the previous studies measured simply the sensitivity to overexpressed ECE2. Even within these constraints, the correlation is not obvious or direct. For example, we have found that BAM22 and BAM18, two peptides that activate opioid receptors, show much lower recycling of KOR than DynB (Gupta, Gomes and Devi, INRC 2019, manuscript in preparation) even though all three are ECE2 substrates (PMID: 12560336). Therefore, it is unlikely that ECE substrate sensitivity is the only difference between these peptides.

      We will be happy to provide some insight on the question of ECE sensitivity and discuss possibilities, but we feel that a thorough characterization of how ECE regulates location-specific signaling, while interesting, is outside the scope of our study that demonstrates a physiological difference between two different endogenous opioids in neurons.

      Most importantly, we respectfully feel that following up and demonstrating a logical conclusion is a strength, and should not be viewed as a negative. Clearly differentiating and establishing predicted outcomes is a critical part of advancing biology. Acknowledging and supporting this is especially important in these times where there is a clear effort and an opportunity to make academic publishing open and fair.

      2) Similarly, the differences in ECE2 sensitivity can also explain the Nb39 results, with KOR activated by the ligand that is not hydrolysable (DynA) being able to remain in the active state (and signal) for longer than when activated with the hydrolyzable ligand (DynB).

      As described in the response to #1, we agree that it is possible that the trafficking and signaling differences we see could correlate with ECE2 substrate sensitivity. Again, we feel that the focus of the manuscript is on signaling differences between endogenous opioids, and not on how ECE inhibition regulates location-specific signaling.

      3) A simple experiment to address this obvious connection is to use an ECE2 inhibitor. One would expect that in the presence of this inhibitor DynB-activated KOR is retained intracellularly and remains active for longer.

      We agree that ECE inhibitors are important tools to manipulate recycling. As mentioned above, we can provide some insight towards the correlation of ECE sensitivity and trafficking and discuss possibilities, but an in-depth characterization of how ECE proteases regulate GPCR location-specific signaling is not the focus of our study.

      4) The authors state "this is the first example of different physiological agonists driving spatial localization and trafficking of a GPCR" in light of the above comment, previous work from Bunnett et al have shown how peptides with different endocytic enzyme sensitivity can indeed, localize GPCRs (e.g somatostatin receptor) in different compartments and elicit distinct signals (Padilla et al J Cell Biol 2007; Roosterman et al PNAS 2007; Zhao et al JBC 2013 to name a few).

      We were quite taken aback by this comment. We take previously published work very seriously, and we try to be as fair as possible when we describe them. We will be happy to modify the sentence to match the current literature.

      We carefully searched through the papers the reviewer pointed out for an example where two physiological agonists drive different spatial localization and signaling of the same receptor. But we could not find one. Padilla et al., 2007, show that the recycling of CLR, activated by the ECE1-sensitive CGRP, is sensitive to ECE inhibition, but that the recycling of angiotensin receptor or bradykinin receptor, whose ligands are not sensitive to ECE, is not. Similarly, Roosterman et al., 2007, focus on how NK1 receptor recycling is sensitive to ECE1 inhibition. To the best of our knowledge, neither paper shows that spatial localization or location-biased signaling of a given GPCR is regulated differentially by two different endogenous agonists.

      The closest experiment we could find are in Fig 2, titled “Agonists induce endocytosis of SSTR2A in myenteric neurons” in Zhao et al JBC 2013. This figure shows that, when cells exposed to SST14 or the pro-peptide SST28 for 1 hour at 4˚C are followed at 37˚C and fixed, SSTR labeling at the plasma membrane and cytoplasm is similar at 30 min, but diverges after that. As far as we could figure out, receptor recycling, the precise endosomal distribution, or signaling were not tested in this manuscript.

      Therefore, we respectfully submit that the manuscripts the reviewer points to, which describe how the recycling of a receptor that binds an ECE-sensitive peptide is sensitive to ECE inhibition, should not be conflated with our careful analysis of whether different endogenous opioids can drive different spatial localization and signaling fates of the same opioid receptor.

      We would, however, be be happy to modify the sentence to state the impact of our work more precisely and to discuss the details on SSTR trafficking in the revised manuscript. If the reviewer would point us to specific examples that show that subcellular localization and spatially restricted signaling of a given GPCR is regulated differentially by two different endogenous agonists, we will be more than happy to include a discussion of that work.

      5) Support for endosomal signalling falls a bit short. For example, if indeed KOR signals from endosomes, the authors should use an inhibitor of receptor internalization and assess Nb39 recruitment and KOR signalling.

      We agree this experiment will support the conclusion, and we will be happy to provide this data.

      Reviewer #2:

      This manuscript demonstrates that two highly similar endogenous opioid agonists can give distinct opioid receptor trafficking and signaling fates. There are two key observations that are novel and intriguing: 1) two opioid peptides that are derived from the same precursor can distinctly modulate Kappa Opioid receptor (KOR) trafficking into two distinct pathways; Dynorphin A causes KOR trafficking to the late endosomes/lysosomes pathway whereas Dynorphin B promotes rapid recycling; 2) Dynorphin A activates Gi proteins on the late endosomes/lysosomes which leads to Gi-mediated cAMP inhibition from these compartments.

      The idea that GPCRs can activate G proteins at the late endosome/lysosomal compartments is fascinating and novel, however, the data presented here does not fully support their model that Dynorphin A activated Gi proteins on the late endosomes/lysosomes.

      We are very happy that the reviewer found our study fascinating and novel. We thank the reviewer for the comments, and we can address them as follows.

      Main questions:

      1) There is a mismatch with the timing of receptor colocalization experiment (Fig 3B and C, 20 min Dynorphin A/B treatment) and the cAMP assay (Fig 3H, 5 min treatment). There needs to be direct evidence that KOR is localized on the late endosomes/lysosomes at 5 minutes post agonist stimulation, i.e. at the time that cAMP levels are measured. It is important to demonstrate that the sustained signaling inhibition by DynA comes from the late endosomes/lysosomes as opposed to early endosomes. A colocalization experiment with 5 min DynA stimulation followed by a 25min washout would be necessary to support their model.

      We agree that this is a good point, and we will be happy to perform the experiment suggested. In addition, we can also provide live cell imaging data, where we simultaneously localize the nanobody that recognizes active KOR with a lysosomal marker and KOR, to show that they colocalize after DynA treatment.

      2) What percentage of KORs are proteolytically degraded in the late endosomes/lysosomes at 20 min DynA stimulation?

      At 20 min, although some of the receptors reach the lysosome, it is unlikely that there is significant degradation. This is supported by our blots that show similar levels of KOR expression at 30 minutes, and loss of receptor levels at 2 hours. This is also roughly consistent with previous studies on GPCR degradation. We will include these details in the revised manuscript.

      3) Given that KOR trafficking to the late endosomes and lysosomes is mediate by ubiquitination (as shown here PMID: 18212250), does mutation of these ubiquitination sites (3 lysine residues on KOR C-terminus) block its trafficking and the sustained signaling from the late endosomes/lysosomes?

      The reviewer raises an interesting topic that has been a subject of considerable debate in the GPCR trafficking field. The mutation of the three lysine residues on the KOR C-terminus cause more residual KOR levels after 4 hours of Dyn A, suggesting that degradation/downregulation of KOR is reduced in these mutants, even though internalization is comparable. For some opioid receptors, although ubiquitination might be required for involution and entry into the intralumenal vesicles, lysosomal localization is arguably independent of ubiquitination. Ubiquitination and/or lysine residues that interact with Ub-transferases could also affect downstream signaling, especially in the endosomes, by some GPCRs. Therefore, we feel that interpretation of results from the lysine mutant receptors will not be straightforward. Nevertheless, we appreciate that this is an interesting point, and we will address this in the revised manuscript.

      4) Is there any evidence for Gi protein localization on the late endosome/lysosomes?

      This is another interesting point raised by the reviewer, as the majority of endosomal signaling data rely on Gs-coupled or Gq-coupled receptors. However, Gi-coupled GPCRs, such as the cannabinoid receptor or the related mu opioid receptor can exist in the active conformation in endosomes (e.g, PMID: 18267983, PMID: 29754753), and internalization is required for sustained cAMP inhibition for the Class B S1P receptor (PMID: 24638168). These provide indirect evidence that Gi proteins might be present and active on endosomes.

      Unfortunately, directly testing whether Gi proteins are active on endosomes has been technically challenging, unlike with Gs proteins. The main limitation has been the lack of conformation-sensors for Gi proteins. We will be happy to discuss these points in the revised manuscript.

      5) Additional functional readouts would also be helpful to support their model of Gi-mediated inhibition of cAMP response from late endosomes/lysosomes and not the plasma membrane or early endosomes. Perhaps mTOR activation (as authors have suggested in their discussion) could be used as a read out to show differences between DynA and B-mediated signaling?

      We will be happy to test endosome-based mTOR signaling downstream of KOR to see if there is a difference between DynA and B. Since our data already suggest that the main impact might be on cAMP signaling, we will also discuss the implications to cAMP signaling.

      Reviewer #3:

      This is an interesting idea and creative paper implicating a differential mechanism of intracellular trafficking and subsequently signaling that is triggered by different dynorphins binding to the kappa opioid receptor. However, there are some questions for the authors:

      We thank the reviewer for the comments that the paper is interesting and creative, and for the critique. We are confident that we can fully address them as follows.

      1) My reading is that some dynorphins are extremely rapidly degraded in serum and with these experiments performed in 15% Horse/FCS there is concern that some of the differential results could be explained by differential degradation. One hypothesis could be a differential frequency of receptor activation over time of a fast recycling receptor population. Can the authors convince me that this difference in trafficking and subsequent signaling is an intrinsic property of the peptide and not an exhaustion of peptide (would be DynB) over the 30min assay?

      We agree this is an important point, and we apologize for not specifically addressing this point. For the trafficking experiments, we directly compared results from experiments done with and without protease inhibitors. We saw no difference between the two conditions, possibly because we were using short time points, high enough concentrations, and dialyzed serum. We agree that it will be important to include these data in the revised manuscript. The signaling experiments, which required longer incubations, were performed in the presence of protease inhibitors, consistent with previous studies.

      2) In Fig 2D, 2G and 2J at what time after addition peptides was this data obtained?

      For measuring individual recycling events (2D and G), cells were treated with agonist for 5 minutes at 37°C. Receptor clustering was visualized using TIRF microscopy, and then a recycling movie was recorded at 10 Hz for 1 minute in TIRF. For 2J, we measured 2 time points, 30 min and 120 min after agonist addition. We apologize for not stating these details in the figure, and will be happy to do so.

      3) In Fig 2F the divergence of internalized receptor only occurs from time 20-30 mins which was difficult for me to understand since DynA should result in lost surface receptor number. What confuses me is that in Fig2H the initial recycling induced by DynA17 is fast and slows down so I am wondering if a second hit is needed which feeds into my concern about peptide degradation in the media. Since released peptide would be pulsatile maybe in vivo DynA17 could act like DynB?

      We realize that a better explanation is needed for the recycling experiment performed in 2F. The cells were imaged for a period of 2 minutes to collect baseline SpH fluorescence, which corresponds to the steady-state amount of KOR on the cell surface. After this period, cells were imaged for 15 min after DynA or DynB was added. In this period, because internalization is the predominant factor affecting surface levels, we see a loss in fluorescence as the receptors are internalized and SpH is quenched in the relatively acidic compartments. Because KOR internalization rates are not dramatically different between DynA and B, we do not expect the fluorescence traces to be different. The agonist was then washed out at this time (t=17), and cells were imaged in media containing antagonist. Because there is very little agonist-induced internalization after this point, the fluorescence change depends predominantly on reappearance of receptors via recycling. Therefore, if the main difference between DynA and DynB is in KOR recycling, we expect to see a divergence only in the late points of the trace.

      We thank the reviewer for carefully viewing the traces in 2F and 2H. We understand the interpretation that there might be fast and slow components to DynA induced recycling. While it certainly is possible, we are not comfortable making a strong conclusion on that, based on the sensitivity of the assays used and the variability between cells.

      As mentioned in point#1, it is unlikely, however that this divergence in recycling is due to significant degradation of DynA. Nevertheless, it is an important point to discuss in light of the new data we provide, and we will be happy to explain this in detail.

      4) The assays seem to be done with a single concentration of peptide - 1µM. Do the authors have data to show that at lower (or higher) concentrations than 1µM result in the same trafficking patterns, albeit to a lesser or greater extent. Also, for the cAMP inhibition what concentration gives max inhibition? For a binding affinity of 0.01nM in the cells and with high expression, the 1micromolar concentration seems high.

      We used the 1µM dose based on careful dose-response measurements for cAMP signaling. Part of the dose-response data has been published (PMID: 32393639). We will be happy to provide the extended data, and also provide a dose-response for trafficking. It is possible that the dose is what helps us mitigate potential degradation of the peptides.

      5) In Fig 2H 100% of receptors appear to be recycled after DynB however 25% of kappa colocalize in Rab7 in 3C so do these Rb 7 co-localized receptors recycle?

      It is certainly possible that some receptors from Rab7 endosomes can recycle. Current views are more aligned with overlapping populations of endosomes as labelled by biochemical markers, especially by trafficking components like Rabs. Therefore, our characterization likely describes a spread of receptor distributions across overlapping compartments. Moreover, the recycling of receptors in Fig 2H was quantitated using ELISA over 2 hours after agonist washout. The endosome colocalization in 3C was measured after 20 min of agonist treatment. As the reviewer would agree, it is difficult to directly compare data from these two experiments and draw definite conclusions.

      That said, we certainly did not mean to imply that all of DynB-activated KOR is recycled and that DynA-activated KOR is degraded. Current data on trafficking support a more dynamic and flexible model for receptor sorting, where a fraction of the receptors is recycled while a fraction is degraded from each endosome. Our results are consistent with this model. We feel that, because the receptor populations undergo many rounds of rapid iterative sorting as the endosome matures, a larger fraction is recycled back to the surface in the case of DynB at a steady state, while a larger fraction stays behind in the case of DynA. Importantly, this difference in steady state localization is enough to cause a difference in endosomal receptor activation and cAMP signaling, suggesting that small differences in steady state localization can cause relevant changes in signaling.

      We apologize for not making this important point clearer, and we will be happy to clarify this in the revised manuscript.

      6) Could some of the signaling differences be explained by continued activation of receptors as a consequence of peptide processing in the endocytosed vesicle as opposed to different vesicles? I guess the continued signaling could also direct subsequent trafficking and this could be tested with a membrane permeable antagonist.

      We thank the reviewer for raising this point. As we described in our response to reviewer#1, peptide processing by ECE proteases could contribute to the differences, but the data suggest that this is not a direct correlation or the main explanation for the differences we observe. We will be happy to provide data to address this aspect.

      7) The impact statement "Co-released dynorphins, which signal similarly from the cell surface, can differentially localize GPCRs to specific subcellular compartments, and cause divergent receptor fates and distinct spatiotemporal patterns of signaling" could be misconstrued. If one of the pathways is dominant and blocks the other, then co-release may only have one signaling outcome. Have any dynorphin mix experiments been conducted? What might be anticipated?

      We agree that the question of whether one peptide is dominant is an interesting one in the context of the paper, and we thank the reviewer for pointing this out. Assay sensitivity has remained a long-standing problem when trying these mixed experiments in the endogenous opioid system. We will be happy to try a dynorphin mix experiment with our state-of-the-art imaging assays. We will also revise the sentence to reduce ambiguity.

      8) It looks like details for the ELISA measurements in the methods section was missing. Were the ELISA measurements done with untagged KOR or SpH-KOR? One might worry about the effects of the N-terminal SpH tag on KOR trafficking, and it would be nice if the fluorescence SpH-KOR data were supported by ELISA for untagged KOR. (At least some of the data is immunostaining of FLAG-KOR, which probably introduces only minimal perturbation)

      We apologize for not including the details of the ELISA experiments. The ELISA experiments were performed essentially as described previously (PMID: 24990314; PMID: 24847082). Briefly, CHO-KOR cells or SpH-KOR cells (2x105) were seeded in complete growth media into each well of a 24 well poly-lysine coated plate. The following day cells were washed once in PBS, placed on ice and incubated with 1:1000 dilution (PBS containing 1% BSA) of either anti-Flag M1 mouse monoclonal antibody (for CHO-KOR cells), or anti-GFP rabbit polyclonal antibody (for SpH-KOR) for 1h at 4˚C. Cells were then gently washed twice with PBS and treated without or with 1mM peptides in either F-12 medium (for CHO-KOR cells) or F-12K(for SpH-KOR) containing protease inhibitor cocktail (Sigma) for 30 min at 37oC to induce receptor internalization. Cells were then washed and incubated in media without peptides for different time periods (5-120 min). Cells were chilled to 4˚C and briefly fixed with paraformaldehyde for 3 min. Cells were then incubated with 1:1000 dilution of either anti-mouse or anti-rabbit HRP-coupled secondary antibody. The substrate o-phenylenediamine (5 mg/10 ml in 0.15 M citrate buffer, pH 5, containing 20 ul of H2O2 ) was added to each well (100 ul) and reaction stopped after 10 min by addition of 50 ul 1N HCl. Absorbance at 490 nm was measured with a Bio-Rad ELISA reader. We will definitely correct this oversight and include these details in the revised manuscript.

      The reviewer’s concern about the tag is a valid one, and one that we are very careful about. We have used three different tags to label the receptor, all on the N-terminus to reduce potential interference. The ELISA measurements were done using FLAG-tagged and HA-tagged KOR. The trafficking experiments were done with FLAG-tagged and SpH-tagged KOR. The results are consistent between all these experiments, suggesting that the difference we observe are not due to tagging. We will clarify these details in the revised manuscript.

      9) Dynorphin A17 is a very sticky peptide and difficult to wash out. Since we don't have a dose response it may require only very doses to have full activation for cAMP inhibition. It would be nice to be able to discount this as a potential for prolonged activation after washout.

      The reviewer brings up a good point. DynA is less sticky in media or solutions containing 150mM NaCl, but we realize that this is a concern that should be addressed. In our case, we picked the doses we used based on dose-response curves that we have performed for cAMP signaling for these peptides. We realize that it is important to explain the choice of our concentrations better, and we will be happy to do so in the revised manuscript.

    2. Reviewer #3:

      This is an interesting idea and creative paper implicating a differential mechanism of intracellular trafficking and subsequently signaling that is triggered by different dynorphins binding to the kappa opioid receptor. However, there are some questions for the authors:

      1) My reading is that some dynorphins are extremely rapidly degraded in serum and with these experiments performed in 15% Horse/FCS there is concern that some of the differential results could be explained by differential degradation. One hypothesis could be a differential frequency of receptor activation over time of a fast recycling receptor population. Can the authors convince me that this difference in trafficking and subsequent signaling is an intrinsic property of the peptide and not an exhaustion of peptide (would be DynB) over the 30min assay?

      2) In Fig 2D, 2G and 2J at what time after addition peptides was this data obtained?

      3) In Fig 2F the divergence of internalized receptor only occurs from time 20-30 mins which was difficult for me to understand since DynA should result in lost surface receptor number. What confuses me is that in Fig2H the initial recycling induced by DynA17 is fast and slows down so I am wondering if a second hit is needed which feeds into my concern about peptide degradation in the media. Since released peptide would be pulsatile maybe in vivo DynA17 could act like DynB?

      4) The assays seem to be done with a single concentration of peptide - 1µM. Do the authors have data to show that at lower (or higher) concentrations than 1µM result in the same trafficking patterns, albeit to a lesser or greater extent. Also, for the cAMP inhibition what concentration gives max inhibition? For a binding affinity of 0.01nM in the cells and with high expression, the 1micromolar concentration seems high.

      5) In Fig 2H 100% of receptors appear to be recycled after DynB however 25% of kappa colocalize in Rab7 in 3C so do these Rb 7 co-localized receptors recycle?

      6) Could some of the signaling differences be explained by continued activation of receptors as a consequence of peptide processing in the endocytosed vesicle as opposed to different vesicles? I guess the continued signaling could also direct subsequent trafficking and this could be tested with a membrane permeable antagonist.

      7) The impact statement "Co-released dynorphins, which signal similarly from the cell surface, can differentially localize GPCRs to specific subcellular compartments, and cause divergent receptor fates and distinct spatiotemporal patterns of signaling" could be misconstrued. If one of the pathways is dominant and blocks the other, then co-release may only have one signaling outcome. Have any dynorphin mix experiments been conducted? What might be anticipated?

      8) It looks like details for the ELISA measurements in the methods section was missing. Were the ELISA measurements done with untagged KOR or SpH-KOR? One might worry about the effects of the N-terminal SpH tag on KOR trafficking, and it would be nice if the fluorescence SpH-KOR data were supported by ELISA for untagged KOR. (At least some of the data is immunostaining of FLAG-KOR, which probably introduces only minimal perturbation)

      9) Dynorphin A17 is a very sticky peptide and difficult to wash out. Since we don't have a dose response it may require only very doses to have full activation for cAMP inhibition. It would be nice to be able to discount this as a potential for prolonged activation after washout.

    3. Reviewer #2:

      This manuscript demonstrates that two highly similar endogenous opioid agonists can give distinct opioid receptor trafficking and signaling fates. There are two key observations that are novel and intriguing: 1) two opioid peptides that are derived from the same precursor can distinctly modulate Kappa Opioid receptor (KOR) trafficking into two distinct pathways; Dynorphin A causes KOR trafficking to the late endosomes/lysosomes pathway whereas Dynorphin B promotes rapid recycling; 2) Dynorphin A activates Gi proteins on the late endosomes/lysosomes which leads to Gi-mediated cAMP inhibition from these compartments.

      The idea that GPCRs can activate G proteins at the late endosome/lysosomal compartments is fascinating and novel, however, the data presented here does not fully support their model that Dynorphin A activated Gi proteins on the late endosomes/lysosomes.

      Main questions:

      1) There is a mismatch with the timing of receptor colocalization experiment (Fig 3B and C, 20 min Dynorphin A/B treatment) and the cAMP assay (Fig 3H, 5 min treatment). There needs to be direct evidence that KOR is localized on the late endosomes/lysosomes at 5 minutes post agonist stimulation, i.e. at the time that cAMP levels are measured. It is important to demonstrate that the sustained signaling inhibition by DynA comes from the late endosomes/lysosomes as opposed to early endosomes. A colocalization experiment with 5 min DynA stimulation followed by a 25min washout would be necessary to support their model.

      2) What percentage of KORs are proteolytically degraded in the late endosomes/lysosomes at 20 min DynA stimulation?

      3) Given that KOR trafficking to the late endosomes and lysosomes is mediate by ubiquitination (as shown here PMID: 18212250), does mutation of these ubiquitination sites (3 lysine residues on KOR C-terminus) block its trafficking and the sustained signaling from the late endosomes/lysosomes?

      4) Is there any evidence for Gi protein localization on the late endosome/lysosomes?

      5) Additional functional readouts would also be helpful to support their model of Gi-mediated inhibition of cAMP response from late endosomes/lysosomes and not the plasma membrane or early endosomes. Perhaps mTOR activation (as authors have suggested in their discussion) could be used as a read out to show differences between DynA and B-mediated signaling?

    4. Reviewer #1:

      General assessment:

      In this manuscript the authors have assessed the different endocytic routes of KOR when activated by DynA or DynB. These are nicely conducted experiments that show interesting results, however the authors completely obviate the connection with their own work that highlights the different degradation mechanisms of these two peptides. As it stands it does not add to the field, and lacks a mechanistic explanation that could be explored given the authors’ expertise in these systems.

      Numbered summary of substantive concerns:

      1) The major conclusion of the study is that after endocytosis, DynA preferentially sorts KOR into the degradative pathway, while DynB sorts KOR into the recycling pathway and this has consequences in the duration of the active state of the receptor and its ability to signal. It is surprising that the authors do not investigate the connection between these results and previously published work that shows differences in the degradation of DynB vs DynA within endosomes. Indeed, the authors have previously shown that: i) ECE2 hydrolyzes DynB and not DynA (Mzhavia et al JBC 2003), ii) overexpression of ECE2 increases the rate of mu-opioid receptor recycling upon DynB stimulation (Gupta et al BJP 2015) and iii) inhibition of ECE2 decreases mu-opioid receptor recycling (Gupta et al BJP 2015). Considering this previous work, it is totally expected that the two ligands show distinct post-endocytic trafficking of KOR.

      2) Similarly, the differences in ECE2 sensitivity can also explain the Nb39 results, with KOR activated by the ligand that is not hydrolysable (DynA) being able to remain in the active state (and signal) for longer than when activated with the hydrolyzable ligand (DynB).

      3) A simple experiment to address this obvious connection is to use an ECE2 inhibitor. One would expect that in the presence of this inhibitor DynB-activated KOR is retained intracellularly and remains active for longer.

      4) The authors state "this is the first example of different physiological agonists driving spatial localization and trafficking of a GPCR" in light of the above comment, previous work from Bunnett et al have shown how peptides with different endocytic enzyme sensitivity can indeed, localize GPCRs (e.g somatostatin receptor) in different compartments and elicit distinct signals (Padilla et al J Cell Biol 2007; Roosterman et al PNAS 2007; Zhao et al JBC 2013 to name a few).

      5) Support for endosomal signalling falls a bit short. For example, if indeed KOR signals from endosomes, the authors should use an inhibitor of receptor internalization and assess Nb39 recruitment and KOR signalling.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This is an interesting and creative paper implicating a differential mechanism of intracellular trafficking and subsequent signaling that is triggered by different dynorphins binding to the kappa opioid receptor. In principle, if the authors could explain the molecular basis for this phenomenon, the story would be of tremendous impact in the fields of opioid receptor signaling and trafficking. The reviewers noted a number of concerns that would require significant further work and clarification to support the authors' conclusions.

    1. Reviewer #3:

      In the manuscript "Kinetics of CDK4/6 inhibition determine different temporal locations of the restriction point" Kim et al., investigate the regulation of the Rb/E2F by CDK4/6 and CDK2 and how mitogen and stress signalling differently regulate kinetics of CDK4/6 inhibition before irreversible cell-cycle entry. Research into restriction point regulation recently experienced a revival due to advanced single cell approaches and the presented study falls into this category as well. Utilizing CDK4, CDK2 and APC/C activity reporters the authors investigate the position of the restriction point in response to external stimuli. Their main conclusions are that i) CDK4/6 activity alone initiates RB hyperphosphorylation and E2F activation, ii) that the CDK2-Rb feedback is the key signalling network controlling the restriction point, iii) that kinetics of CDK4/6 inhibition in response to mitogen removal and stress signalling explain previous observation in asynchronously cycling cells showing different locations of the restriction point and iv), that CDK2 activity alone without other mechanisms in S phase determines the temporal location of the restriction point with respect to CDK4/6 inhibition and S-phase entry.

      I have major concerns with presented work regarding the design of the study in relation to question asked, the one-sided introduction and discussion and imprecise wording of restriction point events, the tendency to overstating/generalize conclusion of their findings, the novelty of their results in relation to old restriction point studies using serum starvation and release regimes and the more recent studies from the Meyer, Spencer and Bakal labs focusing on asynchronously growing cells, and the fact that their results and interpretations are completely at odds with the recent Dowdy and Dyson studies, which are not mentioned at all in either the introduction or discussion. Finally, to my opinion the authors have not yet provided the experimental proof for one of their major claims, namely that CDK4/6 activity alone initiates RB hyperphosphorylation and E2F activation. My detailed criticisms are listed in the major and minor points below.

      Major points:

      1) The authors give the impression in the introduction that they will focus on probing the possibility of different temporal locations of the restriction point depending on the external stimuli (p3, l60ff). However, they only use mitogen withdrawal and NCS-induced DNA damage as "stimuli" but then claim that "we demonstrate that different extracellular environments cause different kinetics of CDK4/6 inhibition (p10, l96ff)". Certainly, these two treatments (in addition to direct CDK4 and CDK2) are not sufficient for such a general statement and in the context of their writing, NCS-induced DNA damage is rather a cell-intrinsic and not an external stimulus/condition as claimed. Similarly, the authors derive from their NCS experiments general and overarching statements about restriction point regulation in response to stress. In fact, CDK4/6 is a target of several integrated stress pathways, e.g. UPR/PERK, which regulate the levels of cyclin D on the translational level (e.g. Brewer at al., PNAS 1999) and are independent on p21. The authors also claim to investigate whether other mechanisms in S phase are required to initiate the restriction point. To me this is another example of unclear wording and unfulfilled expectations as the only factor analysed is the APC/C, which is inactivated at the entry of S phase. From the introduction, discussion and the mentioned literature it is unclear to me why the authors expect that a mechanism in S phase, hence after commitment to proliferation, would feed back on the restriction point during G1 phase of the same cell.

      2) Introduction and discussion are one-sided and completely omit recent findings of the Spencer lab (Min et al., PLOS Biology 2019) in relation to stress and most importantly the Dowdy (Narasimha et al. eLife 2014) and Dyson studies (Sandias et al., Mol Cell 2019), which are both at odds with a major claim of the presented work (see below).

      3) The authors claim throughout the paper that CDK4/6 is sufficient to hyperphosphorylate Rb based on nuclei that can be stained by antibodies specific to 4 Rb phospho sites and in situ extraction experiments that claim to dissociate hyperphopshorylated Rb from the DNA. This claim cannot be made as their results are completely consistent with the alternative, namely that multiple Rb molecules within the same cell (nucleus) are mono-phosphorylated at the analysed sites, or at either of the 14 possible sites. This would be in agreement with the Dowdy and Dyson studies (Fig. 1 & Fig. 2). For the situ extraction experiments investigating nuclear-bound Rb there is no real data shown. Fig. 1J basically shows the segmentation strategy the authors employ and indicate that same cells have less nuclear Rb staining. There are no controls, (e.g. before after extraction) and proof that the assay works in their hands - e.g. treating the cells with CDK4/6 inhibitors and CDK2 inhibitors before the assay. The authors show in Fig 1. that E2F1 is already induced hours before mitosis, yet cells only progress much later into S. However, it is as likely that mono phosphorylation of RB is sufficient to initiate E2F1 transcription, this could be easily tested using the published mutant cell lines expressing Rb variants with only one phosphosite.

      4) The authors claim that "However, previous studies showed that CDK2 inhibitors caused a loss of Rb phosphorylation and induced quiescence (Narasimha et al., 2014; Spencer et al., 2013)" (p5, l60). Reading these papers again it appears to me that this is a wrong statement/interpretation. Narasimha et al, show in Figure 3 that only CDK4 inhibition but not CDK2 inhibition results in a complete loss of Rb phosphorylation. The latter treatment resulted in RB mono phosphorylation (Fig 3i) and did not induce quiescence as the authors claim here. Instead, such cells remained in G1 phase and did not make the transition into G0. Also, the claim the Spencer et al., results are due to off-target effects of CDK2 inhibitors appears flawed, because the authors only detect those after a prolonged time (more than 9 hours), whereas Spencer et al, monitored the effect of such inhibitors on cells immediately after application. Hence, in my opinion this part, the corresponding data (Fig. S3), and interpretations should be removed.

      5) In asynchronously treated cells CDK2 appears to be activated early after mitosis (Spencer et al., 2013), whereas in their experimental setup CDK2 and CDK4 activation are only assessed after mitogen starvation and release. I imagine from the timing that in asynchronously growing cells also CDK2 activity will be tightly coordinated with E2F transcription (Fig 1D) - hence, a main foundation for their study may depend on the experimental setup used and thus this should clearly be discussed. I also wonder how their results on the requirement of CDK4 for RB phosphorylation would be without the synchronization step.

    2. Reviewer #2:

      In this manuscript, Kim et al. investigate the events required for irreversible commitment to division by immortalized mammalian cells in culture. They do so by tracking single, live cells by video-microscopy using an assortment of fluorescent biosensors (augmented by fixed-cell immunofluorescence), and perturbing cell-cycle progression with cyclin-dependent kinase (CDK) inhibitors, DNA-damaging agents, or mitogen withdrawal. This is a complicated problem, which has resisted a comprehensive solution since the initial attempts to define a commitment or "Restriction" point (R point) in mammalian cells over 40 years ago. This study yields some intriguing results, and generally adds significant molecular detail to previous work on this problem by the PI and former colleagues in the Meyer lab. There are serious flaws, however, both conceptual and technical. Some of them are inherent in the approach, for example, the overreliance on small-molecule inhibitors that are not as selective as one would hope, and on live-cell biosensors that are neither as sensitive nor as specific (for individual CDKs) as they would need to be to justify some of the stronger mechanistic conclusions. Then there is the central take-home message (I think), which is based on the observation that mitogen withdrawal or DNA damaging agents have different windows of sensitivity during G1, such that the former needs to be applied earlier than the latter in order to prevent cell cycle entry. This leads to re-interpretation of the R point as a moving target, occurring at different points in the cell cycle depending on which perturbations cells encounter as they take the necessary steps to commence DNA replication. This makes little biological sense to me. The R point concept seems to lose much or all of its usefulness if it is not understood as a cellular state in which the irreversible commitment to division has been made, irrespective of what might befall an individual cell that has passed it. I think a more reasonable interpretation, of a superficially (at least) similar phenomenon, was put forth by Skotheim and colleagues, who found that the threshold level of CDK1/2 activity that predicted subsequent R-point passage was higher when all mitogens were withdrawn than when a single mitogenic signaling pathway was ablated, e.g. with a MEK inhibitor (Schwarz et al., 2018, ref 22). In this take, the R point per se is not mutable, but the strength of an antimitogenic signal can determine how quickly cells can put on the brakes before reaching it. I would urge the authors to avoid this phrasing, and aim for a bit more clarity in describing an admittedly complicated set of data. Below I Iist my major, specific concerns:

      1) Probably the biggest problem for the current study emerged from a paper by Rubin and colleagues (Guiley et al., 2019, ref. 26), which showed, quite convincingly, that the "CDK4/6 inhibitors" Palbociclib, ribociclib and abemaciclib-used throughout the current study-almost certainly do not work in cells by direct inhibition of CDK4/6, but rather by binding CDK monomers and redistributing CDK inhibitor (CKI) proteins, notably p21, to CDK2. To be fair, this is a very recent paper, which, to their credit, the authors cite and try to address. But they address it only obliquely and, I'm afraid, inadequately; although they show that effects of Palbociclib et al. are partially independent of p21 (Fig. 3B,D), this doesn't rule out contributions by other CKIs such as p27 or p57, all of which could potentially be redistributing to CDK2 complexes if CDK4 complex assembly is impaired (Guiley et al. did not test this possibility and only evaluated CDK2-CKI binding in wild-type cells). Nor do they address the strong implication of Guiley et al., that loss of CDK4/6 activity is not the mechanism by which these compounds act. This is a hugely important point; the entire study (and several previous ones from the Meyer lab) depends on the ability to inhibit CDK4/6 or CDK1/2 with different inhibitors and distinguish the effects on various cellular phenotypes and biosensor signals, which is now in considerable doubt.

      2) More generally, the study relies on small-molecule inhibitors of different CDKs that are at best only modestly selective for their intended targets. The problem with using Palbociclib in this way has been discussed above, and is a recent development, but it should be noted that major "off targets" for the "CDK4/6 inhibitors" include transcriptional CDKs such as CDK9, which are also potently inhibited by "CDK1/2" inhibitors such as roscovitine (and others). One could make the case that these drugs are hitting different targets, because they have different effects on different biosensors, but the specificity of those bioesensors was established in part by using the inhibitors, so the case that their effects occur solely or primarily through their intended targets is in the end circular.

      3) The "CDK4/6 biosensor" has in fact been shown in a previous paper by the PI to detect CDK1/2 activity in addition to CDK4/6; there was residual signal after Palbociclib treatment in cells with high CDK2 activity. Setting aside the aforementioned problem of Palbociclib specificity, if I understand correctly, to "correct" for this lack of specificity, the authors subtract 35% to generate the signal they attribute to CDK4/6. This seems to assume that the relative contributions to this fluorescence by CDK4/6 and CDK1/2 will be in a fixed proportion, or am I missing something?

      4) In previous papers from the Meyer lab, Rb hyperphosphorylation was "inferred" from concurrently increased immunofluorescence signals, in fixed cells, from a panel of phosphoRb-specific antibodies (Chung et al., 2019, ref. 18). I have my problems even with inferring stoichiometry from these types of measurements, but in this manuscript the language is even stronger: IF signals are flatly described (and interpreted) as "markers" of Rb hyperphosphorylation. This too is a major issue; a prevailing model, supported by biochemical data that are by necessity ensemble measurements, holds that CDK4/6 is primarily responsible for Rb monophosphorylation, whereas hyperphosphorylation coincides with and is dependent on activation of CDK2 (Narasimha et al., 2014, ref. 28). Although for the moment the larger concern-that anything the authors have done to inactivate CDK4/6 is likely to be indirectly inhibiting CDK2-renders this more technical point somewhat moot, conclusions-or even inferences-about hyper- versus mono-phosphorylated forms of Rb should be based on actual measurements of stoichiometry.

    3. Reviewer #1:

      This manuscript reports a series of studies probing the relative roles of CDK4/6 and CDK2 in inactivation of the retinoblastoma (Rb) protein and in determining the restriction point, which marks the commitment of a cell to S phase and subsequent cell division. The work builds off the recent development of live-cell reporters for CDK activity, and it primarily uses relationships between those signals to conclude that while CDK4/6 activity is sufficient for Rb inactivation and E2F activation, CDK2 activation determines passage through the restriction point. Though well-studied over the last two decades, the questions addressed here related to the G1-S cell cycle transition are still not sufficiently answered, and they are important to understanding fundamental cell biology and cancer biology. The use of single-cell imaging and application of a CDK4/6 sensor is an exciting approach to study Rb inactivation and the restriction point, and many of the experiments here are well designed. In addition, aspects of the authors' approach, including the use of multiple cell lines, make the observations robust. However, there are several significant concerns. While most of the concerns could be addressed through more analysis of experiments already performed and rewriting, more experiments are likely necessary to address the first point.

      Significant concerns:

      1) The study relies on interpretation of the adjusted "CDK4/6 sensor" signal as a specific reporter of CDK4/6 activity. Because this assumption of specificity is so critical, the authors should briefly review the evidence supporting it and better explain the accounting of other activities that may result in sensor phosphorylation. It is problematic that one of the conclusions in the discussion is that the "the CDK4/6 sensor may report other activities which can be targeted by CDK4/6 inhibitors," particularly as these inhibitors were used to validate specificity in ref 19 (Yang et al 2020). It is also important that mounting evidence here (for example Fig. 3A) and elsewhere show that CDK4/6 inhibitors such as palbociclib may also impact CDK2 activity.

      The conclusion that CDK4/6 activity is sufficient for Rb phosphorylation is in large part based on the correlation of the CDK4/6 sensor response with measurements of Rb phosphorylation using phosphospecific antibodies (Fig. 1). However, the sensor was constructed using an Rb-based docking site, which is expected to give the sensor properties of Rb as a substrate. With the perspective that the sensor reports on Rb-like substrate phosphorylation, rather than CDK4/6 activity per se, the reported correlation is inevitable and cannot be used to support the conclusion. The sensor phosphorylation of course correlates with Rb phosphorylation, as it was designed precisely to behave that way. Some other independent measurement of CDK4/6 activity, for example activity toward a different substrate or measurement of the abundance of CDK4/6-CycD complexes is needed to avoid this circular reasoning.

      The plausible interpretation that the sensor merely reports on the threshold of any CDK activity sufficient to phosphorylate Rb would also make other conclusions less novel, for example, that sensor phosphorylation correlates with E2F activation. If one replaces "CDK4/6 activity sensor" with "Rb-phosphorylation sensor," few conclusions from the first two figures are compelling. For this reason, it is critical that the authors further detect and quantify CDK4/6 activity in some independent way. Otherwise, the data as presented are not sufficient to support several of the main conclusions of the paper as stated, and the conclusions that likely could be fairly drawn lack novelty.

      2) Experiments similar to those presented in Fig. S3 were published before in ref 19 (Yang et al 2020). In the previous paper, the effects of the drugs were used to validate the specificity of the CDK sensors. Here, the sensors are invoked to characterize the specificity and effects of the drugs. Again, this circular logic undercuts the validity of the conclusions. It is similarly plausible that either both the sensor and drugs have specificity or both lack specificity; the outcome of the set of experiments would be the same. These experiments are not as critical to the overall study, and the authors may consider removing this part of the manuscript, if further experiments are not possible.

      3) These conclusions following presentation of the data in Fig. 3 are not well substantiated: "the temporal location of the restriction point with respect to stress and CDK4/6 inhibition is closely coupled with engagement of feedback pathways" and "our data demonstrates that inhibition of CDK4/6 activity before threshold-based activation of CDK2-Rb feedback causes cell-cycle exit." The experiments only measure CDK activity and not engagement of CDK2-Rb feedback, so there must be some assumption about the correspondence of a threshold of CDK2 activity to activation of the feedback. How is it known that feedback is engaged? This question persists throughout the study. The authors should more carefully define what CDK2-Rb feedback is and how its initiation is detected experimentally. Is it Rb hyperphosphorylation, mRNA expression of an E2F target gene, or protein levels of CycE? One of these should perhaps be measured in Fig. 3 to state the conclusion in terms of CDK2-Rb feedback rather than a CDK2 activity threshold. Alternatively, if further experimentation is not possible, the conclusions should be carefully stated in terms of CDK2 activity rather than invoking the idea of "CDK2-Rb feedback."

      4) A number of recent studies have similarly used single cell reporter and other analyses to probe the relative roles of CDK4/6, CDK2, and APC-Cdh1 in the restriction point (including Rb inactivation) and S phase entry (e.g. refs 2-4, 16-19, 22, 26, 28). The authors need to better explain how the observations here fit into the paradigms being developed and disputed through this body of work. Several of the conclusions stated here have been reached before. For example, the order that CDK4/6, CDK2, and Apc-CDK1 activity changing en route to S phase, that CDK4/6 is sufficient for Rb hyperphosphorylation, and that CDK2 activity is a threshold for the restriction point have all been described and supported in some of the referenced papers and contradicted in other references. Yet, similar conclusions are stated here as if they are novel. This study still is important in that the use of a CDK4/6 activity reporter may be a powerful approach to investigating these questions. But the subtleties of how this work is distinct and/or confirming needs to be made more clear for the reader to understand its significance.

      A related concern is that the results and conclusions described in Fig. 5 are not particularly surprising or novel. There is extensive literature characterizing high CDK2 activity, including its upregulation through CycE expression, as a mechanism of acquired tumor cell resistance to CDK4/6 inhibitors (see for example references reviewed in PMID: 32289274). Other published studies have examined the effects of ectopic CycE expression on accelerating G1-S, including in the absence of CycD activity or even the absence of Rb (see for example PMID: 8108147, PMID: 7601350, PMID: 1388095, PMID: 14645251, and PMID: 9192874). The authors should place their results in the context of these previous results and emphasize what insights are novel here.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Although the reviewers all agreed that you are addressing an important problem, and that a single cell approach is likely to yield important insights, they had serious concerns over the specificity of the probes and reagents you are using, and the degree of advance that your study represents over the current literature. With regard to the latter, the referees strongly suggested that a more comprehensive literature review is needed to put your results in context.

    1. Reviewer #3:

      This work reports the results from a set of predominantly coarse-grained (CG) simulation of phospholipid interactions with the yeast fippase Drs2p:cdc50p in the outward facing state. Using the popular MARTINI force field, these simulations reveal multiple putative binding sites of lipid molecules and support a likelihood of the "credit-card" model of lipid transport. The authors have also analyzed the possible preference of different lipids at these sites. While these are interesting observations, they are severely limited by the CG nature of the model and lack strong corroborating support from either atomistic simulations or experiment.

      1) While this work includes a substantial set of atomistic simulations, they do not appear to provide much useful information or provide much support to any of the central conclusions of the work.

      2) Instead, virtually all key conclusions are based on MARTINI simulations. While this is indeed an outstanding CG model that has been successfully applied to an increasing number of problems (particularly self-assembly), it is highly questionable that MARTINI is appropriate for predicting binding sites. To the best of my knowledge, this model has not been demonstrated to be reliable for such purposes. It requires great caution and careful validation to establish and support the predicted binding sites.

      Are there any collaborating experimental evidence to support these sites? The authors only made minimal efforts to validate this critical prediction, largely by noting that EM densities suggest multiple binding sites. This needs to be investigated thoroughly, such as by direct comparison of these locations.

      Can one at least test if lipids can stably occupy those sites using atomistic simulations?

      3) Membrane thinning is only observed in CG but not atomistic simulations; this is alarming, as membrane thinning should be able to be captured in atomistic simulations within a few 100 ns. This has been demonstrated clearly in several published simulations of scramblases (e.g., Bethel and Grabe PNAS 2016, among others). This calls the quality of the MARTINI simulations into question for capturing detailed properties of this flippase complex.

      4) Free energy analysis was done with the MARTINI model, which greatly reduces its usefulness. As stated above, the MARTINI model is really not appropriate for such detailed free energy analysis of these putative binding sites.

    2. Reviewer #2:

      This manuscript "Computational Studies of Substrate Transport and Specificity in a Phospholipid Flippase" presents multiscale simulations to understand the details of a yeast flippase in lipid binding, membrane deformation, and protein hydration. Overall, an examination of the Drs2p-Cdc50p complex was carried out with 500-ns-long all-atom and 100-us-long coarse-grained simulations in different membrane models (pure PS, PE, PC and mixtures). Free-energy simulations were also employed to compare lipid binding free energies. A major finding is the identification of the anionic PS lipid binding to a water-filled substrate binding groove. However, I find the work lacks clarity, novelty, and biological insight.

      1) My primary concern is that three different phospholipids were selected in this work: PS, PE, and PC, but only the PS lipid is anionic. First of all, it is quite obvious that the PS lipid is preferred in this limited set, due to the formal charge difference. The higher affinity of anionic lipids to transmembrane proteins has been extensively studied (too many to list, but here are a few recent examples PNAS 2020 117, 7803-7813; Structure, 2019, 27, 392-403.e3; Sci Rep. 2018, 8, 4456; Sci Rep. 2016, 6, 29502)

      Second, according to prior experiments (Appl Environ Microbiol. 2014, 80, 2966-2972), the major phospholipids in yeast are phosphatidylcholine (PC), phosphatidylethanolamine (PE), phosphatidylinositol (PI), phosphatidylserine (PS), and phosphatidic acid (PA), with minor amounts of cytidinediphosphate-diacylglycerol (CDP-DAG). There are also glycosphingolipids, ergosterol, and proteins. None of the membrane models simulated in this work is an approximate to the realistic yeast cellular membrane. Because the lipid composition has important physiological impacts, I found a lack of justification of why key anionic lipids (like PI and PA) and ergosterol were not included.

      2) In addition, it was claimed "As our atomistic simulations were limited to 0.5-1.0 𝜇𝑠 due to their high computational cost". I cannot agree with the authors, given the system size of ~340,000 atoms. It is not rare to see microsecond or multiple-microsecond all-atom simulations (of this size or larger) in current studies of membrane proteins. Further, longer simulations might be more likely to sample lipid exchange and competition within the groove, as well as relevant protein conformational changes (which cannot be captured in CG simulations).

      3) Moreover, while I found the results presented in Fig. 5 quite interesting, the related paragraphs seem to lack the in-depth analysis and clarity to support "a 'credit-card'-like model" First, it is not clear to me how this lipid in Fig. 5 was selected. How did this lipid look in the outer leaflet vs. in the deep state of the groove? Second, there is no analysis of the event at ~21-23 us when the lipid starts to transition. What was the trigger of the event? Were there any specific interactions? Last but not the least, as the authors said "X-ray diffraction and Cryo-EM experiments on ATP8A1 and ATP11C show density for PL head groups", it is possible to compare the simulation results (lipid density) to the experimental density. It would greatly strengthen this paper if such analysis is included.

      4) The "water-filled cavities" results overall may need more clarification and probably even experimental support. First of all, how were the AA simulations compared with CG simulations, in terms of the cavities? Given the ENM constraints, there were little conformational changes of the cavities (of the protein) in response to PS moving the groove. There might be some induced fit effect and the cavities may adopt different shapes when such effect is fully considered in the AA modeling. Second, is there any experimental evidence to support this observation from MD simulations? For example, mutation of the key residue Ile508, suggested by the authors to separate the two cavities.

    3. Reviewer #1:

      This is an outstanding paper. MD simulations at two resolutions are employed to provide convincing predictions regarding the lipid-binding to flippases in terms of mechanism of binding and specificity. The topic is of fundamental biology interest and the results provide deeper insights than are possible with experimental structural biology methods alone.

      The simulations are certainly state-of-the-art in terms of methodology and are well ahead of the field in terms of simulation length.

      The paper is written and presented clearly. The results are explained in detail and have the necessary statistical treatment to provide confidence in them. The discussion is based on the results and contextualised appropriately- there is no claim that cannot be supported by the results.

      A number of important observations are reported including those concerning lipid tail orientations, water-filled cavities, and lipid binding affinity.

      Overall the authors should be commended on a thorough computational study.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      In general, all reviewers agreed that the problem is of importance and the simulations have been well conceived and thoroughly conducted at the coarse-grained level. However, there is the concern that while MARTINI is able to capture many collective properties of lipid membranes, it is not sufficiently reliable for dissecting molecular recognition processes governed by subtle free energy differences, especially when electrostatics (difference in charge state) and protein conformational rearrangements are expected to play major roles. In absence of direct supporting experimental verification, this concern undermines the central conclusions of the study.

    1. Reviewer #2:

      Here the authors expand on their prior modeling of origin activity (Platel 2015) in xenopus extracts. Their prior work, while successful in some estimates, failed to reproduce the tight distribution of interorigin ("eye to eye") distances. Here the authors generate a series of nested models (MM1-MM4) of increasing complexity to describe the distribution and frequency of observed initiation events in an unperturbed S-phase. Not surprisingly, the fit improves with the increasing complexity of each model. The authors then built an even more complex model based on prior published work to generate in silico data for which they tested their MM4 model. I admit to being a little lost at this point as to why the authors were using simulated data to assess their model and identify key parameters. Finally, the authors compare prior published experimental data from an unperturbed S-phase and one with an abrogated intra s-phase checkpoint (chk1 inhibition) and three parameters stood out J (rate limiting factor), 𝜃 (fraction of the genome with high origin initiation activity), and Pout (probability of remaining origins to fire) which suggests that Chk1 limits the probability of origin activation outside of the regions of the genome with high origin activation efficiency and modulates the activity of the rate limiting factor (J). These conclusions are consistent with prior observations in other systems. In summary, the authors apply elegant modeling approaches to describe xenopus in vitro replication dynamics and the effects of Chk1 inhibition, but the work fails to reveal new principles of eukaryotic origin regulation and replication dynamics. The most powerful modeling approaches are those that reveal a new or unexpected mode of regulation (or parameter) that can then be experimentally tested.

      Additional points:

      This was a very specialized manuscript and would be difficult to read for general biologists. The terms/parameters were only defined in a table and many of the figures would not be parsable by a broad audience.

      Figure 1. Sets off the challenge at hand -- that the previous model couldn't account for the distribution of "eye to eye" distances; but this is never assessed in similar format with the newer model. I assume this is captured in the appendix 1 figures, but was uncles if this was eye length or gap length.

    2. Reviewer #1:

      The current work by Goldar and colleagues uses numerical simulations to model the spatiotemporal DNA replication program in an in vitro Xenopus DNA replication system. By comparing modeled data and experimental DNA combing data generated during unperturbed S-phase replication and upon intra-S checkpoint inhibition (which the authors published previously), the authors find that DNA replication in Xenopus extracts can be modeled by segmenting the genome in regions of high and low probability of origin activation, with the intra-S-phase checkpoint regulating origins with low but not high firing probability. Recapitulating the kinetics of global and local S-phase replication under different conditions through mathematical simulations represents an important contribution to the field. However, one concern I have pertains to the generality of the model, as the authors did not explore whether the model can accurately simulate replication under other conditions (e.g., checkpoint activation).

      Major comments:

      1) In figure 1a and 1c, the authors show data that were previously published by the authors. Yet, the displayed values in 1a and 1c differ from those displayed in Figure 10 of Platel et al, 2015. This discrepancy should be explained.

      2) The authors test whether their model can simulate replication when S-phase is perturbed by Chk1 inhibition, but not under opposite conditions of Chk1 activation. This important analysis should be included.

      3) Although the MM4 model developed by the authors is in agreement with previously published experimental DNA combing data measured in the Xenopus system, it is unclear whether it can also accurately predict the replication program in other systems. Comparing simulated data with experimental data from another metazoan system would serve as an important additional validation of the authors' model.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This paper uses numerical simulations to model DNA replication dynamics in an in vitro Xenopus DNA replication system, both in unperturbed conditions and upon intra-S-checkpoint inhibition. The current work extends previous studies by the authors that recapitulated some but not all features of the replication program. The new model is superior as it can model both the frequency and the distribution of observed initiation events. Although the reviewers found the work in principle interesting and well executed, they have identified limitations of the study, both with respect to model validation and the extent to which the findings represent new biological insights into origin regulation and replication dynamics.

    1. Reviewer #3:

      The manuscript consists of a nice confirmation study and further validate the PET-index (Stender et al., 2016) as well as the EEG classification (Sitt et al., 2014; Engemann et al., 2018). The introduction is clear, the method is clear as well, results are well described and the discussion is concise and precise. The clinical impact of the study would greatly benefit from the availability of the PET index code on a platform such as GiHub to allow all centers with a PET scanner to use this index and provide a better diagnosis for DoC patients.

      Major comments:

      1) Regarding the behavioural assessment (i.e., number of CRS-r), is there a minimum of CRS-R performed? This should be stated in the method. Based on the table, some patients received only 2 CRS-R, while the rate of misdiagnosis with 2 CRS-R is as high as 26% for UWS patients (Wannez et al 2017). This is an important limitation. The number of CRS-R should be included in supplementary material, in a table providing all individual data (see next comment). Some UWS patients with a high index (>3.07) may have received only 2 CRS-R, which would have an important impact on the validity of the results.

      2) Were the EEG and PET acquisitions done on the same day? Which CRS-R was taken? The best or the one done on the day of the PET-scan? As the study compared the validity of the PET index and EEG classification, the fact that the two exams may not have been performed on the same day, and knowing that DoC patients fluctuate a lot, is a clear limitation and should be clearly acknowledged and discussed in the limitation section.

      3) For the PET voxel-based analysis, the significance threshold was set at p<0.005 uncorrected. Why did the authors use this threshold? It seems a bit arbitrary or convenient for the authors. It would be interesting and more transparent to present the corrected results too (e.g. in Supplementary Material).

      4) It is crucial to add a limitation section. The study has many limitations (not 5 CRS-R, heterogeneity of the population, PET-EEG and behavioural assessments not done on the same day, while comparing their respective accuracy, PET isn't easily available which limits the clinical impact of the present study, etc.).

      5) Individual data should be added (initial diagnostic, gender, age, etiology, best crsr, number of crs-r, index, eeg classification, outcome etc.) in supplementary material. The excel file provided is terrible to read. Could the authors at least tabulate the columns and provide a legend? In any case, I strongly suggest adding a table in supplementary material with the individual data.

      6) The references should be carefully checked. Some of them are in the text but not in the list, and some of them are in the list but not referenced in the text. The reference "Wannez et al 2018" does not seem to be the appropriate one.

    2. Reviewer #2:

      The study is a prospective cohort study evaluating both PET and EEG regarding the diagnosis and prognosis in VS/MCS patients. Thus, it represents a logical advancement from Stender et al 2016 and Bekinschtein et al 2009 towards clinical evaluation of the retrospectively established methods. To my knowledge there is no other prospective data set examining these methods. The authors plausibly show that the methods are capable of improving the diagnosis. The included number of subjects of 57 sufficient given the high effort necessary for this multimodal assessment. The results regarding the prognosis using the combined methods though significant certainly needs a targeted study with a fixed design before use in clinical practice.

      In the following I would propose some minor improvements:

      1) I would move the first two sentences of paragraph 2 (31 ff) to the discussion. They introduce a new concept that is not necessary to understand your major points in the introduction. I would stick to your story a) DoCs are important clinically because we don't know who is aware of what (a potential nightmare for the patient) b) PET seems to be really robust at telling but is actually not evaluated prospectively c) EEG might also help but in the past was not very robust in prospective studies d) Maybe a combination of both helps too. Second problem is what to tell relatives how the prognosis is. Actually, we know only little mainly as a side finding of Stender 2014 and 2016. In my opinion the latter points are told nicely.

      2) I would remove the regional differences as a discriminator. I have two concerns about them. The first is technical in nature: you applied an anatomical atlas to potentially deformed brains after injury. The paper does not convince me that this worked sufficiently because it is not described in detail and from my experience it is very difficult to segment this type of brains. The second concern is that the result does not really support your main findings and is thus dispensable. I would recommend to focus on the main points: PET is really robust in your sample (even the cut-off from Stender et al 2016 is pretty much reproducible) and EEG is also pretty robust (although sensitivity drops from 94% in-sample to 58% out-of-sample). Also, the combination works well. I think these are the main findings that have potential to make it into clinical routine.

      3) I would also focus the discussion on two points. First, the clinical impact of your findings. I think if you would deliver a fully automatized tool to reproduce your data pipeline people world-wide would be willing to use PET for their VS patients. As a second point you should also discuss the concept of the cortically mediated state and how your work is related to that.

      In conclusion, I think the study presented is technically and conceptually strong and provides a valuable step towards clinical routine application of the demonstrated methods. The language is also enjoyable to read.

    3. Reviewer #1:

      The authors intended to test whether FDG-PET pseudo-quantitative metabolic index of the best preserved hemisphere (MIBH), as well as EEG-based classification (the auditory local-global paradigm)
, and combination of the two methods, were accurate complementary markers to discriminate VS from MCS. Their results showed that an MIBH was accurate 
and robust procedure across sites to diagnose MCS, which can even be improved in combination with EEG-based classification allowing the detection of covert cognition and 6- month responsiveness recovery in unresponsive patients. Additionally, their results indicated 
that the behavioral diagnosis of MCS does not correspond to an elusive and generic conscious 
state, but rather to a CMS that reveals the preservation of metabolic activity in specialized 
cortical networks. These results provide valuable information for the clinic use of MIBH and local-global paradigm in the future. There are several issues which should be mentioned:

      1) As the authors put the "methods and materials" before the results, they should describe the patients’ information in a clear way in the "methods and materials", not in the results.

      2) The authors may need to provide more information about the EEG design. For example, what is the exact experiment design, ITI, stimulus number, and so on. More importantly, the authors need to provide the exact number of the left epochs after the rejection of the bad epochs for each patient.

      3) The authors indicated that the auditory local-global paradigm could be used to detect the consciousness. Furthermore, they also mentioned the cognitive-motor dissociation patients (CMD). If they can discuss the distinction of local-global paradigm and motor imagery tasks (or other tasks) which were used to detect the CMD, this will be very helpful.

      4) The results about accuracy of MIBH to discriminate between MCS and VS are not strongly related to results about how MCS did not correspond to an elusive and generic conscious 
state. The latter is more interesting. I would suggest the authors put them into two independent papers.

      5) Please provide more information about the "MCS items are associated with metabolic specific of subscales", such as how many patients in the analysis for each subscale?

      6) Please clarify why there are results about Motor CRS-R subscale: one in Fig.5 and the other one in supplementary Figure e-1.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      All reviewers in general agree that your study is solid with clear-cut results. In particular, the multimodal assessments of both PET and EEG, regarding the diagnosis and prognosis in VS/MCS patients, were carefully executed. As such, the results provide valuable information for future prognosis research guiding clinic use, e.g., a targeted study with a fixed design.

    1. Reviewer #3:

      1) As I state below the paper is carefully done (with a few minor issues) using a difficult and sophisticated biophysical technique, FCS to assess the changes in beta catenin diffusion within the cell following Wnt signaling. So it passes the test on being an original piece of work executed well. However what has been learned is quite limited. A few interactions, such as the slow diffusion in the cytoplasm can be interpreted several ways. It is very helpful to have concentrations in the nucleus and cytoplasm for beta catenin for future modeling. They could have tried to use single cross correlation with labeled APC or axin or the proteasome to derive more important information about the path through the destruction sequence. But that may be too hard to ask for at this stage. They could have combined their measurements with appropriate mutants or knockouts. I come down close to the line, high on the importance of the problem and the methods and execution; lower on the current take home lesson.

      2) The support for the somewhat limited conclusions is strong as it is.

      3) There are some technical issues. There is some concern with the FCS data itself. Figure 5F and 5G are of some concern. The curve doesn't drop to 1 at long correlation time (>100ms) and there are big fluctuations in the region of short correlation times (<0.1 ms). This could be due to the very long time course (120s) used in the experiment. Have the authors tried to image the same spot multiple times in short intervals (etc 10s), or try to analyze 10s sub-trace of the original long trace to see if the conclusions hold? This type of error could influence the calculation of the diffusion coefficient of complexes of CNNTB1. They also affect the quantification of concentration. In line 352-353 the authors mentioned the nuclear concentration of CNNTB1 increases 2.1 fold based on FCS measurement, which is smaller than the fluorescent intensity change. Is this the result of errors such as this.

      For confocal imaging analysis, the description was not clear as to whether there is background subtraction during the intensity quantification. If there is, the authors should mention it in the method explicitly. If not, the background could decrease the fold change estimation.

      In the model description line 877, equation (6), k7x6 should be k7x5

      Line 901 equation (15), there is no unit for the binding affinity

      Normally, a fraction of the fluorescent protein is not bright; the authors may not have a tool to measure the dark component but they should mention how it may affect the quantification in the discussion.

    2. Reviewer #2:

      The manuscript by S.M.A. de Man et al. presents a study on the cellular response to Wnt activation and on the intracellular kinetics of beta catenin (CTNNB1). The authors have developed cell lines expressing GFP reporters of CTNNB1 using CRISPR CAS9. They present different convincing controls on the specificity of the reporter and decided to analyze the temporal behavior of the best reacting clone. Then, they investigate the temporal evolution of fluorescent signals in the cell cytoplasm and nucleus upon Wnt signaling activation. They quantify the kinetics of the relocalization of CTNNB1 from the cytoplasm to the nucleus upon different strength of activation of the Wnt signaling and GSK3 inhibition. Using FCS, they identify that a dual diffusion model fits better the experimental data than a classical single diffusion model, suggesting the presence of complexes of different sizes. They measure the diffusion parameters and concentrations of the complexes in the nucleus and in the cytoplasm. Using a dynamical model, the authors reveal that, to recapitulate the experimental observations, the regulation of CTNNB1 upon Wnt signaling has to be controlled at three levels, the destruction complex, the nuclear transport and the binding affinity to the chromatin.

      Overall, the study is solid, presenting novel information on the kinetics of CTNNB1 during Wnt signaling. The results are consistent with the classical view on the regulation of beta catenin during Wnt signaling. I have few comments essentially on the methodology.

      Specific comments:

      -The authors have designed a new cell line allowing for tracing the kinetics of beta catenin over time following Wnt signaling activation. They follow the relative changes in concentration in the nucleus and cytoplasm upon activation of Wnt signaling. Normalized changes render difficult to evaluate if the difference in the increase in the cytoplasm and the nucleus is due to a higher increase in the nucleus or simply due the absence of beta catenin in the nucleus at the onset of the process therefore enhancing the quantification. A non-normalized plot showing the increase in grey levels in the nucleus and cytoplasm should be added to complement the quantification and identify the differences between nuclear and cytoplasmic beta catenin. It would also help the reader to compare with the results of concentrations extracted from the FCS.

      -The response in figure 4 upon Wnt signaling activation and GSK3 inhibition are different (with the absence of a plateau in the case of GSK3 inhibition). The explanation of this difference is unclear as it is. I would suggest the authors to detail a bit more their thoughts on the reason for the difference. Could this simply be that Wnt activation clusters just a subset of GSK3 at the membrane and that inhibition can reach a higher level of depletion of GSK3 in the cytoplasm?

      -How GSK3 inhibition treatment affects the FCS measurements, particularly concentrations and different complexes compositions? The differences with Wnt3 activation could provide additional information on the nature of the identified complexes.

      -The dynamical model presented in the paper shows a non-monotonous change in the concentration of beta catenin in the cytoplasm after activation. This seems to be due to the kinetics of nuclear transport and does not seem to be present in the experimental observations. Can the authors comment on this point? Is there a way by modulating parameters associated to transport to suppress this discrepancy?

      -Finally, the model is consistent with the experimental observations but the authors did not check with any type of perturbation how the model would compare with the experiments. For instance, how does the model compare with experiments in the case of GSK3 inhibition, or when nuclear transport is affected. Adding a perturbation case would significantly strengthen the connection between model and experiment and the message of the manuscript.

      -labels of the figure 4 and respective movies are inverted

      -The figure 1 only presents the classical model and no new concept/data. The figure 1 and figure 2 should be merged to my point of view.

      -The labels in the table 1 Wnt (ON -OFF) are inverted.

    3. Reviewer #1:

      CTNNB1 is a core component of canonical Wnt signalling that is frequently mutated in cancers. A constitutively active destruction complex (degradosome) binds and phosphorylates CTNNB1 earmarking it for proteasomal degradation, this complex is inactivated upon Wnt3a/GSK3β inhibition leading to CTNNB1 stabilisation and nuclear translocation. The authors have successfully employed CRISPR mediated endogenous tagging of CTNNB1 and determined its cellular concentration and diffusion dynamics in HAP1 cells, in both the cytoplasm and nucleus by live-cell imaging and analysis. They provide the relative subcellular CTNNB1 concentration for the nucleus and cytoplasm, like previous studies in other cell lines (Tan et al., 2012) and in Xenopus (Lee et al., 2003). In addition their results suggest CTNNB1 resides in slow moving complexes that persist upon Wnt but become slightly more mobile, these results are intriguing but raise several unanswered questions, such as whether these complexes represent the destruction complex (cytoplasm) or enhanceosome (nucleus). The work has been completed to a high standard but I have several concerns listed below.

      1) The authors acknowledge significant cell-cell heterogeneity. This is particularly noticeable in Fig.4A upon Wnt3a and CHIR99021 treatment. Fig.4B suggests all cells are analysed regardless of heterogeneity and the only exclusion criteria mentioned in the methodology is cells with a cytoplasm of less than 10pixels. Fig.4C/D does not seem to reflect the variation observed in Fig.4A? What is the spread pre-normalisation before and after treatment? How is the relative increase in nuclear/cytoplasmic intensity affected by cell size? Nuclear and cytoplasmic area? This may affect the relative fold increase and the cytoplasmic area seems highly variable at the confluence of cells shown.

      2) Using point FCS the authors determined two diffusion speeds corresponding to monomer and complexed CTNNB1 in both the nucleus and cytoplasm. A modest increase in cytoplasmic diffusion speed of complexed CTNNB1 was observed after Wnt3a (0.461μm2/s-1) but far from the speed of the monomer (14.9μm2/s-1) suggesting it remains complexed upon Wnt3a. In addition the fraction of complexed CTNNB1 (~40%) remains largely unaltered. Is the same true under CHIR299021 treatment? Point FCS samples a very small area of the cell cytoplasm/nucleus and therefore gives a small representation of the subcellular pool (which is likely heterogeneous), only a single point appears to have been analysed per-cell and within the 21 cells analysed clear outliers can be observed (Fig.6A/B), this has not been adequately discussed. What is the variation in diffusion measured at different points within a single cell? Some discussion has been made as to these complexes reflecting the destruction complex/proteasome or the enhanceosome but this really needs to be tested in order to make any conclusions about these observations. Especially as cytoplasmic complexes are maintained under Wnt conditions, this would challenge the notion that CTNNB1 disassociates from the destruction complex upon Wnt. Ideally endogenous tagging of other destruction complex components with a different fluorophore would be done to address this, if these complexes do represent the destruction complex and remain bound after Wnt this would have significant implications for our understanding of complex inactivation and greatly enhance the manuscript.

      3) The N&B analysis averages out monomeric and complexed CTNNB1 intensity across an image stack around a single ROI within each cell. The authors interpret Fig.6C to mean SGFP2-CTNNB1 is present as a monomer whether in a complex or not. This is based on the fact the relative brightness averages at 1.0 similar to a monomeric GFP control. However, the spread of relative brightness is large, and often less than <1 so a relative brightness of 1 cannot refer to a monomeric SGFP2-CTNNB1? Does cellular concentration affect relative brightness? If so transiently expressed monomer and dimer GFP may not be the best controls. Aggregation is spatially homogeneous and limited by the diffusion rate of protein/complexes - which your FCS measurements suggest is consistent with a large complex. Thus a single average may not represent the diversity of protein complexes, eN&B could be used (Cutrale et al., 2019). As mentioned in point 3, like FCS, you are only sampling a small region of the cell, which may or may not contain a destruction complex for example. Super-resolution imaging techniques such a STORM or LLSM may help with visualisation of cell complex heterogeneity and give a different impression of complex occupancy. I don't think the N&B data is sufficient to say complexes don't exist that contain more than one SGFP2-CTNNB1 molecule.

      4) The computational model relies on a number of assumptions determined in other studies that may not reflect the HAP1 cells used in this study. Lee et al., was performed in Xenopus and Tan et al., 2012 found a number of differences in their mammalian cell studies. Important information regarding the concentration of destruction complex components has also been omitted, this information is important for future comparisons of cell-type specific behaviours.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The authors investigate how cells respond to WNT signaling by altering beta catenin (CTNNB1) dynamics. They generated a number of cell lines in which they use different light microscopy techniques –such as FCS and number & brightness (N&B) measurements– to quantitatively investigate the diffusion behavior and complex formation of intracellular CTNNB1. The results are in general well explained, reasoned and technically well-controlled (except for some, which raised concerns that were pointed out by the reviewers). The main finding of the paper is that CTNNB1 seems to reside in slow-moving complexes (that exist both in the presence and absence of WNT) that become slightly more mobile after WNT addition. As pointed out by the reviewers, these results can be interpreted in different ways, and it is not clear whether these complexes represent the destruction complex (cytoplasm) or enhanceosome (nucleus). In summary, yet the work shows some technical proficiency which could address some critical issues in Wnt signaling, the authors would need to identify the issues that could be resolved by the technique and then design experiments to resolve them in the future.

    1. Reviewer #4:

      This paper presents CytofRUV, a new tool to remove technical batch effects in CYTOF data, inspired by tools used in the transcriptomics field. There is still a strong need for such tools and I expect this tool to be a valuable addition to the cytometry field. I especially appreciate the authors' effort in providing multiple evaluation measures and informative figures to estimate the properties of the batch effects before and after normalization. There is currently no one-fits-all solution for batch normalization, and having sufficient quality control along the way is absolutely invaluable.

      I recommend no major changes to the manuscript, but mainly some additional guidance in the reader's interpretation of some results, and some smaller suggestions to improve figures. Some of the more unexpected results are not commented on in the text and it would be helpful if some interpretation could be given in those cases.

      -Many methods cause an increased batch silhouette score compared to raw, does this mean that in those cases the methods increase the batch effects?

      -Also the Hellinger distances sometimes become bigger than originally. Would there be any way to check if this distance would be small given an adapted manual gating? Or could there be any reason that actually some cell types are indeed differing in proportion in the different batches, so you would not expect the batch correction to "restore" this (as no cells are added or removed by the correction)? As both CytoNorm and CytofRUV apply the normalization on a cluster-by-cluster basis, I am also not sure why the cluster proportions afterwards would become more similar. Can you give any further intuition about this?

      -While there is a section regarding "keeping biological differences" this is only explored on the population level in the individual samples. I would also find it of interest to read something about biological differences between samples which are preserved (e.g. maybe quantifying the differences between the healthy controls?)

    2. Reviewer #3:

      The manuscript in review discusses a new method to address technical variances in CYTOF data called CytofRUV and based on Remove Unwanted Variation methodology. CYTOF datasets are prone to significant batch-to-batch variation due to the technical nature of signal registration and this method adds to the group of previously published algorithms aimed to solve the same task.

      The manuscript is well-written and the narrative flows well. The authors come up with compelling examples of batch effect in CYTOF data (e.g. Fig.2) that honestly not only call for robust algorithmic normalization but make me somewhat question the claimed reliability of the CYTOF technology to deliver precise measurement of protein expression without robust replicates built into every experimental design of CYTOF experiments; this publication would surely raise awareness of existing issues. Authors also line up a series of metrics to quantify the efficiency of theirs and alternative methods for data normalization, and propose a strong battery of visual cues built into their Shiny app to evaluate the algorithm results.

      1) The algorithm performance deserves more discussion that is currently outsourced to the reference to original RUV paper (Molania et al). How computationally demanding is it? What computational resources were used? How does it scale to large datasets? How parametrization (choice of k value) affects the results specifically for CYTOF data (this is slightly touched upon in the Molania et al paper, but the data context is very different)?

      2) Are any of the metrics mentioned in the paper built into the R package/Shiny app? From the paper, it looks like the only outputs that the interface presents are the four visual plots but no evaluation metrics of how the normalization affected/improved the data.

      3) Besides silhouette scores, were there any other attempts to verify the data integrity post processing? For instance, how reproducible are clustering results after normalization if the processed data are clustered from scratch and compared to clustering performed before normalization?

      4) Based on existing datasets and metric outputs, would the authors suggest a way to estimate the minimal number of replicates (as discussed in lines 488-492) required for the specific panel/sample/instrument type to provide necessary power to preserve the resolution of the data post normalization?

    3. Reviewer #2:

      The authors have presented a novel approach based on RUV-III for normalizing CyTOF data leveraging replicate samples across batches. The article is clear, well laid out, thoughtful and presents well-substantiated conclusions. The RUV class of method has been applied across high throughput technologies including RNASeq, single-cell RNASeq, nanostring and others and it is a natural extension to single cell cytometry. I have few issues with the paper. My one minor concern is the conflation of the term cell subpopulation with cluster. I don't think this detracts from the conclusions of the paper, but the former typically is reserved for cells of a consistent and verified phenotype. FlowSOM and just about all other clustering methods do not necessarily produce clusters that correspond to consistent cell sub populations (the phenotype of the cells included in a cluster can and does vary). I think to make statements about sub populations, the authors would have to look at manual phenotype assignments as well. I am not suggesting that it is necessary, and I find the evaluation of the method with respect to clusters much more compelling and natural. However, I would request that the authors make the distinction between clusters and cell sub populations in this context.

      After looking at the software implementation I think some discussion of the computational complexity and limitations of the method and implementation is warranted, particularly time and memory considerations. Could the method scale to large data sets (100s or 1000s of samples with several 100k cells each), which are typical in clinical studies? Do all data need to be loaded into working memory for the current implementation, or in general?

    4. Reviewer #1:

      The article describes CytofRUV, an algorithm for normalization of mass cytometry datasets. The article is well written, the data is publicly available, and the source code is usable and well-documented. My comments are provided below:

      Major comments:

      1) I believe the focus of this article can be improved. The abstract is a bit confusing. If the article is focused on the algorithm, the focus of the abstract should not be on leukemia. This can be used in many settings. Similarly, much of the article (including 4 of the main figures) are dedicated to establishing that this one dataset indeed does have a batch effect issue. Other datasets are not introduced until the very end of the manuscript. However, for an article focused on the development of a new bioinformatics method, I believe the focus should be on evaluation of the algorithm on a broad range of datasets (which the authors have already done, but should be presented more prominently).

      2) Comparison with prior algorithms is only presented in a qualitative manner. Quantification of these comparisons, followed by appropriate statistical tests, would strengthen this article. I don't believe a new algorithm needs to outperform existing algorithms in every test (as it runs against the no free lunch theorem) but quantification should be provided regardless.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      The authors present a new Cytof normalization approach based on RUV III that has proven useful for other technologies including RNASeq, single-cell RNAseq and nanostring. The reviewers all agreed that this was a strong manuscript that makes an important contribution to an area of the field that remains under-served.

    1. Reviewer #3:

      The relative contributions of both asymptomatic infections and super spreading events to the ongoing SARS-COV-2 pandemic are critical, controversial questions. As far as I know this may be the first paper to utilize the approach combining phylogenetic inferences from genomic data with time series case data to estimate these parameters from available data applied to the ongoing SARS-COV-2 pandemic. However, with so many papers coming out so quickly it's possible I missed this.

      Here, the authors combine viral phylogenetics with time series case data to estimate parameters (including temporally structured estimates of the reproductive number) about the SARS-COV-2 pandemic in 12 locations globally. They find that the number of undetected infections ranges substantially by location from 13% to 92% and the precision of their estimates improves substantially with the number of viral genomes included from each location and this is visualized in Figure 2.

      However, in its current form it suffers from some shortcomings..

      SARS-COV-2 evolves slowly relative to other viruses and this can lead to high levels of phylogenetic uncertainty in recovered trees and this can have a strong influence on parameter estimates. According to the methods and the supplemental material the authors inferred a single phylogenetic tree for each location. The authors should be encouraged to infer a distribution of trees for each location and condition their analyses across this additional uncertainty. If this has already been done then the manuscript needs to be augmented to make this clear.

      Abstract:

      This section requires a thorough edit to improve clarity, in its current form it is rather discombobulated and needs to better link aims to results to conclusions.

      Introduction:

      The first 2 paragraphs of the introduction should be switched. The introduction should start with the big questions - in this case why it is important in the big picture of epidemiology to estimate parameters like the total number of infections - and then introduce the study system in play to address the big questions in this case SARS-COV-2.

      The third paragraph addresses other ways to directly estimate the number of infected through serological surveys. Missing from this paragraph is acknowledging the assumption that markers of immunity lasts long enough for such surveys to be effective in detecting past infected individuals.

      The final paragraph of the introduction outlines the aims and is rather lacking in scientific detail namely what are the hypotheses? What are the alternatives? What are the predictions and tests of hypotheses in play? What specific hypotheses are the authors testing by applying their method? This requires clarification.

      Methods:

      Generally, the methods lack sufficient detail to replicate what the authors have done.

      In the Viral genomes section of the methods it is stated that several locations were excluded due to "multiple circulating lineages" however nearly all of the locations included (e.g. Guangdong, Hubei, Shanghai, UK) also have multiple circulating lineages. What was done here needs to be clarified greatly.

      Phylogenetic inference as performed in IQ-TREE is fine however as previously mentioned the authors need to minimally infer a distribution of trees for each region to condition their subsequent analyses across.

      In the section on sub-sampling the sequences to the dominant lineages, how was lineage assignment done? Using Pangolin? Or another classification system? More detail is needed.

      A bit more detail on how the authors determined convergence was achieved would be valuable. For example, how was visual confirmation of convergence done? Via visual inspection of parameter traces? A generalist reader may need more detail than has been provided.

      Results:

      More detail is needed in the figure legend for Figure 1. For example unless I misunderstand this it is mentioned that the red lines are HPD intervals on those days but it is actually a shaded area with a measure of central tendency as a red line.

      Discussion:

      Overall, the discussion puts the results in appropriate context. It seems though that caveats associated with these analyses were not appropriately acknowledged. A bit more thought should be put into appropriate acknowledgements of things which may affect the authors estimates and interpretations of findings.

      On balance I do think that the approach utilized in this manuscript makes a potentially useful contribution to addressing the current pandemic and it is to my knowledge this approach has not yet been applied to SARS-COV-2. I would like to see additional analyses (incorporation of phylogenetic uncertainty) and a thorough edit and revision for clarity.

    2. Reviewer #2:

      The authors presented a Bayesian inference framework to fit a branching process model that incorporates both viral genomes and time series of case data to estimate the undetected COVID-19 infections. While the method seems to be valid, the application of the method on the data is subject to some uncertainties especially for locations in Asia, such as Japan, Shanghai and Hong Kong. Please see below for my comments/suggestions:

      Major comments:

      1) My biggest concern is that in many of the locations in Asia in Table 1/Figure 1, no sustained local outbreak has been detected. So far the majority of cases in Hong Kong were imported cases (https://www.chp.gov.hk/files/pdf/local_situation_covid19_en.pdf ). By the end of Feb 2020, more than 50% of cases in Guangdong of China were imported cases from Hubei. How would the sequence analysis and model fit be if imported cases are excluded?

      2) As mentioned above, the proportion of imported cases would likely affect the estimation of the Rt and undetected infections. What if the method is applied to imported cases and local separately for some of the locations such as Hong Kong (in which the imported/local case status is clear for every case)?

    3. Reviewer #1:

      In this work the authors use previously-developed methods linking viral sequence data and reported case counts to estimate the percentage of undetected infections and the effective reproduction number Rt through time in a number of locations. This is an extremely important topic. It remains the case that despite the urgency, there has not been consistent population-based viral testing and the fraction of COVID-19 cases that are reported remains largely unknown. This is an important topic and if genomics can help it is very valuable.

      However, there are some concerns about the methods for this specific application. Validation on simulated data, and exploration of robustness to some of the assumptions and limitations, could help.

      Dates of confirmation may differ from dates of symptom onset by many days. This is discussed briefly but the impact of a shift is not explored. The bias may additionally depend on the population size, with more bias towards the beginning when there are few cases and few sequences. It could also impact the sequencing; this is discussed briefly but could be explored to some extent by shifting the dates and re-estimating.

      The authors subsampled the sequences to the dominant lineages. More information about how this was done would be helpful. In addition, of course without information to link viral genomes to reported case counts, the same adjustment cannot be made to the reported cases -- could this impact the results? It is not quite clear how multiple lineages, introductions, geographical mixing in the phylogeny are treated. For example, consider an example in which the California sequences have some Minnesota ones embedded in them, scattered in a clade. If the Minnesota sequences in entirety are treated as one phylogeny (without any of the CA tips) then there would be very long branches between these and other Minnesota sequences, and the likelihood would reflect no branching events on these branches. In reality there were plenty of events but they were in CA. Meanwhile those branching events do not occur in the CA tree either, because their descendants have been pruned out of the CA analysis. In any case it is not clear what precisely is meant by not including locations with co-circulating lineages, nor how geographical mixing is treated.

      The probability of sequencing, and its variation over time, may affect the model's inferences, because in times of more dense sequencing the intervals in the tree will be shorter (and conversely). The model may not be able to distinguish this from changes in prevalence and reporting fraction. Should there be a rho_t that applies to the sequencing data?

      I wonder if the authors are able to model tips that occur in the reported data, handling these dates differently. It seems that the only link is through the conditional independence of the yi and zi information (condition on the xi information). I also wonder about the impact of phylogenetic uncertainty.

      There seems to be a possible identifiability issue with rho_t and x_t, because surely a higher x and lower rho could give the same likelihood, particularly since we can't sequence cases that we can't detect.

      How do the estimates of the reporting fraction compare to those obtained for example with the model by Russell et al ( https://cmmid.github.io/topics/covid19/global_cfr_estimates.html ) or with other estimates of under-reporting? (Some of these are given in the results but CIs are wide).

      I would have liked to see more information for how this was done: "we computed the smallest number of individuals that could contribute to 80% of infections during each week (Figure 4)". Similarly, detailed methods are not given for the 'time to detect an outbreak' results.

      It would be interesting to see the comparison between the estimated reporting fractions and the testing data available at (for example) https://covidtracking.com which allows downloads of data on testing through time by state. It is mentioned in the discussion; information about testing is available for many places (US states and otherwise) .

      I am also concerned about the large population assumption that is inherent in the mathematics behind the core equation for lambda_t (which the authors should either derive or give the citation for). This equation requires that the mean of the number of offspring in the data is equal to the mean of the offspring distribution, which only happens in the limit when the present and past populations are both large. The same assumption is required for the variance. Particularly in the early stages the large population assumption is unlikely to be met.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript.

      Summary:

      This paper uses a combination of sequence and case data to estimate the ascertainment rate of COVID19 in different settings. The methods are known but this is the first application to SARS-CoV-2 data, and the topic is of very high importance. The reviewers had some substantial concerns about the methodology and the clarity of description.

    1. Reviewer #3:

      In this manuscript the authors used high throughput light microscopy and image analysis to study the effects of essential gene knockdown via an arrayed CRISPRi library in M.smegmatis.

      There are many technical advances to this paper, and the experiments are well executed. The data and its analysis adds value to the mycobacterial field. I particularly appreciated the thoughtful Discussion, which honestly laid out the limitations for the author's work.

      However, in some areas, the lengthy manuscript came across as a bit unfocused. For example, in addition to describing the methods of their technique, the author's validate or give examples of what their data contain (identification of cryptic putative RM system, histidine auxotroph phenotypes, effects of disrupting mycolic acid biosynthesis). They then discuss the potential to use CRISPRi to conform compound MOA. This is a lot of information (10 figures with many subpanels), but none of these threads are really taken to completion. I appreciate the amount of work that doing that would take, so I'm not suggesting that as a revision. But, reshuffling or restructuring some of these sections, may help to guide the reader towards the utility of these data.

      Lastly, and I think importantly, after reading this manuscript, I was left with the lingering question: for any essential gene that I'm interested in, would these data help to make hypotheses about its function. And ... I'm not sure... The data as presented in Figure 6 do not help this case. While some functionally related genes cluster together, many do not, especially for genes that fall into cluster 2.

      With some textual changes to streamline the manuscript, I think the manuscript could be improved.

    2. Reviewer #2:

      de Wet et al. screen a CRISPRi library of M. smeg. essential genes for morphological phenotypes. Using a sensitive analytical approach, they find that most essential knockdown strains have morphological phenotypes. They further show that functionally related genes cluster by morphology in multidimensional space. Finally, they associate morphological changes with antibiotics to probe antibiotic MOA. This manuscript will be of interest to researchers studying essential genes in Mycobacteria.

      General Comments:

      1) "Moreover,to verify the reproducibility of the imaging workflow, replicate imaging was performed on separate days for 134 strains." Does this mean that the authors don't have replicate data for 29 strains? If so, imaging of these strains must be repeated to verify reproducibility. Have the authors validated any phenotypes with a second guide RNA to rule out off target effects?

      2) MSMEG_3213 isn't an example of defining the function of an uncharacterized gene--instead it simply validates existing database predictions. Further, the data presented here do not demonstrate that MSMEG_3213 is the methylase of an R-M pair.

      3) The his gene depletion phenotypes are likely due to translation defects that result from uncharged tRNAs. This is consistent with tRNA synthetase/ribosomal protein knockdown phenotypes presented in this manuscript, as well as the observation that translation inhibition by knockdown or serine hydroxymate produced elongated cells in Bacillus subtilis (PMID: 27238023).

    3. Reviewer #1:

      General assessment:

      This manuscript addresses the lag in identifying functions of genes annotated in bacterial genomes. It is an epic presentation of a line of investigation from inception through assay development and validation to identifying previously unknown functional associations. Beyond these initial novel insights, the developed phenoprinting approach and the resulting UMAP space provide a solid foundation for future conditional gene function and initial drug mechanism of action studies.

      Substantive concerns:

      None

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript combines a CRISPRi library in Mycobacterium smegmatis with high throughput light microscopy and image analysis to investigate the effects of essential gene knockdown on bacterial morphology. The reviewers all agree that there are many technical advances presented in this paper, the experiments are well executed, and the data and its analysis is significant for the field. However, there are some questions regarding the reproducibility of the data and the utility of these data as a predictive tool. The reviewers believe that these questions should be straightforward to address, as described more below.

    1. Author Response

      Reviewer #1:

      Summary:

      In this paper, the authors utilize CRISPR-Cas9 to generate two different DMD cell lines. The first is a DMD human myoblast cell line that lacks exon 52 within the dystrophin gene. The second is a DMD patient cell line that is missing miRNA binding sites within the regulatory regions of the utrophin gene, resulting in increased utrophin expression. Then, the authors proceeded to test antisense oligonucleotides and utrophin up-regulators in these cell lines.

      Overall opinion (expanded in more detail below).

      The paper suffers from the following weaknesses:

      1) The protocol used to generate the myoblast cell lines is rather inefficient and is not new.

      2) Many of the data figures are of low quality and are missing proper controls (detailed in points 5,7,10, 12, 13,14)

      Detailed critiques:

      1) The title needs to be changed. The method used by the authors is inefficient. The title should instead focus on the two cell lines generated.

      We appreciate the reviewer’s comments: thanks to them, we have realized the focus of the manuscript should be in the new models we described and less in the methodology used to create them.

      Originally, we wanted to share the problems we faced when applying new CRISPR/Cas9 edition techniques to myoblasts: our conversations with other researchers in the field confirmed that many were having similar problems. However, the reviewer is right in the fact that there are many ways around this problem. We do describe ours and we are working in a new version of the manuscript with additional data to characterize our new models further and where the method used to create them, although included, is not the main focus of the manuscript. In this new version we will change the title accordingly.

      2) Line 104: The authors declare that the efficiency of CRISPR/Cas9 is currently too low to provide therapeutic benefit for DMD in vivo. There are lots of papers that show efficient recovery of dystrophin in small and large animals following CRISPR/Cas9 therapy. The authors should cite them properly.

      Thank you for your appreciation. We have reviewed the literature again to include new evidences of efficient dystrophin recovery as well as other studies with lower efficiency.

      3) Figures 1, 2,3, and 4 can be merged into one figure.

      4) Figure 2A and 2B can be moved to supplementary.

      5) Figure 2C and 2D are not clear. Are the duplicates the same? Please invert the black and white colors of the blots.

      Thank you for your comments. We have inverted the colors of the blots and changed the marks used in figure 2C and 2D to clarify that duplicates are indeed the same sample, assayed in duplicates. We have also merged figures 1 and 4 and moved figures 2 and 3 to supplementary in this new version.

      6) Figure 3: In order to optimize the efficiency of myoblast transfection, the plasmids containing the Cas9 and the sgRNA should have different fluorophores (GFP and mCherry). This approach would increase the percentage of positive edited clones among the clones sorted.

      We think the reviewer may have misunderstood our methodology: we are not using a plasmid with the Cas9 and another with the sgRNA, we are using two plasmids, both containing Cas9 and each a different sgRNA. We did try to use two different plasmids, one expressing GFP and one expressing puromycin resistance, but we found out that single GFP positive cell selection plus puromycin selection was too inefficient. We could have tried with two different fluorophores, but we tested the tools we had in our hands first and were successful at obtaining enough clones to continue with their characterization, so we did so instead of a further optimization to our editing protocol.

      7) Figure 4A: In the text, the authors state that only 1 clone had the correct genomic edit, but from the PCR genotyping in this figure shows at least 2 positive clones (number 4 and 7).

      Thank you for your appreciation. As you said, we got two positive clones (as we also indicate in figure 3B) but we completed the full characterization of one of them (clone number 7= DMD-UTRN-Model). In the new version of the manuscript we explain this further.

      8) Figure 4C: The authors should address whether one or both copies of the UTRN gene was edited in their clones.

      Thank you for your comment. Both copies of the UTRN gene were edited in our clones. We have included this information both in the text and in the figure 4 legend.

      9) Figure 4 B and D: The authors should report the sequence below the electropherograms.

      Thank you for this correction, we have included the sequence under the electropherograms.

      10) Figure 5B: This western blot is of poor quality. Also, the authors should specify that the samples are differentiated myoblasts. Lastly, a standard protein should be included as a loading control.

      Thank you for your comment. Poor quality of dystrophin and utrophin western blots was the main reason to validate a new method in our laboratory to measure these proteins directly in cell culture (1) like an alternative to western blotting. Since then, the myoblot method has been routinely used by us and in collaboration with other groups and companies. We included the western blot as it is sometimes easier for those used to this technique to be able to assess a blot in which there is no dystrophin expression. As you pointed out, our samples were all differentiated myotubes, not myoblasts, and we have modified this accordingly. Thank you very much for pointing out this mistake

      On the other hand, as described in the methods, Revert TM 700 Total Protein Stain (Li-Cor) and alpha-actinin were included as standards in dystrophin and utrophin western blots, respectively.

      11) Figure 5E: We would like to see triplicates for the level of Utrophin expression.

      We thank the reviewer for his/her recommendation, but we do not consider western blotting a good quantitative technique, we have included western blots to show the expression/absence of protein at the same level. We have included many more replicates than needed to show at the level of utrophin by myoblots. We acknowledge that western blotting is the preferred method for some reviewers, so in the new version of our manuscript we clearly indicate the value we give to each technique, being myoblots our choice for quantification.

      12) Figure 6: A dystrophin western blot should be included to demonstrate protein recovery following antisense oligonucleotide treatment. Also, the RT-PCR data could be biased as you can have preferential amplification of shorter fragments.

      Thank you for your recommendation but as we have explained before, myoblots have been validated in our laboratory to replace western blot for accurate dystrophin quantification in cell culture.

      13) Figure 6A: Invert the black and white colors. The authors should also report the control sequences and sequences of the clones under the electropherograms.

      Thank you for your suggestion, we have inverted the colors and added the sequences under the electropherograms.

      14) Figure 6B: Control myoblasts should be included in figure 5C.

      Thank you for this correction, we will include control myoblasts in the new manuscript version.

      15) Figure S2A: Invert the black and white colors.

      Thank you for your suggestion, we have inverted the colors.

      Reviewer #2:

      The work from Soblechero-Martín et al reports the generation of a human DMD line deleted for exon 52 using CRISPR technology. In addition, the authors introduced a second mutation that leads to upregulation of utrophin, a protein similar to dystrophin, which has been considered as a therapeutic surrogate. The authors provide a careful description of the methodology used to generate the new cell line and have conducted meticulous evaluations to test the validity of the reagents.

      However, if the main purpose of this cell line is to perform drug or small molecule compound screenings, a single line might not be sufficient to draw robust conclusions. The generation of additional DMD lines in different genetic backgrounds using the reagents developed in this study will strengthen the work and will be of interest to the DMD field.

      Thank you for your appreciation. We think that a well characterized immortalized culture, like the one we describe is sufficient for compound screening, as described in other recently published studies (2), (3). About the other suggestion, we have indeed used our method to generate other cultures for collaborators, but they will be reported in their own publications, as they are interested in them as tools in their own research projects.

      Further, the future use of the edited DMD line with upregulated utrophin is unclear. The utrophin upregulation adds a complexity to this line that might complicate the assessment of screened compounds. In contrast, this line could be used to test if overexpression of utrophin generates myotubes that produce increased force compared to the control DMD line.

      We think we may have not explained our screening platform well enough. Our suggestion is to offer our newly generated culture ALONGSIDE the original unedited culture: the original is treated with potential drug candidates, while the new one may or may not be treated, if these drug candidates are thought to act by activating the edited region (see an example in the figure below). In this case, the new culture will be a reliable positive control to the effects that may be reported in the unedited cultures by the drug candidates. We will make this clear in the new version of the manuscript.

      Created with BioRender.com

      In summary, while there is support and enthusiasm for the techniques and methodological approach of the study, the future use of this single line might be dubious and could be strengthened if additional lines are generated.

      We share the reviewer’s enthusiasm for this approach, and we have included in the new version of the manuscript further characterization of this new cell culture that we think would demonstrate its usefulness better.

    2. Reviewer #2:

      The work from Soblechero-Martín et al reports the generation of a human DMD line deleted for exon 52 using CRISPR technology. In addition, the authors introduced a second mutation that leads to upregulation of utrophin, a protein similar to dystrophin, which has been considered as a therapeutic surrogate. The authors provide a careful description of the methodology used to generate the new cell line and have conducted meticulous evaluations to test the validity of the reagents.

      However, if the main purpose of this cell line is to perform drug or small molecule compound screenings, a single line might not be sufficient to draw robust conclusions. The generation of additional DMD lines in different genetic backgrounds using the reagents developed in this study will strengthen the work and will be of interest to the DMD field.

      Further, the future use of the edited DMD line with upregulated utrophin is unclear. The utrophin upregulation adds a complexity to this line that might complicate the assessment of screened compounds. In contrast, this line could be used to test if overexpression of utrophin generates myotubes that produce increased force compared to the control DMD line.

      In summary, while there is support and enthusiasm for the techniques and methodological approach of the study, the future use of this single line might be dubious and could be strengthened if additional lines are generated.

    3. Reviewer #1:

      Summary:

      In this paper, the authors utilize CRISPR-Cas9 to generate two different DMD cell lines. The first is a DMD human myoblast cell line that lacks exon 52 within the dystrophin gene. The second is a DMD patient cell line that is missing miRNA binding sites within the regulatory regions of the utrophin gene, resulting in increased utrophin expression. Then, the authors proceeded to test antisense oligonucleotides and utrophin up-regulators in these cell lines.

      Overall opinion (expanded in more detail below).

      The paper suffers from the following weaknesses:

      1) The protocol used to generate the myoblast cell lines is rather inefficient and is not new.

      2) Many of the data figures are of low quality and are missing proper controls (detailed in points 5,7,10, 12, 13,14)

      Detailed critiques:

      1) The title needs to be changed. The method used by the authors is inefficient. The title should instead focus on the two cell lines generated.\

      2) Line 104: The authors declare that the efficiency of CRISPR/Cas9 is currently too low to provide therapeutic benefit for DMD in vivo. There are lots of papers that show efficient recovery of dystrophin in small and large animals following CRISPR/Cas9 therapy. The authors should cite them properly.

      3) Figures 1, 2,3, and 4 can be merged into one figure.

      4) Figure 2A and 2B can be moved to supplementary.

      5) Figure 2C and 2D are not clear. Are the duplicates the same? Please invert the black and white colors of the blots.

      6) Figure 3: In order to optimize the efficiency of myoblast transfection, the plasmids containing the Cas9 and the sgRNA should have different fluorophores (GFP and mCherry). This approach would increase the percentage of positive edited clones among the clones sorted.

      7) Figure 4A: In the text, the authors state that only 1 clone had the correct genomic edit, but from the PCR genotyping in this figure shows at least 2 positive clones (number 4 and 7).

      8) Figure 4C: The authors should address whether one or both copies of the UTRN gene was edited in their clones.

      9) Figure 4 B and D: The authors should report the sequence below the electropherograms.

      10) Figure 5B: This western blot is of poor quality. Also, the authors should specify that the samples are differentiated myoblasts. Lastly, a standard protein should be included as a loading control.

      11) Figure 5E: We would like to see triplicates for the level of Utrophin expression.

      12) Figure 6: A dystrophin western blot should be included to demonstrate protein recovery following antisense oligonucleotide treatment. Also, the RT-PCR data could be biased as you can have preferential amplification of shorter fragments.

      13) Figure 6A: Invert the black and white colors. The authors should also report the control sequences and sequences of the clones under the electropherograms.

      14) Figure 6B: Control myoblasts should be included in figure 5C.

      15) Figure S2A: Invert the black and white colors.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Lee Rubin (Harvard University) served as the Reviewing Editor.

      Summary:

      While the paper by Soblechero-Martín et al., may present an ultimately useful method for modifying genes in skeletal muscle, the reviewers felt that, at the current time, the robustness of the methods and the amount of data presented were insufficient. The reviews below point towards additional experiments that could be done to improve this paper.

    1. Author Response

      Reviewer #1:

      This study is an in silico analysis of data from the Cancer Genome Atlas (TCGA) on hepatitis B virus (HBV)-positive liver tumours and human papillomavirus (HPV)-positive cervical and head and neck tumours and association with viral load, genotytpe(s) and expression. It is unclear to me the rationale behind including two unrelated DNA tumour viruses in the study, especially as the number of HBV-positive samples is much less than for HPV. Overall the manuscript seems to be a validation of a bioinformatic tool rather than reporting significant research findings.

      We strongly believe that a global summary of key oncoviral-associated tumors makes sense in this context precisely because of the fundamental importance viral genotype is already known to have. While HBV and HPV are of course quite different viruses, there is extensive clinical evidence that linking outcomes to specific viral genotypes and phenotypes is of great value, which we expand upon in our work via a working demonstration of ViralMine. For this reason we think it is crucial to present both virally related cohorts together as they support each other, demonstrate robustness our methods across completely different systems while allaying concerns about fine-tuning, and create a cohesive picture of the effect of viral genotype across the molecular landscape of two key onco-viruses. As the reviewer notes this does implicitly demonstrate the utility of ViralMine but we do emphasize that it also does uncover significant research findings.

      Concerning the HBV/HPV sample sizes, in fact the number and percentage of infected HCC samples is substantially higher than that of cervical or head and neck HPV samples as discussed in detail on page 4 of our manuscript.

      Use of the TCGA has allowed analysis of a reasonably large number of RNASeq data sets. However, once the authors drill down to individual genotypes, numbers become quite small, which may compromise some of the observation. For example, the large discrepancy between numbers of HPV16 (173) and 18(39)-positive cases makes it difficult to make firm conclusions about the significance of differentially expressed cellular genes for each set of cancers. Similarly, in Figures 4 and 6 they compare HPV18 (23 cases) with HPV45 (39 cases) and HPV18/45 coinfections (number not stated but likely far fewer).

      While there is an imbalance in group size between HPV genotypes in the cervical cancer cohort, the test statistic used by the DESeq2 pipeline to identify differentially expressed genes does account for class imbalance and even in the most extreme case we have analyzed the dispersion parameter estimates are easily verified as accurate. In fact accurately inferring group-wise dispersion parameters given unequal group sizes is a well-known problem, and in any case this problem only becomes acute when one group becomes so small (~1 sample) that it becomes difficult to estimate its common dispersion parameter. That situation clearly does not arise here. Additionally, in Figure 4b, it should be noted that we are comparing ALL HPV co-infected cervical tumor samples (92 cases) against single-infection samples (193 cases), which the reviewer may find more confidence in and which is obviously statistically reasonable. Furthermore, while the comparison of cervical cancer HPV18 (n=10), HPV45 (n=9), and HPV18/45 coinfected (n=39) cases in Figure 6b does compare relatively small patient groups, the significant difference in neoantigen population TCR binding affinity is confirmed by a one-sided, non-parametric KS-Test and shown to be robust to subsampling, which formally demonstrates that the signal is not artefactual. Therefore from a statistical point of view the concerns raised about class imbalance and power are not fundamental and were addressed in the original manuscript draft. Thus, we believe we can completely address the reviewer’s concerns by:

      In Figure 3a, Figure 4a and b, signify the group sizes (n=X) compared in the barcode plots to improve transparency in the contrasts, and additionally add group numbers to Figure 6a and b. Further, we will include a new supplementary figure demonstrating that a bootstrap resampling of the HPV group neoantigens to balance for group size validates that the difference in TCR binding affinity distributions is robust.

      Much of the information that they derive from their analyses is not novel. For example, they report no preferential sites of HPV integration. Despite what they claim, quite a bit is known about HPV co-infection in cervical cancers and it is not uncommon but varies according to geographical regions, which was not a variable they used.

      We acknowledge that other oncoviral survey papers have provided evidence of preferential integration (as we originally cited, as well as referenced in Dall et al. (2008), Zhang et al. (2016)). However, these and other previous characterizations of recurrent HPV integration do not attempt to organize these sites by either genotype or co-infection status, which was our explicit and stated aim, principally because they could not efficiently and accurately determine these parameters from in-situ tumor RNA. As we found no preference in integration along these axes of variation (which we acknowledged openly in the manuscript as being expected when using RNA rather than DNA), we deliberately chose not to present these results as a main finding and included them in supplemental results for the sake of completeness.

      We also agree that HPV co-infection in cervical lesions is not per-say a novel finding, although to be clear most literature focuses on side-by-side infections of HPV with another virus (HHV, EBV, HIV, etc.), or uses the term to describe groupings of sub-variants or isolates under the same viral genotype header (Mirabello et al. (2016)). Additionally, most of the literature focuses on HPV co-infection in cervical neoplasia or high-grade lesions and cervical cancer risk (Chaturvedi et al. (2011); Senapati et al. (2017)) rather than assessing HPV co-infection in the tumoral tissue itself, post oncogenesis. As such, we believe that our approach at looking at in situ cervical tumor infections and the relatively high rate of HPV co-infections we observe does merit particular notice compared with previous studies. Furthermore, the analyses linking this cross-genotype co-infection phenotype with tumor gene expression, survival adjusted for major known clinical covariates, and tumor immunogenicity measures has not been reported elsewhere to our knowledge.

      For HPV, viral exon-level RNASeq analysis is irrelevant because HPV gene expression is polycistronic and is subject to changes by random viral integration events in individual cases. Therefore, it is unlikely that general overall viral gene expression signatures will be diagnostic besides, from multiple studies we understand that what matters in cervical cancer is the level of expression of the E6/E6 isoforms/E7 oncogenes.

      We agree that the post-transcriptional polycistronic nature of HPV expression makes it difficult to elucidate the effect of differing HPV gene-level expression on ultimate HPV gene translation and protein expression. However, our related yet distinct question here is on the effect HPV genotype and cancer type has on HPV gene transcriptional differences (as seen in Figure 7), so we believe we are within the limits of reasonable interpretation. Additionally, while E6 and E7 expression are well known to drive oncogenesis, it seems crucial to quantify the expression of these viral oncogenes across viral genotype and tissue type, which has not been done previously to our knowledge. Finally, even if we somehow accept that the average tumoral viral gene exon expression itself is best described as a random variable, which we do not, it remains to be explained why we observe and report persistent genotype-specific expression patterns across completely different cell-types.

      The references chosen for the HPV part of the study are either rather out of date or not representative of the extensive literature.

      We acknowledge that we have cited only a portion of the vast HPV-related cancer literature, so we have made an effort to include more recent surveys and studies as references.

      Reviewer #2:

      1) The authors comment that averaged infection phenotypes such as viral load or predominant genotype may be replaced by more granular measures, such exon-level viral expression or the ratio of expressed viral genotypes. In reality, viral expression, and the ratio of expressed viral genotypes, are still 'tumor averages' in the way that the authors have analysed them. HP associated tumors are heterogeneous, and without in situ analysis, it is hard to discern which transcripts are involved in driving the cancer phenotype, and which are found in associated precancerous tissue.

      We concede that the viral genotypes quantified by our method represent a computed average measure across the tumor, as would any measurement of any quantity in a bulk sequencing assay. However, the information provided by the admixture of genotypes and exon-level viral expression does provide an additional measure of granularity over previous bulk measures, and allows additional analyses not explored previously to our work. To make a comparison, this criticism could identically apply to cell-type decomposition algorithms like Cibersort, which despite their problems and inherent limitations do provide insightful information. We agree with the reviewer that with more targeted in situ analyses would allow for a truly specific association of particular viral transcripts with tumor phenotype, and would serve as a useful validation of some of our results, but this certainly does not invalidate the tumor aggregated genotype and co-infection presence associations we present here. We agree with the reviewer that multiple biopsies would allow for intra-tumoral heterogeneity to be taken into account in our study, however no major public resources (e.g. TCGA) include such data and we believe that such an undertaking lies out of any reasonable scope of this work.

      2) The authors use the term co-infection quite widely. For HPV, previous studies have shown that coinfection within cells in an individual cancer or neoplasia is rare, although independent infections by different HPV types can occur side-by-side. I expect something similar with HBV, although the study would need a higher level of analysis to establish this. The use of terminology, and the way in which data is interpreted, needs to be much more rigorous.

      We agree with the reviewer that the use of ‘co-infection’ in this context is unclear, as co-infection on a cellular level with two different HPV/HBV genotypes is impossible to determine by bulk RNA sequencing analysis. We will clarify ‘co-infection’ as strictly a mixture of independent HPV infections contained in the same tumor tissue.

      We will clearly define our meaning of ‘co-infection’ in the introduction as the aggregated mixture of HPV genotypes expressed in the tumor tissue (‘side-by-side’ infections), to remove ambiguity as to our cohort characterization.

      3) Viral load is generally used in the field as a measure of viral genome or genome-fragment abundance. This is already a misuse of the terminology, as the term implies virus numbers, or even infectious virus numbers. Here the term is used to refer to viral transcript abundance. The authors need to say precisely what they're measuring, and need to be aware that they are measuring the average across a heterogeneous tumour, which may have areas of high grade neoplasia, cancer, and even low-grade neoplasia. My feeling is that the level of analysis is too great, given the uncertainties regarding the heterogeneous nature of tissue that is being analysed, and the different cells with different levels of viral gene expression that are most likely present.

      We agree that as the reviewer frames it, our use of ‘viral load’ should be clarified as ‘viral transcript abundance’ as determined from the tumor RNASeq data in variance-stabilized units of log2 counts per million reads mapped across the viral contig. We do note however that it has been previously indicated that levels of viral transcripts do correlate well with virus numbers in infected tissue. Concerning the last comment of the reviewer, we wish to point out that our analysis goes no further in either analytic complexity nor in drawing inference from expression data than any published other study based on tumor bulk RNA-sequencing data. All samples will contain a mixture of cells and we emphasize that we are only measuring average signals, viral or host tumor specific, across this mixture.

      To address these comments we will change all references to viral load to normalized viral transcript abundance, to remove ambiguity. We can once again emphasize that our conclusions hold only in a strict averaged sense.

      4) Several of the figures don't obviously support the conclusions. For instance, it is not clear how the data shown in figure S2 supports the title of the S2 figure legend. Surely some statistical analysis is needed to support the conclusion stated in the legend. Given previous studies, I'm not at all convinced that the distribution of causative HPV genotypes is the same between SCC and Adenocarcinoma. An additional limitation of these large cancer association studies, comes from limitations in pathology diagnosis, which cannot always accurately distinguish borderline SCC/adenocarcinoma cases. With the large-scale transcriptional analysis, maybe the authors can use molecular information available in their samples to look at this.

      As the reviewer points out, we agree the statistical evidence backing our claim of no association between cervical histology and HPV infection genotype or co-infection should be added. This calculation was actually carried out and only reported in the text, but we will amend the figure to include the results and apologize for this key omission. We also note in passing that we are not making any claims about ‘causative’ HPV genotypes for the respective subtypes, but rather much more conservative statements about association. Concerning the reviewer’s concern about the quality of the phenotypic data reported in the TCGA, we heartily agree but are unable to really do much else. Indeed, concerning the last interesting comment about utilizing molecular information in our samples to distinguish SCC/adenocarcinoma subtypes, we did not find reliable gene expression signatures which could be used to validate or correct the phenotypic results.

      We will add in the spearman correlation rho and test significance results for the correlation between cervical cancer histological type and both viral phenotypes represented in figure S2.

      5) The APOBEC analysis is quite rudimentary in the text, and does not discuss the different members of the APOBEC family. Similarly, the different effects of single and multiple HPV infections on the IFR3 responsive genes is poorly developed at the biological level, which most probably reflects the general way in which the utility of the approach.

      We agree with the reviewer that our APOBEC expression analysis in the HPV+ cervical cohort could be more comprehensive, and therefore the interpretations of the results may be too far reaching. We believed the initial result to be of sufficient interest in the context of a very similar result from Zapatka et. al (2020), but concede it may make more sense as a supplemental result alone without additional evaluation or discussion of the greater APOBEC family. Additionally, the pathway analysis involving the differentially expressed genes from the co-infected and non-coinfected cervical tumors most likely should be moved to a supplemental result as well without further analyses to support the enrichment trends, following how we reported the HBV associated liver cancer co-infection DEG results (figure S5).

      We will move Figure 3d to a supplemental figure, and limit our comments in the results to just an observation in reference to Zapatka et. al., and delete any associated interpretation. We will move Figure 3c to a new supplemental figure as well, and remove the suggestion of expanded antiviral activation in co-infected tumors.

    2. Reviewer #2:

      The title of the manuscript suggests a detailed analysis of cancers using in situ gene expression approaches, which aims to provide new insight into tumour heterogeneity and co-infection. The manuscript is in fact an analysis of viral transcription and the presence of cellular mutations in a collection of tumours associated with HPV and HBV infection. Much of the starting data for the analysis has been drawn from the TCGA database. It is a little unclear as to whether the authors are pitching this paper as a methodological development manuscript, but I think that this is what it is at its heart. The ability to deconvolute RNA sequencing data from virus-associated tumours is interesting, and could be widely used as a research tool. However, much of the manuscript is concerned with interpreting the data, and I think the interpretation goes well beyond what can feasibly be achieved from the analysis of transcripts in extracts of total tumour tissue. The authors term 'co-infection' most likely refers to heterogeneous mixtures of viral infected cells which are competing with each other in the tumour. In my view, the biological interpretations are not particularly useful at the level that they are presented, but could serve as the starting point for future research. This manuscript could be repackaged as a description of a new analytical tool, or the most exciting aspects drawn out with the addition of biological studies to explain what the transcriptional analysis may mean. This would be a complex process, and would be facilitated by focus on either HPV or HBV, as trying to extend conclusions to the two disparate virus families in one manuscript is probably unrealistic. Without any analysis of tumour tissue using in situ analysis or single cell sequence analysis, or a combination of the two, there is little new information that can be drawn regarding the biology of disease development. My suggestion would be to repackage this as an analytical methodology publication, rather than a biology discovery manuscript.

      1) The authors comment that averaged infection phenotypes such as viral load or predominant genotype may be replaced by more granular measures, such exon-level viral expression or the ratio of expressed viral genotypes. In reality, viral expression, and the ratio of expressed viral genotypes, are still 'tumor averages' in the way that the authors have analysed them. HP associated tumors are heterogeneous, and without in situ analysis, it is hard to discern which transcripts are involved in driving the cancer phenotype, and which are found in associated precancerous tissue.

      2) The authors use the term co-infection quite widely. For HPV, previous studies have shown that coinfection within cells in an individual cancer or neoplasia is rare, although independent infections by different HPV types can occur side-by-side. I expect something similar with HBV, although the study would need a higher level of analysis to establish this. The use of terminology, and the way in which data is interpreted, needs to be much more rigorous.

      3) Viral load is generally used in the field as a measure of viral genome or genome-fragment abundance. This is already a misuse of the terminology, as the term implies virus numbers, or even infectious virus numbers. Here the term is used to refer to viral transcript abundance. The authors need to say precisely what they're measuring, and need to be aware that they are measuring the average across a heterogeneous tumour, which may have areas of high grade neoplasia, cancer, and even low-grade neoplasia. My feeling is that the level of analysis is too great, given the uncertainties regarding the heterogeneous nature of tissue that is being analysed, and the different cells with different levels of viral gene expression that are most likely present.

      4) Several of the figures don't obviously support the conclusions. For instance, it is not clear how the data shown in figure S2 supports the title of the S2 figure legend. Surely some statistical analysis is needed to support the conclusion stated in the legend. Given previous studies, I'm not at all convinced that the distribution of causative HPV genotypes is the same between SCC and Adenocarcinoma. An additional limitation of these large cancer association studies, comes from limitations in pathology diagnosis, which cannot always accurately distinguish borderline SCC/adenocarcinoma cases. With the large-scale transcriptional analysis, maybe the authors can use molecular information available in their samples to look at this.

      5) The APOBEC analysis is quite rudimentary in the text, and does not discuss the different members of the APOBEC family. Similarly, the different effects of single and multiple HPV infections on the IFR3 responsive genes is poorly developed at the biological level, which most probably reflects the general way in which the utility of the approach.

    3. Reviewer #1:

      This study is an in silico analysis of data from the Cancer Genome Atlas (TCGA) on hepatitis B virus (HBV)-positive liver tumours and human papillomavirus (HPV)-positive cervical and head and neck tumours and association with viral load, genotytpe(s) and expression. It is unclear to me the rationale behind including two unrelated DNA tumour viruses in the study, especially as the number of HBV-positive samples is much less than for HPV. Overall the manuscript seems to be a validation of a bioinformatic tool rather than reporting significant research findings.

      Use of the TCGA has allowed analysis of a reasonably large number of RNASeq data sets. However, once the authors drill down to individual genotypes, numbers become quite small, which may compromise some of the observation. For example, the large discrepancy between numbers of HPV16 (173) and 18(39)-positive cases makes it difficult to make firm conclusions about the significance of differentially expressed cellular genes for each set of cancers. Similarly, in Figures 4 and 6 they compare HPV18 (23 cases) with HPV45 (39 cases) and HPV18/45 coinfections (number not stated but likely far fewer).

      Much of the information that they derive from their analyses is not novel. For example, they report no preferential sites of HPV integration. Despite what they claim, quite a bit is known about HPV co-infection in cervical cancers and it is not uncommon but varies according to geographical regions, which was not a variable they used.

      For HPV, viral exon-level RNASeq analysis is irrelevant because HPV gene expression is polycistronic and is subject to changes by random viral integration events in individual cases. Therefore, it is unlikely that general overall viral gene expression signatures will be diagnostic besides, from multiple studies we understand that what matters in cervical cancer is the level of expression of the E6/E6 isoforms/E7 oncogenes.

      However, such an in silicio approach to quantify various aspects of virus-associated tumours could be a useful prognostic clinical tool in the future.

      The references chosen for the HPV part of the study are either rather out of date or not representative of the extensive literature.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript. Margaret Stanley (University of Cambridge) served as the Reviewing Editor.

      Summary:

      The reviewers agree that the study is technically impressive but the biological data generated is not particularly novel and there are criticisms of the interpretation of the data. The study may have value as a methodological and bioinformatics tool.

    1. Reviewer #3:

      In this paper, Arcaro and colleagues investigate the relationship between bumps along the macaque STS and functional selectivity for faces. Through a series of analyses, they convincingly demonstrate a strong structural-functional relationship between face patches and these small sulcal bumps. They show that this correspondence outperforms functional probabilistic atlases, and does not result from functional specialization per se, as visually-deprived monkeys show similar anatomical folding patterns. As someone familiar with the field of vision and cognitive neuroscience, I can say that this paper is thorough, employs careful single-subject analyses, and I honestly do not have much to add to improve what is already a great paper.

      Points:

      For clarification, were the borders of each bump drawn by hand on the cortical surface (e.g. what's show in Figure 2B)? Saying so in the text will help future researchers replicate the identification process.

      Monkey M3 looks odd; what do you think is going on there? I know there is individual variability, but the AL patch in that monkey seems atypical in its position in both hemispheres. For most monkeys it almost looks like you could mirror their STS from one hemisphere and predict relatively well their other hemisphere, but in M3 the AL patch doesn't look symmetric.

      The previous point got me thinking, was data across hemispheres within a monkey collapsed before statistical testing? Are there hemispheric differences in bump volume or spacing? Apologies if I missed that in the text.

      In the visually-deprived monkeys, were there any anatomical differences at all within the bumps? Volume differences? Thickness of the cortex that comprises a bump? If this is the topic of another paper and the authors excluded it purposely, I understand, but it might speak to how functional emergence interacts with existing structure (in this case the bumps).

      Did I miss something, or is there really a reference to Julius Caesar's Gaul in the discussion? Is that what "Gallia" is referring to? I appreciate a deep historical reference (if that's what this is) but I'm worried that this will go over most readers' heads. Happy to leave it for poetic purposes, but just noting that it will likely be confusing.

    2. Reviewer #2:

      Summary:

      Neuroimaging and electrophysiological experiments demonstrate a series of face-selective regions in the macaque superior temporal sulcus (STS). In normalised space, these regions partially align across individuals. The present report demonstrates that for some of these regions, local surface properties ("bumps") provide additional reliable information about the likely location of face-selective patches (in fMRI) and cells (in intracranial recordings). That pre-cursors of these bumps are identified both pre-natally, and in macaques reared with abnormal visual experience of faces, indicates that these bumps do not arise due to the normal development of face-selective cortical activity. Similar bumps are found in some other primates, although much of the relevant imaging and electrophysiology data that would help to assess homologies is not yet available.

      General assessment:

      This is a well-presented study that addresses a topic of ongoing interest with highly rigorous methods. On a narrow reading that holds close to the data, the paper offers an interesting observation that would seem to have mainly practical implications (e.g. in informing localisation for future electrophysiological work). In contrast, the effort to draw wider theoretical implications for understanding the visual organisation of STS seems to rely on unpicking the main observation that prompted the report in the first place, and on inferences and speculations that extend too far beyond the data that are reported.

      Substantive points:

      If "bumps" are the relevant physiological markers -- and demonstrating this is the thrust of most of the paper -- then it seems important to understand what a "bump" is. That is, what underlying properties or developmental processes are implied by the presence of a cortical bump, in contrast to regions with less prominent local curvature? The authors only very briefly review some possible mechanisms in the Discussion, and I felt more a complete exploration of this issue would have been useful.

      However, having established a structure-function correlation empirically, at the same time the paper provides many indirect lines of evidence to suggest that this relationship may be tangential at best. As the authors note, "STS bumps are not sufficient to produce face selectivity in the absence of face experience". Nor are bumps necessary to produce face selectivity, given the apparent absence of bumps related to MF and AF. Further, the overlap between bumps and faces patches is variable over individuals, and incomplete: the bumps are large, and not entirely comprised of face-selective populations. The authors also note studies that reveal broadly similar tri-partite STS organisation of retinotopic responses, and of body and colour-selective patches. For example, images of bodies tend (in macaque fMRI) to activate regions that are adjacent to face patches, suggesting that there would be a similar anatomy/function relationship for this visual category too. Finally, the authors note that the kinds of physiological processes that are likely to produce bumps are too generic to produce a face-specific mechanism. The authors' speculation, in light of such considerations, is that anatomical bumps in STS are in fact the indirect signals of three distinct, coherent, and complex visual areas that may contribute to a range of visual processes. The main difficulty with the manuscript, as I see it, is that while these wider possibilities are what give the paper the potential to engage a broad neuroscience audience, they are simply too far removed from the actual observations that are reported here. Substantial additional evidence would need to be mustered to support the (admittedly interesting) picture of arealisation in STS that the authors paint. Without such evidence, what remains is mainly a structure-function observation that is interesting, and perhaps practically useful for further studies, but with uncertain theoretical implications.

    3. Reviewer #1:

      This paper reports a correspondence between structural markers, convexities ("bumps") along the superior temporal sulcus (STS), and face-selective patches in the macaque inferior temporal cortex. They localized three face patches with fMRI and each of these face patches overlapped with one of three bumps. These bumps were also present in monkeys that lacked face patches because of being reared without exposure to faces. These data provide some evidence for a correspondence between structure and function in inferior temporal cortex in macaques, in line with recent evidence for a link between structure and function in the temporal lobe of humans. This is interesting work showing novel data on a potential correspondence between structure and function in macaque temporal cortex. They examined, for monkey studies, a relatively large number of subjects and employed two functional measurements, fMRI and multi-unit recordings. However, I have some concerns regarding the correspondence between the face patches and the anatomical structure that need to be addressed.

      Main comments:

      1) The authors employed an automatic procedure to compute the convexity of the pial/white matter, which is excellent because it is objective. However, I found it difficult to differentiate neighboring bumps in some of the animals (Figure 2 S1). One reason for this is the way Figure2 S1 was made, showing the bumps with different colors that occlude to some extent the underlying convexity map. The authors should show the convexity map for each monkey and then in a separate panel show the identified bumps, so that one can judge the correspondence between the convexity map and the bumps. Also, the group average data shown in Figure 2C look not very convincing to me: I find it difficult to differentiate the posterior from the middle bump: it looks like one long continuous convexity instead of two with a clear border in between. This could be due to the averaging across monkeys. That is why Figure 2S1, that shows the data of the individual monkeys, is important but that figure needs to be improved by showing the convexity maps alone (see above).

      2) The overlap between the bump surfaces and the patches depend on how the two are defined. As said above, I found it difficult to identify the individual bumps. The surface area/size of a face patch depends on the statistical threshold (and number of runs etc) that is used to define it and thus is arbitrary to some extent. These two factors make it difficult to evaluate the degree of overlap between patches and bumps and to interpret the DICE overlap analysis. The authors should address this by using several thresholds to define the face patch surface and examine how this affects the DICE outcome and analyses using centroids.

      3) Because the face patches appear to be a (in some cases) small part of a bump and its location can vary within the bump, how predictive is the bump then about the location of the face patch? The correspondence between structure and function appears to be rather coarse: I have the impression from the comparison of the centroids of the bumps and face patches (Figure 4) that there is a reasonable correspondence between ML and the middle bump, but that it is weaker for PL and AL. Furthermore, it is highly variable amongst animals. For instance, in M3, face patch AL appears to lie in between the middle and anterior bump. This suggests that the bumps might not determine the presence of a face patch but that perhaps the presence of a bump and a face patch are unrelated mechanistically.

      4) The authors' work ignores the most anterior face patch, AM, which is located outside the STS (as in fact also PL typically is (in fact, also in the present study)). It has been suggested that AM is important for face identification, having a high tolerance for identity-preserving transformations such as viewpoint (see the work by Freiwald and Tsao), and thus is difficult to ignore. How does AM fit into the proposed correspondence between STS bumps and face patches?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers agreed that the paper reports an interesting finding: a potential correspondence between structure and function in macaque temporal cortex. However, they also noted that this correspondence was only partial and variable across individuals. Furthermore, the reviewers were unsure of the broader theoretical implications of this finding.

    1. Reviewer #2:

      In this work the authors seek to disentangle the reason for a well-documented effect regrading reduced model-based tendencies among high compulsive individuals. The authors collected behavioral and EEG data from ~200 participants performing a two-stage decision task. Main findings show a latent compulsivity factor is associated with weaker transition-type effects at the task's 2nd stage. Specifically, high compulsive individuals show smaller reaction-times and parietal-occipital alpha-band power differences between uncommon and common transitions. These findings are interpreted as evidence in favor of less accurate model as a reason for reduced deployment of model-based strategies in compulsive individuals. Authors further note reduced theta power for compulsive individuals during 1st stage choice.

      I am generally very impressed with this manuscript. I think the authors are addressing an important question that has a lot of promise in pushing the field forward. I also believe that given the number of participants, this is a relatively well powered EEG and behavioral study. Yet, I have one major concern as detailed below:

      The authors relay on 2nd stage effects to estimate to what extent individuals are more or less aware of the transition structure (i.e., to what extent individuals are surprised by an uncommon state, or unsurprised by the common one). However, unlike Konovalov & Krajbich, 2020 who used a mouse tracking procedure to capture participants' 2nd stage expectation, both the RT and alpha band scores might be confounded due to 1st stage choice strategy. Individuals with stronger deployment of model-based strategies in the 1st stage tend to get more often to the best 2nd stage choice by means of a common transition. In contrast, the choices made by MF individuals at the 1st stage will not direct them more often to the best 2nd stage option by means of a common transition. This means that for a MF individual, the overall value difference for the two options offered at the 2nd stage will be similar in common and rare transitions, while for a MB individual the value difference will be higher in common vs. rare transitions. This is even when both MB and MF agents have a perfect knowledge regarding the task transition structure, and are equally surprised by an uncommon transition. Since the 2nd stage decision is easier on average on common vs. rare transitions for MB agents, they should also exert stronger transition effects compared with MF agents on 2nd stage estimates. One such effect might be greater alpha-band on rare transitions reflecting a greater mental effort (as the authors note). Also, when the decision is easier due to larger value difference, shorter RTs are to be expected (e.g., Pedersen et al., 2017 on pbr; Shahar et al., 2019 on plos-cb). This means that transition effect on both alpha-band and RTs is expected due to the use of MB strategies in the 1st stage, even if transition probability is perfect. Indeed, the authors report lower MB deployment at the 1st stage for compulsive individuals, which is in-line with their weaker transition-related effects on the 2nd stage.

    2. Reviewer #1:

      In this report, the authors test a hypothesis about the nature of high-level ("model-based") vs. low-level ("model-free") learning across the spectrum of behavioral compulsivity. Prior literature has suggested that high-compulsive individuals have a deficit in either forming a model of the world, or implementing that model due to competition from learned low-level action-outcome tendencies. This report tested a large number of participants with concurrent EEG (N=192) across a range of compulsivity on the well-known two-step reinforcement learning task.

      The authors note that they "replicated prior work in findings that individual differences in compulsivity and intrusive thought ... were associated with reduced model-based planning" with analyses of accuracy (pg. 8). Analysis of RT revealed a novel effect of compulsivity on model-based planning, which was replicated using archival data of the same task. E-phys findings indicated that the candidate biomarkers of control in P300 and frontal midline theta were unrelated or not specifically related to model-based planning deficits in compulsivity, respectively (more on this below). However, the novel biomarker of posterior alpha power during the transition period was indeed linked with model-based planning deficits in compulsivity. This is novel.

      This report is extremely well motivated by prior literature, it is very well written, and very well executed. Supplemental controls for age and IQ, tests of the specificity of EEG effects with compulsivity, and tests of the specificity of this compulsivity dimension on dependent measures in relation to associated personality variables (e.g. anxious depression & social withdrawal, also raw item measures) all work together to bolster the conclusions. This is a very carefully presented report.

      Despite these virtues and advantages, the take-home message that I leave with is that EEG is not ideally suited for revealing the nature of compulsivity on model-based planning. P300 was irrelevant, frontal theta was possibly indirectly related (see below), and only posterior alpha was indicative of the compulsivity-related findings revealed in the behavioral analysis. This is unfortunately the least useful assessment of cognition used here, as it reflects the lowest level of control or decision making amongst these EEG measures. This perceptual effect is likely more of a consequence of the behavior than a candidate mechanism underlying it. This conclusion unfortunately diminishes the utility of these findings.

      Regarding theta: Theta power and compulsivity were related to RT change, and they were related to each other, even though theta was not related to model-based planning (presumably tested via accuracy / choice). Although these patterns are carefully interpreted, it isn't perfectly clear how these were tested and I suspect there may be more that could be tested / inferred here. First, theta may still be related to the latent feature of "model-based choice" even if it is not significant due to the manifest measure based on choice patterns. This requires some careful unpacking of semantics and what latent constructs can be inferred from which manifest variables, but it is always a good idea to question what a single measure can infer about complex cognitive states. Second, taking this theoretical issue and including a methodological point, the single trial theta-RT relationship may still be altered by compulsivity even if theta power is not. Power and power-RT correlations have been presented as different measures of control that can be differently affected by a host of variables. This could presumably be tested by a thetaRTcompulsivity interaction, and could be visualized as a correlation between the individual theta*RT beta weight (Y-axis) with compulsivity on the X-axis.

    3. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The authors aimed to disentangle the processes underlying compulsive individuals' difficulties forming models of the world, or implementing such models due to competition from lower-level action-outcome tendencies. To this end, they obtained behavioral and EEG data from ~200 participants performing a well-established two-step reinforcement learning task. The authors note that they "replicated prior work in findings that individual differences in compulsivity and intrusive thought ... were associated with reduced model-based planning" with analyses of accuracy. RT analyses revealed a novel effect of compulsivity on model-based planning, which was replicated using archival data of the same task. EEG findings indicated that the P300 and frontal midline theta, both well-established measures of cognitive control, were unrelated or not specifically related to model-based planning deficits in compulsivity, respectively. Posterior alpha power during the transition period, a novel marker, was linked with model-based planning deficits in compulsivity.

    1. Reviewer #4:

      The author investigated the relationship between spontaneous (SR), peak (PR), and steady-state (SS) firing rate in sensory neurons (across modalities) by extracting data from approximately ten studies, published between 1928 and 2017. The relationship between SR, PR, and SS is surprisingly simple: SS =sqrt(PRxSR). The author concludes that this is a universal law of sensory adaptation.

      General Assessment: The claims of universality are not supported by the analysis.

      Major comments: The primary claim of a universal law of adaptation is based on a meta-analysis of fewer than 20 several hand-picked papers. This is contrary to good scientific practice of meta-analyses. To truly assess the universality of the rule, the author should define a time period, a set of journals, and possibly some other criteria for exclusion/inclusion and then study the relationship across all publications that meet these criteria. Without such a clearly defined approach, the reader cannot know whether the examples were cherry-picked.

      The comparisons of (extracted) experimental data and the model are entirely "by eye"; statistical analysis is lacking.

      Even within the chosen sample, universality is clearly a step too far and the author's explanations of why the universal law fails are not particularly convincing ("the visual system is complex"). There may be something to the claim that the law only applies "in the absence of interaction from others cells in the neural circuitry", but this should be part of the study (i.e investigate only those papers that studied neurons that were isolated in this sense), not as an ad-hoc explanation of discrepancies.

      The manuscript only states the "universal law" but leaves an explanation to future work. This is unsatisfactory. Detailed neuronal models exist that explain adaptation (e.g. in terms of the opening of potassium channels). These alternative biophysical explanations need to be considered.

    2. Reviewer #3:

      This manuscript by Wong proposes that steady state responses to a constant sensory stimulus-the responses observed after adaptation-are well predicted by a simple relationship between the spontaneous firing rate and the peak firing rate, namely their geometric mean. The author provides evidence extracted from measurements made in previous published studies, across species and modalities.

      The paper presents a simple and somewhat interesting observation. However, it is difficult to accept the claim and support publication for several reasons:

      1) The comparisons between the predicted and measured responses are entirely qualitative, and there is no alternative model considered. The predictions in Table 1 are pretty good but in many cases the arithmetic mean works reasonably as well (unless peak rates are very high). The steady state will lie somewhere between peak and spontaneous. Where is the quantitative evidence that the geometric mean is better than an alternative? What other relationships might better map the quantities on to each other?

      2) There is little context for the observation: if true, why should we care that Eq 1,2 hold? The discussion hints that the observation is consistent with theoretical principles. If this were laid out in a compelling way, it would greatly increase the impact and relevance of the observation. As it stands, the observation has little context. The implications are unclear.

      3) It is not clear how the studies considered here (i.e. where the data came from) were chosen. Surely there are many studies of sensory responses to the constant stimuli. How did the author choose this small subset (~10 studies)? For a 'universal' law, one would want to see many studies considered. In addition, in the studies considered here, the values were extracted in an ad hoc manner.

      4) The discussion points out many cases in which the rule does not apply (whenever neurons are embedded in a circuit as opposed to being primary sensory neurons). This limits the appeal of the proposal, unless one can provide theory/explanation for why such a relationship should hold in the periphery but not in more central structures.

      5) Previous work has dispelled the notion of a steady state response, arguing that responses continue to decrease with adaptation duration, following a power law dependence (Drew and Abbott, 2006, J Neurophysiol 96: 826). If so, the rule proposed here is unlikely to hold across adaptation duration, again suggesting they are not broadly applicable.

    3. Reviewer #2:

      This paper proposes a universal law of adaptation that occurs during sustained sensory stimulation. The law states that the sustained response of sensory afferents equals the square root of the product of the spontaneous and transient, peak response. The author shows several examples of previously published results to support the claim, some dating back to the seminal studies by Adrian. The author states that the law can be derived from a theory of sensory processing but does not provide further information on this (he refers to a publication in preparation).

      This is interesting work and the paper is well-written. However, I am not convinced by this claim of a universal law of adaptation. First, it does not appear to be universal, and, second, the empirical data that are provided to support its universality are not convincing yet.

      1) The law is not universal: in his Discussion, the author lists exceptions to the rule, in the visual system, auditory system and even for somatosensory afferents. Explanations are given of why the law does not hold in some of these cases, but the exceptions show that the law is not universal. Even when it is not universal, the theory should be able to predict in which cases it holds and when it does not hold.

      2) I am not convinced by the evidence presented in Figure 2. In several instances, the slope of the relationship between the log peak and log sustained (steady state) activity does not seem to be equal to the predicted 1/2: e.g. in panels b and c .The author should have computed the slope and tested whether it was 1/2.

    4. Reviewer #1:

      This study reports an interesting observation, namely that the firing rate after sensory adaptation appears to be equal to the geometric mean of the peak firing rate and the spontaneous firing rate. However, there are concerns about the theoretical motivation and general empirical evidence supporting this observation.

      1) Theoretical motivation: still unclear even after discussion, although we are told it exists. "The derivation of Eqs 1-2 of will be the subject of a later publication."

      2) Is this relationship supposed to hold for each stimulation or on average? The author seems to be only working with averages.

      It is not clear why exactly these (quite old) studies were selected. What was the criteria to include these studies in the meta-analysis? A number of exceptions are later discussed however.

      Alternative of in-depth analysis of existing datasets requested from other authors was explicitly not done. Could also address the trial-wise validity.

      "Not only does adaptation show time-varying changes in firing rate, but the variability makes it difficult to know exactly which value to choose. Averaging the data is not feasible without extracting a large number of data points, and this was not possible from noisy images. As such, with the exception of two studies, ... a visual estimation of the average activity in the final portion of the adaptation curve was used."

      The error introduced by visual estimation remains unknown.

      3) Counter-example in one of the few easily accessible papers, in the ferret, reference 16 (https://pubmed.ncbi.nlm.nih.gov/22694786/#&gid=article-figures&pid=fig-6-uid-5 ). Another counterexample appears in Fig 8 of this randomly chosen paper, although maybe the mechanoreceptor of the cricket doesn't count due to some exclusion criterion, since it is an interneuron. (https://journals.physiology.org/doi/full/10.1152/jn.1997.77.1.207?url_ver=Z39.88-2003&rfr_id=ori:rid:crossref.org&rfr_dat=cr_pub 0pubmed)

      4) The law doesn't hold for a number of exceptions, this is not announced until the discussion (missing in abstract).

      5) The discussion ignores existing literature on information content and possible function of the sustained response, and of adaptation in general (e.g. gain control).

      6) Introduction: failure to cite recent reviews on this topic, e.g. "has been repeated many times.... More modern methodologies..." cites nothing after 1970s.

      7) At which time point is the relationship supposed to hold true? What happens when stimulation time becomes very long? Does the firing rate reach steady state in all of these studies?

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

    1. Reviewer #3:

      This study provides very informative trends regarding long-term (~4-month) recording with Neuropixels probes in chronically implanted, freely moving rats. This is accomplished by recording across many animals (n = 18) and many recording locations and analyzing the number of single (and multi) units that can be automatically isolated as a function of time since implant, recording location, and other features (e.g. shank orientation). The authors perform these experiments with a modular system that allows the implanting of multiple probes simultaneously in a single rat (here they mostly implanted 1 probe, sometimes 2, once 3) and that allows the removal of probes for re-use in another animal, both of which are also valuable contributions. The analysis of neuron yield is framed in terms of a sum of 2 decaying exponentials model (an initial fast decay of one subpopulation of neurons followed by a slower decay of the remainder) that the authors fit to find the primary features determining neuron yield. The major trends they report include: substantially better yield over time in regions anterior to the bregma and ventral to the most dorsal 2mm of the brain surface. They also show that re-used probes perform essentially as well as new probes in terms of noise and unit quality (e.g. average unit amplitude), and also neuron yield (at least for medial frontal cortex, but see below).

      Major comments:

      The results averaged over many animals and the model are good for extracting the major trends, but there are hints of significant and important variability across animals or probes. Points 1-3 are about this variability, potential sources of variability, and displaying the variability whether or not potential causes can be found.

      1) The results shown in Figure 2, especially the averages in Figure 2H,K, indicate severe losses in unit yield over time for probes implanted posterior of bregma and electrode sites in the dorsal 2mm of the rat brain. However, Figure 2B shows at least one animal (open circles) for which high neuron yield was obtained in motor cortex and dorsal striatum for at least 4 months. First, is this from 1 or 2 probes? Whether it is from 1 or 2 probes, the stability of the recording over time from day 1 is much greater than the other animals, and much better than what is expected from Figure 2H. Is the preservation of units over time for this animal due to the stability of units in dorsal striatum (presumably mostly >2mm below the surface) or also motor cortex?

      2) There are some additional potential causes that could also account for yield differences, and one of these is age of the animal at the time of implant. The authors should list age at implant in their table in Figure 4-supplement 1. The authors should display yield over time as a function of age at implant, and also try adding age as one of the regressors for their model of neuron yield.

      3) Another potential cause is whether the probe is new or reused. The authors showed that probe re-use did not result in statistically different yield for the medial prefrontal cortex. But is this also true for the other brain regions? Does the data in Figure 2 include implants of both new and re-used probes, or only new probes? The authors should try to add whether the probe was new or re-used as a regressor in their neuron yield model.

      Regarding points 1-3, whether or not it is possible to add age or probe newness as regressors in the model, the authors should create a supplementary figure that shows the single unit yield curves as in Figure 2A-C for all probes in all animals: one panel per major brain region (e.g. splitting motor cortex from dorsal striatum from ventral striatum), with one curve per probe. There should be a legend for each panel that gives the (AP,ML,DV) coordinates of the approximate midpoint of the probe's location within that brain region. The legend should also indicate for each probe/curve: the animal, age at time of implant, probe newness, probe tip depth, estimated number of electrodes recorded from in that region, and shank orientation. This will repeat some pieces of information that's in the tables, however it's very useful to see all this information together in a form that would be very valuable for readers, especially experimenters who may want to record from some of the more posterior and dorsal areas. The information that could be gleaned would include knowledge of the variance in yield over time across implant attempts, so they could see if, say, 1 of 3 attempts to implant in a given area may give very good long-term yield.

      4) It is stated starting on Line 172 that "The relative number of units corresponding to the fast- and slowly-decaying subpopulations did not significantly vary across brain regions along either anatomical axis, nor did the rate of decay of the fast population (Figure 2--supplement 3). This suggests that the rapid decline in yield observed in the days after surgery may be due to a process that is relatively uniform across brain regions."

      The support for this statement can be seen in the indicated Figure 2-supplement 3. On the other hand, the point is made (and shown in Figure 2-supplement 4) that there is no loss of units in mPFC over time. This is apparently at odds with the Line 172 statement and model assumption of a fixed fraction of fast-decaying units. Was a model tried in which alpha varies with location? If the Line 172 statement is ultimately kept, there should at least be a comment made there that the most anterior, ventral regions appear to differ from the model's assumption/interpretation.

    2. Reviewer #2:

      In this paper, the authors report a device that can be used to implant and later explant Neuropixels probes in freely moving rats. The device consists of an adaptor, an internal holder and an external chassis. The chassis protects the probe, is attached to the animal's head via adhesive cement and acrylic. The internal part can be explanted at the end of the experiment, allowing the NP probe to be re-used.

      The work builds on existing technology in important ways: the authors examined the long-term yield across different brain regions, they more extensively assessed the feasibility of probe reuse compared to previous work, and they evaluated probe performance over a long period of time and also after explanation (measuring the input referred noise of explanted probes in saline). It was also impressive that they used a cohort of 18 rats to evaluate performance of both the animals and the probe, and that they were able to implant up to 3 NP probes at a time. Because of the importance of using freely moving animals in Neuroscience research, and the differences between rats and mice that necessitate modifications on existing technology, this paper is timely and likely to be very useful to a sizeable group of researchers. My suggestions are aimed at furthering the usefulness of this "Tools and Resources" paper for investigators who wish to use this important technology.

      At the moment, the majority of the paper seems aimed at evaluating the performance of the device as a function of time, depth and location. This performance evaluation was useful, very carefully done, and makes important points that aid in the interpretation of other papers (such as the unusual stability of recordings in mPFC reported in previous papers). Nonetheless, readers are likely interested in the paper because they wish to make and implant the device in order to benefit from the scholarly analysis done here. The manuscript does contain very helpful technical details, but these are hard to find and are not front-and-center in the main text. For instance, the material from the "Neuropixels implant procedure" is really helpful and would be critical for anyone who wants to use this technique. But at the moment, that information is in a google doc linked from the associated GitHub, a long way from the main manuscript. This information should be in the main manuscript, either in Results or Methods. Also use of consistent nomenclature across documents would help a lot. I believe the part referred to as the "chassis" in the main text is referred to as the "external" on the google doc with the instructions. Similarly, the part referred to as the "internal" in the google doc is called an "internal holder" in the manuscript.

      A reader hoping to use the device might also benefit from more information on the grounding procedure. The text in the "Implantation" section of the methods was helpful, but more information would be useful, such as where on the probe the ground wire should be connected and how one should fix the grounding wire (tapping the wire and covering with Metabond?). Also, it would be nice to know how one should protect the grounding wire from being touched by the animal. Figure 6 in the google doc protocol is really helpful and should definitely be in the main manuscript. An additional figure showing how to connect the wire to the ground during the surgery would be quite useful. Finally, are the craniotomy and durotomy necessary for grounding? Could one simply connect the grounding wire to a couple of screws on the skull?

    3. Reviewer #1:

      This manuscript presents new techniques for obtaining chronic recordings using multiple neuropixel probes in rats. The resources, I imagine, will be of high value to the neuroscience community at large. They also address short and long terms unit stability, probe recovery and impact of the probe on behavior. I have only a few minor comments.

      I understand the authors rationale to avoid manual curation but there have been reports of inconsistencies in the identification of units across different sorters. Did the authors consider comparing their kilosort unit identification with manual curation or another sorting software?

      The authors speculate in the discussion about the possible reason for the slow loss of units. It wasn't quite clear to me however, what types of changes might improve this loss?

      Figure 2 is perhaps one of the most informative findings but I wonder how applicable this will be to future probe iterations. Do the authors have a hypothesis for what features of the probe might contribute (or not contribute) to the long term loss of units?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript. Lisa Giocomo (Stanford University School of Medicine) served as the Reviewing Editor.

      Summary:

      This manuscript presents new techniques for obtaining chronic recordings using multiple neuropixel probes in rats. The study provides very informative trends regarding long-term (~4-month) recording with Neuropixels probes in chronically implanted, freely moving rats. This is accomplished by recording across many animals (n = 18) and many recording locations and analyzing the number of single (and multi) units that can be automatically isolated as a function of time since implant, recording location, and other features (e.g. shank orientation). The authors perform these experiments with a modular system that allows the implanting of multiple probes simultaneously in a single rat (here they mostly implanted 1 probe, sometimes 2, once 3) and that allows the removal of probes for re-use in another animal, both of which are also valuable contributions. The work builds on existing technology in important ways: the authors examined the long-term yield across different brain regions, they more extensively assessed the feasibility of probe reuse compared to previous work, and they evaluated probe performance over a long period of time and also after explanation (measuring the input referred noise of explanted probes in saline). Because of the importance of using freely moving animals in Neuroscience research, and the differences between rats and mice that necessitate modifications on existing technology, this paper is timely and likely to be very useful to a sizable group of researchers.

    1. Author Response

      Reviewer #1

      This paper investigates the role of Lhx6 and other transcription factors in the development of GABAergic neurons in the hypothalamus. The authors report that a small fraction of hypothalamic GABAergic neurons express Lhx6 and further depend on this expression for their survival. Dlx1/2, Nkx1-1 and Nkx2-2 define 5 subpopulations and at least three of these populations depend on these TFs to maintain Lhx6 expression. A strength of the paper is the multimodal analysis and the fact that descriptive assays like RNAseq and ATACseq are followed up with specific knockouts of candidate transcription factors. However, the relationships between the developmental populations identified and adult subtypes of hypothalamic neurons remain unclear. Although the results will surely interest those already interested in hypothalamic development, it is not clear that broader developmental or functional principles have been identified. The authors make much of the fact that the identified populations do not resemble forebrain interneurons defined by Lhx6 expression, but it is not clear why this should have been expected. Many developmental transcription factors are utilized both across diverse brain regions and across tissues outside of the brain. Perhaps the emphasis of this point could be tempered.

      We thank the Reviewer for his/her comments, although we respectfully but strongly disagree with the statement that “it is not clear that broader developmental or functional principles have been identified”. This manuscript aims to provide a broad overview, and by no means exhaustive, an overview of the molecular mechanisms controlling the development of hypothalamic neurons that express Lhx6. Although these neurons comprise only approximately 2% of all hypothalamic GABAergic neurons, they are highly heterogeneous at the molecular level. Using traditional methods such as histology and more recent methods such as scRNA-Seq, we have not found a selective marker of hypothalamic Lhx6+ neurons other than Lhx6 itself. However, we have found multiple spatially distinct domains in hypothalamic Lhx6+ neurons that express specific sets of transcription factors such as Dlx1/2, Nkx2-1, and Nkx2-2, as we and others have previously observed in developing hypothalamic nuclei.

      In addition, a subpopulation of these neurons later gives rise to a subset of Lhx6+ neurons of the zona incerta, which have been previously shown by us to promote sleep. Unlike all previously described sleep-promoting neurons, Lhx6+ zona incerta neurons are only one of few neuronal subtypes that can regulate both REM and NREM, which likely reflects molecular and functional heterogeneity among these neurons.

      Our thus manuscript speaks to both broader developmental principles by demonstrating the molecular heterogeneity of hypothalamic Lhx6+ cells that arises through the action of diverse transcriptional networks, and broader functional principles by identifying developmental networks that potentially control the specification, differentiation, and survival of sleep-promoting neurons.

      We believe that there are several compelling reasons for including a direct comparison of hypothalamic and cortical Lhx6 neurons, both of which arise from different regions of the forebrain (or secondary prosencephalon, if using the prosomere model). First, the role of Lhx6 in development of telencephalic interneurons is extensively studied, with 72 publications ((Pubmed: Lhx6 AND development AND (cortex OR telencephalon OR interneuron), accessed 7/27/20), and virtually all our understanding of how Lhx6 controls neuronal development has been acquired from this work. It is thus critically important that we directly connect our findings to a prior understanding of the mechanism of action of Lhx6.

      Second, current work in the field of developmental neuroscience in general, is heavily focused on studying telencephalic development. It is very much an open question, however, whether telencephalic structures are themselves particularly good models for studying the development of physiologically vital brain regions, such as the hypothalamus. By identifying many key differences in the function of this extensively studied gene between Lhx6+ MGE-derived neural precursors and hypothalamic Lhx6+ neurons, we establish some important caveats in generalizing studies of telencephalic development even to nearby forebrain structures.

      Nonetheless, we certainly agree with the Reviewer that the organization and clarity of the manuscript can be substantially improved. To this end, we have revised the manuscript carefully to improve clarity, focusing on its key findings.

      The presentation of the manuscript could be improved by clarifying the relationships between embryonic and more mature structure within the hypothalamus. For example, It is extremely hard to follow the evidence split across figures 5, S6 and S7 for parsing the cell groups by TF expression.

      We have revised the manuscript carefully to improve clarity. We have moved scRNA-Seq analysis of postnatal Lhx6-expressing neurons as Fig 3, and embryonic Lhx6-expressing neurons as Fig. 4, to improve the overall flow of the manuscript.

      The ATAC seems to be used only to bolster the impression that the populations identified by gene expression are different. The description of footprinting seems to imply an effort to analyze binding sites for specific factors (e.g. to identify targets of the TFs studied), but the statistical approach employed and even the conclusions reached are not fully spelled out. As such, this part of the study is underdeveloped or not well enough described.

      Specific details of the ATAC-Seq analysis are extensively described in the Method section, with each bioinformatics package (and package version) listed and, when non-default parameters were used, parameters clearly stated. However, we have added details of the statistical approaches used for data analysis to the revised manuscript.

      There is little use in conducting ATAC-Seq analysis without a matched RNA-Seq dataset, as changes in peaks (open chromatin regions) do not necessarily correlate with changes in gene expression levels. By integrating ATAC-Seq data with differential gene expression obtained using RNA-Seq, we have been able to identify changes in motif accessibility and candidate transcription factor footprinting that to identify changes in gene regulatory networks that control Lhx6 expression in both hypothalamus and cortex. We have revised the manuscript to make this clearer, and better explain the findings of this part of the study.

      Reviewer #2:

      Kim and colleagues used a combination of state-of-art sequencing and mouse genetic tools to study the mechanisms that control the development of a subset of GABAergic neurons in the developing hypothalamus.

      While neurodevelopment of GABAergic neurons has been extensively studied in the developing telencephalon, little is known about their counterparts in the developing hypothalamus. The authors focused their work on a specific subset of GABAergic neurons that express the LIM homeodomain factor Lhx6. Lhx6 is a master regulator of GABAergic neuron differentiation, specification, and migration in cortical interneurons. In contrast, Lhx6-expressing neurons make up only 2-3% of GABAergic neurons in the hypothalamus. The authors' previous work demonstrated that these neurons play a critical role in sleep homeostasis. Therefore, understanding how these neurons are formed and maintained is of great importance.

      The authors show that hypothalamic Lhx6 is necessary for neuronal differentiation and survival. Furthermore, by profiling and comparing multiple RNA-seq, scRNA-seq, and ATAC-seq datasets, they were able to identify three transcription factors Nkx2.1, Nkx2.2, and Dlx1/2 that each delineates non-overlapping subdomains of Lhx6 neurons and are necessary for Lhx6 expression in the hypothalamus. Finally, the authors demonstrate that mature Lhx6 neurons manifest extensive molecular heterogeneity that is distinct from their counterparts in the telencephalon.

      We thank the Reviewer for his/her comments, and for appreciating the key findings of the manuscript.

      The work presented is of high quality and is a technological tour de force. The scope and depth of the study are unparalleled among similar studies of hypothalamic neurodevelopment. That said I only have a couple of minor suggestions.

      1) In Figure S2, the number of tomato+ cells appear to be reduced, but not eliminated. Do the authors think that Lhx6 is necessary for the survival of all Lhx6 neurons, or just a subset? The use of the floxed Bax allele is clever, but is there evidence directly supporting increased cell death? Can the authors completely rule out the possibility of the mismigration of cell bodies after the postnatal deletion of Lhx6?

      We appreciate the Reviewer for his/her comments. We conclude that Lhx6 is necessary for the survival of all Lhx6 neurons due to the lack of read-through transcription in Lhx6-CreER/CreER mice (Fig 2), and the rescue of Lhx6-deficient mice that is seen using conditional Bax mutants (Fig. 2). The fact that numbers of cells labeled with Lhx6-CreER are rescued by the deletion of this key positive regulator of apoptosis strongly implies that Lhx6-deficient neurons simply die. Finally, we observe very few Lhx6-expressing hypothalamic neurons that undergo even short-range tangential migration (Fig. 1), and observe no evidence for an increase in these cells in the analysis described in Fig. 2.

      The fact that postnatal loss of function of Lhx6 leads to a more modest cell loss than the constitutive mutant may simply reflect a reduced overall requirement for Lhx6 in regulating neuronal survival in the postnatal hypothalamus or may indicate that the survival of a specific subset of Lhx6+ neurons is no longer Lhx6-dependent at this age. We cannot currently distinguish between these alternatives, and state this fact in the text.

      2) In Figure 4, the authors acknowledged that the ectopic gene expression in Lhx6CreER/lox; Baxlox/lox mice could be due to the loss of function of Bax. If so, would Lhx6CreER/+; Baxlox/lox mice be a better control in this experiment?

      We initially thought of using Lhx6-CreER/+;Baxlox/lox as a control since our phenotype could be due to loss of Bax itself, but not due changes in cell survival. However, we observed the same rescue phenotype in initial experiments using Lhx6-CreER/Bak-null (#006329), which strengthened our initial hypothesis. We now discuss potential limitations that may result from the fact that RNA-Seq data from Lhx6CreER/+;Baxlox/lox mice is not included in this study.

      Reviewer #3:

      Kim et al. aimed to characterize the similarities and differences between the development and molecular identity of telencephalic versus hypothalamic (HT) Lhx6+ GABAergic neurons. By analyzing a diverse repertoire of transgenic mice at different developmental stages and through the use of fate mapping, bulk and single cell sequencing approaches, ISH and immunostaining, the authors descriptively compare transcriptional networks and upstream regulators of LHX6. They found essential differences between LHX6-dependent networks and those in telencephalic neurons and suggest a role of LHX6 in survival instead of migration regulation HT neurons. Moreover, spatially distinct LHX6+ HT cell clusters were identified and transcriptionally profiled.

      1) Only 1-2% of the GABAergic neurons express LHX6, and the cells expressing LHX6 in the HT were identified to be very diverse. Apart from a putative role for LHX6 in promoting the survival of HT neurons, which in my opinion is not analyzed convincingly, nothing functional was revealed. For this, I do not judge the potential significance and influence of the findings as broad or fundamental.

      We respectfully but strongly disagree with this conclusion, most of which have already been described at length in our response to Reviewer #1. In brief, hypothalamic Lhx6+ neurons are key regulators of sleep initiation and maintenance, and nothing is known about their development. In much the same way that studies of the development of Lhx6+ cortical interneurons potentially help inform our understanding of neurodevelopmental disorders such as autism, so too may an understanding of the development of hypothalamic Lhx6+ neurons improve our understanding of sleep disorders and their treatment. In this study, we characterize the fate of hypothalamic Lhx6+ neurons, identify transcriptional regulatory networks that control their patterning and survival, and characterize their molecular heterogeneity in the postnatal period. We identify the homeodomain factor Nkx2.2 as a key regulator of both regional patterning of hypothalamic Lhx6 neurons, but also as a marker of a substantial subset of Lhx6+ ZI neurons that are activated by sleep pressure. This represents the groundwork needed for a basic understanding of the development of this physiologically important cell type, and forms the basis of more detailed future studies.

      Unless the Reviewer simply believes that studies of hypothalamic development are inherently uninteresting and of little significance, these comments simply do not seem to reflect a careful reading of the manuscript, and come across as vague and unconstructive. In future reviews, we urge the Reviewer to be more specific, and to offer concrete and constructive comments, to support sweeping statements of this sort.

      2) The manuscript could be better focused, and more coherent. The authors jump between different aspects of the story. First, the authors address a potential role of LHX6 in survival regulation in HT interneurons, and try to identify potential LHX6 target genes mediating this effect. The latter was neither analyzed convincingly nor validated. Then the authors switch to the comparative analysis of transcriptional networks in cortical versus hypothalamic LHX6+ interneurons, and the identification of different clusters of LHX6+ HT cells. Next, potential upstream regulators of LHX6 in HT neurons were addressed by fate mapping studies. Then, the authors again switch focus, and analyzed distinct anatomical regions covered by Lhx6+ neurons by single cell RNA seq and investigated an instructive role of Nkx2-1, Nkx2-2 and Dlx1/2 in the establishment of these hypothalamic regions.

      Subheadings in the result section might be very useful. However, the focus of this study requires clarification and also respective consideration in the introduction.

      As stated in our response to Reviewer #1, we have sought to conduct a broad characterization of the development and diversity of hypothalamic Lhx6+ neurons, a subset of which are important regulators of sleep. While we cover multiple aspects of this question, we strongly disagree that the manuscript “lacks focus”. However, we do agree that organization and clarity could be improved. To this end, we have incorporated subheadings into the Results section, and clearly outlined the experiments conducted, and the reasons why each were conducted.

      3) The authors use a variety of different reporter and loss of function mouse models and jump between developmental stages for analysis. Apart from being confusing, the experimental/analytical pipeline is not sufficiently rigorous with respect to age and genetic background. E.g. to analyze target genes of LHX6 through which the effect on cell survival could be mediated, the authors compared expression profiles from P10 Lhx6CreER/+;Ai9 neurons with hypothalamic and cortical Lhx6-GFP positive and negative cells from P8 mice. Hypothalamic enriched genes were then compared to single-cell RNA-Sequencing (scRNA-Seq) datasets of E15.5 and P8 hypothalamic Lhx6-expressing neurons. Transcriptional profiles tremendously change with progressing development, and different mouse lines were used, which were not all time-matched. This might have caused Lhx6-independent variation, which likely masks relevant genes. This could be an explanation why so few LHX6 target genes were identified through which LHX6 putatively acts on neuronal survival.

      This is another instance where the Reviewer seems to have failed to appreciate the rationale for the work presented here. We have modified the text to make this clearer. In summary, while it is certainly true that gene expression patterns are dynamic during development, cells of common origin and/or function also typically show core patterns of gene expression that are expressed across multiple stages of development. Our findings suggest that constitutive loss of function seen in Lhx6CreER/Lhx6CreER mice leads to a complete loss of hypothalamic Lhx6+ cells (Fig. 2), while the postnatal loss of function leads to a partial loss of Lhx6+ cells (Fig. 2). This suggests that Lhx6 may control the expression of similar target genes in both embryonic and postnatal hypothalamus to promote neuronal survival. In addition, since Lhx6 clearly is not required for survival of telencephalic neurons, we predict that Lhx6 will regulate the expression of specific sets of genes in both embryonic and postnatal hypothalamus, but not telencephalon, which promotes neuronal survival.

      In Figure 4, we therefore identify candidates for these prosurvival genes both by comparing gene expression profiles between embryonic (E15) and postnatal (P8) hypothalamic and cortical Lhx6+ cells and also by directly comparing the gene expression profile of P10 control Lhx6-CreER;Ai9 and Lhx6-deficient but viable Lhx6CreER/Lhx6lox;Baxlox/lox;Ai9 mice. These were analyzed at P10 rather than P8 because of the need to ensure efficient disruption of the conditional alleles of Lhx6 and Bax, and induction of sufficient levels of tdTom to allow for efficient cell isolation, following daily 4-OHT administration between P1 and P5. While this might lead to the failure to identify whatever the small number of Lhx6-regulated genes that are differentially expressed between P8 and P10, we believe that this will identify the great majority of Lhx6-dependent genes that promote neuronal survival. Any readers who wish to delve further into this dataset, and identify additional genes we may have missed in this initial screen, can do so using the data in Table S1.

      We are frankly puzzled by the Reviewer’s statement that we “identified so few Lhx6 target genes”, when we clearly state in Figure S2 that over 2,000 differentially expressed genes were observed between control and Lhx6/Bax-deficient hypothalamic neurons. A major reason why data was incorporated from the E15 and P8 datasets was to better select strong candidate regulators of neuronal survival from this very long list of genes.

      4) The proposed survival regulatory function of LHX6 in HT interneurons represents the main functional finding of this study, which however was not analyzed in great detail. Likewise, the analysis of LHX6 target genes that mediate the survival regulating function was not very successful, identifying only the ERBB4 receptor and other genes related to the neurotrophic neuregulin pathway. Of note, the authors proposed a clear difference of LHX6-associated transcriptional networks and LHX6 function in telencephalic versus HT neurons (migration versus survival). However, THE identified target gene of LHX6 suggested to regulate survival in HT neurons was Erbb4. Erbb4 is likewise expressed in telencephalic neurons, here being involved in migration regulation. Studies that confirm Erbb4 function in survival regulation in HT neurons are lacking. By applying a more coherent analysis, comparing transcriptional profiles of Lhx6 KO and WT cells of the same age, better candidates might be identified. For this, the time window of the LHX6-dependent survival regulation needs to be identified.

      This is exactly the point we were trying to make here. Lhx6 is strongly expressed in a large subset of progenitors and precursors of GABAergic neurons in the telencephalon, and in a much smaller subset of GABAergic neuronal precursors in has different functions between telencephalic and hypothalamic populations, yet is strongly expressed in both populations.

      Quoting Reviewer #1 “Many developmental transcription factors are utilized both across diverse brain regions and across tissues outside of the brain”. Errb4 has been shown to regulate tangential migration in cortical interneurons but has been shown to promote neuronal survival in other cell types. Since hypothalamic Lhx6+ neurons do not undergo long-range tangential migration, we therefore conclude that the function of Errb4 in hypothalamic Lhx6+ neurons is likely related to promoting survival, rather than controlling migration. It is certainly possible, however, that Erbb4 could also contribute to the regulation of short-range tangential migration of Lhx6-expressing neuronal precursors, such as the likely migration of Nkx2.2-expressing cells from the hinge to the ZI. We have revised the text to make this point clearer. We certainly believe that further functional studies of these genes are worthwhile and compelling, but are also beyond the scope of this study.

      5) With respect to the survival analysis, the analysis of Lhx6CreER/lox;Baxlox/lox;Ai9 mice although elegant, should be supplemented with other data, eg caspase and/or TUNEL labeling to support this main conclusion.

      Both TUNEL and Caspase-3 staining is detectable for only a relatively brief period during apoptosis, and neither are highly sensitive tools for detecting neuronal death. We were unable to observe changes in staining with either marker between P5 and P10 following the postnatal loss of function of Lhx6 (Fig. 2). This is now mentioned in the text. The use of Bax mutants in this analysis, in which apoptosis altogether, was done with the aim of maximizing our ability to detect Lhx6-dependent regulation of neuronal survival.

    2. Reviewer #3:

      Kim et al. aimed to characterize the similarities and differences between the development and molecular identity of telencephalic versus hypothalamic (HT) Lhx6+ GABAergic neurons. By analyzing a diverse repertoire of transgenic mice at different developmental stages and through the use of fate mapping, bulk and single cell sequencing approaches, ISH and immunostaining, the authors descriptively compare transcriptional networks and upstream regulators of LHX6. They found essential differences between LHX6-dependent networks and those in telencephalic neurons and suggest a role of LHX6 in survival instead of migration regulation HT neurons. Moreover, spatially distinct LHX6+ HT cell clusters were identified and transcriptionally profiled.

      1) Only 1-2% of the GABAergic neurons express LHX6, and the cells expressing LHX6 in the HT were identified to be very diverse. Apart from a putative role for LHX6 in promoting the survival of HT neurons, which in my opinion is not analyzed convincingly, nothing functional was revealed. For this, I do not judge the potential significance and influence of the findings as broad or fundamental.

      2) The manuscript could be better focused, and more coherent. The authors jump between different aspects of the story. First, the authors address a potential role of LHX6 in survival regulation in HT interneurons, and try to identify potential LHX6 target genes mediating this effect. The latter was neither analyzed convincingly nor validated. Then the authors switch to the comparative analysis of transcriptional networks in cortical versus hypothalamic LHX6+ interneurons, and the identification of different clusters of LHX6+ HT cells. Next, potential upstream regulators of LHX6 in HT neurons were addressed by fate mapping studies. Then, the authors again switch focus, and analyzed distinct anatomical regions covered by Lhx6+ neurons by single cell RNA seq and investigated an instructive role of Nkx2-1, Nkx2-2 and Dlx1/2 in the establishment of these hypothalamic regions.

      Subheadings in the result section might be very useful. However, the focus of this study requires clarification and also respective consideration in the introduction.

      3) The authors use a variety of different reporter and loss of function mouse models and jump between developmental stages for analysis. Apart from being confusing, the experimental/analytical pipeline is not sufficiently rigorous with respect to age and genetic background. E.g. to analyze target genes of LHX6 through which the effect on cell survival could be mediated, the authors compared expression profiles from P10 Lhx6CreER/+;Ai9 neurons with hypothalamic and cortical Lhx6-GFP positive and negative cells from P8 mice. Hypothalamic enriched genes were then compared to single-cell RNA-Sequencing (scRNA-Seq) datasets of E15.5 and P8 hypothalamic Lhx6-expressing neurons. Transcriptional profiles tremendously change with progressing development, and different mouse lines were used, which were not all time-matched. This might have caused Lhx6-independent variation, which likely masks relevant genes. This could be an explanation why so few LHX6 target genes were identified through which LHX6 putatively acts on neuronal survival.

      4) The proposed survival regulatory function of LHX6 in HT interneurons represents the main functional finding of this study, which however was not analyzed in great detail. Likewise, the analysis of LHX6 target genes that mediate the survival regulating function was not very successful, identifying only the ERBB4 receptor and other genes related to the neurotrophic neuregulin pathway. Of note, the authors proposed a clear difference of LHX6-associated transcriptional networks and LHX6 function in telencephalic versus HT neurons (migration versus survival). However, THE identified target gene of LHX6 suggested to regulate survival in HT neurons was Erbb4. Erbb4 is likewise expressed in telencephalic neurons, here being involved in migration regulation. Studies that confirm Erbb4 function in survival regulation in HT neurons are lacking. By applying a more coherent analysis, comparing transcriptional profiles of Lhx6 KO and WT cells of the same age, better candidates might be identified. For this, the time window of the LHX6-dependent survival regulation needs to be identified.

      5) With respect to the survival analysis, the analysis of Lhx6CreER/lox;Baxlox/lox;Ai9 mice although elegant, should be supplemented with other data, eg caspase and/or TUNEL labeling to support this main conclusion.

    3. Reviewer #2:

      Kim and colleagues used a combination of state-of-art sequencing and mouse genetic tools to study the mechanisms that control the development of a subset of GABAergic neurons in the developing hypothalamus.

      While neurodevelopment of GABAergic neurons has been extensively studied in the developing telencephalon, little is known about their counterparts in the developing hypothalamus. The authors focused their work on a specific subset of GABAergic neurons that express the LIM homeodomain factor Lhx6. Lhx6 is a master regulator of GABAergic neuron differentiation, specification, and migration in cortical interneurons. In contrast, Lhx6-expressing neurons make up only 2-3% of GABAergic neurons in the hypothalamus. The authors' previous work demonstrated that these neurons play a critical role in sleep homeostasis. Therefore, understanding how these neurons are formed and maintained is of great importance.

      The authors show that hypothalamic Lhx6 is necessary for neuronal differentiation and survival. Furthermore, by profiling and comparing multiple RNA-seq, scRNA-seq, and ATAC-seq datasets, they were able to identify three transcription factors Nkx2.1, Nkx2.2, and Dlx1/2 that each delineates non-overlapping subdomains of Lhx6 neurons and are necessary for Lhx6 expression in the hypothalamus. Finally, the authors demonstrate that mature Lhx6 neurons manifest extensive molecular heterogeneity that is distinct from their counterparts in the telencephalon.

      The work presented is of high quality and is a technological tour de force. The scope and depth of the study are unparalleled among similar studies of hypothalamic neurodevelopment. That said I only have a couple of minor suggestions.

      1) In Figure S2, the number of tomato+ cells appear to be reduced, but not eliminated. Do the authors think that Lhx6 is necessary for the survival of all Lhx6 neurons, or just a subset? The use of the floxed Bax allele is clever, but is there evidence directly supporting increased cell death? Can the authors completely rule out the possibility of the mismigration of cell bodies after the postnatal deletion of Lhx6?

      2) In Figure 4, the authors acknowledged that the ectopic gene expression in Lhx6CreER/lox; Baxlox/lox mice could be due to the loss of function of Bax. If so, would Lhx6CreER/+; Baxlox/lox mice be a better control in this experiment?

    4. Reviewer #1:

      This paper investigates the role of Lhx6 and other transcription factors in the development of GABAergic neurons in the hypothalamus. The authors report that a small fraction of hypothalamic GABAergic neurons express Lhx6 and further depend on this expression for their survival. Dlx1/2, Nkx1-1 and Nkx2-2 define 5 subpopulations and at least three of these populations depend on these TFs to maintain Lhx6 expression. A strength of the paper is the multimodal analysis and the fact that descriptive assays like RNAseq and ATACseq are followed up with specific knockouts of candidate transcription factors. However, the relationships between the developmental populations identified and adult subtypes of hypothalamic neurons remain unclear. Although the results will surely interest those already interested in hypothalamic development, it is not clear that broader developmental or functional principles have been identified. The authors make much of the fact that the identified populations do not resemble forebrain interneurons defined by Lhx6 expression, but it is not clear why this should have been expected. Many developmental transcription factors are utilized both across diverse brain regions and across tissues outside of the brain. Perhaps the emphasis of this point could be tempered.

      The presentation of the manuscript could be improved by clarifying the relationships between embryonic and more mature structure within the hypothalamus. For example, It is extremely hard to follow the evidence split across figures 5, S6 and S7 for parsing the cell groups by TF expression.

      The ATAC seems to be used only to bolster the impression that the populations identified by gene expression are different. The description of footprinting seems to imply an effort to analyze binding sites for specific factors (e.g. to identify targets of the TFs studied), but the statistical approach employed and even the conclusions reached are not fully spelled out. As such, this part of the study is underdeveloped or not well enough described.

    5. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

    1. Reviewer #3:

      The properties and mechanism of DNA transformation by Streptococcus pneumoniae have been intensively studied for nearly a century. This elegant and insightful paper develops a powerful new set of quantitative assays based on recombining out stop codons of fluorescent protein fusions to reprise several issues that have largely been addressed by conventional antibiotic resistance selection. This approach leads to new answers for a number of fundamental questions about pneumococcal transformation, thereby re-setting the paradigm in this area. This is an extremely well-written, complete study that answers interesting and important questions about bottlenecks and recombination during transformation of this genetically plastic pathogen.

      This is a rigorous study that represents a substantial amount of work and creative thinking. The results will be of interest to a large audience concerned with genome evolution by transformation in different bacteria, points of limitation, or not, in the different steps in transformation, and mechanisms of recombination. The conclusions of this paper are well supported by extensive, often corroborative data, and provide new insights that go way beyond traditional genetic approaches. Rather complicated assay schemes are presented in highly effective diagrams and descriptions. Some of the new findings include that: all pneumococcal cells become competent and express the competence machinery in response to added competence stimulatory peptide or during natural competence; confirmation of brief non-genetic inheritance of phenotypes during transformation by single-cell tracking of recombination through lineage trees; a ≈50% limitation of transformation through RecA-dependent recombination that is unaffected by mismatch repair or restriction/modification; cell-cycle independence of recombination, regardless of reading strand or distance to the origin of replication; quantitation of direct multiple recombination (up to three was tested); and reduction of transformation recombination by non-homologous DNA.

      Many of these conclusions overturn and/or refine previous results that were obtained by less precise genetic methods. Together, this paper shows that any site or orientation with regard to DNA replication can be transformed in pneumococcal cells, including multiple chromosomal insertions; however, there is an intrinsic limitation to the efficiency of recombination, possibly related to the level of off-marker recombination. This limitation may have implications to pneumococcal evolution.

    2. Reviewer #2:

      In this work Kurushima et al. use recently developed fluorescent labelling techniques to study natural transformation in the human pathogen Streptococcus pneumoniae. Previously, genetic marker analyses have been used to study the different aspects of this process, but with these new techniques the process can now be studied at the single cell level. The authors used the single cell analysis to identify new transformation bottlenecks and tried to determine why some cells are genetically transformed and others are not. Related experiments have been performed in the past using classic genetics and Kurushima et al. were able to confirm these studies. In that sense, in my opinion, the novelty is limited and no important new molecular insights are provided. They found that the number of cells that are ultimately transformed is plateauing at approximately 50%, despite the fact that most cells bind DNA. This is partially the result of the heteroduplex formed after recombination followed by separation by strand replication, combined with the fact that the DNA binding sites on cells are limited so that there is a competition between DNA markers at saturating DNA concentrations. The authors argue that this mechanism entails a "fail-safe strategy for the population as half of the population generally keeps an intact copy of the original genome". I find this conclusion far-fetched for two reasons.

      Firstly, the DNA recombination event followed by DNA replication will automatically assure that only half the population will inherit the mutation, and to speak of a strategy implies that the organism has specifically evolved this system, but we are dealing here with a well-known and general recombination system found in many organisms that will generally result in a 50/50 distribution. Maybe more importantly, under natural conditions it is highly unlikely that cells encounter saturating levels of tDNA. In their experiments the authors use 3.2 nM DNA for transformation. If my calculation is correct, this would amount to 19xE11 DNA molecules per ml, which seems a bit high when assuming tDNA comes from lysed bacteria. In nature, this number will be much (much) smaller therefore there is no need for the bacterium to come up with a dedicated strategy to assure that not all cells in a population are being transformed.

      Finally, the results are very well presented and the paper makes easy reading.

    3. Reviewer #1:

      Overall I thought this to be an extremely compelling story, both in terms of general scientific interest and the overall high degree of experimental rigor. Overall, the data provides strong experimental evidence to support the authors conclusions.

      Overall, I found it very interesting the maximal efficiency is capped at 50%, as this makes for a very intriguing evolutionary hedge betting strategy for a naturally competent bacterial pathogen that frequently undergoes both intra and inter-species recombination events. In addition, this study provides a very elegant experimental framework for understanding the finer points of pneumococcal recombination through both clever genetic approaches and rigorous experimental design. The data was presented in a clear, concise manner and the overall manuscript followed a clear and logical progression.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 2 of the manuscript.

      Summary:

      All three reviewers felt that the manuscript was experimentally sound and praised the authors for their use of single cell analysis to tackle the question of why some cells are transformed and others are not in a population of genetically competent Pneumococci. The thoughtful presentation of complicated and extensive data was appreciated by all. Two reviewers were enthusiastic about the study's conclusions regarding bet hedging and the potential for an intrinsic limit on recombination efficiency. The latter reduces the potential for off-marker recombination which, as Reviewer #3 notes, might have implications to pneumococcal evolution. At the same time, Reviewer #2 had some reservations about the significance of the data in light of previous studies of Pneumococcus and other naturally competent organisms. Most importantly, this reviewer questions whether the finding that only a portion of bacteria incorporate exogenous DNA is a particularly novel one and, regardless, whether the saturating DNA concentrations used in the study are representative of a "natural" environment.

    1. Reviewer #3:

      The manuscript by Abdulhay, McNally and colleagues presents an effort to combine DNA modification detection and Pacbio sequencing to contribute to the growing body of methods designed to gain epigenome information at the chromatin fiber level, i.e. beyond existing short read NGS-chemistry constraints. They do so by leveraging micrococcal nuclease to cleave and help solubilize DNA, which they then treat with adenine methyltransferases to footprint nucleosomes; single-molecule adenine methylated oligonucleosome sequencing assay - SAMOSA. Fiber-level epigenetic information will be of great use to the field and is expected to answer many open questions that remain unanswered.

      However many of the claims made about the potential of the method are insufficiently supported by the data provided. It appears that additional data is required to support the conclusions made from SAMOSA with respect to existing chromatin information, such as signal differences as a function of transcription factor binding (see below).

      1) The authors should make an attempt to investigate where sequence bias influences a methylation call in their datasets. Clearly the pattern on the in vitro chromatinized template suggests that on average their methylated calls are correct. However, there appear to be clear positions in their chromatinized template datasets where this is not the case, i.e. lines in sup fig 5a representing methylation calls in unmethylated template DNA and unmethylated calls on fully methylated template DNA. Upon close examination, this also seems the case in the chromatinized template, with certain positions inflexibly methylated/unmethylated and at odds with the surrounding linker/nucleosome patterning (Fig1D). The authors should use Kmer analysis of methylated A's genome-wide to detect sequence bias in either the methyltransferase or sequencing platform.

      2) It seems reasonable that the clustered data by NRL estimate (fig 3) should correlate with existing measurements (i.e. MNase-seq). The authors should identify regions of the genome with strong enrichment for the seven clusters and compare this to nucleosome repeat length as can be estimated using conventional MNase measurements, i.e. the average distance between 5' mapping read positions across the genome (Valouev et al., 2011, Teif et al., 2012). Some agreement (for at least a few of these clusters with very regular nucleosomes) would strengthen the conclusions made by this approach, especially where there are irregular positioning patterns. Additionally, for these clusters the authors should display raw read alignment/methylation calls for SAMOSA at a few representative loci, where a sense of the raw data can be gleaned.

      3) The comparisons of SAMOSA at different TF bound regions is likely influenced by the fraction of actually TF-bound molecules present in the original cellular sample. For example, CTCF is known to occupy it's strong motifs in the majority of cells, while few other factors have such regular binding/residency (Kelly et al., 2012 NomeSeq data at CTCF sites). It seems reasonable that some cluster fractions should scale with the enrichment for the factor (for at least CTCF and REST, the strong binding/nucleosome positioners), especially those associated with chromatin accessibility at the motif (i.e. A-accessible, HA-hyper-accessible). The authors should try to illustrate this, as well as representative read alignments/methylation calls at a few loci where these signals are prevalent.

      4) The meta-plotted data seems noisy for most TFs profiled (Fig 4 A-L) and the authors should show that their replicates agree with each other in terms of the relative size of clusters and at the metaplot level. Similarly, the data shown in Figure 5 should be broken into replicates. It is difficult to know to what extent the differences quoted are quantifiable/reproducible. For example, in panel A the reported deviation seems quite large around the median to make strong claims: e.g. "In specific cases, we observed small effect shifts in the estimated median NRLs for specific domains-for example, a shift of ~5 bp (180 bp vs. 185 bp) in H3K9me3 chromatin with respect to random molecules..." This should also apply to the analysis done in Figure 5B and C, where it is difficult to get a sense of reproducibility from cluster size and the heatmap of Odds ratio and q-values.

    2. Reviewer #2:

      The authors describe SAMOSA, a novel method for mapping accessibility on single chromatin fibers, using a non-specific adenine methyltransferase and taking advantage of the long-read high-accuracy capability of the PacBio platform. The method allows for chromatin arrays to be precisely mapped for nucleosomal and non-nucleosomal footprints on single chromatin fibers. When combined with light MNase treatment, the method provides two orthogonal readouts of the chromatin landscape for single molecules, with advantages over other single-molecule long-read methods. Proof-of-concept application of this new method to human K562 cells reveals global heterogeneity, with surprisingly little distinction in nucleosome array patterns between regions distinguished by various active or repressive histone modification patterns. The heterogeneity observed using the unbiased approach represented by SAMOSA highlights the fact that the most common chromatin profiling methods favored by both large projects such as ENCODE and individual researchers are dominated by features such as histone modifications and hyper accessible sites. The method itself and insights into global nucleosomal heterogeneity are of substantial interest to the fields of chromatin and gene regulation. The data are of high quality and the methods are well-described. I have only one suggestion and a couple of minor issues.

      In Figure 5, controls are randomly chosen nucleosomes, but it would be interesting to see what unmarked nucleosomes show. For example, unmarked alpha-satellite should be dominated by highly regular arrays with a 171-bp repeat length present in higher-order repeats corresponding to active centromeres, which consist of nucleosomal complexes that lack Histone H3 (CENP-A instead). The authors speculate that satellite irregularity might result from dynamic restructuring by HP1, and this predicts that other (H3-containing) unmarked satellites that lack H3K9me3 and presumably lack HP1 will be in regular arrays.

    3. Reviewer #1:

      The authors validate the method on a reconstituted array of 9 nucleosomes, and convincingly show that m6dA is found in linker DNA, and not (or greatly reduced) at positions bound to nucleosomes.

      They then apply the approach to chromatin fibers released from K562 cells. Long read patterns were clustered to identify 7 clusters. The idea is that because the fragments are released by mild MNase digestion, there will be a positioned nucleosome at one end. The 7 clusters differ in nucleosomal spacing. I am not familiar with Leiden clustering, it would be good if the authors can confirm these clusters with alternative clustering methods. These clusters appear differentially represented in domains that differ in histone modifications.

      Aggregation of data around TF binding sites further reveals a range of different states that show variable nucleosome positioning. This section is interesting but seems rather shallow in analysis. The authors have the ability to look at specific sites and determine the variation in nucleosome positioning in the cell population. However, they look only at aggregated data.

      Overall the approach works well and promises to address important questions, but the current work does not yet take full advantage of the single molecule nature of the assay and as such falls a bit short compared to very related methods that have recently been published (the works cited in the ms, and recently published work from the Stamatoyannopoulos lab). Also, the use of mild MNase is presented as an advantage, but is it really necessary? Adding EcoGII to isolated nuclei may work as well as shown in the recent Stamatoyannopoulos paper in Science.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      This manuscript describes a method, named SAMOSA, to identify nucleosome positions along chromatin segments that can be over 10 Kb in size. The approach employs EcoGII-modulated m6dA deposition on accessible non-nucleosomal DNA (inkers, nucleosome free regions) released from nuclear after mild MNase cleavage. The DNA modification is then read-out using PacBio sequencing. Mapping nucleosome positions along longer DNA stretches can provide information on variation in nucleosomal arrays, and how that relates to chromatin state and factor binding etc. The assay is validated using a reconstitute chromatin template and then applied to K562 cells, revealing significant variation in nucleosome positioning and nucleosome repeat lengths at transcription factor binding sites, and throughout domains with various histone modifications.

    1. Reviewer #3:

      In this study, the authors present data aimed at supporting their conclusion that microbiota-derived SCFA resulting in increased AD pathology, including microglial activation, ApoE upregulation, and A-beta deposition.

      First and foremost, the biggest issue with this study is the lack of male versus female comparisons and the very small sample sizes of the mice. Especially given the past literature of microbiome effects on AD pathology, e.g. with antibiotic cocktails, it is essential to look at sufficient numbers of both female and male mice, individually, and not just group them. Moreover, the average number of mice used in each experiment (N=5) are relatively small for making any firm conclusions.

      Specific concerns:

      Figure 1D: Based on the observation of more smaller plaques in SPF mice vs GF mice, the authors conclude, "This result highlights the impact of bacterial colonization on early amyloid plaque deposition rather than plaque growth," The problem here is that these mice are 5 months old. It is well-known that SPF APPPS1 mice start depositing at only 6 weeks old. So, they would need much earlier (and later) time points to support this conclusion. In addition, N=5 animals/group is very small and not appropriate for making conclusions.

      The authors also need to show total plaque burden distribution in each group and level of variability?

      Figure 2: Again, N=5/group is very small for high impact paper. They also need to show plaque burden distribution, especially since there is much more variability in 3 month old animals.

      Figure 2D: The authors claim SCFA brings up plaque load to a "significant increase", i.e., 2X GF levels. But what are these values compared to SPF animals? They would need to have data on the 3 month SPF group for comparison sake to make the claim that SCFAs are driving pathology. Otherwise, this is just not convincing.

      Figure 3: Westerns should also include 3 month SPF animals. The small differences in CTFalpha and CTFbeta are not convincing. Even if there were a change, how does it account for elevated Abeta?

      Figure 4: SCFA trigger microglial activation: the data in this figure fail to support this conclusion:

      Fig 4B: Why did the authors perform in situ for CX3CR1 instead of Iba 1 ICC for microglia? The quantification is unconvincing. There should be other CX3CR1 microglia that are not plaque associated, but we don't see these in the field. This brings into question the sensitivity of the in situ analyses? They need to also do Iba1ICC.

      Fig 4 C/D: Regarding the statement, "we directly investigated the influence of bacterial colonization on microglial reactivity in the WT background. To this end, we injected brain homogenates from 8 months old APPPS1 mice containing abundant Ab into the hippocampus of GF or SPF WT mice (Fig. 4C) and subsequently analyzed microglial abundance and activation by smFISH. We observed a significant increase in overall microglial cell counts at the peri-injection site of SPF compared to GF WT mice (Fig. 4D)."

      This experiment does not support the conclusion since one would expect microglial reactivity to increase in this experimental paradigm. The authors claim more activation in SPF mice, thus "gut microbiome triggers microglial activation and reactivity towards an exogenous insult containing Ab". But, this is unfortunately not supported by the experiments, as performed.

      Figure 4F: The ex vivo amyloid clearance assay is not useful or convincing since cultured microglia lose their transcriptional phenotype after 6 hrs in culture (Gosselin et al, Science 2017).

    2. Reviewer #2:

      This is an interesting and well-written paper on the relationship between gut microbiota metabolites and AB production. Although previous studies have documented a link between the gut microbiome and Ab pathology, the underlying mechanisms and molecular mediators remain elusive. Here the authors use a germ-free Alzheimer's Disease mouse model to examine the role of short chain fatty acids on amyloidogenesis and neuroinflammation.

      The studies thus add another welcome piece to the puzzle of how the microbiota affects the brain.

      My comments are relatively minor:

      How do the behaviour of GF animals compare with non-GF animals given that cognitive deficits have been reported in them (Gareau et al., 2011)?

      I am somewhat surprised that more metabolite differences were not observed between GF & SPF mice as all microbial metabolites should be only in the latter.

      Fig 2B should include all metabolites tested individually

      Were the concentrations of the metabolites increased in the plasma following administration in drinking water? The physiological relevance of the doses used in the rescue experiments could be better supported with experimental data

      If acetate is most important then it is not clear why they used a pooled cocktail in rescue experiments.

      The analysis of transcriptome of brain samples from control- and SCFA-supplemented GF APPPS1 mice is a nice addition but the molecular targets for SCFAs on microglia remains unresolved.

      The comments about modulating dietary fibre to reduce central SCFA concentrations are provocative and although beyond the scope of the current study are clearly studies that would be very welcome for the field to test.

      The potential effects of SCFAs on HDACs is completely left as a cliff-hanger...

    3. Reviewer #1:

      The authors do a good job of citing the prior literature; however, Harach et al., 2017 did diminish my enthusiasm as it covers much of the same ground as this study, limiting the novelty of the current findings.

      Essential Revisions:

      1) Experimental perturbation of the proposed pathway. The manuscript leads to a nice model; however, the data is descriptive in nature with any experiments using either genetic or pharmacological approaches to test the proposed mechanisms. The impact of this study would be increased substantially if at least one link between SCFAs and AB, microglia, or ApoE were experimental validated. While most of the text avoids making causal claims based on correlative evidence, the one sentence summary states that SCFAs impact disease "via activation of microglial cells and upregulation of ApoE."

      2) Identify which SCFA matters. The experiments all rely on a mixture of 3 SCFAs making it impossible to determine which compound is responsible. There is also high salt in this mixture which confounds the interpretation further. At a minimum, each individual compound needs to be tested using an equimolar amount of salt as a negative control. The authors should also note issues with oral delivery of SCFAs, which does not necessarily mimic production in the colon. Ideally, tributyrin, or a similar ester for acetate or propionate should be used. Another key missing control is the administration of SCFAs to SPF mice. It is also important to be clear that while SCFAs are sufficient to impact AB, there is no evidence in the paper to suggest that they are necessary, the full scope of "key microbial metabolites" remain to be determined. If the authors want to claim necessity, they would need to deplete specific SCFAs in the presence of a complex gut microbiome.

      3) Be more cautious in discussing the role of the microbiome in Alzheimer's disease. The background discussion includes studies that show correlations in humans and phenotypic differences in germ-free mouse models, which in my opinion are insufficient to claim a causal role in human disease. The authors should discuss the level of evidence in humans for a causal role of the microbiome and its relative impact relative to other risk factors, including any prospective or intervention studies that have been conducted. They should also take care not to extrapolate differences in intermediate phenotypes in mice (plaque levels, microglial activation, and ApoE expression) to human disease. For example, the one sentence summary says, "contributing to AD disease progression". The authors should also discuss whether or not cognitive performance was evaluated in response to SCFAs.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Colombo et al. present an intriguing set of findings from the amyloidosis mouse model (APPS1). Rederivation of this model under germ-free conditions led to both decreased plaque load and impaired cognitive performance. Administration of a cocktail of SCFAs and salt (sodium propionate, butyrate, and acetate) significantly increased plaque levels, microglial activation, and ApoE expression. Together, these findings suggest a potential pathway through which the microbiome could impact cognitive performance. The paper is well-written, with a clear description of the current results and a logical flow to the text and figures. These data are a good starting point for further mechanistic dissection and add another welcome piece to the puzzle of how the microbiota affect the brain.

  3. Jul 2020
    1. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 3 of the manuscript. The major points agreed by the reviewers are included below, as well as the separate reviews.

      Summary:

      The work presented is a major scientific achievement. This is the first functional reconstitution of any CO2 concentrating mechanism (CCM). The work has major implications for engineering of CCMs into crops for increasing yields: the authors have definitively identified a set of components that confer CCM activity in a heterologous host. As a bonus, the authors demonstrate a new way of generating a Rubisco-dependent E. coli.

      Major points:

      1) The EM images shown in Figure 5-figure supplement 1 should be presented as a main figure, not a supplement. The negative control is too dark and difficult to compare with the other micrographs. Moreover, it is concerning that the positive control (WT:pHnCB10) failed. It should be repeated as it would allow comparison of the putative carboxysomes to a native carboxysome and would greatly improve the quality and value of this figure.

      2) For the benefit of a non-expert reader, the names of the 20 proteins and corresponding genes should listed in a Table, together with their function and the relevant references.

      3) In Figure 3-figure supplement 1A, the authors should discuss why the gene csos1D is present in both pCB and pCCM.

      4) In Figure 4B, the large variance in the OD600 after 4 days for CCMB1:pCB'+pCCM' cultures was explained as being due to genetic effects or non-genetic differences (line 1064). However, in Figure 3 - figure supplement 2B the measured growth kinetics did not show such big differences. Authors please explain.

      5) Would be nice if the authors can demonstrate that Rubisco localizes to the putative carboxysomes by performing an experiment such as immunogold labeling. It would improve the claim that the observed polyhedral bodies are in fact carboxysomes. We leave the decision of such an experiment to the authors.

    1. Reviewer #3:

      In this manuscript, Urchs and colleagues use transductive conformal prediction (TCP) applied to rsfMRI functional connectivity data to predict autism in a subset of cases. The approach is novel for applying to autism research and also is pinpointed at a topic that is very much needed in autism - the problem of heterogeneity. The logic applied is that only a subset of autism cases will have powerful biomarker differences in terms of resting state functional connectivity and TCP is utilized to isolate that subset. Thus, while the approach is novel and maps onto similar kinds of logic in the realm of genetics of autism, the utility is somewhat limited, as TCP will not be able to tell us much about the majority of cases. This is the same problem with many highly penetrant genetic mechanisms that lead to high risk for autism. However, it is still an issue that the approach can only make statements about a very small percentage of the total autism cases in the population. Could the authors comment more on this issue/limitation? For instance, what does this biomarker in a small percentage of cases tell us? Are there powerful, specific, and homogeneous biological mechanisms behind such cases, whereas for the rest of the population the underlying mechanisms are highly diverse and not powerful enough to penetrate up into macroscale functional connectivity phenotypes? The result could help to generate new hypotheses focused on such a group. However, I think the authors should try to lead readers in discussing how to take such results further for new discoveries.

      Besides this main issue noted above about the utility or meaning behind the novel findings, the following are comments about how to make the introduction more readable, and how to potentially better facilitate a reader's understanding of the analyses.

      1) Introduction: I would suggest that some modifications need to be done to the introduction in order to make the ideas flow a bit better. The problem is that the authors are introducing a variety of complex and not necessarily easily linked information - e.g., risk from a variety of different types of genetic mechanisms, failure of neuroimaging classifier studies, and TCP. With a bit of effort and a couple re-readings it is clear that the logic the authors are using is that we have some understanding of how much risk there is from different types of genetic mechanisms, and we would like to understand how neuroimaging data might match up to that. Using TCP would hopefully allow you to do that, hence the goals of the study. This logic is not clearly spelled out as one reads the introduction however, because the different topics are either mixed together within a paragraph with little linking text to help the reader follow the logic, or the bits of information for each topic are segregated into their own paragraphs with little linking text and the beginning or ends of the paragraphs to help the ideas flow from one paragraph to the next. A good example of this is that the background paragraph to start with has these topics mixed together within the very first paragraph, and then the subsequent 3 paragraphs solely focus on each topic, without helping the reader understand why they are jumping from very different topics. By the time the reader gets to line 120 of the Objectives, then things are spelled out a little better, but the reader has to then go back and connect the ideas about how the authors are trying to compare how a TCP approach to identify a high risk imaging marker would match up against more well known risk markers at the genetic level. It may be the case that the manuscript here will get readers of various different backgrounds (e.g., autism researchers, those with expertise in genetics, neuroimaging, or machine learning). Few have expertise in all those areas, and for those individuals, it may be hard to understand how these different topics flow together and are linked in a specific logical way. The logic is there, but even for this reviewer, it required a couple readers to see how all this information lined up in a logic way to justify the study. Thus, I would suggest that the authors make changes to the writing so that the reader can clearly follow the logic without too much extra effort to connect what isn't written about how these topics are supposed to line up.

      2) Methods: The methods and analysis are fairly complex. Can the authors make a figure that clearly lays out the analysis pipeline? It would help to have a visual that clearly outlines how the authors selected the subset of individuals from the larger ABIDE datasets, how the preprocessing was done, how the features were estimated, and how the TCP analysis was implemented with all the associated added aspects like the bootstrapping, etc. Furthermore, to facilitate understanding of the complexities of the analysis, can the authors create a GitHub repo that has all the reproducible analysis code that generates the results and figures produced in the paper, along with tidy data files that have the features used by the TCP model? Although in the data availability statement the authors write that a GitHub repo exists, having had a look through this, no tidy data files are available that the code can load up to have readers reproduce the analysis or figures. In addition, the code consists of only 4 brief R scripts. That code isn't easily readable with regards to how the analysis was done. The R code could be done in another way that is more in line with literate programming, such as an Rmd file, that has the analysis code, along with plain text to describe the different steps, and then the figures embedded within the html or pdf report that it creates when it is knitted in R Studio. There are also some Jupyter notebooks that show how the figures were generated. This was helpful to see and is what is needed for the R code too. In those Jupyter notebooks, it seems like there are certain tidy data files that those notebooks load, but they are absent in the repository and therefore, the readers cannot reproduce the analysis.

    2. Reviewer #2:

      This work represents an investigation into autism(s). For this purpose, multi-network inputs to transductive conformal prediction are used. This approach provides a measure for how much an individual resembles a pattern linked to autism(s) or healthy controls. The resulting predictions are translated to the population prevalence. The authors state correctly that their models are in the ballpark of what has previously been reported. However, they claim that their improvements with respect to predictions in the general population are a major improvement, achieved by a bias towards specificity of their model. While machine learning papers often do not report this translation it is also apparent that they easily could. Therefore, the novelty of this approach is not clear to me as it may be to the authors. This requires clarification in the context of the literature in addition to addressing the major concerns below.

      1) The paper would benefit from a more in depth discussion of the literature. There have been more than 50 papers published using different pattern recognition approaches on ASD. It is important that the authors evaluate their work in the context of those findings. There are a bunch of reviews on pattern classification approaches in psychiatry in general and ASD in particular.

      2) A slightly longer and more in-depth description of the methods section would help the reader, especially a description of the method used to calculate the relevant score.

      3) Based on Figure 3 it is a bit unclear to me if the small number of individuals identified with higher HRS score indeed also show higher symptoms. This should be statistically tested.

      4) The strongest confounding effects are usually induced by scanner differences, as both the discovery as well as the replication sample are multi-site samples. It would be important to investigate the effect of scanners on the proposed models. This is particularly problematic should there be disbalances between the groups across scanners.

      5) Probabilistic predictive approaches have already been applied to ASD using for instance gaussian process regression (e.g. Ecker et al. 2010, Neuroimage). The paper would benefit by stating clearly how their method improves above the approach mentioned in this referred paper as well as other approaches in ASD. The adjustments of the prediction to the population prevalence is a minor achievement.

      6) The authors discuss: "Although our model made only few predictions, those predictions carry a much higher risk of an ASD diagnosis for the identified individuals. The result is a prediction with a much higher specificity (99.5% compared to 72.3% and 63% for traditional approaches, Heinsfeld et al., 2018; Abraham et al., 2017) and much lower sensitivity (4.2%, compared to 61% and 74% respectively). It is thus important to point out that here we have not proposed a better prediction learning model, but rather addressed a different objective." However, sensitivity and specificity are always a trade-off and dependent on the decision threshold. You can bias this for either of the two. For probabilistic models this is easy to do by adjusting the decision threshold to the population prevalence of a disorder. It is also possible to determine a decision margin which will naturally lead to higher performance, similar to the approach presented here and has been done and proposed earlier.

    3. Reviewer #1:

      This is a well-written manuscript examining prediction of ASD diagnosis from resting-state fMRI data. The primary innovation is the application of Transductive Conformal Prediction (TCP), which quantifies the confidence with which one can accurately make a prediction. The authors show that they can identify a functional connectivity (FC) signature with high PPV for a subset of patients.

      The approach is certainly interesting, but it also seems circular. As I understand it, predictions are limited only to individuals who can be classified with high accuracy. A priori, we might expect that these people would be patients with severe illness, and the results show that the subset of patients who are correctly identified do have more severe symptoms. It therefore seems unfair to compare the high PPV of this method with other approaches, when the current method, by construction, focuses only on those cases who are easier to classify (whereas others don't). Could the authors please clarify whether this interpretation is accurate?

      Related to the above, the PPV of the test is high, but this is only one side of the coin. The sensitivity is very low and I imagine the NPV is also low. Given its low sensitivity, It does not seem correct to speak of the FC signature as a risk marker, since many people at risk (indeed with a diagnosis) do not show it. In practical terms, it seems like a positive result with this FC marker is conservative, relatively accurate indicator of someone's risk for a severe form of ASD, but a negative result carries almost no information at all. What is the practical utility of such a marker, given that severe autism should be evident from clinical observation? That is, how could the current results add value to clinical decision-making? If the FC signature could be detected in newborns, it would be of value, but this analysis is conducted in adults after diagnosis has been established.

      The methods section indicates that the approach prioritises specificity, but the reasons for this decision are unclear.

      How were site differences addressed in the analysis?

      It would be useful to see how results vary as the 5% threshold is varied.

      The evidence for cluster structure in Fig 1b seems quite weak.

      The Figure 1 caption requires greater detail explaining what is actually shown in the plots.

      Were any of the participants taking psychotropic medications? to what extent could this have impacted the findings?

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The reviewers shared a number of concerns in common, as outlined in their detailed reviews. In addition, the following points were raised upon further discussion between the reviewers:

      -A comprehensive analysis of the potentially confounding effect of site differences is required

      -The potential circularity of the method - classifying only cases that can be confidently classified - and practical limitations of this approach should be discussed in greater detail. The algorithm is biased towards specificity. This could also be achieved using probabilistic machine learning approaches by, for instance, adjusting the decision threshold to the population prevalence or by defining a margin for cases for which you do not make a decision.

      -The findings are considered in relation to population prevalence rates, but the algorithm is not applied to a population sample. It seems likely that the classifier would not detect cases with the same accuracy in a population sample. If this claim is made, it needs to be explicitly tested.

      -The passage "The result is a prediction with a much higher specificity (99.5% compared to 72.3% and 63% for traditional approaches, Heinsfeld et al., 2018; Abraham et al., 2017) and much lower sensitivity (4.2%, compared to 61% and 74% respectively)." seems problematic. If you calculate the balanced accuracy for the current approach, of Specificity + Sensitivity/2, you end up slightly above chance accuracy. The other papers actually perform better.

    1. Reviewer #3:

      The article by Delabouglise and collaborators presents a longitudinal analysis of farms in Southeast Asia to understand farmer behaviours in response to disease outbreaks in poultry. The study is original in its design and the results are important for the prevention of avian flu epidemics in the region, as they suggest that smallholder farmers are more likely to sell their poultry to traders following outbreaks, which could contribute to the rapid disease spread. There are important differences in terms of response to outbreaks (harvest, vaccination, etc.) between large and small farms, which suggests that targeted sensitization campaigns and programs are necessary to modify these behaviours. The article is well written, although the discussion needs some work to lay out the limitations of the study and to expand the practical implications of the study in terms of policies or interventions to put in place.

      I have a few comments to improve the manuscript:

      Introduction:

      -I am uncertain about whether the term cohort applies to their study, as the unit of follow-up are farms but they're not following the same individuals (chickens) over time. I would suggest changing the term to longitudinal study.

      Methodology:

      -How reliable is the classification of outbreaks with and without sudden deaths? Are farmers able to recognize fast the onset of symptoms and then a death within 24h after the onset of those symptoms? I imagine that misclassification can happen, so I would mention this as a potential limitation.

      Results:

      -Table 2: it would be much more easily interpretable if variables are described fully, with the function used for transformation in brackets. For example, instead of "square root of Nbc", I would include "Number of broiler chickens in the farm (sqrt)", and so on.

      Discussion:

      -I think the different limitations of the study should be explained and discussed. For instance, 1) the use of a proxy for weight instead of weight itself, 2) potential misclassification of outbreaks (see above), 3) some behaviours may depend on events happening in longer time frames, for example the previous year, but this is not accounted for in the models.

      -Also, the harvest of chickens could be greatly influenced by economic needs of the household (a family event, an economic shock, disease, etc.), especially for smallholder farmers in the developing world who may use chicken as a form of cash savings. I am actually surprised that this was not included in the questionnaires, and I think it's an important limitation that should be discussed (and appropriate literature referenced).

      -I feel that the discussion lacks insights into practical implications/solutions coming from this study (policies, interventions, etc.). Given the results, what can the government or NGOs or international organizations implement in order to reduce the risk of future outbreaks? This part should be expanded and be more specific.

    2. Reviewer #2:

      This manuscript addresses an important gap in knowledge of infectious disease emergence and spread within small-scale poultry production systems. The study design allows for analysis of longitudinal epidemiological and human behavioral data, not commonly found in animal health research; the statistical analysis is well thought out and robust. The authors find that farmers with small flocks respond to disease outbreaks with the rapid sale of sick birds to traders, and that despite government-supported programs, there is little uptake of vaccination in this population. Findings point to future areas of research that could inform policy development or better target activities to reduce disease transmission within similar poultry production systems.

    3. Reviewer #1:

      General assessment:

      The manuscript is very well written, easy to follow despite the substantial statistics, and has a clear goal that the authors address with strength. The study is highly relevant in the context of emerging infectious diseases, and addresses one of the main understudied candidate drivers of emergence. The study design allows a thorough analysis of the observed patterns, providing highly useful insights into the potential ways in which avian influenza can spread.

      Substantive concerns:

      I have no substantive concerns. The longitudinal study seems to have been designed and conducted well, allowing the incorporation of potentially important variables in the statistical models. The authors made great and responsible use of MGAMs, and clearly have an excellent background in statistics. I have no reservations or concerns about any aspect of the statistics. In fact I would like to complement the authors on the way in which the methods were described and results were reported, which was done in a clear way despite the large and potentially confusing number of results.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      Your manuscript surveyed 53 poultry farms in Southern Vietnam and identified that small scale farmers with lower sized flocks were more likely to rapidly harvest and sell disease birds to mitigate loss of profit. This finding is of great potential importance for developing prevention efforts for introduction of avian influenza into human populations.

      The reviewers were all highly complimentary of this paper. They all felt the manuscript was methodologically sound, clearly written, highly original and of substantial public health and policy relevance. Particular noted strengths were appropriate use and description of mixed-effects general additive models, appropriate study design and the inclusion of all raw data for public use.

    1. Reviewer #3:

      PREreview of "The gene cortex controls scale colour identity in Heliconius" Authored by Luca Livraghi et al. and posted on bioRxiv DOI: 10.1101/2020.05.26.116533

      Review authors in alphabetical order of last name: Monica Granados, Vinodh IlangovanORCiD, Katrina Murphy, Aaron Pomerantz

      This review is the result of a virtual, live-streamed preprint journal club organized and hosted by PREreview and eLife. The discussion was joined by 17 people in total, including researchers from several regions of the world.

      Overview and take-home message:

      In this preprint, Livraghi et al. present noteworthy advances in evolutionary biology by characterizing the role of cortex gene in multiple Heliconius butterfly species, which is responsible for the wing patterns: yellow bar or the Type I scale cell fates (white/yellow). The authors identified cortex gene’s major role in sympatric speciation, the modulation of convergent wing patterns, and the regulation of scale identity in multiple Heliconius species, which naturally have different niches to help explain different co-mimetic morphology. Livragi’s team provides strong evidence for the cortex gene as one of the earliest regulators and its ability to set the differentiation of scale cells in a molecular switch fashion from yellow to red/black at a particular development stage through distal localization. This important discovery on the role of cortex gene fills a gap in our existing knowledge about the gene’s ability to control scale cell identity and wing color patterns. Since this work is of significant interest in evolutionary biology, we outlined some concerns below that could be addressed in the next version.

      Positive feedback:

      1) We strongly recommend this preprint to others/for peer review. In addition, we recommend this article to trainees as educational material to learn evolutionary developmental biology through interactive tutorials.

      2) The authors have provided a good amount of novel results and have utilized current tools to address their questions.

      3) This research fills a gap in our understanding of wing patterning in Heliconius while doing so in a very comprehensive way across multiple species and using techniques that systematically detail the association between gene expression and phenotype.

      4) It was interesting to learn that the cortex gene doesn’t follow the typical pattern gene paradigm. We do not have many examples of integrator genes like cortex, which give binary outputs from a network of genes and integrate elements to produce a singular output.

      5) This is a textbook example and is important for evolutionary development and mimicry studies. It is hard to find and/or work with a developmentally important gene that is amenable for genetic modification and still be able to work with viable offspring and have it be relevant for evolution.

      6) The current cortex protein data as seen in Figure 6 adds novel data to the manuscript.

      7) Thanks to the authors for setting a great example of showing modeling information. The graphics are visually appealing and convey complex information well.

      8) This preprint sets up a good next step of how cortex evolved in a more broad context. We know the cortex gene is potentially implicated in wing pattern evolution in other distantly related butterflies and moths (e.g. peppered moth Biston betularia) and in possible roles of evolution/speciation by pattern changes due to genomic inversions at cortex locus.

      9) The authors did a good job of creating a well-composed manuscript. Yellow bar with one species had a contradiction but did reconcile with further research questions.

      10) Definitely, [the results are likely to lead to future research] especially with understanding how a cell cycle regulator affects developmental cell fates in terms of these scale colors and structures.

      11) Antibodies can open up future research. This research team figured out three elements and there are possibly more to explore. Future research might investigate how cortex possibly regulates endocycling and what this means for color identity determination.

      Major concerns:

      1) The use of the term “race” to define butterflies with specific phenotypes needs to be revised to clines or strains or variants. “Race” is a social construct and not a biological reality and we strongly suggest revising this term.

      2) The authors state that cortex and dome/wash genes are controlled by inversion (see Line 375, page 19). Does the strain they engineered have/carry the inversion ?

      -We are aware that inversion for species is complex - strains, genetic background - starting material for inversion.

      -Inversion events occurred millions of years ago in the loci contributing to the wing pattern. Authors describe the first generation of CRIPSR knock-outs in Heliconius sp. and hence we suggest to include further information.

      3) We strongly suggest the authors elaborate on their qRT-PCR analysis pipeline. Did the authors follow MIQE guidelines in their quantitative real time PCR assays?

      4) More explanation could be provided for cortex protein experiments. Figure 6 could explicitly say what developmental stage/time after pupation (they report this in the Methods section) and the rationale behind presenting data for this stage in development.

      -If a systematic developmental time series of cortex protein expression is observed using immunostaining, we suggest adding the data. Otherwise we request the authors to comment on the rationale behind selecting this particular stage of development.

      5) We recommend the authors mention institutional or local animal care ethical approval and safety regulations in the field working on Heliconius sp. for setting best practice reporting standards.

      6) We suggest to clarify the lack of a clear correlation between in situ stains and the mutational effects of cortex CRISPR knock-outs.

      7) Please add statistical analyses in figure legends, e.g. Figure 2 lacks statistical analysis information. Which test was performed and why? A statistical analysis subsection under the Methods section could be useful.

      8) Could a sized-down Figure S10 be added to Figure 6 in the manuscript to provide more information about the nuclear ploidy and cortex antibody signal? Even no association is informative and helps the reader think about the connection between color/endopolyploidy.

      Minor concerns:

      General

      1) We request authors to revise the introduction section allowing an easy to comprehend information on gene regulatory complex affecting each patterning region.

      2) We strongly recommend minor rephrasing of the on/off switch to guide non-experts in evo-devo biology.

      Figures

      1) Figure S10 has a couple of typos - ‘localisation’ and ‘punctae’ in the first sentence of the figure caption.

      2) It will be helpful to guide the readers, if a high-level phylogenetic tree mapping the related Heliconius’ evolution is presented in Figure 1. We suggest a compass guide to be added in the map of Figure 1b.

      3) The scale bar is missing in Figures 6a and 7a.

      4) In Figure 4, some of the mosaic KOs are very apparent and others are not especially for researchers unfamiliar with butterfly CRISPR, e.g. H. charithonia. I might suggest highlighting or using arrows to indicate the mKO regions.

      5) We request the authors to consider reflecting on the distribution of samples in qPCR data superimposed on box-whisker plots .

      Sufficient Detail

      1) More information about the genes would be helpful, such as accession numbers and annotated gene information rather than the complete genome data.

      -Might not be able to repeat CRISPR from the details in the Methods section. If the gene information is not well annotated as a model system then it is difficult. What about Heliconius? It might be helpful to report the scores for low off-targets.

      -Non-standard genetic model systems present a challenge particularly to create genomic resources.

      2) Multiple people mentioned not able to repeat in situ hybridization methods from the available information on methods. The hybridization conditions for thicker whole mounts were not fully explained.

      3) Please provide more information about the number of animals.

      Data Accessibility

      1) We appreciate the authors adding supplemental information as figures and we request to report data files associated with the manuscript.

      2) R code was used for morphometric analysis - this is difficult to track from pay walled reference mentioned and thus a problem. We request to make this analysis information/pipeline available openly.

      3) Please include supplemental information on the microscope settings and metadata of images used for analysis explicitly.

      4) High-resolution images of the CRISPR mutants could be provided in a supplemental/data repository.

      5) Providing gene sequences used in this study will be very helpful rather than the SRA repository, especially probes used for in situ and sequences targeted for CRISPR.

      Acknowledgments:

      We thank all participants for attending this preprint journal club. We especially thank those that engaged in the discussion. Their participation contributed to both a constructive and lively discussion.

      Below are the names of participants who wanted to be recognized publicly for their contribution to the discussion:

      -Monica Granados | PREreview | Leadership Team | Ottawa, ON

      -Vinodh Ilangovan | Labdemic - Founder |Postdoc | @I_Vinodh

      -Katrina Murphy | PREreview | Project Manager | Portland, OR

      -Aaron Pomerantz | UC Berkeley/Marine Biological Laboratory | Ph.D. Candidate | Berkeley, CA/Woods Hole, MA

    2. Reviewer #2:

      This manuscript explores the role of the gene cortex in the specification of wing scales in the butterfly genus Heliconius. Species of Heliconius butterflies are notorious for their reciprocal mimicry of wing color patterns. Several genes are known to control variation of specific color pattern elements within and between species, cortex is one of them. The authors combine RNAseq analysis across wing development, in situ hybridizations, antibody stainings and analysis of crispr somatic mutations to dissect the role of cortex in the specification of scales. Their main claim is that cortex imparts scale identity (color, morphology), namely type II and type III identity.

      Although this paper includes a substantial amount of work and a number of interesting observations, I am not sure what can really be concluded in the end, and several results would need follow-up experiments to reach a stable conclusion.

      The strongest part, in my opinion, is the analysis of somatic mutant clones of cortex in the wings of different species. The authors show that the lack of cortex consistently results in the conversion of type II and type III scales into type I scales, and thereby demonstrate the necessity of this gene for type II & III identity. This is solid, interesting, but not a novel concept from a genetic or developmental biology point of view. There are countless examples in the 1990s literature of genes whose mutations results in such shifts in cell identity (e.g., poxn and cut in the peripheral nervous system of flies).

      From this result, two questions emerge: how and when does cortex assign this identity during development? And how does cortex explain the variation in color pattern among Heliconius morphs and species? Although the paper discusses these two questions, I find the answers unclear and the results confusing.

      The authors first examine the expression dynamics of cortex. They re-annotated the 47-gene genomic interval where cortex maps and analyzed the differential expression of all genes in the interval, across developmental stages, across species and morphs and also compared wing compartments. Their main conclusion is that cortex is the most likely candidate in this interval to explain color pattern variation. I am not sure why the authors did this. I thought this was already clearly established from a previous paper (Nadeau et al. 2016, Nature). Moreover, the explanations of the differential gene expression (DGE) analysis are often too shallow to really understand what the authors really did, including the method description. The figures are poorly annotated and it's difficult to understand if there are replicates in the RNA-seq analysis (see minor comments). One striking result from this part, is that the DGE suggests that cortex is differentially expressed in the the 5th instar larvae between 2 morphs of Heliconius erato and 2 morphs of Heliconius melpomene, but the differential expression goes into opposite directions between this 2 species. How could the same phenotypic variation between morphs of 2 species be caused by opposite DGE? They authors note that it is interesting but do not comment or analyze further.

      They pursue their investigation with in situ hybridization on 5th larval instar wings and mitigate the notion of a spatial correlation between cortex transcripts spatial distribution and color patten elements proposed by Nadeau et al., 2016. Here again, the figure would benefit from better annotation. The authors indicate subtle differences in the local distribution of cortex transcripts between morphs but do not really conclude anything from their observation. They also give no indication of sample size or replicates, which I find unsettling given the noise associated with this experiment. I am not sure what this figure really adds to the published work, or to the present manuscript.

      Finally, the authors examine the distribution of Cortex protein in late (2-day pupa) developing wings with a polyclonal antibody. They find, surprisingly, that the protein is distributed more or less uniformly in the wing epithelium and localizes to the cell nuclei. While this is very different from the patterned transcript distribution, it is consistent with the somatic mutant clone analysis that showed that any mutated cell at any position of the wing displayed a phenotype. But this opens many questions: what is the origin of the apparent difference in expression between protein and transcripts? Is cortex secreted and it diffuses across the wing? Or is the transcript expression spatially dynamic and the protein distribution revealed by the authors reflects the temporal integration of this expression? And if Cortex is present and functional across the wing, how does it produce discrete pattern elements?

      The authors conclude their paper with a figure suggesting that cortex specifies typeII/III scale identity early during wing disc development and that the distinction between type II and type III is subsequently governed by the gene optix at a later stage. But what substantiates the idea that cortex imparts cell type identity early on? What does Cortex larval (5th instar) distribution look like? Is it as uniform as that of later stages? The data presented here do not offer the temporal or functional resolution to support this conclusion.

      In conclusion, this paper shows that the mutation of the gene cortex results in scale type transformation, but fails to explain or suggest how this may happen during development. It also does not suggest how cortex may control the "fantastically diverse" pattern variation in Heliconius.

    3. Reviewer #1:

      This is an interesting but complex study that examines the role of a few genes in a previously mapped interval in being the "switch" gene that regulates the presence or absence of a yellow band in the wings of Heliconius butterflies. The study first examines whether there is a correlation between expression level of several (47) genes in the mapped interval in developing wings (or parts of wings) in two separate species of Heliconius each having a race with the yellow band and a race without the yellow band. This part of the study highlights three genes (among others) that show some pattern of differential regulation but shows that there is no simple correlation between the expression level of these three genes in either larval or pupal wings and the presence of the yellow band. The authors then examine the function of one of the genes in the interval, cortex, in scale color development by using CRISPR. They find that cortex crispant individuals display color changes across the whole wing, not just in the region of the yellow band. In particular the black scales (Type II) become white or yellow (Type I), and the red scales (Type III) also become white or yellow (although this last transformation is not documented at the SEM level). The authors examine, once again, the expression domain of the cortex gene, this time during pupal development with an antibody, and they find that the gene is expressed across the whole wing, supporting its functional effects also across the whole wing. They observe that cortex is expressed in multiple punctate domains in the nuclei of scale building cells, which are polyploid cells, and in a single punctate domain in the nucleus of non-scale building epidermal cells, which are not polyploid. They then test whether perhaps there are more of these punctate nuclei in the region of the yellow band, but they find no such correlation.

      In the end the authors try to argue that either 1) cortex is the yellow-band switch gene they are after but that the switch is not in the form of a typical spatially expressed gene (in the shape of the yellow band) but perhaps in the form of some threshold or heterochronic mechanism (not clearly explained), or that 2) another gene in the mapped interval, not examined for function in this study, is instead the switch genes they are after, and which may (or may not) interact with cortex in the differentiation of the yellow band.

      I believe the authors are trying hard to implicate cortex in some way, as the yellow band switch locus, but the data just does not support this. Instead the authors implicate cortex in scale color identity (the title of the manuscript). However, given that cortex (alone) cannot control a specific color either, because the effect of cortex on color is different in different parts of the wing, their model for how cortex acts is too simple and does not fit their data. A combinatorial genetic code for both scale color and morphology (see below), where cortex is simply one of the players (rather than a major switch/homeotic gene) is required to explain the data in this manuscript.

      Furthermore there are several data missing from the manuscript that need to be added to support some of the conclusions drawn, and several other data that would be important to add for purposes of data replication across labs.

      1) The authors claim that cortex converts Type II (black) scales into Type I (white/yellow) scales but their SEM data and scale morphological measurements presented in the supplement don't fully support this conclusion. These transformations vary from species to species (e.g. H. melpomene and H. erato show different degrees of transformation) and only some features of the scale are actually transformed (e.g., cross rib periodicity in both species, and scale width and length and ridge periodicity in H. melpomene). The remainder of the measurements show that cortex is not sufficient to convert scale Type II into scale Type I.

      2) I suggest that the definition of the scale types presented should be made more explicit. What are scale types I, II and III really? In line 87 it is mentioned that these scale types are based on scale color and on scale morphology but what follows is just a description of the pigments found in each scale and not their morphology. Furthermore, the data presented in the manuscript suggests that color and morphology can be uncoupled with genetic perturbations of cortex, so is it even useful to stick to this scale type nomenclature going forward? Something to consider.

      3) There is a need for a new figure showing how the scale morphological measurements were actually conducted. There is no scale bar in the SEM images of yellow and black scales and this should be added. The SEM images used to represent a typical yellow WT scale and a transformed yellow scale of H. melpomene (in Figure 7) show very different densities of cross-ribs (but I am not even sure what exactly is being considered a cross-rib), yet the graph indicates that there is no difference between these scale types. This is confusing and needs clarification. Make sure you look up scale morphology nomenclature in Ghiradella 1991 (Applied Optics) to make sure you designate ribs (crossribs) and microribs appropriately. There seem to be quite a lot of differences in microrib density across Wt scales and transformed yellow scales in H. melpomene.

      4) The authors claim that cortex converts Type III (red) scales into Type I (white) but they only describe conversions of Type II (black) into Type I (yellow) scales at the SEM level and don't provide any SEM images or quantitative data for the red to yellow, red to white, and black to white scale transformation. Adding these data is important to support the conclusions of the study.

      5) I suggest the authors remove the dome-t and dome/washout gene data from the manuscript as 1) nothing about these genes is mentioned in the abstract; 2) the expression of these genes doesn't correlate with presence of the yellow band; 3) the genes are not investigated at the functional level; 4) the whole gene duplication issues surrounding these genes make the whole manuscript more difficult to read and does not, in the end, contribute to the main story that yielded results - which is the function of cortex in scale development. The function of these genes might still be worthy of investigating using CRISPR at a later date, and perhaps it would be useful to include the expression pattern data in that subsequent paper. This is merely a suggestion that I believe will make this manuscript less heavy and easier to read by focusing the reader's attention on the main points of this story.

      6) Pigmentation and scale morphology is most likely controlled at the pupal stages of wing development and by measuring RNA levels of candidate switch genes at just two time points during pupal development (36hrs and 60-70 hrs after pupation) you may not have sampled the correct time window for yellow band differentiation. Several genes are expressed only during the first 16-30 hrs of pupal development, in species that need 7 days for pupal development (see Monteiro et al. 2006 for genes such as Wg, pMad and Sal) so sampling wings (for RNA-seq and antibody stains) at 36hrs and 60-70 hours may not be an ideal sampling strategy going forward.

      7) The authors mention that because cortex causes changes in both scale color and morphology this suggests "that cortex acts during early stages of scale cell fate specification rather than during the deployment of effector genes". This conclusion needs more discussion. Matsuoka and Monteiro (2018) showed that knockout of the gene yellow, an effector gene at the end of a gene regulatory network for melanin pigment production, also led to both changes in scale color and morphology. These authors proposed instead that absence of certain pigments on the wing, such as dopa melanin, caused chitin to polymerize differently and form an extra lamina that prevent the windows from forming in the scales (just as seen in cortex mutants). The authors should consider and evaluate this alternative explanation in their discussion.

      8) Did the authors examine whether there were protein coding changes between the 47 genes in the mapped interval between the yellow and black races? Please mention whether this was done. Please also upload the sequences of the genes that were studied and provide accession numbers for these sequences.

    4. Preprint Review

      This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.

      Summary:

      The topic of your work is timely and intriguing, but the reviewers raise several issues with the study. For example, the reviewers propose that the major conclusions of the manuscript are not supported by the data presented, and that a full set of SEM data across all scale type color transformations should be presented. Given the results presented, the relationship between cortex expression and the actual pigmentation remains unclear, and the sole phenotypic analysis is insufficient to make conclusions about the role of a gene in producing pigmentation pattern variation.

    1. Reviewer #3:

      In this manuscript Hashmi et al describe the emergence of an endoderm population in a gastruloid model. They observe that endoderm cells are positive for E-Cad, likely express E-Cad continuously from an epiblast state, initially form small islands, and finally coalesce into a larger endoderm region at the pole of the gastruloid. There are several issues with this manuscript in its current form.

      1) No evidence is provided that there is a relationship between how endoderm forms in this gastruloid model and in vivo. In fact, endoderm is believed to derive from a restricted area in the anterior primitive streak. This is evident from the mouse imaging data of Mcdole et al Cell 2018 as well as from more recent genetic labeling experiments (Probst et al bioRxiv 2020). It is well known that cells of different germ layers may self-segregate and this may drive the behavior observed here downstream of heterogeneous differentiation in the gastruloids, but that is not necessarily the mechanism which occurs in vivo. The authors suggest that their experiments show something about endoderm formation in vivo without addressing this point which substantially diminishes from the interest of the manuscript.

      2) The authors suggest that this view of endoderm differentiation, which doesn't require full EMT is novel, however, much of the observations here are already known. It is known that future endoderm cells do not down regulate E-Cadherin but instead must continue to express it. They also are known to migrate collectively rather than as single cells in a cadherin-dependent way (Montero et al Development 2005; reviewed in Nowotschin et al Development 2019). The authors should discuss this literature and make clear which aspects of the proposed mechanisms are novel.

      3) The authors are assessing the status of EMT based on a single marker, E-Cad. If this is a major point of the manuscript other markers e.g. Snail, N-Cad should be examined.

      4) It is well known that embryoid bodies form an outer layer of visceral endoderm, e.g. Concouvanis & Martin Cell 1995, Doughton et al PLOS ONE 2010. None of the markers here are exclusive to definitive endoderm (including Sox17 which is used throughout, see Artus et al Dev Biol 2011). The authors should address the possibility that their observations may be consistent with a similar mechanism and may not reflect definitive endoderm differentiation.

    2. Reviewer #2:

      In this manuscript, the authors proposed a new mechanism of endoderm formation in 3D gastruloid models based on cell migration and fragmentation. Specifically, they found that E-cad is first uniformly expressed inside mESC aggregates. After exposure to Wnt agonist Chiron (Chi), a gradual repression of E-cad and an increase of T-Bra were detected. Cells in the core are tightly packed and express E-cad. T-Bra expressing cells are sparsely wrapped around the core. A directed flow of E-cad expressing cell islands surrounded by T-Bra expressing cells help to accumulate E-cad expressing cells to the tip of the aggregate and form endoderm domain. I think the dynamical expression of E-cad and T-Bra and the directed cell flow reported in this manuscript are interesting. The results and videos have shown that the elongation and formation of endoderm region is a collective cell behavior rather than single cells undergo epithelial-to-mensenchymal transition. But I am not convinced that the process is done based on the three-step mechanism proposed by the authors. Moreover, I am not sure if this phenomenon really happened in mouse embryo development, giving the considerable differences between gastruloid model and embryo. Since there are methods culturing mouse embryo in vitro up to the early organogenesis stage, I would suggest the authors provide more evidence showing that the proposed mechanisms might also happen in vivo.

      In addition, the manuscript provides too little information to understand the phenomenon. And they did not clearly introduce experimental and computational methods they used to acquire the results. I listed some of my comments below.

      Major comments:

      1) Did all 3D aggregates become elongated shape in the presence of Chi? If not, what do E-cad and T-Bra expressions and cell migration dynamics look like in those spherical aggregates? Without Chi, inside the spherical aggregates, do they also have cell migration since the aggregates keep growing larger?

      2) When did the collective cell migration start? Right after exposing to Chi? Or after some percentage of cells become T-Bra positive cells? Did the gastruloid keep elongating with directed cell flow inside it when cultured for a long time?

      3) Are the collective cell migration driven by the T-Bra cells? Is it a spontaneous property of E-cad cells when the E-cad cell density exceed some critical threshold (e.g. glassy dynamics)?

      4) Does the elongation and migration dynamics depend on the concentration of Chi, size of the aggregates? I noticed the authors used different initial seeding densities.

      5) For the elongated cell aggregates, one side of cells express E-cad. How about the other side of cells? Did they all become mesoderm-like (T-Bra+) cells?

      6) Many results are only based on several (3 or 4) gastruloids. For example, figure 1 (d) (e), figure 2 (b), figure 3(c). And in Figure 4 (b), the authors only quantify 13 junctions, probably in the same gastruloid. Due to the heterogeneity among the gastruloids, I am not sure how repeatable the experiments are. Can those observations really reflect phenomenon happened inside the majority of gastruloids? I think the authors should provide some quantifications of the percentage of observing the reported results among a large number of gastruloids.

      Unclear results or experimental descriptions:

      1) Can the authors show a schematic of the experimental process, such as the time of adding Chi and fixation?

      2) 'We find that 30/37 ... set to 0.125.' How did the authors define and calculate the elongation ratio and E-cadherin polarization ratio? How did the authors define the elongation threshold?

      3) Figure 1 (a): what is the y axis? 1 (d): how did the author measure the E-cad and T-Bra expression? Fixing at different time points or live imaging? If it is live imaging, is the acquisition process influenced by adding and removing Chi? 1(e) how can the authors get continuous results for polarization?

      4) Figure 2 (b) Are those dots represents the nuclear position? Can the authors provide the 3D view of the whole gastruloid? (c) What information the authors are trying to get from the connectivity graph?

      5) Figure 3 (a) What are those white dots in the images, also in movie 6? Can the authors replace t1, t2, t3, t4 with the real time, such as 24h, 36h? (d) How did the authors calculate the intensity? How did the authors normalize the intensity? The schematic in (b) is hard to understand. What do the light and dark colors represent? How did the authors measure theta_1 and theta_2, especially in 3D situation? More quantitative information should be acquired from (a).

      6) I am not able to identify islands of E-cad expressing cells in Figure 3 (a) and movie 6.

    3. Reviewer #1:

      General assessment:

      The manuscript by Hashmi et al describes the emergence of endoderm-like cells in a stem cells based embryo model. The particularity of the protocol is that it stimulates transition through an epiblast-like state, then differentiation towards mesoderm after a pulse of Chiron, a Wnt agonist. In those conditions, islets of E-cadherin positive cells emerge, surrounded by Ecad+Brachyury+, then Brachyury positive cells. Those islets fuse together at the tip, possibly due to distinct surface tension and directed cell movements, and express endoderm markers such as Sox17 and FoxA2.

      It is an original approach and concept, raising new questions and possibilities about the mechanisms of endoderm emergence in the mouse embryo. The manuscript is well written and clear.

      Concerns:

      1)The data would benefit from increased clarity in stating, for each experiment, the proportion of aggregates in which a given phenomenon was observed, as well as the number of cells counted in each aggregate, in particular in supplementary figures.

      2) For the migration analysis, it could be interesting to distinguish each cell trajectory in order to distinguish behaviours of the subpopulations.

      3) In terms of the surface tension analysis, performing a similar analysis at different timepoints might be helpful to understand how the islets come to fuse at the tip.

      4) I am not sure about the specificity of the gata6 staining, not that it adds a lot to the story.

      5) The authors might want to discuss how those aggregates evolve, and whether the endoderm-like cells have a potential for further differentiation.

      Conclusion:

      Overall it is an interesting and original observation, well substantiated. More details on the quantification methods would help convince about the solidity of the model: chance of obtaining those cells, amount of cells of each subpopulation including those described in supplementary figures, technical possibility of sorting them for transcriptome analysis etc.