992 Matching Annotations
  1. Nov 2021
    1. Author Response:

      Reviewer #3 (Public Review):

      The paper contains a substantial amount of novel experimental work, the experiments appear well done, and the analysis of the data makes sense. Raw data and analysis scripts have been made fully available.

      I have two specific comments:

      • While the paper talks extensively about deep mutational scanning, I don't think this is a deep mutational scanning study. In deep mutational scanning, we usually make every possible single-point mutation in a protein. This is not what was done here, as far as I can tell.

      In the revised manuscript, we have avoided using deep mutational scanning to describe our experimental design. Instead, we described our approach as “a high-throughput experimental approach that coupled combinatorial mutagenesis and next-generation sequencing”

      • For the analysis of epistasis vs distance (Fig 4d, e, f), it would be better to look at side-chain distances rather than C_alpha distances. In covariation analyses, it can be seen that C_alpha distances are not a good predictor of pairwise interactions. Similar patterns may be observable here.

      See e.g.: A. J. Hockenberry, C. O. Wilke (2019). Evolutionary couplings detect side-chain interactions. PeerJ 7:e7280.

      Thank you for the suggestion. In the revised manuscript, we replaced the Cα analysis by a side-chain analysis according to Hockenberry and Wilke (see response to Essential Revisions above).

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript describes single molecule measurements of rotation of the C10 ring of E. coli ATP synthase in intact complexes embedded in lipid nanodiscs. The major point of the work is to identify the mechanisms by which protonation/deprotonation steps produce torque between the a-subunit and the C10 ring, which is subsequently conveyed to F1 to couple to ATP synthesis. The work explores the pH-dependent of the "transient dwell" (TD) phenomenon of rotation motion to identify likely intermediates, showing a likely step of 11o in the "clockwise" (ATP synthase-related) direction. The results are then interpreted in the context of detailed structural information from previous cryo-EM and X-ray crystallographic reports, to arrive at a more detailed model for the partial steps for coupling of proton translocation to motion. The effects of site-specific mutations in the c-subunits appears to support the overall model.

      While the detailed structural arguments seem, at least to this reader, to be plausible, the text is not structured for any hypothesis testing, and one might imagine that alternative models are possible. No alternative models were presented, so it is not clear to what extent the 11o rotation step rules out such possibilities. This leaves the reader with the feeling that a lot of speculation occurs in the Discussion, but it is very difficult to figure out which parts are solid and which parts are speculation.

      We have now presented the results in terms of hypothesis testing specifying alternative hypotheses that exist in the literature. We then specify results presented in the manuscript that discriminate between alternate hypotheses.

      The Discussion also tries to pack in too many concepts, going well beyond the advances enabled by the TD results themselves. For example, the proton "funnel" concept is quite interesting, but it is not easy to see how the TD leads up to it. This overpacking makes it difficult to pinpoint the real advances, and dilutes the message sets the reader up to ask for more support for such extensive modeling. Do the mechanistic details set up good testable hypothesis for future experimental tests?

      It is clear that the pKa values we determined are the result of multiple residues involved in the proton transfer process. It is currently not possible to determine where the input channel starts. The recent structures now show that the residues that were thought to define the input channel are far from the surface and must communicate with the periplasm via the funnel. Residues in the funnel likely impact the pKa values that we have measured. The proton transfer-dependent 11 degree step that we measured must also depend upon the funnel. Our results clearly show that this 11 degree step depends upon the correct protonation states of both the input and output channels, and that this depends on the differences in the high and low pKa values. The possibility also exists that this funnel that is absent in the output channel may provide a proton reservoir to supply the input channel, which promotes the ability of input and output channels to drive the 11 degree synthase steps. We have now included this information in the manuscript, and for these reasons, we decided that discussion of the funnel must remain.

      Overall, the text would be far more impactful if it focused more tightly on the implications of the TD results themselves, testing specific sets of models, and taking more care to guide the readers through the interpretation.

      We have extensively rewritten the entire manuscript to address these issues. To help guide the readers, we added more background information to the introduction and pose alternate hypotheses. In the results, we now guide the readers by restating how the experiments can test a given hypothesis, and include brief conclusions that explain why a hypothesis is eliminated or favored based on the results. We shortened the Discussion to make it more focused, with the exception that we provided additional information that has been requested by the reviewers. We also tie each point in the discussion to the results presented. Of course, a good discussion is meant to put the results and conclusions of the manuscript into the context of results and conclusions from other laboratories, which we have done.

      Reviewer #2 (Public Review):

      This brilliant, beautiful and important study provides the essential kinetic framework for the recent, static high-resolution cryo-EM structures of F1FO ATPases from bacteria, chloroplasts and mitochondria. The elegantly conducted single-molecule work is necessarily complex, and its analysis is difficult to follow, even for someone who is intimately familiar with F1FO ATPases. Some more background and better explanations would help.

      We added additional background information to the Introduction, and we now periodically explain the reasoning and conclusions in the Results to help guide the readers.

      For F1FO ATPases, CCW rotation has little if any biological relevance, whereas CW rotation is centrally important. Evidently, the CW ATP synthesis mode is not accessible to the approach taken in this manuscript, since the ATP synthase is reconstituted into lipid nanodiscs rather than liposomes. This critical fact should be stated more clearly in the introduction.

      We now state explicitly that net rotation was observed as the result of F1-ATPase activity as requested. We also note that E. coli does sometimes use F1Fo as an ATPase-dependent proton pump to maintain a pmf across the membrane depending upon metabolic conditions.

      The central concepts of "transient dwells", "dwell times" and "power strokes" need to be introduced more fully for a general, non-expert audience.

      We added this information to the Introduction as requested.

      The manuscript describes the power stroke and dwell times in CCW ATP hydrolysis mode in unprecedented detail. Presumably the dwell times and power strokes apply equally to the physiologically relevant CW ATP synthesis mode, but are they actually the exact reverse? Is there evidence for transient dwells and 36{degree sign} power strokes divided into 11{degree sign}+25{degree sign} substeps during ATP synthesis?

      The 36° subunit-c stepping that contain 11° synthase-direction steps is a novel observation first reported in this study. To date, single-molecule studies of rotation during net ATP synthesis have been carried out using single-molecule FRET that have been able to observe only 3 or 4 consecutive synthase steps for a given F1Fo molecule (Dietz et al. ((2004) Proton-powered subunit rotation in single membrane-bound FoF1-ATP synthase. Nature Struct & Mol Bio). The FRET measurements do not have the time resolution to resolve sub-steps. Whether or not continuous rotation in the synthesis direction is the exact reverse of ATPase-dependent rotation is an important question that remains to be answered.

      The meaning of low, medium and high efficiency of transient dwell formation (Figure legend 2; lines 189/190; Figure 3; line 365) is not obvious and not well explained. How are these efficiencies defined? Why are they important? What would be 100% efficiency? And what would be 0%?

      Background information concerning the three efficiencies of transient dwell formation has been added to the Introduction, and we now explain their importance. We also now explain what 100% and 0% efficiency is in the Results.

      Why is it important whether transition dwells do or do not contain a synthase step? Is this purely stochastic? If not, what does it depend on?

      We added a paragraph to the discussion to explain that their formation depends on the kinetics of the rate of formation of the interaction between subunit-a and the c-ring versus the velocity of ATPase-depending rotation in the opposite direction, and that it depends on the energy that can drive the synthase-direction step relative to the energy that drives the ATPase direction power stroke. More work is required to define the energetic parameters of these opposing rotations that is beyond the scope of the work presented here.

      The formation of a salt bridge between aR210 of subunit-a and cD61 of the c-ring rotor would seem to be counter-productive for unhindered rotary catalysis. What is the evidence for such a salt bridge from the cryo-EM structures or molecular dynamics simulations?

      This is an excellent question, especially since the distances between aR210 and cD61 are more consistent with intervening water molecules. We revised the paragraph in the Discussion describing this point and have been more explicit about the importance of the aqueous vestibule between the output channel and aR210 must play during rotation, which includes the impact of the dielectric constant inside the vestibule. As a direct answer to the reviewer’s question, in the absence of water, a salt bridge between aR210 and cD61 in such a hydrophobic environment would be so strong that the energy of a proton from the input channel would never be able to dislodge them.

      Reviewer #3 (Public Review):

      Yanagisawa and Frasch utilise a gold nanorod single molecule method to probe the pH dependency of F1FO rotation. The experimental setup has been previously used to investigate both F1-ATPase and FO function in multiple studies. In this study, clockwise rotations are observed in transient dwells which may correlate to synthesis sub-steps. Mutations along the proposed proton path modify the pH dependency of the transient dwells.

      The strength of this manuscript can be seen in the rigorous way in which the problem has been explored. Testing the pH dependence of mutants along the proposed proton path and linking this to potential sub-steps using the known atomic structure.

      In my view, the main weakness of this study is the experimental design (shown in Fig. 1C). Strictly, the measurements show rotation of the c-ring relative to subunit-Beta rather than relative to subunit-a. Recent structures of E. coli F1FO ATP synthase inhibited by ADP (doi: 10.1038/s41467-020-16387-2) have shown that the peripheral stalk is flexible and can accommodate movements of the c-ring relative to the F1 (AlphaBeta)3-subunit ring. For example, comparison of PDB entries 6PQV and 6OQS shows that FO (the c-ring and subunit-a) can rotate 10 degrees as a rigid body relative to the F1 (AlphaBeta)3-subunit ring - with no relative rotation between the c-ring and subunit-a, or rotation of subunit-gamma. The authors discuss structures from this study related by a 25 degree rotation of the c-ring relative to subunit-a, but I do not believe they have ruled out the possibility that their observations show rotation of the FO as a rigid body. A preprint investigating E. coli F1FO ATP synthase in the presence of ATP has proposed that the complex becomes more flexible during ATP hydrolysis (doi: 10.1101/2020.09.30.320408), with the central stalk twisting by up to 65 degrees. The small CW movements seen in the transient dwells in this study could be attributed to 36 degree FO sub steps, facilitated by central stalk flexibility, with counter rotation facilitated by peripheral stalk flexibility.

      The data clearly show that the mutations of subunit-a residues in the input or output channels significantly change the pKa values of TD formation (Figs 2B and 2C), and can dramatically change the occurrence of the synthase-direction steps (see Figs 4D and 4E). These results clearly indicate that the rotational events observed in this study do not result from rotation of subunit-a and the c-ring as a unit.

      With regard to the recent structures that the reviewer refers to, we report differences in the efficiency of TD formation that are consistent with torsion induced by rotation of the c-ring relative to the beta subunit, which we reported previously (Yanagisawa and Frasch, JBC 2017), and which has been confirmed by independent single-single molecule studies by the Junge lab and by the Boersch lab using different approaches to our own (Sielaff et al., Molecules 2019). Both papers are cited in the manuscript. We have now expanded the introduction to include these results describing the impact of central stalk flexibility on the ability to form synthase-direction steps, and how these results are consistent with E. coli cryo-EM structures similar to those referred to (27).

      It is also unclear what causes the stochastic nature of transient dwells. Are these related to inhibition of F1-ATPase? Could increased drag in FO increase the likelihood of F1-ATPase inhibition?

      We now include background information from our prior publications that characterizes the kinetic component that affects the ability to form a transient dwell. Ishmukhametov et al. EMBO J (2010), reported an increase in TDs upon increasing the drag on the nanorod that slowed the power stroke angular velocity. We decribed the kinetics of TD formation in that paper. In the Discussion, we also now provide information concerning how the bioenergetics can impact the probability of TD formation.

    1. Author Response:

      Reviewer #3 (Public Review):

      [...]Overall, the quality of the RNAseq data seems sound, and the conclusions presented seem mostly supported by the data. Additionally, the manuscript is well written and easy to read.

      We are thankful to hear that the quality of the paper convinces the reviewer.

      The spatial profiling of Physarum by physically segregating it by centrifuging it into a 384-well plate is clever. While the approach is probably cannot be generalized to most organisms, it still provides a nice example of creative experimental design that is somewhat lacking in the single-cell genomics field at the moment. Moreover, given that there seem to be no/few published studies with RNA in situ hybridization gene expression patterns in this animal, it probably provides a wealth of information to Physarum researchers.

      We are glad to hear that the reviewer finds our experimental design clever and we also believe that our manuscript is a great resource for the Physarum but potentially also the whole protozoan and fungi community.

      Some aspects of the experimental design potentially limit the conclusions that can be drawn from the data. The authors find that plasmodia in distinct states of life (mitotic, non-mitotic, chemotaxing, contacting food) have broad syncytium-wide transcriptional differences. A major caveat of this finding is each separate condition was only profiled once without replicates, which makes it more difficult to tie which of these transcriptional differences are related to the samples' biological differences and which might be a batch effect.

      We agree with the reviewer on that point and try in our revised manuscript to also highlight similarities between replicate samples (SM1/2) and similar plasmodium parts (fans) in Fig. 2- Fig. supp. 1D-F and not only the differences in order to show that there are also overlaps suggesting that differences are not driven by batch. However, some of our observations might have some effect due to batch, and we have been careful in the revised manuscript to point this out. We feel that the methods that we have established in this manuscript can be employed in higher throughput in the future in order to sample multiple environmental conditions across multiple batches. Before our work, it had been entirely unclear if there is transcriptional heterogeneity within the syncytium and this is one of our major findings, which we feel is a very robust, interesting, and important finding.

      Additionally, it's not clear why the authors profiled different timepoints via snRNAseq (1 week with oat flake) and spatial RNAseq (only a few hours with oat flakes) in their experiment to assess feeding behavior.

      We thank the reviewer for the comment. The reason for not having the same time point is indeed not ideal but happened for technical reasons. The main goal for the snRNA-seq data acquisition was to obtain a rough spatial separation of fan and network parts which we believe is important to link the nuclei data to the spatially resolved data as shown in Fig. 3 (see point below). In order to obtain enough material, we needed to grow the plasmodium longer, whereas we performed all the Spatial Transcriptomics experiments in a more condensed time frame to keep at least these experiments as comparable as possible.

      While the work identifies spatial heterogeneity and nuclear heterogeneity, they are not directly compared (how much of the nuclear heterogeneity is explained by spatial heterogeneity?) perhaps because different timepoints were used with the two approaches?

      We apologize that this point was not conveyed clearly in our manuscript. We do try to draw a direct link between the nuclear and the spatial heterogeneity in Fig. 3G where we correlate the transcriptomes of the clusters of SM4 with the clusters identified for the secondary plasmodium. Strikingly, the fan specific cluster in SM4 is highest correlated with the nuclei cluster that is strongly enriched with nuclei from the fan sample allowing to draw a direct link between the nuclei and the spatially resolved data. We rewrote the corresponding paragraph accordingly. This also highlights that age has obviously rather a minor impact for the comparison. However, we agree that there are technically more direct techniques available like measuring the mRNA in situ (e.g. SeqFish or OsmFish) which are, unfortunately, being far away from being routinely established in Physarum.

      Additionally, some aspects of the analysis seem to miss opportunities. A significant portion of the presentation of the gene expression results discovered by the authors is focused on the cell cycle, which seems less exciting than perhaps other biological phenomena related to structural specialization within different parts of the organism or related to its feeding and metabolic behaviors might be.

      We are thankful for the reviewer’s suggestions and added new panels to Fig. 2- Fig. supp. 1D-F that focus on similarities between and within samples to emphasize other biological phenomena like common transcriptomic fan signatures. In line with this, we performed additional GO enrichment analyses (Fig. 2- Fig. supp. 3 and Fig. 3-Fig. supp. 1) which, for instance, also highlight GO enrichments in dependence on nutrient interaction in addition to our findings regarding cell cycle progression in Physarum. We agree that there are a lot of very interesting biological phenomena that can be explored with Physarum, however, we also think that regulation of cell cycle within the syncytium and in free-living cells is also interesting and important.

      Also, while multiple classes of nuclei (stationary and mobile) are identified, it's unclear how those relate to the different transcriptional states identified through the snRNAseq.

      We thank the reviewer for making this point and we would also like to understand differences between stationary and mobile nuclei. We agree that the next step is to find ways how to specifically label, track and isolate moving/stuck nuclei in order to understand how the different classes of nuclei are linked to the transcriptomic differences. Unfortunately, this is not yet possible but has to be the goal for future research.

      Lastly, some aspects of the presentation detract from the work. Some of the results and discussion focus on 'coordinated intra-syncytial behaviors', and a major one of focus is that a wave of mitosis seems to proceed across the organism. However, to my knowledge, many syncytial systems (e.g. Xenopus or Drosophila embryos) exhibit synchronized mitosis, so I would have expected this to be the default state, rather than an exciting finding. Is this result unexpected? If so, it would be helpful if better contextualized. One aspect that would likely improve this manuscript would be to place it more firmly within the larger context of well-studied syncytial cells that exhibit specialization. For instance, a major example of a well-studied syncytium that exhibits spatial gene expression and nuclear specialization is the Drosophila embryo, which undergoes much of its early patterning while syncytial. Furthermore, muscle cells are typically syncytial, and some exciting recent studies have similarly used snRNAseq to observe heterogeneity and specialization of particular nuclei.

      We are thankful for that comment and contextualize this finding now more clearly in our Discussion. Briefly, synchronous cell division is not the default state in acellular slime molds and also not for similarly functioning systems like fungi hyphae. However, the reviewer is right that the synchronized division is not a new finding we made neither for Physarum nor for other syncytial systems in general. Our research, however, adds to this view that such a synchronization wave can be established over a large distance in Physarum (much larger than in ‘standard’ developmental biology systems like frogs and flies) while nuclei are in addition in a continuous shuttle flow which makes it a unique model system compared to the different nuclei states that are established in more rigid systems like muscles and embryos. In addition, a link between mode of nuclei division and plasmodial size and age exists but is not yet understood emphasizing its uniqueness as model system.

      Lastly, while not meant to diminish the contributions of this work, it does seem that given the diversity of syncytia that are well studied and exhibit nuclear specification, perhaps the title "Nuclei are mobile processors enabling specialization in a gigantic single-celled syncytium" oversells the results presented in this work.

      We thank the reviewer for this point and changed the title of the manuscript to avoid overselling and to more accurately reflect the research presented in the manuscript. The new title is: “Spatial transcriptomic and single-nucleus analysis reveals heterogeneity in a gigantic single-celled syncytium.” We agree that more work is needed to finally prove that mobile nuclei can be used to more quickly ‘seed’ new transcriptomic states in other plasmodial parts, thereby acting as mobile processors.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors extend their previous work on thymus epithelial cells (TECs) and antigen presenting cells (APCs), which focused on TLR signaling in TECs and monocyte-derived dendritic cells (mDCs), and now focus on medullary (m) TECs and ask whether different subsets of DCs can uniquely serve as APCs for tissue-restricted antigens (TRAs) expressed by mTECs.

      The approach used makes use several reporter transgenic mouse models to conditionally express a fluorescent reporter gene in mTECs (Defa6-cre) or in TECs more broadly (Foxn1-cre or Csnb-cre). Their findings show that restricted expression of reporter genes in a subset of mTECs, which typically expressed AIRE-dependent TRAs, are typically presented by a subset of DCs that express XCR1 and CCR7, which, together with mDCs, are able to perform cooperative antigen transfer (CAT) effectively. The authors also examined the ability of different DC subsets to acquire antigens from other DCs, and show that mDCs excel in this ability.

      Overall, the work is clearly presented and makes use of several elegant mouse models to further delineate the role of different DC subsets in T cell selection. However, as pointed out by the authors, it remains to be determined whether the differences in the ability of different DC subsets to perform CAT has any fundamental impact in the establishing self-tolerance, either by negative selection or induction of Treg differentiation.

      We thank Reviewer #1 for this positive comment. We concur that describing a direct effect of the preferential pairing in CAT between TECs and DCs on the mechanisms of central immune tolerance would be very insightful and beneficial for the scientific community. However, a better understanding of the rules underpinning the interactions of this dynamic thymic cellular network in its complex and physiological outcomes will require the development of novel cellular tools and organismal genetic models, which is beyond the scope of this current work.

      Reviewer #3 (Public Review):

      In this manuscript, Voboril et al. address the question of whether specific dendritic cell (DC) types in the thymus acquire antigens from distinct thymic epithelial cell subsets. It is well-documented that both medullary thymic epithelial cells (mTECs) and thymic DCs present self-antigens to thymocytes to induce central tolerance. mTECs express the majority of the proteome, including AIRE-dependent tissue restricted antigens (TRAs), and thymocytes must be tolerized against this diverse set of self-antigens to prevent autoimmunity. While some self-antigens are expressed abundantly in an AIRE-independent manner by mTECs, a given AIRE-dependent TRA is expressed at low levels by only 1-3% of mTECs, raising the question of how thymocytes encounter rare, sparse self-antigens during their residence in the medulla. Part of the answer to this comes from the fact that thymic DCs can acquire self-antigens from mTECs to improve the efficiency of display to thymocytes. As the authors point out, several recent studies have utilized single-cell transcriptional profiling to identify multiple distinct medullary thymic epithelial cell subsets. Furthermore, work from the authors' lab and others has demonstrated heterogeneity within the thymic DC compartment. Thus, the authors set out to address whether DC interactions with mTECs are promiscuous or whether specific DC cell types interact with different mTEC subsets to acquire self-antigens to induce tolerance to different types of self-antigens, such as Aire-dependent versus Aire-independent TRAs or ubiquitous antigens.

      Using mouse strains that express a fluorescent protein in TECs under the control of different promoters, the authors use flow cytometry to determine the relative ability of thymic DC subsets to acquire self-antigens, TdTomato in this case, from TEC subsets. Using linear regression modeling to compare the frequency of TEC subsets expressing TdTomato in the different reporter strains to the frequency of DC subsets that acquired TdTomato, the authors conclude that there is specificity to the interactions of different DC subsets with distinct mTECs, resulting in antigen acquisition by the interacting DCs. Specifically, they conclude that pDCs and macrophages acquire antigens from mTEClow cells, cDC1 and activated XCR1+ DCs acquire antigen from mTEChigh cells, activated XCR1+ and activated XCR1- DCs acquire antigen from pre-post-Aire mTECs, cDC2 exclusively acquire antigen from Post-Aire mTECs, and activated XCR1+ DCs acquire antigen from Tuft cells. It is well documented that activated XCR1+ DCs (Ardouin et al. 2016, Oh et al. 2018, Perry et al. 2018) and activated XCR1- DCs (Leventhal 2016) acquire and present Aire-dependent self-antigens from mTECs to induce thymic central tolerance. Thus, the most novel claim is that cDC2 acquire antigens from Post-Aire mTECs. However, given that the model is derived from the linear regression analysis of fluorescent reporter mice in which multiple TEC subsets express each reporter, and there are some known caveats to these analyses, this interesting conclusion is not adequately supported by the data. The authors cleverly use Foxn1-cre ConfettiBrainbow2.1 reporters to conclude that moDCs are particularly adept at acquiring antigen serially from different mTECs. Furthermore, they use mixed bone marrow congenic/reporter mice to demonstrate that moDC are particularly good at acquiring antigens from other DC subsets. These two conclusions are well-supported by the data, although it is notable that while moDC are efficient at these two processes, other DC subsets acquire antigens from DCs, and fluorescent reporter acquisition does not indicate the ability to process and present antigens to developing T cells to promote central tolerance, somewhat reducing the impact of the findings. Altogether, this is a promising study that cleverly uses a variety of mouse models with flow cytometric analysis of recently identified TEC and DC subsets to delve into whether specificity of interactions between TEC and DC subsets could enable distinct DC subsets to contribute differentially to central tolerance induction against distinct types of self-antigens. However, the conclusions could be strengthened by additional analyses.

      We thank the Reviewer for the many comments and suggestions. We are well aware of the fact that our model is based on the linear regression analysis that brings about some known caveats, which have been now newly described in the Discussion section. As suggested by the Reviewer, to better resolve some discrepancies and strengthen our conclusions, we conducted a novel analysis of our data and made some vital changes to the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      Mulholland et al show that there is a very close relationship between the development of excitatory and inhibitory networks in the developing cortex. This paper makes an important contribution to our understanding of the structure of inhibition during an early stage in cortical development. The work has been carefully performed and analysed. The changes I suggest are principally to improve clarity in some places.

      Lines 48-55: express these possibilities more didactically as individual items in a list, rather than grouping the first two together.

      We thank the reviewer for this suggestion and have revised our manuscript accordingly.

      Fig 2: Show a raw image of the imaging window, with a scale bar.

      We thank the reviewer for the suggestion, and we have added a raw image of the imaging window to Figure 2.

      Line 81: clarify the type of imaging used (both wide-field and 2-photon are mentioned in the Introduction).

      The experiments in Figure 2 were performed using wide-field calcium imaging. We have now clarified this point in the main text and figure legend.

      Line 89: This is presumably in a different set of animals: make this clearer. n value?

      The reviewer is correct that the animals in Figure 3 are a different experiment from those in Figure 2. We have clarified the text, and now more clearly state the number of animals in these experiments.

      Fig 3b: "Example mean trace…"

      We have edited the figure legend to incorporate this suggestion.

      Fig 3e: Is this for one animal or averaged over all? Why not show negative correlations as well?

      Figure 3e shows the values of correlation maxima for a single representative experiment shown in panels (b-d, f). We quantified the spatial extent of correlations by measuring the amplitude of positively correlated peaks in the correlation pattern as a function of distance from the seed point, and therefore only plotted the values of these peaks in Figure 3e.

      We have revised the axis label for this panel to more clearly indicate that we are plotting the values of correlations at local maxima (New label: “Correlation at maxima”).

      Fig 3f: This appears to be the same image as S1c?

      The reviewer is correct. Our original Figure S1 (now Figure S2) aims to elaborate on the description of correlation fractures shown in Figure3f, and thus reproduces the fracture image shown previously with a more detailed explanation. We have revised the legend for Figure S2c to clearly indicate that this panel is reproduced from Figure 3f.

      Line 119-120: Suggest you explain more clearly that this is a preliminary step towards the later simultaneous E-I imaging, and why it's still useful given that it's somewhat indirect compared to the later simultaneous imaging.

      We thank the reviewer for the suggestion, and have revised the manuscript to better convey the rationale behind imaging spontaneous inhibitory and excitatory activity in separate animals.

      Line 124: Why tenth of maximum? Can you confirm that the main results are not highly sensitive to this choice?

      We selected the tenth of the maximum of the fitted two-dimensional gaussian as a way to estimate the diameter of the active pixels within an event domain. When using an alternative threshold, such as the full-width at half maximum, the values for inhibitory and excitatory domain size scale together, indicating that the relationship between the relative size of inhibitory and excitatory active domains is not sensitive to this threshold. However, we found that full-width half max underestimated the diameter of the active region, leaving out active pixels. We have now included the values for full-width half max in the text of the results, to show that the threshold does not impact the relationship between inhibitory and excitatory domain size.

      Fig 4: It would be clearer to give the graph parts of a and d their own panel labels. Is it worth explicitly flagging that none of the E-I distributions are significantly different? It would be useful to explicitly define pink and blue in the caption. Does panel e have units? In panel a I'm confused about what is being referred to as "circles" and what as "lines". Panel f caption: "principal components"

      We have incorporated these suggestions into a revised figure. We have included bars to indicate non-significance of E-I distributions, updated panel e and f figure axes, and edited the figure legend to provide more explicitly define inhibitory and excitatory data color scheme.

      Line 143: suggest "similarity in the statistical structure", to make it clearer what you mean by "structure".

      We have revised the text to explicitly refer to “similarity in the spatial structure” in this paragraph.

      Line 159: state the n value here.

      We have updated the text to include the number of animals used in this particular experiment.

      Fig 5: It would be interesting to speculate about the significance of the correlation fractures. In the rhs of panel b the red seed points are almost impossible to see, since they occur on top of a red patch (perhaps include a black or white border to these circles?). a caption: "-2-2 z-score" is confusing; perhaps say "-2 to +2 z-score". g caption: Is this correlation similarity for E, I or both?

      Fractures show areas in the network where there are sharp discontinuities in the spatial patterns of the correlations. As discussed in the results, mature functional maps in the visual cortex also exhibit smooth variation punctuated by abrupt discontinuities, such as pinwheels and fractures in orientation preference maps (Bonhoeffer and Grinvald, 1991), or direction reversals in direction preference maps (Okai et al., 2005). Previous work demonstrated that correlation fractures derived from spontaneous activity after eye opening precisely coincided with the highrate-of-change regions in the orientation preference map (Smith et al., 2018). Therefore, the correlation fractures observed from spontaneous inhibitory activity prior to eye opening reveal that the underlying network is already precisely organized, and while the fractures are refined over development (Smith et al 2018), they may provide insight into the future structure of the mature orientation preference map, although this remains to be directly tested.

      We thank the reviewer for the suggestions to improve the clarity of this figure. We have changed the color of the seed points, to make them easier to see, and have adjusted the figure legends for Panel a. In Panel g, the correlation similarity is comparing E vs I networks as a function of distance, and the figure legend has been updated to more explicitly state this.

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript is clearly written, the data are largely of sufficient quality, and the findings are certainly of interest to the endosomal research community. I also agree with the model the authors propose.

      We thank the reviewer for their appreciation of this study.

      One weak point of the study is the dependence on microscopic techniques to analyze integrin surface levels and endosomal recruitment of the retriever complex. The study would have benefited from additional methods to confirm the microscopy data.

      We agree. We have now added an experiment showing the changes in the surface levels of β1-integrins by flow-cytometry. This technique allowed us to easily sample 10,000 cells per experiment (New Figure 1G).

      It would also be good if the authors could confirm their inhibitor studies with genetic suppression/deletion of PIKfyve, ideally followed by rescues with a kinase deficient mutant.

      We agree. We now include experiments where we depleted PIKfyve with siRNA and rescued PIKfyve by exogenous expression of siRNA resistant PIKfyve. Depletion of PIKfyve caused a decrease in surface levels of β1- and α5-integrin by 21% and 17% respectively (Figure S3), which is very similar to what we observed by immunofluorescence.

      Later in the manuscript, we also tested changes in the endosomal localization of COMMD1 due to PIKfyve inhibition (Figure 7), which resulted in a 22% decrease. We now include new experiments showing PIKfyve depletion. For the depletion studies, we also tested rescue by PIKfyve expression. PIKfyve depletion resulted in a statistically significant but modest lowering of COMMD1 endosomal localization by 15%. This decrease was rescued back to basal levels on endosomes by re-expression of PIKfyve (Figure S8).

      Since PIKfyve has so many roles, and since the PIKfyve inhibitor has been shown to be specific by so many labs, we consider acute PIKfyve inhibition preferable. Long-term depletion studies are more likely to result in effects are indirect or provide cells with a chance to adapt.

      The authors rely on microscopy of integrin beta 1 for most of their data. However, SNX17 and the retriever complex are not required for the recycling of all beta 1 integrins (Steinberg et al., 2012). In HeLa cells, it is mainly integrin alpha 5/beta1 that is recycled by SNX17. Therefore, the other beta 1 integrins that recycle SNX17 independently tend to mask the recycling phenotypes caused by the loss of SNX17/retriever. I think that the authors could detect a much more pronounced recycling phenotype upon PIKfyve inhibition if they stained integrin alpha 5 instead of integrin beta 1. Does integrin alpha 5 "get stuck" in a LAMP1 positive compartment similar to what Steinberg et al., 2012 or McNally et al., 2017 describe in their studies? These two studies clearly show that almost all integrin alpha 5/beta 1 accumulates in a LAMP1 or LAMP2 positive compartment upon loss of retriever/SNX17 function. If the authors are correct in their assumptions, this should be happening upon loss of PIKfyve activity. One could use the Abcam antibody against integrin alpha 5 that was used in the McNally et al. study as it works very well.

      We did not see pronounced differences in surface levels of α5-integrin vs. β1-integrin upon inhibition or depletion of PIKfyve. For example, PIKfyve siRNA treatment lowered the surface levels of α5-integrin and β1-integrin in the similar range by 21% and 17 % respectively, as measured by immunofluorescence microscopy.

    1. Author Response:

      Reviewer #1:

      Summary:

      Moody et al. presented a comprehensive investigation into the choice of marker genes and its impact on the reconstruction of the early evolution of life, especially regarding the length of the branch that separates domains Bacteria and Archaea in the phylogenetic tree. Specifically, this work attempts to resolve a debate raised by a previous work: Zhu et al. Nat Commun. 2019, that the evolutionary distance between the two domains is short as estimated using an expanded set of marker genes, in contrast to conventional strategies which involve a small number of "core" genes and indicate a long branch.

      Through a series of analyses on 1000 genomes, Moody et al. defended the use of core genes, and reinforced the conventional notion that the inter-domain branch (the AB branch) is long, as inferred by the core gene set. They proposed that with the 381 marker genes (the "expanded" set) used by Zhu et al., the observed short branch length is an artifact due to inter-domain gene transfer and hidden paralogy. Through topology tests, they ranked the markers by "verticality", and showed that it is positively correlated with the AB branch length. They also conducted divergence time estimation and showed that even the most vertical genes led to an implausible estimate of the origin of life.

      In parallel, Moody et al. surveyed the best marker genes using a set of 700 genomes. They recovered 54 markers, and demonstrated that ribosomal markers do not indicate a longer AB branch than non-ribosomal markers do. With the better half (27) of these marker genes, they conducted further phylogenetic analyses, which shows that potential substitutional saturation and the use of site-homogeneous models could contribute to the underestimation of the AB branch. Using this taxon set and marker set, they reconstructed the prokaryotic tree of life, which revealed a long AB branch, a basal placement of DPANN in Archaea, and a derived placement of CPR in Bacteria.

      Prokaryotic tree of life:

      The scope(s) of the manuscript is somehow split. First, it is posed as a point-to-point rebuttal to the Zhu et al. paper, on the long vs. short AB branch question. Second, it introduces a new phylogeny of prokaryotes using 27 "good" marker genes, and demonstrates that DPANN is basal to Archaea, and CRP is derived within Bacteria.

      Thanks for the summary. The two aspects of the manuscript identified by the reviewer are closely related, because the different issues boil down to the same underlying question: which genes should we use to infer the deep structure of the tree of life? The provocative work of Zhu et al. acted as an impetus to compare and evaluate the properties of several published marker gene sets, and then to identify (what our analyses suggest are) the subset best-suited for deep phylogeny, which we then use to infer an updated tree of life. We have clarified this logical structure in the revised manuscript, writing (at the end of the Introduction):

      “Here, we investigate these issues in order to determine how different methodologies and marker sets affect estimates of the evolutionary distance between Archaea and Bacteria. First, we examine the evolutionary history of the 381 gene marker set (hereafter, the expanded marker gene set) and identify several features of these genes, including instances of inter-domain gene transfers and mixed paralogy, that may contribute to the inference of a shorter AB branch length in concatenation analyses. Then, we re-evaluate the marker gene sets used in a range of previous analyses to determine how these and other factors, including substitutional saturation and model fit, contribute to inter-domain branch length estimations and the shape of the universal tree. Finally, we identify a subset of marker genes least affected by these issues, and use these to estimate an updated tree of the primary domains of life and the length of the stem branch that separates Archaea and Bacteria.”

      The second scope has inadequate novelty. A recent paper (Coleman et al. Science. 2021), which was from a partially overlapping group of authors, was dedicated to the topic of CPR placement, and indicated the same conclusion (CPR being derived and sister to Chloroflexi) as the current work does, albeit using more sophisticated approaches. The paper also addressed the debate of CPR placement (including citing the Zhu et al. paper). Additionally, the basal placement of DPANN has also been suggested by previous works (such as Castelle and Banfield. Cell. 2018). Therefore, re-addressing these two topics using a largely well-established and repeatedly adopted method on a relatively small taxon set does not constitute a significant extension of current knowledge.

      We disagree. Resolving the deep structure of the tree of life is an important topic --- this is what we, Zhu et al. (2019), and of course many others have been trying to achieve, in different and sometimes conflicting ways. Most of the published work is based on limited or biased taxon sampling (see Figure 1 Figure Supplement 14,15,16) or else focused on just one of the two prokaryotic domains of life. Furthermore, deep phylogeny is uncertain, and new results become convincing only when they receive support from multiple datasets and approaches. For instance, Coleman et al. (2021) recently found support for the placement of CPR as a sister clade to Chloroflexota rather than as a basal branch within the Bacteria. Notably, this work focused only on Bacteria, and made use of a different rooting method (with its own strengths and limitations) and taxon sampling. Most previous analyses using Archaea as an outgroup to root the bacterial tree recovered CPR as a deeply branching lineage within Bacteria, a placement likely resulting from LBA. In turn, our present findings represent an important confirmation of the CPR+Chloroflexi clade. Similarly, the basal placement of DPANN within Archaea remains controversial despite a number of studies on the topic, and our study also contributes to that ongoing debate.

      The debate:

      The first scope appears to be the more important goal of this manuscript, as it extensively discusses the claims made by Zhu et al. and presents a point-to-point rebuttal, including counter evidence. This may narrow the interest of this work to a small audience of specialists. Nevertheless, to best evaluate the current work, it is necessary to review the Zhu et al. paper and compare individual analyses and conclusions of the two studies.

      In doing so, I found that the two articles have distinct scopes that appear similar but not actually inline. To a large extent, the current work does not constitute actual rebuttal to the points made by Zhu et al. In contrast, some of the analyses presented in the current work support those by Zhu et al., despite being interpreted in a different way. For the claims that directly contest Zhu et al., I do not see sufficient evidence that they are supported by the analyses.

      Below is a summary of the comparison, which I will explain point-by-point in later paragraphs.

      • Moody et al. assessed AB branch length, while Zhu et al. assessed AB evolutionary distance (which is different).
      • Moody et al. evaluated the phylogeny indicated by a small number of core markers, while Zhu et al. evaluated the genome average using hundreds of global markers.
      • Zhu et al.'s results also showed that gene non-verticality, substitutional saturation, and site-homogeneous models shorten the AB distance, which is consistent with Moody et al.'s.
      • However, Zhu et al. found that some core markers are outliers in the genome-wide context, and the long AB distance indicated by them cannot be compensated for by the aforementioned effects. Moody et al. hasn't addressed this. Therefore, the novelty and potential impact of the current work is less compelling: It used a classical method (a few dozen core genes) and found a pattern that has been found many times by some of the same authors and others (including Zhu et al., who also analyzed core genes).

      Thanks for this detailed comparison of the two studies --- the points raised here and elaborated on below have prompted us to perform additional analyses which provide further insight into the properties and behaviour of the various marker gene sets analyzed. We nonetheless disagree that “the current work does not constitute actual rebuttal to the points made by Zhu et al.”: our finding that ribosomal and other “core” proteins are among the best phylogenetic markers for resolving both within- and between-domain relationships, estimating the length of the AB stem, and performing divergence time estimation, challenges an important claim of Zhu et al.’s study, and will be of broad interest to the community of researchers working on early life/early evolution.

      That said, we do also agree that one aspect of the disagreement between our study and that of Zhu et al. has to do with what is meant by evolutionary distance, and we have now discussed these issues in detail in the revised manuscript (as detailed below). In revising the manuscript, we have also sought to avoid a reductive focus on rebuttal, have revised the text to acknowledge important strengths and interesting features of the Zhu et al. analyses, and have made text revisions to ensure a consistent constructive tone: these are fundamental and challenging questions, and different perspectives and analyses are valuable in making progress. We also note that there has been an ongoing debate about the suitability of ribosomal genes for deep phylogeny in the literature (e.g. Petitjean et al. 2014, discussed in more detail below). Our analyses, and those of Zhu et al. (2019) previously, contribute to that broader discussion.

      Detailed responses to each of the above points follow below.

      AB distance metric:

      There is a subtle but critical difference between the scopes of the two papers: The Zhu et al. paper "reveals evolutionary proximity between domains Bacteria and Archaea". By stating "evolutionary proximity", it investigated two metrics: The length of the branch separating Archaea from Bacteria in the phylogenetic tree, i.e., the "AB branch". This was the main focus of the current work.

      The average tip-to-tip distance (sum of branch lengths) between pairs of Archaea and Bacteria taxa in the tree. A significant proportion of the Zhu et al. work was discussing this metric, and it led to several important conclusions (e.g., Figs. 4F, 5). The current work has not explored this metric.

      Thanks for raising the point about relative AB distance. In our revised manuscript, we have expanded Figure 1 and the associated analyses to include this metric. These analyses demonstrate that relative AB distance behaves similarly to AB branch length: they are positively correlated with each other; both are reduced by inter-domain HGT, and both are negatively correlated with ΔLL and with split score, an additional metric of within- and between-domain marker gene verticality which we have included in the revised Figure 1. Taken together, these results suggest that high-verticality marker genes (as judged both by the recovery of reciprocal AB monophyly, and of established within-domain relationships) support a longer AB branch and show a higher relative AB distance.

      These two metrics implicate distinct research strategies: For 1), HGTs and paralogy are usually considered problematic (as the current and many previous works argued). However, 2) is naturally compatible with the presence (and prevalence) of HGTs and paralogy.

      Authors of the current work equate "genetic distance" to "branch length" (line 70), and only investigated the latter. This equation is misleading. If organism groups A and B diverged early, but then exchanged many genes post-divergence, then this is indisputable evidence that their "genetic distance" is close. This point needs to be clearly explained in the manuscript.

      We agree with the reviewer that various definitions of evolutionary distance are possible, and some may be more useful than others for particular applications. The reviewer’s argument that “If organism groups A and B diverged early, but then exchanged many genes post-divergence, then this is indisputable evidence that their "genetic distance" is close” makes the case for a kind of phenetic distance: a distance based on overall similarity, regardless of how that similarity was brought about in terms of evolutionary process. We appreciate the democratic appeal of such a metric, and we have no desire to impose any particular philosophy of classification on the reader. However, the key point here is that methods that rely on concatenation for branch length or divergence time estimation (as used by Zhu et al., and in our current study) make the assumption that all of the sites in the concatenate evolved on the same underlying tree and if this assumption is not met, analyses can be misled. Thus, the shorter AB branch length and the more recent Archaea-Bacteria divergence times estimated from concatenations of incongruent marker genes result from unmodelled gene transfers which are misinterpreted as evidence for more recent common ancestry. Gene transfer is an important aspect of genome evolution, but none of the currently available methods, including those used by Zhu et. al., allow for genome-scale comparisons to be made in a way that accounts for our understanding of the underlying evolutionary processes.

      The point about different possible definitions of evolutionary distance made by the reviewer is valid, and we have now revised the opening of our conclusion to discuss these issues in more detail, writing:

      “We note that alternative conceptions of evolutionary distance are possible; for example, in a phenetic sense of overall genome similarity, extensive HGT will increase the evolutionary proximity (Zhu et al., 2019) of the domains so that Archaea and Bacteria may become intermixed at the single gene level. While such data can encode an important evolutionary signal, it is not amenable to concatenation analysis.”

      Core vs genome:

      This difference between "AB distance" and "AB branch length" is relevant to a more fundamental question: What defines the "evolutionary distance" between two groups of organisms? Both papers did not explicitly discuss this topic. It likely cannot be resolved in one article (as many scholars have continuously attempted on related topics in the past decades). But the discordance in understanding led to very different research strategies in the two papers, and rendering them incongruent in methodology.

      Specifically, the current work (and multiple previous works) based phylogenetic inference on only genes that demonstrate a strong pattern of vertical evolution. HGTs were considered deleterious, and needed to be excluded from the analysis. This left a few dozen genes at most, and many are spatially syntenic and functionally related (e.g. ribosomal proteins). In this work, the final number is 27. Previous critiques of this methodology have suggested that this is not a tree of life, but a "tree of one percent" (Dagan and Martin, Genome Biol. 2006).

      In contrast, Zhu et al. (and related previous works) attempted to evaluate the evolution of whole genomes by "maximizing the included number of loci.". They used a "global" set of 381 genes. They faced the challenge of "reconciling discordant evolutionary histories among different parts of the genome", because "HGT is widespread across the domains". To resolve this, they adopted the gene tree summary method ASTRAL.

      Therefore, the "AB distance" estimated by Zhu et al. is a genome-level distance, calculated by merging conflicting gene evolutions (which itself can be disputed, see below). Whereas the "AB branch" evaluated in this work is strictly the branch length in the core gene evolution. Therefore, the results presented in the two papers do not necessarily conflict, because of the different scopes.

      This point is closely related to the previous one, and the new section (final paragraphs of the Conclusion, quoted directly above) goes some way to addressing this comment. Regarding the issue of a focus on just a small proportion of vertically-evolving genes, the critical point is as above: current methods for branch length and divergence time estimation (including those used by Zhu et al.) require such vertically-evolving genes, because they make the assumption that all of the sites evolve on the same tree, i.e. trace back to the same origin via vertical evolution. We agree that most prokaryotic gene families do not evolve under these restrictive assumptions and therefore cannot be analysed using concatenation methods for branch length estimation. Indeed, one of the main points of our study is that most of the genes in the 381-gene set of Zhu et al. do not meet these assumptions and are thus unsuited for estimating evolutionary distance and divergence times.

      There is much ongoing method development which will allow more of the genome to be used in deep-time comparative analyses; Astral-Pro, FastMulRFS and SpeciesRax, among others, are recent promising steps in this direction. However, our central critique of Zhu et al. is that inferences under concatenation-based methods can be misled by HGT and other sources of incongruence, and indeed our analyses show that these unmodelled signals underlie the difference between the conclusions of Zhu et al. and other studies (e.g. (Liu et al., 2021; Spang et al., 2015; Williams et al., 2020) that have instead supported a deep divergence between Archaea and Bacteria. In our revised manuscript, we have shown that the relative AB distance, like the AB branch length, is shortened by unmodelled gene transfers (Figure 1), and that estimates of the AB stem length from different studies are similar when the congruent subset of the data is analysed with the best available substitution models (Figure 6). We therefore disagree that the scopes are distinct: richer, broader measures of genomic diversity can be proposed and, with the development of new methods, estimated; but so far, the vertical signal is the only signal that can be harnessed to infer divergence times using concatenations.

      The expanded marker set:

      The authors made a valid critique (line 121-135) that many of the 381 genes in the "expanded marker set" adopted by Zhu et al., are under-represented in Archaea. According to the PhyloPhlAn paper (Segata et al. Nat Commun. 2013) which originally developed the 400 markers (a superset of the 381 markers), these genes were selected from ~3,000 bacterial and archaeal genomes available in IMG at that time time (note that it was 2013). Zhu et al. also admitted, in the discussion section, that this marker set falls short in addressing some questions (such as the placement of DPANN). What is important in the current context, is that they were not specifically selected to address the AB distance question.

      We agree that the taxon sampling of archaea and the choice of marker genes in the Zhu et al. study were not ideal for estimating the evolutionary distance between the domains. However, we note that this distance (or proximity), and the hypothesis that traditional core genes over-estimate the Archaea-Bacteria divergence, was one of the main results of the paper (c.f. the title of that paper, “Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea”).

      However, note that Zhu et al.'s Fig. 5A, B presented the AB distance informed by 161 out of the 381 genes. These genes have at least 50% taxa represented in both domains - the same threshold discussed in the current work (line 132).

      While the 50% sampling criterion indeed enriches for the genes of the expanded set that were present in LUCA and on the AB branch, we note that the 50% criterion represents a minimum of 4953 bacteria and 335 archaea; that is, it still reflects the unbalanced sampling of the dataset overall. For example, 30 of the genes had fewer than two archaeal homologues, and in 100 of the trees there were fewer than 50 archaea reflecting the large disparity in taxon sampling (Supplementary Information Table S1). The phylogenetic signal in these genes is discussed in more detail below. Looking at the subsampled versions of these 161 genes, we found the majority of these genes (123/161) to have no discernible AB branch length. The 38/161 genes which had an arguable AB branch length (but still with transfers/paralogs) possessed a range of AB lengths: 0.0814:5.26, with a mean AB length of 1.03 and a median of 0.635.

      Even with those sufficiently represented genes, they still found that ribosomal proteins and a few other core genes are "outliers" in the far end of the AB distance spectrum.

      The reviewer raises an interesting point about outliers with high relative AB distances, which gets to the heart of the debate about how best to estimate the evolutionary distance between Archaea and Bacteria. The new analyses of relative AB distance introduced in our revised manuscript (Figure 1) demonstrate that this metric is affected by HGT in a similar manner to AB branch length (that is, high-verticality marker genes have a greater relative AB distance (relative AB vs ΔLL: p = 0.0001051 & R = -0.2213292, relative AB vs between-domain split score: p = 2.572e-06 & R = -0.2667739). Thus, core genes can be viewed as “outliers” compared to other prokaryotic genes in the sense that they have experienced an unusually low amount of HGT. This high verticality makes them among the few prokaryotic gene families that can be analysed by concatenation methods, which make the assumption that all sites evolve on the same underlying tree topology.

      Domain monophyly in gene trees:

      The authors' efforts in manually checking the gene trees are appreciable (Table S1), considering the number and size of those trees. They found (line 147) "Archaea and Bacteria are recovered as reciprocally monophyletic groups in only 24 of the 381 published (Zhu et al., 2019) maximum likelihood (ML) gene trees of the expanded marker set."

      The domain monophyly check was valid, however the result could be misleading because any sporadical A/B mixture was considered evidence of non-monophyly for the entire gene tree. As the taxon sampling grows, the opportunity of observing any A/B mixture also increases. For example, in Puigbò et al. J. Biology. 2009, 56% (a much higher ratio) of nearly universal genes trees had perfect domain monophyly based on merely 100 taxa. This is because even the "perfect" marker genes (such as ribosomal proteins) are not completely free from HGTs (e.g., Creevey et al. Plos One. 2011), let alone the fact that there are many artifacts in the published reference genomes (Orakov et al. Genome Biol. 2021).

      Therefore, to have an objective assessment of this topic, it would be better to have a metric that allows some imperfection and reports an overall "degree" of separation (also see below).

      We agree that complementing the monophyly check with a more nuanced metric is useful. In our revised manuscript, we now also evaluate the split score (Dombrowski et al. (2020) Nat Commun) of each marker, which reflects the degree to which a gene recovers the monophyly of established taxonomic ranks (a higher score reflects the splitting of monophyletic groups into a number of smaller clades in the gene tree, and so the metric permits a degree of “imperfection”, as suggested; in addition, the metric is averaged over bootstrap replicates, so that lack of resolution or poorly-supported disagreements with the reference taxonomy do not disproportionately affect the score). This expanded analysis (Figure 1) indicates that both within- and between-domain split score and ΔLL are significantly positively correlated (R = 0.836679, p < 2.2✕10-16), and that phylogenetic markers that more strongly reject domain monophyly (higher delta-LL) also perform worse at recovering between-domain (and within-domain) relationships (higher split score) and support a shorter AB branch length.

      AB branch by gene: correlation and outliers

      Figure 1 is the single most important result in this work, because it argues that the short AB branch observed in Zhu et al. is an artifact due to "inter-domain gene transfer and hidden paralogy" (line 202). This argument is based on the observation that the indicated AB branch length is negatively correlated with "verticality" (measured by ΔLL and split score) of the gene.

      Our argument that the short AB branch results from inter-domain gene transfer and hidden paralogy is based on three main lines of evidence: (i) documentation of extensive transfers and intermixing of paralogues in the gene trees for the 381 gene set; (ii) the analyses in Figure 1, which demonstrate that verticality positively correlates with AB branch length and AB distance; (iii) the demonstration that the incremental addition of low-verticality markers to a concatenate results in a concomitant decrease in AB branch length.

      However, Zhu et al. also investigated the impact of verticality on AB distance, and they also found that they are negatively correlated (Fig. 5E). Therefore, the current result does not appear to deliver new information (as do multiple other analyses, see below).

      Zhu et al. indeed identified a weak positive relationship between gene verticality and AB distance. Our analyses go beyond that work by showing, using a variety of complementary metrics of verticality, that AB branch length and relative AB distance are strongly positively correlated with verticality (see Figure 1), and that the low verticality of the genes in the 381 gene set largely explains the difference in stem length inference between that dataset and earlier analyses (Figure 6). An additional factor not considered in the analyses of Zhu et al. was the question of whether a gene was present in LUCA, and so can provide information on the AB branch length. Our analyses (detailed below) suggest that the majority of genes (317) in the 381 gene set do not contain an unambiguous AB branch, and so do not contribute interpretable signal to estimates of the AB branch length.

      An important finding in Zhu et al., which is largely not discussed in the current work, is that a handful of "core" genes are outliers in the spectrum of AB distance, as compared to the majority of the genome (Fig. 5A). The AB distance indicated by these core genes is so long compared with the genome average that it cannot be compensated for by the impact of non-verticality, substitutional saturation, site-homogeneous model, etc (see below).

      Fig. 1A of the current work also clearly shows that many long-AB branch genes are outliers compared with the majority of the genome (the bottom of the blue bar).

      Figs. 3 and 4 attempted to show that ribosomal proteins are not outliers, but that analysis was based on a very small set of core genes, and the figures clearly show that there are outliers even in this small set (to be further discussed below).

      This comment re-iterates the reviewer’s earlier points about “core” genes as outliers compared to the majority of the genome. The key issue is that “most of the genome”, and a significant portion (317 genes) of the 381-gene set, contain features that make them unsuitable for estimation of AB branch length by concatenation, or indeed estimation of an interpretable relative AB distance. We have documented the cases of HGT and mixing of paralogues in the 381-gene dataset; this information is summarised in the main text and presented in more detail in Supplementary Information Table S1.

      Focusing on the 161 genes with >50% representation in both Archaea and Bacteria, manual inspection of gene trees inferred on the 1000-species subsample under the LG+G+F model indicate that 123/161 do not have a clear AB branch (that is, a branch that separates most or all Archaea from Bacteria). While distinguishing such cases from early gene transfers is not straightforward, there is no compelling reason to think that these genes were present in LUCA. The simplest explanation for these gene phylogenies is instead an origin within Bacteria and subsequent transfer on one or multiple occasions into Archaea. As a result, estimates of AB branch length or relative AB distance inferred from these genes cannot be straightforwardly compared to those of the traditional “core” or other genes for which the evidence of a pre-LUCA origin is stronger. Considering only the 38/161 genes for which a LUCA origin appears, from the gene phylogeny, to be likely, the mean AB branch length is 1.03, greater than that estimated from the concatenation of the most vertical genes in the expanded set (0.56), and suggesting that phylogenetic incongruence, combined with (for some families) a more recent origin explains the shorter AB distances inferred from the 381 gene set. Thus, it is not the case that the AB branch lengths (or relative distances) estimated from the majority of genes form a null distribution” against which “core” genes can be seen as outliers; instead, our analyses suggest that “core” genes are among the limited number of genes that trace vertically to LUCA.

      Regarding Figures 3 and 4, see the more detailed discussion below.

      Verticality is not causative of short AB branch:

      In spite of the outlier question, there is an important logic problem in these analyses: The authors observed that gene verticality (measured by negative ΔLL) is correlated with AB branch length (Fig. 1), and concluded that HGTs and paralogy shortened the AB branch (line 202). However, they did not directly assess the rate of evolution in this model. It is totally possible that the most vertical genes happen to be those that evolved faster at the AB split. In order to support the claim made in this work, it is important to separate the effect of the rate of evolution from the effect of HGT / paralogy.

      The ideal solution would be to include ALL genes (not just "good" ones), build gene trees, identify parts of the gene trees that once experienced HGT or paralogy, and prune off these PARTS, instead of excluding the entire gene tree. The remaining data are thus free of HGT / paralogy, based on which one can quantify the "true" AB branch length, and further assess how much it is correlated with "verticality", and whether there are still "outliers". This solution is likely not trivial in implementation, though. However, without such assessment, the observed short AB branch still only applies to the "tree of one percent", not the "tree of life".

      Thanks for this comment --- the reviewer raises a subtle and valid point. Our analyses indicate that vertically-evolving genes have longer AB branch lengths, but in the first version of our manuscript we did not test the alternative hypothesis that this relationship might simply result from a faster rate of evolution in vertically-evolving genes. To evaluate the relationship between evolutionary rate, verticality, AB branch length and relative AB distance on as broad a set of genes as possible, we took the 302 genes from the 381-gene expanded set, excluding 56 genes for which the 1000-species subsample included no archaea, and another 23 which included only 1 archaeon. To estimate per-gene evolutionary rate, we rooted each gene tree using MAD (Tria et al. 2017) and calculated the mean root- to-tip distance on the MAD-rooted gene tree, then evaluated the relationship between rate and verticality. This analysis indicated that vertically-evolving genes evolve more slowly (have shorter mean root-to-tip distances) than less vertical genes (using deltaLL and between-domain split score as proxies for marker verticality, with a Pearson’s product- moment correlation: MAD rooted mean root-to-tip distance against deltaLL: R = 0.1397803, p = 0.01506 or against split score: R = 0.1902056 p = 0.000893), despite having longer AB branches and relative AB distances (using a Pearson’s product moment correlation of MAD rooted mean root-to-tip distance against AB length: p = 0.2025, R= 0.1143076, or against relative AB distance p = 0.007435, R=0.1537479). Thus, the longer AB branches of vertically evolving genes do not appear to be the indirect result of faster evolution of those genes. These analyses are reported in the main text, where we write:

      An alternative explanation for the positive relationship between marker gene verticality and AB branch length could be that vertically-evolving genes experience higher rates of sequence evolution. For a set of genes that originate at the same point on the species tree, the mean root-to-tip distance (measured in substitutions per site, for gene trees rooted using the MAD method (Tria et al., 2017)) provides a proxy of evolutionary rate. Mean root-to-tip distances were significantly positively correlated with ∆LL and between-domain split score (∆LL: R = 0.1397803, p = 0.01506, split score: R = 0.1705415 p = 0.002947; Figure 1 Figure Supplement 5,6, indicating that vertically-evolving genes evolve relatively slowly (note that large values of ∆LL and split score denote low verticality)). Thus, the longer AB branches of vertically-evolving genes do not appear to result from a faster evolutionary rate for these genes. Taken together, these results indicate that the inclusion of genes that do not support the reciprocal monophyly of Archaea and Bacteria, or their constituent taxonomic ranks, in the universal concatenate explain the reduced estimated AB branch length.

      Differential metric for verticality:

      In spite of the similarity between the current result and Zhu et al.'s (see above), the two works approached this goal using different metrics.

      First, the authors attempted to quantify the AB branch length in individual gene trees, including those that do not have Archaea and Bacteria perfectly separated. To do so they performed a constrained ML search (line 210). I am wary of this treatment because it could force distinct sequences (due to HGT or paralogy) to be grouped together, and the resulting branch length estimates could be highly inaccurate.

      We agree with the reviewer that estimating AB branch lengths in this way might lead to inaccuracy. We note that this is, in effect, what was done in the published analysis (Zhu et al. 2019): a topology in which Archaea and Bacteria were reciprocally monophyletic was inferred using ASTRAL (a reasonable analysis, given the robustness of ASTRAL to some degree of HGT/gene tree incongruence), and then the AB branch length was estimated from the concatenation of these 381 genes, fixing on the ASTRAL topology. We performed this experiment (inferring AB branch length on constrained trees) in order to evaluate how incongruence between the gene and species trees might affect AB branch length inference.

      In contrast, Zhu et al. quantifies the average taxon-to-taxon phylogenetic distance between the two domains, regardless of the overall domain monophyly. That method was free of this concern, although it computed a different metric.

      Thanks for raising this point. As described above, in the revised manuscript we have also evaluated the relative AB distance metric used by Zhu et al., and show that it behaves similarly to the AB branch metric we evaluated in the first version of the manuscript (see revised Figure 1).

      Second, the authors assessed "marker gene verticality" using two metrics: a) AU test result (rejected or not) (Fig. 1A), c) ΔLL, the difference in log likelihood between the constrained ML tree and ML gene tree (line 222, Fig. 1B, C). I am concerned that they are sensitive to taxon sampling and stochastic events, as I explained above regarding domain monophyly. It is possible that a single mislabeling event would cause the topology test to report a significant result. In addition, they evaluate how severely domain monophyly is violated, but they do not account for intra-domain HGTs and other artifacts, which are also part of "verticality", and they can potentially distort the AB branch as well.

      In the revised manuscript, we also evaluate a complementary metric for marker gene verticality, the split score (see above), which measures the extent to which marker genes recover established relationships at a given taxonomic level (we computed both within-domain and between-domain split scores). The split score is a more granular measure than ΔLL and, by summing over bootstrap replicates, it also better accommodates phylogenetic uncertainty. The two metrics (ΔLL and split score) are positively correlated and the analyses come to the same conclusions regarding the impact of HGT and other sources of incongruence on estimates of the AB branch length and relative AB distance.

      I did not find the ΔLL values of individual markers in any supplementary table. I also did not find any correlation statistics associated with Fig. 1B.

      The ΔLL values for individual markers can be found in the data supplement in: “Expanded_Bacterial_Core_Nonribosomal_analyses/Individual_gene_tree_analyses/Expanded//Expanded_AB_AU.csv”

      We have now updated the readme.txt file for clarity and included all the new results from the analyses which we have undertaken as part of the review process in the latest version of the supplemental available on figshare (10.6084/m9.figshare.13395470) as well as updated the directory and file names for clarity. We also have added the statistics associated with the correlations in Figure 1 to the Figure Legend.

      Statistical test:

      Line 157: "For the remaining 302 genes, domain monophyly was rejected (p < 0.05) for 232 out of 302 (76.8%) genes." Did the authors perform multiple hypothesis correction? If not, they probably should.

      Thanks for this suggestion. We have now used a Bonferonni correction to account for multiple testing. As a result, fewer marker genes are rejected at the 5% level (151/302), although the overall conclusions are unaffected.

      Line 217: "This result suggests that inter-domain gene transfers reduce the AB branch length when included in a concatenation." and Fig. 1A. If I understand correctly, this analysis was performed on individual gene trees, rather than in a concatenated setting. Therefore, the result does not directly support this conclusion.

      Thanks for pointing this out. The reviewer is correct that this inference depends not only on the single gene analyses, but also on the subsequent concatenation results presented in this section. We have therefore moved this sentence later in the section, after the concatenation analysis.

      Line 224: "Furthermore, AB branch length decreased as increasing numbers of low-verticality markers were added to the concatenate (Figure 1(c))". While this statement is likely true, Zhu et al. also presented similar results (Fig. 5) despite using a different metric, and they concluded that the impact is moderate and cannot explain the status of some core genes as outliers.

      Zhu et al. did identify some of these trends, as we acknowledge in our manuscript ("The original study investigated and acknowledged (Zhu et al., 2019) the varying levels of congruence between the marker phylogenies and the species tree, but did not investigate the underlying causes.“ --- line 178; “These results are consistent with (Zhu et al., 2019), who also noted that AB branch length increases as model fit improves for the expanded marker dataset.” --- line 337) and as discussed above. Our analysis (Figure 6, Table 1) goes further in showing that the most vertical subset of the 381-gene set supports an inter-domain branch length closely similar (2.4 subs/site compared to e.g. 2.5 subs./site for the 27-gene dataset) to analyses using the traditional marker gene set.

      Concatenation and branch length:

      The authors pointed out that "Concatenation is based on the assumption that all of the genes in the supermatrix evolve on the same underlying tree; genes with different gene tree topologies violate this assumption and should not be concatenated because the topological differences among sites are not modelled, and so the impact on inferred branch lengths is difficult to predict." (line 187).

      This argument is valid. In my opinion, this is the one most important potential issue of Zhu et al.'s analysis. In that work, they inferred genome tree topology through ASTRAL, which resolves conflicting gene evolutions. However ASTRAL does not report branch lengths in the unit of number of mutations. Therefore, they plugged the concatenated alignment into this topology for branch length estimation, hoping that it will "average out" the result. That workaround was apparently not ideal.

      Yes, we agree --- this is our main critique of the Zhu et al. analyses.

      However, the practice of molecular phylogenetics is complicated. Theoretically, every gene, domain, codon position and site may have its unique evolutionary process, and there have been efforts to develop better partition and mixture models to address these possibilities. But there is a trade off; these technologies are computationally demanding and have the risk of overfitting. It is plausible that in some scenarios, the gain of concatenating many loci (despite conflicting phylogeny) may outweigh the cost of having unpredictable effects.

      But this dilemma needs to be analyzed rather than just being discussed. The Zhu et al. paper did not assess the impact of such concatenation on branch length estimation. The best answer is to conduct an analysis to show that concatenating genes with conflicting phylogeny would result in an AB branch that is shorter than the mean of those genes, and the reduction of AB branch length is correlated with the amount of conflict involved. The current work has not done this.

      Thanks for raising this point. We agree that phylogenetics is complicated and that

      we lack methods that can account for all possible factors. With respect to the impact of gene transfers on the AB branch length, and as touched on above, there are two issues here.

      The first is with the analysis actually performed by Zhu et al: of the 381 extended set genes, 79 have one or no archaea in the 1000-taxon subsample, and a further 176 have an AB branch length close to 0 (<0.00001) in the constrained analyses. To investigate further, we manually inspected ML gene trees for the 381 genes (1000 taxon subsample). Allowing for recent gene transfer, we nevertheless identified only 64 genes with an unambiguous branch separating most Archaea from most Bacteria that might correspond to the ancestral AB divergence (Supplementary File 1).

      Taken together, these analyses suggest that there is no strong evidence that these genes were present in LUCA or evolved along the AB branch, and so they do not provide information on its length. Since the branch length in the concatenation is an average over the branch length per site, the inclusion of this set of genes in the analysis did reduce the AB branch length, as demonstrated by our analyses (Figure 1(H)).

      The second issue is: for genes which were likely present in LUCA and evolved on the AB branch, does gene transfer cause a reduction in the AB branch length inferred from their concatenation? To test this, we initially tested iterative concatenations of increasing numbers of non-vertical markers (Figure 1H), as well as a comparison of the most vertical genes to the whole expanded marker set (Figure 3 Figure Supplement 2). This revealed that as more markers were added (with lower verticality), the inferred AB branch length from the concatenate was reduced. We also found an increased AB branch length when only the 20 most vertical markers were used as opposed to the whole (381 marker) dataset (0.56 vs 0.16 substitutions/site, Figure 6).

      The reviewer proposes an additional test of the impact of marker gene incongruence on branch length inference from concatenations: to compare the AB branch length before and after pruning of HGTs from individual marker gene alignments. To do this, we took the 54 marker genes from our new dataset and concatenated them before and after pruning of unambiguous HGTs. The AB branch length inferred from the concatenation with HGTs removed was 1.946 substitutions/site, compared to 1.734 substitutions/site without pruning HGTs, demonstrating the impact of even a relatively small number of HGTs on branch length estimation from concatenates.

      Divergence time estimation:

      The manuscript dedicates one section (line 230-266) to argue that the divergence time estimation analysis performed by Zhu et al. was not good evidence for marker gene suitability. Zhu et al. showed congruence of the expanded marker set with geological records whereas ribosomal proteins were conflicting with the geologic record.To support their argument, the authors estimated divergence times using the top 20 most "vertical" genes measured by ΔLL.

      It would be good to clarify which genes they are, and it would be important to check whether they include some of the most "AB-distant" ones found by Zhu et al. Their Fig. 5A shows that there are genes that divide the two domains several folds further than the ribosomal proteins (such as rpoC). If they are among the 20 genes, it will not be surprising that the estimated AB split is older than it should be.

      We now include the annotations for these 20 genes in Supplementary File 5a. The 20 most vertical genes include two of the “AB-distant” outliers identified by Zhu et al., tuf and infB, and one ribosomal marker, rpsG.

      Overall, I think this section is logically questionable. Zhu et al. suggested that "They show the limitation of using core genes alone to model the evolution of the entire genome, and highlight the value in using a more diverse marker gene set.". The current work showed that using another set of a few genes (I do not know if they include multiple "core" genes, as discussed above, but it is plausible) also did not work well. This does not refute Zhu et al.'s claim.

      What's important in Zhu et al.'s analysis is this: they demonstrated that using a small set of genes in DTE may cause artifacts due to them significantly violating the molecular clock at certain stages of evolution. Instead, using a larger set of markers that represent a portion of the entire genome would help to "smooth out" these artifacts. This of course is not the ideal solution, likely because concatenating conflicting genes and modelling them uniformly is not the best idea (see above). But as an operational workaround, it was not challenged by the analysis in the current work.

      Finally, I agree with the authors' statement that more and reliable calibrations are the best way to improve divergence time estimation.

      The dating analyses presented in the first version of our manuscript demonstrated that the apparent agreement between molecular clock estimates using the 381-gene set and the fossil record was the result of artifactual shortening of the AB branch, as discussed in detail above. Once the subset of the data least affected by these issues (that is, the most vertical subset) was used, the limitations of current clock methods, particularly with few calibrations, for dating deep nodes became clear.

      That said, we agree with the reviewer (and also R3) that the dating section in the first version of our manuscript was somewhat unsatisfactory: it identified an important limitation of the published analysis, but did not explore the underlying question of why molecular clock methods infer unrealistically old divergence times from vertically-evolving genes. In the revised manuscript we have reworked and improved this section extensively, including new analyses on the 27-gene dataset, with more fossil calibrations, that help to diagnose how and why clocks struggle to date the archaeal and bacterial stems from the available data. We now show that the old ages result from a combination of low rates of molecular evolution across the tree inferred from “shallow” calibrations, combined with a lack of age maxima for nodes other than the root of the tree; when the rate distribution is informed in this way, the long AB branch is interpreted as representing a long period of time and estimates of LUCA age are strongly influenced by prior assumptions about root maximum age. These analyses now suggest how the difficulties might be overcome in the future, for example using better calibrations (particularly maximum ages, and indeed any fossil calibrations within the Archaea), or alternatively other sources of time information such as from gene transfers. Reflecting the new, broader focus, we have moved this section to the end of the manuscript.

      AB branch by ribosomal and non-ribosomal genes:

      Two figures (Figs. 3 and 4) are two sections (line 270-303) dedicated to the argument that ribosomal markers do not indicate a longer AB branch than a non-ribosomal one. However, this is a small scale test (38 ribosomal markers vs. 16 non-ribosomal markers) compared with the similar analysis in Zhu et al. (30 ribosomal markers vs. 381 global markers). A closer look at Figs. 3 and 4 suggests that while the AB lengths indicated by the ribosomal markers are within a relatively narrow range, those by the non-ribosomal ones are very diverse, including ones that are several folds longer than the ribosomal average. This result is in accordance with that of Zhu et al.'s Fig. 5A, although the latter was describing a different metric. Do these genes also overlap the ones found by Zhu et al.?

      Nevertheless, this analysis does not falsify Zhu et al.'s, because it compared a different, much smaller, and deliberately chosen group of genes.

      As the reviewer indicates, the purpose of the analyses presented in Figures 3-4 is to evaluate the hypothesis of accelerated ribosomal protein evolution: that is, the idea that ribosomal proteins over-estimate the AB branch length due to accelerated evolution during the divergence of Archaea and Bacteria. Although this hypothesis was independently proposed in Zhu et al., to our knowledge it actually originates with Petitjean et al. (2014) GBE (https://academic.oup.com/gbe/article/7/1/191/601621; see their Figure 2), and has been at play in analyses of deep evolution and in particular the position of DPANN Archaea in the phylogeny since that time. Thus, this section of our manuscript (indeed, all but the first section) is not a critique of Zhu et al.’s work, but a contribution to the broader ongoing discussion about which marker genes are best to use in deep phylogeny. We compare only vertically-evolving genes in Figures 3-4 so as to distinguish the impact of gene function (ribosomal versus non-ribosomal) from confounding factors such as HGT, paralogy, and gene origination time.

      To clarify this point, we have modified our main text discussion to make it clear that we are making a comparison between ribosomal genes and other vertically-evolving members of the traditional “core” gene set, rather than a broader genome-wide claim. We now write:

      “If ribosomal proteins experienced accelerated evolution during the divergence of Archaea and Bacteria, this might lead to the inference of an artifactually long AB branch length (Petitjean et al., 2014; Zhu et al., 2019). To investigate this, we plotted the inter-domain branch lengths for the 38 and 16 ribosomal and non-ribosomal genes, respectively, comprising the 54 marker genes set. We found no evidence that there was a longer AB branch associated with ribosomal markers than for other vertically-evolving “core” genes (Figure 2(b); mean AB branch length for ribosomal proteins 1.35 substitutions/site, mean for non-ribosomal 2.25 substitutions/site).”

      Substitutional saturation:

      The comparative analysis of slow- and fast-evolving sites is interesting. The result (Fig. 5) is visually impactful. In my view, this analysis is valid, and the conclusion is supported. It would be better to explain the rationale with more detail to facilitate understanding by a general audience.

      Thanks for this assessment. We have now expanded on the rationale of this analysis in the main text, writing:

      “It is interesting to note that the proportion of inferred substitutions that occur along the AB branch differs between the slow-evolving and fast-evolving sites. As would be expected, the total tree length measured in substitutions per site is shorter from the slow-evolving sites, but the relative AB branch length is longer (1.2 substitutions/site, or ~2% of all inferred substitutions, compared to 2.6 substitutions/site, or ~0.04% of all inferred substitutions for the fastest-evolving sites). Since we would not expect the distribution of substitutions over the tree to differ between slow-evolving and fast-evolving sites, this result suggests that some ancient changes along the AB branch at fast-evolving sites have been overwritten by more recent events in evolution --- that is, that substitutional saturation leads to an underestimate of the AB branch length.”

      Zhu et al. also tested the impact of substitution saturation on the AB branch, using a more traditional approach (Fig. S19). They also found that the inter-domain distance is more influenced by potential substitution saturation, but the difference is minor. They concluded that (AB distance) "is not substantially impacted by saturation."

      Like other analyses, these two analyses involved very different locus sampling (27 most "vertical" genes vs. 381 expanded genes). They also differ by the metric being measured (AB branch length vs. average distance between AB taxa). Therefore, the analysis in the current work does not falsify the analysis by Zhu et al. In contrast, it is inline with (though not in direct support of) Zhu et al. and others' suggestion that there was "accelerated evolution of ribosomal proteins along the inter-domain branch" (line 25) in the 27 core genes (of which 15 are ribosomal proteins).

      We disagree that our analysis is consistent with the hypothesis of accelerated ribosomal protein evolution. The analysis that directly addresses this point is Figure 3, where we show that the distributions of AB branch lengths in single gene trees are not significantly different between ribosomal and non-ribosomal datasets (Figure 3; mean AB branch length for ribosomal proteins 1.35 substitutions/site, mean for non-ribosomal 2.25 substitutions/site).

      Evolutionary model fit:

      The authors compared the AB branch length indicated by the standard, site-homogeneous model LG+G4+F vs. the site-heterogeneous model LG+C60+G4+F, and found that the latter recovered a longer AB branch (2.52 vs. 1.45). The author's reasoning for using a site-heterogeneous model is valid, and this analysis is sound.

      However, Zhu et al. also analyzed their data using the site-heterogeneous model C60 -- the same as in this work, but through the PMSF (posterior mean site frequency) method. Zhu et al. also compared it with two site-homogeneous models (Gamma and FreeRate). The results were extensively presented and discussed (Figs. 3, 4E, F, S23, S24, Note S2). They also found that C60+PMSF elongated the AB branch compared with the site-homogeneous models (Fig. S24A). As for the average AB distance (another metric evaluated by Zhu et al., as discussed above), C60+PMSF increased this metric when using ribosomal proteins, but not much when using the expanded marker set (Fig. S25A). And overall, the elongation by C60+PMSF with the expanded markers cannot compensate for the long branch indicated by the ribosomal proteins.

      Therefore, similar to the point I made above, this analysis is sound but it does not logically falsify the conclusion made by Zhu et al., as it only concerns a small set of markers, and it recovered a previously described pattern.

      Thanks for this comment. As above, note that the second part of our manuscript presents a general analysis of the issues around marker gene and model selection using our meta-analysis and new dataset, and is not a direct response to Zhu et al’s work. On reflection, we agree that this was not sufficiently clear in the first version of the paper, and we have now modified the text to acknowledge the model fitting analyses of Zhu et al.

      The manuscript also did not clarify what the phrase "poor model fit" refers to (line 34 and line 304). If this is addressing the Gamma model evaluated by the authors, then this claim is valid though not novel (but see my previous comment on the trade-off). If that is a general reference to Zhu et al.'s methodology, then the authors should at least include the C60+PMSF model in the analysis, and show that C60 indicates a significantly longer AB branch than C60+PMSF does (if that's the case, which is doubtful). Admittedly, C60+PMSF is cheaper than the native C60 in computation, but "In some empirical and simulation settings PMSF provided more accurate estimates of phylogenies than the mixture models from which they derive." (Wang et al. Syst Biol. 2018).

      Thanks for this comment. We did not intend the phrase “poor model fit” to imply a critique of Zhu et al.’s work; as the reviewer notes, those authors carried out a range of analyses to investigate the impact of model choice on their inferences. Rather, the title of the section is intended to summarise its main conclusion, which is that substitutional saturation and poor model fit (on any dataset, and even with the best available models) can lead to under-estimation of the AB branch length. Note that the analyses in Table 1 illustrating the impact of model fit are from the new dataset that is assembled and analysed in the second part of the manuscript. As above, we agree that this was not sufficiently clear in the first version of the paper. We think the title of this section is accurate and so we have not changed it, but we have changed the final two paragraphs of the section (as quoted immediately above) so as to acknowledge the model fitting analyses of Zhu et al., and to clarify that the results are general (and based on our new dataset), rather than a critique of Zhu et al’s work.

      Finally, Zhu et al. also performed an analysis using the native C60 model on a further reduced taxon set. That result was not presented in the published paper, but it can be found in the "Peer Review File" posted on the Nature Communications website. That tree also recovered a short AB distance, and placed CPR at the base of Bacteria, and showed that this placement was not impacted by the removal of Archaea.

      Thanks for pointing us to this additional analysis. The unrooted, bacteria-only tree referred to by the reviewer (panel B) recovers a clan (that is, a cluster of branches on the unrooted tree) comprising CPR+Chloroflexi, in agreement with the analysis on the new marker dataset we present here (Figure 6). The disagreement between that analysis and the new tree presented here relates to the position of the archaeal outgroup, which in the Peer Review File panel A connects to the bacterial tree between CPR and Chloroflexi. If, as recently suggested, the bacterial root lies between Gracilicutes and Terrabacteria (Coleman et al. 2021), then CPR and Chloroflexi represent monophyletic sister lineages. We note that the CPR+Chloroflexi relationship recovered here and in Peer Review File Panel (B) has also been obtained in several other recent analyses (Taib et al. 2020, Coleman et al. 2021, Martinez-Gutierrez and Aylward 2021), as cited in the main text.

      Taxon sampling:

      My final comment is about taxon sampling. Zhu et al. developed an algorithm for less biased taxon sampling, and they argued that extensive taxon sampling is important in resolving the early evolution of life. They presented evidence showing that reduced taxon sampling changed overall topology and basal relationships (Figs. S13, S14, S23, Note S2). The analyses were performed in combination with the assessment of site sampling, locus sampling, substitution model and other factors. The importance of less biased and/or extensive taxon sampling was also noted by previous works, especially in a phylogenomic framework (e.g., Hedtke et al. Syst Biol. 2006; Wu and Eisen. Genome Biol. 2008; Beiko. Biol Direct. 2011). The current work is based on a smaller set of taxa, and it has not addressed the impact of taxon sampling. As I suggested above, some results may be sensitive to taxon sampling.

      We agree that taxon sampling is important for phylogenetics. While the analyses of Zhu et al. (2019) included a very large number of genomes, sampling of genomes (and indeed marker genes) was biased, both towards Bacteria compared to Archaea, and also within Bacteria. In our revised manuscript, we now compare the taxon sampling between Zhu et al.’s work and our new analyses (see Figure 1 Figure Supplements 13,14,15 and Figure 4 Figure Supplements 1,2). Balanced sampling is important for phylogenetic inference (Heath et al., 2008; Hillis, 1998) and, by this criteria, the taxon sampling in the analyses of Zhu et al. was not ideal. Our new analyses made use of fewer genomes (700), but these sample the known diversity of Archaea and Bacteria in a more representative way (Figure 4 Figure Supplement 1,2).

      Reviewer #3:

      Moody and coworkers principally address a recent paper presented by Zhu et al. (Nature Communications, 2019). In their paper, Zhu and coworkers claim that (i) ribosomal protein genes, commonly used in resolving deep phylogenies, have experienced an increased rate of evolution right after LUCA, and (ii) that an expanded set of markers show that the branch separating archaea from bacteria (AB-branch) is 10-fold shorter than previously thought. Moody et and coworkers first demonstrate flaws in the Zhu et al. analysis: first, the expanded gene set is biased towards bacteria, with 25% of the single-gene trees having very few archaeal counterparts. Second, that over 75% of the single-gene trees from Zhu et al are not monophyletic at domain level, suggesting a large influence of horizontal gene transfers (HGT), inter-domain exchanges, and inclusion of paralogous sequences in the original datasets. Third, they show that genes with fewer HGT display longer AB-branches. Fourth, they show that the argument by Zhu et al. that the longer AB-branch yields absurd LUCA datation is not relevant. Fifth, and maybe most important, they show that the shorter AB-branches recovered by Zhu et al in their expanded dataset result from inadequate substitution models, which lead to underestimating rates and thus branch lengths.

      Going further, they select a set of 54 manually curated markers (showing mostly monophyletic archaea and bacteria), both from ribosomal proteins (36) and non-ribosomal proteins (18) and retrieve these in a balanced set of 350 archaea and 350 bacteria. With this set, they show that ribosomal protein markers do not display longer AB-branches than non-ribosomal ones. They also show that diversity among Archaea and Bacteria, as measured as the total tree length within each domain, is very similar, when sampling equal number of genomes in both domains.

      Strengths:

      The paper is well-written and well structured. In general, the methodology chosen here is adapted to the question at hand and very rigorously followed. The balanced dataset (with equal amounts of bacteria and archaea) of 54 carefully selected genes is also appropriate to explore diversity differences between the two domains of life.

      Although all arguments presented in Zhu et al are carefully re-evaluated, the part where Moody et al show that substitutional saturation and poor model fit is artifactually producing short AB-branches is quite compelling and elegantly presented.

      Weaknesses:

      One potential weakness, more in terms of significance than in terms of scientific soundness is that the paper is mostly "reactive", responding to a single other paper. The authors might have used the data and methodology presented here to give the paper a broader scope. An example would be to provide the audience with a solid protocol or general guidelines on how to avoid artifacts in making deep phylogenies. I believe that the authors have demonstrated that they have the authority to do that.

      Thanks for this suggestion. We considered including guidelines of this type in the first version of the manuscript, but we were --- and remain --- wary of attempting to promote one particular way of doing deep phylogeny over others. These are difficult and slippery questions, and different approaches and perspectives (including ones we might disagree with) are, in a broader sense, useful in refining ideas and helping the field to make progress as a whole. That said, a recurring issue appears to be the question of the fit between model and data, both in terms of substitution model fit (as with the impact of site-heterogeneous models on branch length inferences) and the broader issue of using models that, for example, account for gene duplication or transfer. There are several recent reviews (including one by some of us) which treat these topics in detail and provide detailed advice. We have now raised and discussed these issues in our conclusion. We have also updated Figure 6 to illustrate the approach we used in assembling the new 27-gene dataset, which may be of use to others, and goes some way towards the suggestion of providing guidelines for future analyses. We now write:

      “Our analysis of a range of published marker gene datasets (Petitjean et al., 2014; Spang et al., 2015; Williams et al., 2020; Zhu et al., 2019) indicates that the choice of markers and the fit of the substitution model are both important for inference of deep phylogeny from concatenations, in agreement with an existing body of literature (reviewed in (Kapli et al., 2021, 2020; Williams et al., 2021). We established a set of 27 highly vertically evolving marker gene families and found no evidence that ribosomal genes overestimate stem length; since they appear to be transferred less frequently than other genes, our analysis affirms that ribosomal proteins are useful markers for deep phylogeny. In general, high-verticality markers, regardless of functional category, supported a longer AB branch length. Furthermore, our phylogeny was consistent with recent work on early prokaryotic evolution, resolving the major clades within Archaea and nesting the CPR within Terrabacteria. Notably, our analyses suggested that both the true Archaea-Bacteria branch length (Figure 6A), and the phylogenetic diversity of Archaea, may be underestimated by even the best current models, a finding that is consistent with a root for the tree of life between the two prokaryotic domains.”

      In the figure 6 legend, we also expand on guidelines for future analyses, writing:

      “(B) Workflow for iterative manual curation of marker gene families for concatenation analysis. After inference and inspection of initial orthologue trees, several rounds of manual inspection and removal of HGTs and distant paralogues were carried out. These sequences were removed from the initial set of orthologues before alignment and trimming. For a detailed discussion of some of these issues, and practical guidelines on phylogenomic analysis of multi-gene datasets, see (Kapli et al., 2020) for a useful review.”

      The authors use the difference in log-likelihood between the constrained and unconstrained gene trees as a proxy for verticality and thus marker gene quality (Figure 1b). However, they don't demonstrate that that metric is actually appropriate. Could the monophyly (or split score) be also involved here? The authors might want to comment on that.

      Thanks for this suggestion, which has substantially improved our analysis of Archaea-Bacteria distance and marker gene verticality (see the revised Figure 1 and associated text). We have now evaluated the relationship between AB branch length and split score (both within- and between-domain level relationships) for the expanded marker set and have updated our results and discussion accordingly. We found that deltaLL and split score (both within- and between-domains) are positively correlated with each other, and negatively correlated with AB length (that is, high-verticality markers have longer AB branch lengths). These analyses also revealed that within-domain and between-domain split scores are strongly positively correlated, implying that genes that recover domain monophyly also do better at resolving within-domain relationships.

      The argument about the age of LUCA an ad absurdum one, showing that using better suited genes one gets impossible time estimates. However, the argument presented by Zhu et al is also a "just so" argument (if we get a time estimate that doesn't make sense then the phylogeny must be wrong), which doesn't give it much weight. The authors themselves note well that this part of the paper is more revealing of the limitations of the strict clock method, or of the relaxed clock with one single calibration point, than of the quality or appropriateness of the dataset.

      We agree that the dating section in the first version of our manuscript was somewhat unsatisfactory. We have now expanded it to include new analyses on our 27-gene dataset, using more fossil calibrations, in order to diagnose why current clock methods struggle to estimate evolutionary rate near the root of the tree, and how this impacts on the age of LUCA and other deep nodes. These analyses add substantial value to this section, which has been moved to the end of the manuscript to reflect its expanded focus.

      Another small weakness (or loose end) is that manual curation of the 95 genes dataset is not consistently reducing the percentage of non-monopyhletic genes (e.g. 62 to 69% from the 95 to the 54 genes dataset for non-ribosomal genes; 21 to 33% from the 95 to the 27 genes dataset for ribosomal genes). The author don't discuss how this impacts the manual curation they perform on the datasets; however, they state that "manual curation of marker genes is important". The authors might want to discuss that aspect further.

      Thanks for raising this point. We were not sufficiently clear in describing the logic of our approach in the first version of the manuscript, and have now revised the text to clarify. In this analysis, we used a strict binary definition of monophyly --- that is, even a single inter-domain transfer leads to non-monophyly (note that this is in contrast to the re- analysis of the expanded set, where we considered whether each marker statistically rejected domain monophyly). For some genes scored as non-monophyletic in this way, manual removal of a small number of unambiguous recent transfers is sufficient restore domain monophyly; for others, HGT is extensive and it is difficult to know how to filter the sequences so as to obtain a reliable marker gene alignment; it was these latter cases that we set aside. We have now revised this section to make the logic of the approach clear, writing:

      “Prior to manual curation, non-ribosomal markers had a greater number of HGTs and cases of mixed paralogy. In particular, for the original set of 95 unique COG families (see ‘Phylogenetic analyses’ in Methods), we rejected 41 families based on the inferred ML trees, either due to a large degree of HGT, paralogous gene families or LBA. For the remaining 54 markers, the ML trees contained evidence of occasional recent HGT events. Strict monophyly was violated in 69% of the non-ribosomal and 29% of the ribosomal families. We manually removed the individual sequences which violated domain monophyly before re-alignment, trimming, and subsequent tree inference (see Methods). These results imply that manual curation of marker genes is important for deep phylogenetic analyses, particularly when using non-ribosomal markers. Comparison of within-domain split scores for these 54 markers indicated that markers that better resolved established relationships within each domain also supported a longer AB branch length (Figure 2A).”

      In summary and despite the small weaknesses listed above, my opinion is that the authors reach their goal of showning that the AB-branch is indeed a long one, and that the results support the conclusion.

      Impact:

      The main point addressed by the authors here, the time of divergence between Archaea and Bacteria, is crucial to our understanding of early evolution. The long branch separating Bacteria and Archaea has long been thought to be a long one, and the paper by Zhu et al casted a doubt about the validity of this long-standing hypothesis. Here, Moody et al convincingly establish that the divergence between archaea and bacteria is a profound one. The paper also has profound implications on the validity of the commonly used core-gene phylogenies, particularly those based on ribosomal protein genes. Indeed, it shows that the these proteins are appropriate for deep phylogenies. They also show the impact of model violations on deep phylogenies, and how to avoid them.

      We thank the reviewer for this positive assessment of impact.

    1. Author Response:

      Reviewer #1 (Public Review):

      This is a well-executed study looking at the association of urinary metabolites to the types of diets consumed by European children. They focus on four analytes that have opposing patterns from a "good" KIDMED Mediterranean style diet versus a "bad" diet with processed foods and high sugars. They then create an association with levels of C-peptide, which has in turn been linked to health outcomes.

      Overall there is extensive data provided in the supplementary data to justify their findings. The one omission is the effects of activity levels and total caloric consumption. There is an attempt to link body weight to C-peptide associations, but in a minor revision, it would be nice to also include MBI as a parameter for the concentrations of metabolites.

      We thank the reviewer for his/her positive feedback. We agree that inclusion of information on physical activity levels and total caloric consumption would strengthen our study. Unfortunately, we do not have available data on these variables. To counteract this, we adjusted all our models for child sedentary behavior (minutes/day of time spent watching TV, playing computer games or other sedentary games) which has been shown to associate to physical activity levels - this association could be due to the fact that the time devoted to sedentary screen-time activities affects availability of time devoted for exercise, or vice versa.(1-3) Further, we adjusted our models for child body mass index (BMI), a measure that strongly correlates to energy intake, and assessed ultra-processed food intake as proportion of total food intake in order to take into account inter-individual differences in total food (and hence caloric) consumption. We clarify these points in the discussion section.

      Regarding the second part of reviewer’s comment to consider BMI as a parameter for the concentrations of metabolites, we considered, and controlled for, any potential influence of BMI on the associations of both diet and C-peptide with the urinary metabolites as all our models were adjusted for this measure. We have previously reported the associations of children’s BMI with the urinary metabolome in the same study population (Lau CHE et al (4)) and hence we did not repeat this analysis in our manuscript. In our previous HELIX study by Lau CHE at al, we found significant associations between children’s BMI z-score and three urinary metabolites; positive associations with valine and 4-deoxyerythronic acid and a negative association with pantothenic acid. Of these metabolites, we found that KIDMED score was positively associated with pantothenic acid after adjustment for BMI (Supplementary Table 5), suggesting that Mediterranean diet adherence could affect urinary levels of this metabolite independently of BMI. Further, in our analysis, we found that UPF intake was negatively associated with the branched-chain amino acid (BCAA) valine after adjustment for BMI. Even though the importance of the BCAAs in adiposity has been reported previously, our findings provide an important foundation for future research to better understand the role of UPF intake on BCAA metabolism. Throughout the discussion section, we now discuss our results in context with our previous HELIX analysis examining associations of BMI with the urinary metabolites.

      Modified manuscript text:

      "We did not have data available on children’s physical activity. Nevertheless, we adjusted all our models for sedentary behavior (including time spent in front of the screen) which has been shown to associate to physical activity levels, as the time devoted to sedentary screen-time activities might affect availability of time devoted for exercise, or vice versa.(1-3) Further, we did not have data available to control for energy intake. However, in all our models, we included BMI of the children, a measure strongly correlated to energy intake,(5) and assessed ultra-processed food intake as proportion of total food intake."

      "In addition, adherence to the Mediterranean diet was also positively associated with urinary levels of pantothenic acid and acetate. Both compounds have a central role in human biochemistry and the metabolism and synthesis of carbohydrates, proteins, and fats. Pantothenic acid (vitamin B5, necessary to form coenzyme-A) is present in many foods, and we have previously reported a positive association between consumption of dairy products and urinary pantothenic acid in the same study population.(4) Further, we have previously shown that BMI is negatively associated with urinary levels of this metabolite,(4) and our results suggest that higher adherence to the Mediterranean diet associates with pantothenic acid independently of the potential influence of BMI."

      "Moreover, we found that UPF intake was negatively associated with two urinary amino acids, valine and tyrosine. Tyrosine is regarded as a conditionally essential amino acid in adults and essential in children. Foods high in dietary tyrosine include dairy, meat, eggs, beans, nuts, grains. Tyrosine is a precursor for neurotransmitters and hormones, increases dopamine availability which in turn could enhance cognitive performance.(6) Valine is an essential branch chain amino acid (BCAA) critical to energy homeostasis, protein and muscle metabolism.(7,8) In many studies, it has been observed that elevated BCAAs are associated with insulin resistance and diabetes.(9) Also, in our previous HELIX study,(4) we found that urinary valine was associated with higher children’s BMI. However, it remains to be eludicated whether these associations are causal (e.g. via mTOR activation) or consequential (e.g. due to reduced mitochondrial oxidation) in metabolic disease,(9) and whether UPF intake plays a role in the etiology of the association of BCAAs with metabolic health."

      References:

      1. Serrano-Sanchez JA, Marti-Trujillo S, Lera-Navarro A, Dorado-Garcia C, Gonzalez-Henriquez JJ, Sanchis-Moysi J. Associations between screen time and physical activity among Spanish adolescents. PLoS One. 2011;6(9):e24453.
      2. Pearson N, Braithwaite RE, Biddle SJ, van Sluijs EM, Atkin AJ. Associations between sedentary behaviour and physical activity in children and adolescents: a meta-analysis. Obes Rev. 2014;15(8):666-675.
      3. Aira T, Vasankari T, Heinonen OJ, et al. Physical activity from adolescence to young adulthood: patterns of change, and their associations with activity domains and sedentary time. Int J Behav Nutr Phys Act. 2021;18(1):85.
      4. Lau CE, Siskos AP, Maitre L, et al. Determinants of the urinary and serum metabolome in children from six European populations. BMC Med. 2018;16(1):202.
      5. Jakes RW, Day NE, Luben R, et al. Adjusting for energy intake--what measure to use in nutritional epidemiological studies? Int J Epidemiol. 2004;33(6):1382-1386.
      6. Kühn S, Düzel S, Colzato L, et al. Food for thought: association between dietary tyrosine and cognitive performance in younger and older adults. Psychological Research. 2019;83(6):1097-1106.
      7. Brosnan JT, Brosnan ME. Branched-Chain Amino Acids: Enzyme and Substrate Regulation. The Journal of Nutrition. 2006;136(1):207S-211S.
      8. Nie C, He T, Zhang W, Zhang G, Ma X. Branched Chain Amino Acids: Beyond Nutrition Metabolism. Int J Mol Sci. 2018;19(4).
      9. Lynch CJ, Adams SH. Branched-chain amino acids in metabolic signalling and insulin resistance. Nat Rev Endocrinol. 2014;10(12):723-736.
    1. Author Response:

      Reviewer #1 (Public Review):

      5.The reported data point to an important role of the premotor and parietal regions of the left as compared to the right hemisphere in the control of ipsilateral and contralateral limb movements. These are also the regions where the electrodes were primarily located in both subgroups of patients. I have 2 concerns in this respect. The first concern refers to the specific locus of these electrodes. For premotor cortex, the authors suggest PMd as well as PMv as potential sites for these bilateral representations. The other principal site refers to parietal cortex but this covers a large territory. It would help if more specific subregions for the parietal cortex can be indicated, if possible. Do the focal regions where electrodes were positioned refer to the superior vs inferior parietal cortex (anterior or posterior), or intra-parietal sulcus. Second, the manuscript's focus on the premotor-parietal complex emerges from the constraints imposed by accessible anatomical locations in the participants but does not preclude the existence of other cortical sites as well as subcortical regions and cerebellum for such bilateral representations. It is meaningful to clarify this and/or list this as a limitation of the current approach.

      On the first issue, we have updated the manuscript to specify the subregion within the parietal cortex in which we see stronger across-arm generalization - namely, the superior parietal cortex. On the second issue, we have added text in the Discussion that reference subcortical areas shown to exhibit laterality differences in bimanual coordination, providing a more holistic picture of bimanual representations across the brain. In addition, we acknowledge that with our current patient population we are limited to regions with substantial electrode coverage, which does not include all areas of the brain.

      6.The evidence for bilateral encoding during unilateral movement opens perspectives for a better understanding of the control of bimanual movements which are abundant during every day life. In the discussion, the authors refer to some imaging studies on bimanual control in order to infer whether the obtained findings may be a consequence of left hemisphere specialization for bimanual movement control, leading to speculations about the information that is being processed for each of both limb movements. Another perspective to consider is the possibility that making a movement with one limb may require postural stabilization in the trunk and contralateral body side, including a contribution from the opposite limb that is supposedly resting on the start button. Have the authors considered whether this postural mechanism could (partly) account for this bilateral encoding mechanism, in particular, because it appears more prominent during movement execution as compared to preparation. Furthermore, could the prominence of bilateral encoding during movement execution be triggered by inflow of sensory information about both limbs from the visual as well as the somatosensory systems.

      Thank you for these comments. We have added a paragraph to the Discussion to address the hypothesis that some component of ipsilateral encoding may be related to postural stabilization.

      In response to the final point in this comment, we agree that bilateral information during execution could be reflective of afferent inputs (somatosensory and/or visual). However, the encoding model shows that activity in premotor and parietal regions are well predicted based on kinematics during the task. While visual and somatosensory system information are likely integrated in these areas, the kinematic encoding would point to a more movement-based representation.

      Reviewer #2 (Public Review):

      Weaknesses:

      1. Although the current human ECoG data set is valuable, there is still large variability in electrode coverage across the patients (I fully acknowledge the difficulty). This makes statistical assessment a bit tricky. The potential factors of interest in the current study would be Electrode (=Region), Subject, Hemisphere, and their interactions. The tricky part is that Electrode is nested within Subject, and Subject is nested within Hemisphere. Permutation-based ANOVA used for the current paper requires proper treatment of these nested factors when making permutations (Anderson and Braak, 2003). With this regard, sufficient details about how the authors treated each factor, for instance, in each pbANOVA, are not provided in the current version of the manuscript. Similarly, the scope of statistical generalizability, whether the inference is within-sample or population-level, for the claims (e.g., statement about the hemispheric or regional difference) needs to be clarified.

      We discuss at length the issue of electrode variability and have addressed this in the revised manuscript. Graphically, we have added a Supplemental Figure (S2). Statistically, we appreciate the point about the need for the analysis to address the nested structure of the data. We have redone all of the statistics, now using a permutation-based linear mixed effects model with a random effect of patient. This approach did not change any of the findings.

      As to the comment about hemispheric or regional differences, the data show that both are important factors. Our hemispheric effect is characterized by stronger ipsilateral encoding in the left hemisphere and subsequently better across-arm generalization (Figures 2-4). We then examine the spatial distribution of electrodes that generalized well or poorly and found clusters in both hemispheres of electrodes that generalize poorly. In contrast, only in the left hemisphere did we find clusters of electrodes that generalize well. These electrodes were localized to PMd, PMv and superior parietal cortex (Fig 5D). In summary, we argue that activity patterns in M1 are similar in the left and right hemispheres, but there is a marked asymmetry for activity patterns over premotor and parietal cortices.

      Additional contexts that would help readers interpret or understand the significance of the work: The greater amount of shared movement representation in the left hemisphere may imply the greater reliance of the left arm on the left hemisphere. This may, in turn, lead to the greater influence of the ongoing right arm motion on the left arm movement control during the bimanual coordination. Indeed, this point is addressed by the authors in the Discussion (page 15, lines 26-41). One critical piece of literature missing in this context is the work done by Yokoi, Hirashima, and Nozaki (2014). In the experiments using the bimanual reaching task, they in fact found that the learning by the left arm is to the greater degree influenced by the concurrent motion of the right arm than vice versa (Yokoi et al., J Neurosci, 2014). Together with Diedrichsen et al. (2013), this study will strengthen the authors' discussion and help readers interpret the present result of left hemisphere dominance in the context of more skillful bimanual action.

      The Yokoi paper is a very important paper in revealing hemispheric asymmetries during skilled bimanual movements. However, we think it is problematic to link the hemispheric asymmetries we observe to the behavioral effects reported in the Yokoi paper (namely, that the nondominant, left arm was more strongly influenced by the kinematics of the right arm). One could hypothesize that the left hemisphere, given its representation of both arms, could be controlling both arms in some sort of direct way (and thus the action of the right arm will have an influence on left arm movement given the engagement of the same neural regions for both movements). It is also possible that the left hemisphere is receiving information about the state of both the right and left arms, and this underlies the behavioral asymmetry reported in Yokoi.

      Reviewer #3 (Public Review):

      In the present work, Merrick et al. analyzed ECoG recordings from patients performing out-and-back reaching movements. The authors trained a linear model to map kinematic features (e.g., hand speed, target position) to high frequency ECoG activity (HFA) of each electrode. The two primary findings were: 1) encoding strength (as assessed by held-out R2 values) of ipsilateral and contralateral movements was more bilateral in the left hemisphere than in the right and 2) across-arm generalization was stronger in the left hemisphere than in the right. As the authors point out in the Introduction, there are known 'asymmetries between the two hemispheres in terms of praxis', so it may not be surprising to find asymmetries in the kinematic encoding of the two hemispheres (i.e., the left hemisphere contributes 'more equally' to movements on either side of the body than the right hemisphere).

      There is one point that I feel must be addressed before the present conclusions can be reached and a second clarification that I feel will greatly improve the interpretability of the results.

      First, as is often the case when working with patients, the authors have no control over the recording sites. This led to some asymmetries in both the number of electrodes in each hemisphere (as the authors note in the Discussion) and (more importantly) in the location of the recording electrodes. Recording site within a hemisphere must be controlled for before any comparisons between the hemispheres can be made. For example, the authors note that 'the contralateral bias becomes weaker the further the electrodes are from putative motor cortex'. If there happen to be more electrodes placed further from M1 in the left hemisphere (as Supplementary Figure 1 seems to suggest), than we cannot know whether the results of Figures 2 and 3 are due to the left hemisphere having stronger bilateral encoding or simply more electrodes placed further from M1.

      The reviewer makes a very valid point and this comment has led to our inclusion of a new Supplementary Figure, S2, in which we quantify the percentage of electrodes in each subregion.

      Second, it would be useful if the authors provided a bit of clarification about what type of kinematic information the linear model is using to predict HFA. I believe the paragraph titled 'Target modulation and tuning similarity across arms' suggests that there is very little across-target variance in the HFA signal. Does this imply that the model is primarily ignoring the Phi and Theta (as well as their lagged counterparts) and is instead relying on the position and speed terms? How likely is it that the majority of the HFA activity around movement onset reflects a condition-invariant 'trigger signal' (Kaufman, et al., 2016). This trigger signal accounts for the largest portion of neural variance around movement onset (by far), and the weight of individual neurons in trigger signal dimensions tend to be positive, which means that this signal will be strongly reflected in population activity (as measured by ECoG). This interpretation does not detract from the present results in any way, but it may serve to clarify them.

      To address this comment, we have added a new figure (Fig 6) which shows the relative contribution of each kinematic feature as well as their average weights across time for both contralateral and ipsilateral movements. This figure also addresses the reviewer’s question about the contribution of the target position to the model. As can be seen, features that reflect timing/movement initiation (position, speed) make a larger contribution compared to the two features which capture directional tuning (theta, phi). As the reviewer suggested, this result is in line Kaufman et al. (2016) which reported that a condition-invariant ‘trigger signal’ comprises the largest component of neural activity. We note that the target dependent features theta and phi still make a substantial contribution to the model (relative contribution: contra = 32%, ipsi = 37%). Previously, we have tested the contribution of the theta and phi features by comparing two models, one that only used position and speed (Movement model) and one that also included the two angular components phi and theta (Target Model). For a subset of electrodes, the held-out predictions were significantly better using the Target Model, a result we take as further evidence of electrode tuning within our dataset.

      The figure below shows an electrode located in M1 that is tuned to targets when the patient reached with their contralateral arm as an example. We believe that having an explicit depiction of how the four features contribute to the HFA predictions will help the reader evaluate the model. These points are now addressed in the text in the results section discussing Figure 6.

    1. Author Response:

      Evaluation Summary:

      This manuscript addresses outstanding questions about the molecular mechanisms by which the two types of arginine-methylating enzymes affect the processing and fate of transcripts in mammalian cells. This work makes important inroads into these questions, uncovering an inverse effect of the two types of enzymes on intron retention during post-transcriptional splicing, linking the effects to specific target proteins. With better support of some key claims , the paper will provide a lot of new information about the functional consequences of asymmetric and symmetric demethylation.

      We thank the reviewers for their support of the study and hope that the described revisions better support the key claims.

      Reviewer #2 (Public Review):

      Previous work has established that inhibition or knockdown of the Type II (symmetric) arginine methyl-transferase PRMT5 has global effects on splicing, and although it is less well-characterized, loss of the major Type I (asymmetric) enzyme PRMT1 affects both splicing and RNA export activities. In both cases, inhibition or depletion of the enzymatic activities of these proteins has been shown to negatively impact cancer cells, but the specific targets that are required for the RNA processing effects have not been identified. The key findings here are that levels of transcripts containing unspliced introns were inversely affected by the two classes of inhibitor, with intron retention increasing upon PRMT5 inhibition, and decreasing relative to the control in the case of PRMT1 inhibition. The affected introns were shown to be localized to the nucleus, indicating that they belong to the class of 'detained' introns (DI). Using kinetic assays to measure transcriptional elongation and splicing rates, the authors concluded that PRMT inhibition affects DI levels post-transcriptionally. They found that spliceosome component SNRPB and nuclear RNA export factor CHTOP were both enriched in chromatin-associated, poly(A) RNA fractions, that SNRPB was specifically demethylated by PRMT5 inhibitors while PRMT1 inhibition demethylated CHTOP in the chromatin associated fractions, and that both knockdown of the methyltransferases as well as replacement of the modified arginine residues in each protein recapitulated the effects of the inhibitors. Together, these experiments provide strong evidence supporting a coherent mechanism of differential arginine methylation on RNA processing. They support and significantly extend previously published observations implicating the PRMT enzymes in gene expression. These findings are of broad interest to those who study RNA processing and transcription, cancer biology, and signaling through post-translational modifications.

      We thank the reviewer for support of our main findings and that they are of broad interest.

    1. Author Response:

      Evaluation Summary:

      The present work aims to increase our understanding of marine epizootics caused by the dinoflagelate parasite Hematodinium sp. in crabs. The work includes a large data set of field collected specimens from a wide geographical area. The authors have evaluated presence or absence of this parasite as well as co-infections by several other groups of pathogens and model the main factors that shape crab community structure. The topic of study is very important in the context of current marine pandemics and, therefore, adequate examination of this data set may lead to significant advances in the field. Refinement of the approaches to produce quantitative data is needed in order to reach to more solid conclusions.

      We are grateful to the reviewers and editorial members of eLife for their evaluation of our submission and recognising the importance/rarity of our dataset for advancing our understanding of marine epizootics. In our revised text, we have included further quantitative data in the context of Hematodinium-parasitised crabs and re-run all our analyses.

      Reviewer #1 (Public Review):

      The study would have benefited from qPCR instead of presence or absence. It is interesting to note that differences were observed when CFUs were compared. I understand that for most organisms, in the absence of a draft genome, assigning copy numbers to cells is not yet possible; however, it would have provided a more robust dataset for the statistical analysis.

      Similarly, histology relies upon one of a hundred possible for every single specimen. With this caveat, scoring the presence of the parasites rather than (+) (-) would also have been more informative. Without knowing the intensity of the infection for each of the other pathogens makes it more difficult to reject the hypothesis. The authors discuss other papers with cellular immune data, which is lacking in this manuscript. Observing fresh hemolymph and histology are just not enough. The data strongly support that parasite community composition is affected by the location. Smaller crabs also appear to be more likely to display co-infections compared to disease-free crabs.

      We thank reviewer 1 for spending their time considering our submission.

      We used population genetic markers for Hematodinium and crabs, and PCR-based molecular diagnostics, in addition to freshly withdrawn haemolymph and multi-tissue histology in our original submission. In our revised text we have incorporated additional quantitative Hematodinium loads in haemolymph, gill and hepatopancreas in crabs.

      In our original submission, we presented data that discriminated between Hematodinium-positive crabs (n = 162) and Hematodinium-negative crabs (n = 162) based on targeted PCR. Rather than use qPCR – due to the absence of sufficient molecular information to ensure single-copy gene targets per genome for Hematodinium spp. and the diversity of ecotypes presented in our study – we have incorporated count data (actual number of parasites per mL haemolymph) from the original liquid tissue screens using haemocytometry, and grade data (0-4) from solid (gill and hepatopancreas) tissues assessed using histology. In our view, these quantitative data are more powerful than those that we could have achieved via qPCR.

      We have analysed these additional data in the context of co-infection presence. Indeed, when we restricted the Hematodinium-positive crabs to n = 111 based on individuals that were positive for the parasite in all three diagnostic methods across key tissues (haemolymph via haemocytometry and PCR, and gill/hepatopancreas via histology), and re-ran all our analyses/models, the outcomes did not yield contradictory conclusions. Hematodinium-positive crabs were no more likely to contain co-infections when compared to Hematodinium-negative crabs, AND, location remains the determining factor for pathogen diversity among crabs.

      Reviewer #2 (Public Review):

      Strength:

      1. The authors looked into the prevailing idea that parasitic infection make crab immune-compromised although evidence to support this idea is lacking except one study by where immune gene expression was found to be modulated upon Hematodinium infection in Japanese blue crab. Apparently, the authors studying gene expression in Japanese blue crab upon parasitic infection did not identify any effort molecules of parasitic origin.
      2. It was interesting to note that haplotype diversity analysis revealed higher genetic diversity in the host compared to the Hematodinium parasite in two locations examined.

      We thank Reviewer 2 for their time and insight.

      Weakness:

      1. The authors should clarify how the sample sizes were selected.

      We apologise for the unintentional ambiguity. The initial survey covered ~50 crabs per location per month for 1 year (n = 1191) - using an alpha value of 0.05 and desired power >80% indicated a minimum of 38 (1-sided test) or 48 (2-sided test) crabs were needed based on an a priori prediction of 15% Hematodinium prevalence in the C. maenas population. The 162 Hematodinium-positive crabs (PCR) were size/sex/location matched to 162 Hematodiniumnegative crabs.

      The methods section has been edited to ensure that this information is clear.

      1. It is also recommended that details on sample collections are included (for example the times in which samples from the surrounding waters of infected crabs)

      More details on sample collection have been included in our revision

      Reviewer #3 (Public Review):

      This manuscript aims to assess the main factors that drive disease in crabs across Europe. Crabs are critical members of the trophic chain in marine environments and also serve as food source for human consumption. Therefore, understanding the health of this group of organisms is very important for ecosystem health as well as human health. This is a well written manuscript that identifies important knowledge gaps in the field. The authors evaluate whether crabs are infected by an important parasitic dinoflagelate, Hematodinium, and question several current dogmas in the field. The first one is whether or not presence of Hematodinium infection is a significant factor driving co-infections by other pathogens including trematodes, bacteria or fungi. The authors, based on their data set, conclude that this is not the case, since co-infections are observed in similar proportions in Hematodinium free animals. The authors perform in depth assessment of other drivers of infection and identify geographical location as the main factor driving community structure.

      We thank Reviewer 3 for their time and insight.

      A second important aspect of Hematodinium biology is the previous notion that this parasite immunosuppresses the crab host (and therefore co-infections may be found). However, there is a paucity of studies to confirm or reject this hypothesis, and therefore, the current study is important to expand our current knowledge of marine invertebrate immunobiology and specifically in the mechanisms by which Hematodinium is harmful to the host. The current study provides some evidence that, in fact, this parasite is not an immune suppressor but rather evades the cellular immune response of the host. In particular, the authors claim that the host hemocytes (immune cells) fail to phagocytose (engulf) and encapsulate (surround and fend off) this parasite. These two cellular immune responses are the most common in invertebrates such as crabs.

      Overall, this study is important because very few studies evaluate wild marine diseases, especially in invertebrate hosts and because it contributes with a large data set from a wide geographical range.

      We thank Reviewer 3 for their time and insight.

    1. Author Response:

      Reviewer #1:

      This manuscript explores the role of ER and PR in the endometrial cancer cell model called the Ishikawa cell line. The authors conduct a series of detailed experiments to assess the estrogen (E2) and progesterone (R5020) response in this model and show that E2 can promote cell growth which is subsequently inhibited by co-treatment with R5020. RNA-seq revealed E2 or R5020 gene targets with most differential genes being unique to the treatment condition. The functional role of PR was assessed and confirmed on a specific locus of interest and using a reporter assay. ChIP-seq was conducted revealing gained PR binding events following R5020 and some that were already present prior to treatment, as well as sites that were lost. Substantially more PR binding sites were observed in the T47 breast cancer model and the authors mimic the elevated levels of PR, by expressing exogenous PR in Ishikawa cells and conducting a series of ChIP-seq experiments. Analysis of specific binding regions revealed the enrichment of motifs for the Pax family of transcription factors and the authors assess the hormonal regulation of PAX2 cellular expression. PAX2 ChIP-seq was conducted, revealing very few binding peaks and these were partially integrated with the PR and ER binding peaks. Finally, a Hi-C experiment was conducted, revealing that ER, PR and PAX2 binding occurs in genomic compartments and specific gene signatures were derived from this analysis.

      This is a topical area and the work is of potential interest, but several key issues need to be addressed:

      • The role for PAX2 (over other family members) is inferred by the enrichment for motifs specific to that family member, but the motif enrichments are not good as defining individual family members that share a common motif. What are the expression levels of the PAX family members in the Ishikawa cell line and in primary endometrial cancers, to support the role for PAX2 over other family members? This wouldn't require any experiments and could simply involve analysis of public expression datasets.

      We agree with the reviewer that “the motif enrichments are not good as defining individual family members that share a common motif”. Besides choosing PAX2 because its motif was the most represented in PR and ERbs, we evaluated expression of PAX2 and other family members in publicly available normal and endometrial cancer samples. The expression levels of the PAX family members in Ishikawa cells and in primary endometrial cancer samples have been detailed in Figure 5, Figure 5-Figure Supplement 1, Figure 7 and Figure 7- Figure Supplement 1 (see new version of the manuscript). Although other PAX family members seem to be more expressed than PAX2 in tumors (like PAX8), many reports link the loss of PAX2 in endometrial tissue to bad prognosis of tumors (EIN) (see Sanderson et al, 2017 in revised manuscript). This was included in the discussion and now it has been included in the corresponding result argument connected to Figure 5 and Figure 7 of the revised manuscript.

      • The authors conclude that PAX2 binding overlaps with PR in pre-treated cells, but data in Figure 5C and 5D could simply represent co-binding at open enhancers, which are notorious for recruiting many transcription factors that are expressed in that cell type. What is the overlap in peaks between PAX2 and PR/ER, ideally via a Venn diagram or some visual that allows for a comparison of the total number of peaks for each factor and the common ones? Is there a statistically enriched co-binding of PAX2 to PR/ER sites, at levels that are more than expected?

      The fraction of overlap between sets of binding sites was studied in a pairwise fashion and is graphically represented in Figure 5C of the revised manuscript. Also, we performed a Fisher test to statistically evaluate the significance of the co-localization between PAXbs and hormone receptors which were included in Figure 5E and F. Results were included in text of revised manuscript (line 314).

      • The link with PAX2 is potentially exciting but is not convincing in its current form. The only real evidence linking PAX2 to ER/PR is that PAX2 cellular location can be altered and there are some binding peaks where there is co-enrichment, but this would likely happen with any transcription factor expressed in that cell line model. What is currently missing (and essential), is some evidence providing a functional link between ER/PR and PAX2. Is PAX2 required for PR and/or ER function, either ER/PR binding or induction of target genes? If not, then the data on PAX2 is circumstantial and isn't really relevant to the transcriptional pathways regulated by PR or ER.

      We completely agree with the reviewer. Functional data linking PAX2 to ER/PR signaling were studied using depletion of PAX2 with specific siRNA, showing clear effects on PR binding and hormone-regulated genes. These effects were observed in both E2 pretreated and non-pretreated cells, indicating that PAX2 is involved in many aspects of PR regulatory action and its interplay with ERalpha. Results on PR binding and gene expression with or without siRNA against PAX2 have been included in new Figure 5 (line 329 of revised manuscript). M&M and discussion were also revised accordingly.

      • There is no explanation put forward to the 307 lost PR sites.

      The sentence has been removed because we do not have replicates of 30min time point ChIPseq samples.

      • The GEO dataset indicates that only one replicate was conducted for the ChIP-seq experiments. This does not meet the minimum ENCODE requirement and many of the differential peaks (i.e. the 307 lost peaks) are potentially false positives that result from having only one replicate.

      We processed data from replicates and found similar results between them (see Figure 2 -Figure Supplement 2). While PRbs in E2-pretreated cells and ERbs were mostly reproduced (80% and 70%, respectively), PRbs in non- pretreated cells showed 40% of common peaks. This is mainly due to the fact that one of the replicates produced a higher number of peaks above threshold during peak calling, forcing a lower percentage of common peaks. However, we have demonstrated that almost all reported PRbs were independently reproduced by PR ChIP-seq data in other conditions, namely E2-pretreated PRbs and PRbs from FPR cells (Figure 2 and Figure 2-Figure Supplement 2). It is important to note that only PRbs from E2-pretreated cells and ERbs promoted the main conclusions of our manuscript. We will upload new data to GEO as soon as possible.

      • The authors claim that the motif enrichment supports a conclusion where monomer PR could bind at 30 min and dimers at 60 min, but there is no direct evidence that this is the case. Unless the authors plan to pursue this functionally and can show dimeric vs monomeric binding, this statement should be removed, as it is not backed up by data and the presence of a half site vs a full palindromic motif does not provide evidence for the genuine mode of binding.

      The statement has been removed.

      • All the work is conducted in a single cell line model. I understand that there are few endometrial cancer cell line models and I also acknowledge that the authors have conducted a complicated series of genomic experiments and it would be unrealistic to repeat these in another model. However, the findings from this one model should reveal new insight that can be validated in either another model or in a cohort of clinical samples of the cancer types. But, in its current state, neither are done. The authors attempt to extract gene signatures from the genomic data to assess in patient cohorts, but the data (see my next comment) is not compelling or convincing and the only conclusion I can make, is that out of the hundreds of somatic mutations and hundreds of PgCR genes, only a handful of genes correlate with outcome. I suspect the same conclusion could be made with a random set of several hundred genes.

      We reformulated the analysis of tumor samples to avoid biased conclusions. Results are part of the new version of Figure 7.

    1. Author Response:

      Reviewer #2 (Public Review):

      Yu et al provide a comprehensive set of experiments to determine that bradyzoites have much slower cytosolic Ca2+ parameters, which impact on gliding motility, a key process of Toxoplasma spread and persistence.

      The only main criticism that I have is the use of the MIC2-GLuc reporter to measure microneme secretion in bradyzoites. Do bradyzoites have any appreciable level of MIC2 and its associated protein M2AP?? This is important that may affect the outcome. If bradyzoites do not, then the MIC2-GLuc reporter might not have appropriate levels of M2AP to correctly traffic to the micronemes. I recommend that the authors quantitate, either by western blot or IFA, the levels of MIC2 and M2AP in bradyzoites versus tachyzoites and also show that M2AP co-localises with MIC2-GLuc to give confidence that MIC2-GLuc is trafficked correctly and thus the low readings of secretion are not just a result of the reporter mistrafficked. It would also be pleasing to see, that 1hr incubation leads to restoration of MIC2-GLuc secretion.

      We acknowledge that the expression and localization of MIC2-Gluc reporter is a potential concern. We performed western blotting (Figure 2C) and IFA (Figure 2 supplement 1A) to confirm that bradyzoites express MIC2-Gluc and M2AP albeit at lower levels compared with tachyzoites. Moreover, MIC2-GLuc and M2AP were properly co-localized to the apical end in bradyzoites, ruling out the possibility of mis-localization of the MIC2-GLuc reporter. Based on these results, we believe that MIC2-GLuc provides a reliable read-out for microneme secretion in in vitro differentiated bradyzoites. Additionally, the conclusion that MIC secretion is dampened in bradyzoites is also supported by the studies using the FNR-Cherry reporter in Figure 2E,F,G.

      Reviewer #3 (Public Review):

      This is a first study that looks in detail at Ca-controlled gliding motility and ATP supply in bradyzoites. A comparison of such different parasite stage by manipulating Ca and ATP metabolism is challenging. Intervention by chemical compounds needs to overcome a prominent cyst wall and the usage of genetic tools needs to consider the broad changes in protein expression between tachyzoites and bradyzoites as well as a heterology between individual bradyzoites. The authors used excysted bradyzoites to exclude the cyst wall as a diffusion barrier as a major factor in the efficacy of different Ca agonists. To address differences in expression levels between tachyzoites and bradyzoite stages the authors developed a ratiometric Ca sensor based upon an autocleaved GCaMP6f-BFP dimer protein.

      Overall the conclusions are well supported but there are methodological questions that need to be addressed.

      Bradyzoites show a heterogenous expression of Bag1 / Sag1 markers as well as heterologous proteins. This is shown in Fig 1A and Fig 2b for example. However, in most time-dependent measurements of Ca-dependent fluorescence (Fig 2G, 3D the authors only average three cells. This appears to be insufficient to represent the bradyzoite population. How is the variance between the three measured cells?

      We have quantified more cells in all figures related to fluorescence measurements. For measurements of single parasites in Figure 5B, 5D, 5E, 6F, 8A, 8B and Figure 7 supplement 1A, we have now quantified 10 parasites for each condition and plotted the data as means ±S.D. to show the variance. For in vitro induced cysts or ex vivo cysts in Figure Fig 2G, 3D, 3E, 4C,4G, 6E, 7B and Figure 4 supplement 1A, we measured 5 cysts or vacuoles per condition. Because these samples contain many parasites within each vacuole or cyst, they represent a greater sample size. The data are also plotted a means ±S.D.

      In addition, the Mic2 promoter driven Gluc-myc protein is not expressed in all bradyzoites. This is perhaps not suprising as Mic2 seems to be downregulated in bradyzoites according to Pittman and Bucholz et al dataset in ToxoDB. If interpreted correctly the lower expression of Gluc in some bradyzoites would favour an underestimation of the RLUs in Fig 2D.

      We acknowledge that the expression and localization of MIC2-Gluc reporter is a potential concern. We performed western blotting (Figure 2C) and IFA (Figure 2 supplement 1A) to confirm that bradyzoites express MIC2-Gluc and M2AP albeit at lower levels compared with tachyzoites. Moreover, MIC2-GLuc and M2AP were properly co-localized to the apical end in bradyzoites, ruling out the possibility of mis-localization of the MIC2-GLuc reporter. Based on these results, we believe that MIC2-GLuc provides a reliable read-out for microneme secretion in in vitro differentiated bradyzoites. Additionally, the conclusion that MIC secretion is dampened in bradyzoites is also supported by the studies using the FNR-Cherry reporter in Figure 2E,F,G.

      The maturation of bradyzoite takes several weeks. This cannot be accomplished with currently available system in vitro and the authors use 1 week matured bradyzoites. To facilitate comparability to data from other manuscripts it would be helpful if the authors could quantify the differentiation stage of the in vitro bradyzoites. This could be done by measuring the fractions of Bag1-positive and Sag1-negative bradyzoites.

      We thank the reviewer for this useful comment. We have quantified the percentage of BAG1-positive SAG1-negative bradyzoites within each cyst induced for 3, 5 or 7 days by IFA and spinning disc confocal microscopy (Figure 3 supplement 1A). This analysis demonstrated that the percentage of BAG1-positive and SAG1-negative bradyzoites reached ~70% at day 7 after induction (Figure 3 supplement 1B). For this reason, we used a 7 day induction treatment for the majority of experiments. Also, where imaging was used in the analysis, we focused on regions of in vitro differentiated cysts that expressed high levels of BAG1-mCherry.

      The mcherry and GCaMP6f signal in fig 3B seem mutually exclusive. This may be due to difference in calcium signalling between Bag1 pos or neg parasites or due to expression differences of GCaMP6f.

      To test the possibility of expression differences in GCaMP6f, we quantified the fluorescence of BAG1-mCherry and GCaMP6f in different bradyzoites within the cyst shown in Figure 3B. At time 0 prior to stimulation, we observed heterogenous expression of BAG1- mCherry while the signal for GCaMP6f expression was relatively constant (Figure 3B supplement 1C and 1D). In contrast, when in vitro differentiated bradyzoites were stimulated with A23187, they showed reduced levels of GCaMP expression in cells that were strongly positive for BAG1-mCherry (Figure 3B). Collectively, these findings are consistent with the difference in GCaMP fluorescence being due to dampened calcium responses in bradyzoites rather than expression differences. This conclusion is supported by studies on GCaMP responses in cells where we normalized for expression level using a dual-expression BFP reporter in Figure 6. Therefore, we do not think that heterogeneity in the expression of GCaMP is responsible for the observed dampened response in bradyzoites.

      The authors use syringe, trypsin-released and FACS sorted bradyzoites in multiple Ca assays. How can it be excluded that this procedure affects (depletes) Ca stores?

      In all the figures except Figure 2C-2D, we did not use FACS to sort bradyzoites. Instead, we scraped cells cultured at pH 8.2, used syringe passage through 25g needle followed by centrifugation. Cyst pellets were resuspended and digested with trypsin to liberate bradyzoites. For tachyzoites, all procedures were similar except that we did not use trypsin digestion. As a control, we have now treated tachyzoites similarly with trypsin and monitored the calcium stores using ionomycin. We found that trypsin digestion did not affect the calcium stores or response as shown in Figure 7 figure supplement 1A.

      In my opinion several experiments in this manuscript would benefit from clarification of this point. For example: In Fig 7A Fu et al measure Ca for 5min during trypsin digestion, however, for gliding assays cysts are digested for 10min. The Ca monitoring should cover the complete 10min off trypsin digest.

      We understand the concern but there were practical reasons for the slightly different times used. In panel A where we are monitoring calcium during trypsin digestion, the majority of cysts are dispersed after 5 min resulting the parasites being out of focus. As such, it is not practical to monitor beyond this time point. In the panel C, we were interested in observing parasites after the cysts where fully digested and hence we used a slightly longer time period to allow complete digestion and for the parasites to settle to the bottom of the dish before further recording. In this instance, similar to the result in A, most parasites remained dormant and did not show elevated calcium levels. In the figure, we are selectively showing a rare example where calcium signaling was observed in order to compare the patterns to what is normally observed with tachyzoites. These combined panels are not meant to be a comparison of kinetics, as this aspect is tested more directly in later experiments. We have modified the text to make the rationale for this experiment clear.

      In Fig 2B Fu et al digest infected monolayers with trypsin to release mcherry from cysts matrices. How can the authors exclude that trypsin is not digesting mCherry protein in this assay?

      I think the reviewer means 2F as in 2B we are using BAG1 mCherry to visualize bradyzoites – but they are not being liberated in this image. In 2F we use a different construct, FnR-mCherry that directs the reporter to be constitutively secreted to either the PV (surrounding tachyzoites) or the cyst matrix (surrounding bradyzoites). When the cysts are disrupted with trypsin, the mCherry is likely to disperse and may also be digested. However, this would not happen if it remains inside the parasite. This control is provided to show that the protein is secreted into the matrix. We have revised the text to clarify the use of this control.

      Fig 7 E,F: the authors measure shorter gliding distances of bradyzoite as compared to tachyzoites. Trails of both parasites however, are detected by visualizing using different antigens that may have different shedding behavior on the FBS-coated glass surface. The Bag1 trail also depends on Bag1 expression, which is shown in numerous images to not be equal among individual bradyzoites. This point is very challenging to address but should at least be discussed.

      BAG1 is used here to discern the bradyzoites, not to detect the trail. Trails are stained with either SAG1 or SRS9 – corresponding to the most abundant surface GPI anchored antigen in each stage. Since these proteins are part of the same C-C fold family and are similarly anchored, we feel they are comparable. We have added the following statement to the results: “These two surface markers are both members of the cysteine rich SRS family that are tethered to the surface membrane by a GPI anchor, thus they represent comparable reporters for each stage.”

      Fig 7E: Bradyzoites are considered to satisfy their ATP needs mostly via glycolysis and the data shown do support this capability. I find the ability of OligomycinA to block glucose-dependent gliding surprising as this suggests a necessary mitochondrial transport chain for ATP-production from glucose. This result should be mentioned clearly in the text and its implications discussed.

      The Discussion has been revised as suggested.

      Figure 8: The authors claim a recovery of bradyzoite ATP and Ca levels after 1hr incubation with carbon sources and Ca, that together enable efficient gliding. However, the elevation of bradyzoite ATP occurs after the parasites spend 2 hours in glucose-free and Ca-free conditions, whereas gliding assays are done after a short 10min trypsin digest. I am not entirely convinced that low ATP levels post-egress are responsible for the low gliding activity. Ideally gliding assays should be done after a similar purification procedure to correlate the two experiments.

      We have repeated the gliding assays using bradyzoites purified in the same manner as for the ATP measurements and found the same result that a combination of exogenous calcium and glucose enhance recovery of gliding motility (Figure 8D, 8F). In addition, we used the same time point to purify bradyzoites for MIC2-Gluc secretion and found exogenous calcium and glucose also led to an increase in MIC2-GLuc secretion, indicative of the recovery of microneme secretion (Figure 8C).

    1. Author Response:

      Reviewer #3 (Public Review):

      Miskolci et al have investigated if it is possible to measure the natural fluorescence of two important co-enzymes (NADH/NADPH and FAD) in living cells to determine their metabolic status. This tests the hypothesis that changes to the relative ratio of NADH/NADPH to FAD+ reflect a shift between glycolytic and oxidative phosphorylation in living macrophages. To investigate this they have used 2-photon FLIM to measure intensity and fluorescence lifetime of NAD/NADPH and FAD+ in mouse macrophages in vitro and zebrafish macrophages in vivo in a tail injury model. By comparing their measures of NAD(P)H and FAD+ from macrophages responding to different injury or infection cues and comparing this to a maRker of inflammation (TNF-alpha) they argue that there is a reduced redox state indicative of glycolytic metabolism in pro-inflammatory macrophages.

      The adoption of label free imaging techniques to measure metabolic processes in cells in vivo is a valuable and important development that, although not novel to this work, will help researchers to probe cell biology in situ. FLIM using time correlated single photon counting (TCSPC) allows an accurate and robust measure of the relative state of a molecule that shows changes in its fluorescent lifetime as a consequence of changing chemical state. Although Stringari et al (doi.org/10.1038/s41598-017-03359-8) were the first to describe the utility of wavelength mixing FLIM for measuring NAD(P)H and FAD+ levels in zebrafish, they did not focus on macrophages which is the focus of this work.

      The results from this work are are interesting, as they argue that it is possible to determine cell metabolism in cells within living animals without a need to use a genetically encoded sensor and they argue that pro-inflammatory macrophages in zebrafish appear to have a lower redox state, which may reflect a more glycolytic metabolism. This assumption is not tested but rather inferred based on the measures of fluorescence intensity and lifetime of endogenous NADH/NADPH and FAD coupled with a small metabolic sampling of injured tissue. This lack of corroboration for a the supposed difference in metabolism between pro-inflammatory and non-inflammatory macrophages is a weakness of the paper and makes it hard to accept the conclusion that the redox state may reflect different metabolic profiles. A biosensor for NADH/NADPH (iNap) has been demonstrated to be a sensitive tool for measuring NADPH concentration in vivo in zebrafish during the injury response (Tao et al (doi: 10.1038/nmeth.4306) and it would be intriguing to know how similar the response is of this biosensor to the label free measurements described using FLIM. This is additionally relevant as the authors also note that in mouse macrophages cultured in vitro, they observe an opposite redox response which is well supported by the literature and a variety of different methods. Why the zebrafish macrophages should show a different redox state to mouse macrophages is not clear and an alternative explanation is that the measures used do not directly reflect the metabolic profile of the cells. One further caveat to the chosen method of using fluorescence lifetime to measure the redox state of NADH/NADPH is that lifetime of NADH is affected by which proteins it is bound to. This is not accounted for in the method used for calculating the redox ratio used for defining the redox state and could potentially alter the interpretations of relative NADH/NADPH levels in a cell. The authors acknowledge this, but do not consider whether this would affect the conclusions they arrive at from their measures of NAD(P)H intensity and fluorescence lifetime in macrophages.

      We thank the reviewer for their comments. We have added additional data that indicate that the imaging does indeed reflect the metabolic profile of the cells (see Metformin and STAT6 data).

    1. Author Response:

      Reviewer #1 (Public Review):

      The reviewer believes that there is a fundamental problem with the approach of the current MS. Dense reconstruction from serial EM images is a powerful tool for revealing the connectivity matrix in many brain area, where the majority of synaptic connections are made by glutamatergic pyramidal and GABAergic interneurons. Many studies have convincingly demonstrated that the site of synaptic communications among these cells is the well-known EM defined synapses with a presynaptic cloud of vesicles, a rigid presynaptic active zone membrane facing a rigid postsynaptic membrane that either has or does not have a pronounced postsynaptic density. We know from many EM localization results that e.g. the active zone contains the essential molecules of the release sites, the presynaptic Ca2+ channels and the PSD contains the appropriate receptors. Thus, with this information, when the connectome is created from serial EM sections, the sites of communication can be defined based on the EM images. To the knowledge of the reviewer, such pre-existing information is lacking for the DA varicosities. The authors argue that almost all varicosities lack synapses. Verification of such a statement would require the molecular characterization of these varicosities, demonstrating that the molecules essential for vesicle docking/priming/release are lacking. However, if these molecules are present in these varicosities without forming an apparent active zone, then the conclusion of the MS is misleading.

      We thank the reviewer for this point and now believe we have addressed it in the discussion.

      The authors demonstrate the clear labeling of DA neuronal processes using the cytoplasmic-targeted Apex2. However, due to the well-known masking effect of DAB precipitate in the cytoplasm, which prevent the unequivocal identification of vesicles, the authors decided to use the mitochondria targeted Apex2 in the first half of the MS. However, for the cocaine part, they turned to the cytoplasmic version for some reasons. They then analyzed the axonal branching structure and the varicosities/contact points. The reviewer cannot see how this later was achieved with densely filled DAB containing structures.

      We apologize for the confusion. There are several reasons we turned to cyto-Apex for the cocaine part of the MS. We have added a broader discussion of this topic and reproduced below.

      "we used cyto-Apex2 for reconstructing axons and their contact points for a several reasons. First, we found that tracing axons in low resolution EM data sets, for both controls and experimental groups, was substantially easier in cytoplasmic Apex2 axons, thus increasing our tracing throughput. In addition, we found that contact points, (e.g., spinules), were also easier to detect. Detecting either a darkly Apex2 labeled cytoplasmic process in an unlabeled structure, or vice versa, was easier because of the stark contrast between Apex2 labeled and unlabeled processes. In Figure 6 and Figure 6-Fig Supplement 2 an example of this difference is shown. While it is possible that cytoplasmic Apex2 expression could potentially obscure the internal contents of varicosities (e.g., synaptic vesicles or endoplasmic reticulum), we found little evidence that cyto-Apex2 obscured the relevant ultra-structural features that were investigated here."

      The evidence for DA varicosities making synapses (Fig4) is not convincing. The presented EM images does not have the quality/resolution to see the opposing rigid pre- and postsynaptic membranes and the widening of extracellular space in the cleft.

      To make the results clearer, we now show an 8-panel montage of each putative DA synapse as Figure 4- Figure Supplement 1 and 2. We hope that showing the serial sections that span the synapse will make details of the pre- and post-synaptic membranes more convincing. All the images in the manuscript are SEM, not TEM.

      Analyzing the structures immediately next to DA varicosities is questionable. If DA is indeed a volume transmitter, how would the authors know how far it can exert its effect. 1 or 5 microns? If 5 microns, there are many structures of all kinds (axon, glia, spine, dendrite) and only their DA receptor content will tell whether they are sensitive or not (and not necessarily their physical distance) to the released DA.

      We agree with the reviewer that this analysis distracts from the main finding of the story and is better investigated with a more accurate model of which types of cellular processes are sensitive to released DA. We have removed these experiments and their discussion from the manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      In their manuscript, Sengupta et al. describe a developmental mechanism that positions a single neuron across multiple layers in the hierarchical C. elegans nerve ring. The authors show that neighborhood placement of the interneuron AIB is established during embryogenesis and is maintained throughout development. AIB is one of the few C. elegans neurons that are divided into distinct pre- and post-synaptic regions, and its axons curiously occupy two physically separated neighborhoods or layers. How this occurs is not known. This study uses time-lapse imaging to show that unlike canonical axon tip outgrowth mediating fasciculation in a target region, AIB's axon occupies two neighborhoods by first growing completely into one, and then gradually unzippering from the first, switching, and zippering onto the second neighborhood. Importantly, axon outgrowth and neighborhood choice are continuously visualized during embryogenesis, an impressive experiment typically constrained by lack of cell-specific reporters during early development as well as the struggle of imaging embryos.

      The authors posit that zippering is mediated by temporally regulated differential adhesive forces between AIB's neighboring pre- and post-synaptic neurons. How this differs from differential adhesion in classic fasciculating neurons is described but could be made much clearer. They proceed to identify the immunoglobulin syg-1/syg-2 receptor-ligand pair to be necessary and sufficient for AIB's axon switch; in syg-1/syg-2 mutants, AIB is not able to position itself in the second neighborhood and remains fasciculated with the first one, suggesting that adhesive forces are dampened in syg-1/syg-2 mutants. Lastly, the authors show that pre-synapse assembly follows zippering, linking AIB axon placement with synaptogenesis, and that this is also compromised in syg mutants.

      The pipeline used to study axon outgrowth at a single-cell level in the embryo at relevant time points is commendable and will be useful to people studying C. elegans nervous system establishment. Although the overall manuscript and data are well-presented, we think the mechanism of retrograde zippering could be better described. Also, syg-1/syg-2 expression needs to be delineated to support the notion of differential adhesion between neighborhoods.

      We have further clarified the novelty of the zippering mechanism, contrasting it with tip-directed outgrowth. We have also performed a thorough analysis of syg-1 expression.

      Reviewer #2 (Public Review):

      A large amount of data is presented in this paper. The experiments are carefully documented and support the conclusions. Of particular importance is the live imaging of the outgrowth of the AIB neurite in the embryo. This is challenging and required the development of a new marker for labelling and the adaptation of a new type of microscope. This enabled the initial and surprising observation that part of the neurite relocates after outgrowth. I'm not sure that the mathematical modeling adds much. The main conclusion is that the modeling is consistent with a "net increase of adhesive forces in the anterior neighbourhood", which is to be expected. The authors then try to identify the relevant adhesion molecules and find that a pair of IgCAMs (syg-1 and syg-2), which are known to act as receptor-ligand pair, are involved. A series of experiments establishes that syg-2 act in the AIB neurons, whereas syg-1 does not. The neurite positioning defects in syg-1 and syg-2 mutants are partially penetrant, suggesting that other adhesion molecules must be involved. While a large percentage of mutant animals show defects, the defects within an individual animal are surprisingly low with only 21.5% +/- 4% of the neurite detached. This would suggest that syg-1/syg-2 aren't even the major adhesion molecules involved here. In further studies, where the authors ablate the RIM neurons (which express syg-1), the authors use a different measure to quantify the defects (minimal distance between neurite segments, Suppl Figure 7). This makes it difficult to compare the results to those of the syg-1 mutants. For the ectopic expression experiments with syg-1 the authors only report the percentage of animal with defects and not the extent of the defects (how much of the neurite was in an abnormal position).

      Overall, this is a very detailed study describing an important novel mechanism for neurite positioning within an nerve bundle.

      We have added in this revised version additional quantifications, including ‘the minimal distance between neurite segments’ measure (the one used for the RIM ablation experiments) in Figure 4-figure supplement 1 for the placement defects in syg-1(ky652) and syg-2(ky671) mutants, allowing direct comparisons between the phenotypes. We have also added a measure for the percentage of the distal neurite that is mispositioned in the ectopic expression experiments (Figure 6-figure supplement 1A). We do not claim that SYG-1/SYG-2 are the only adhesion molecules involved in AIB neurite placement, but that they are required for complete and proper placement of the neurite. We clarify this in the text.

      Reviewer #3 (Public Review):

      This is a very interesting manuscript describing the changes of neurite position in a complex neuropil during development. The experimental system is well chosen because AIB's function within the circuit requires its neurite to be in two different neuropil "neighborhoods". The manuscript included some technically difficult experiments of imaging neurite outgrowth in C. elegans embryos which are very hard to do. The surprising finding here is that neurite position is not sole dependent on its growth cone navigation. In the case of the AIB neuron, the growth cone is anchored after it reaches its destination point and then a segment of the neurite shift direction towards its final position through a zippering action. They also show that this shift in position is driven by adhesion molecules SYG-1 and SYG-2. Overall, I think this is a strong candidate for eLife. I have one main point and a few minor points.

      My main point is about the relationship between synapse formation and neurite zippering. In my opinion, this is an interesting point because it would tell us if the zippering behavior is a consequence of synapse formation or it is a distinct specificity step before synapse formation. From the time course that was described in the paper, it seems that the accumulation of RAB-3 only starts after the zippering has completed. I would suggest the authors to examine at least another synaptic marker like SNB-1 or SYD-2. We have created cell specific endogenous labeling of several active zone markers that can be used for these experiments. If the results hold, then, I think the authors should make it clear in the text that the zippering takes place before synapse formation and serves as a distinct step in achieving the neighborhood specificity.

      We thank the reviewer for generously sending us strains for cell-specific endogenous active zone protein labeling (McDonald et al., 2021) using the SapTrap method (Schwartz and Jorgensen, 2016). We made constructs expressing FLP recombinase downstream of the inx-1 and unc-42 promoters for cell-specific labeling of these active zone proteins in AIB and injected them into the strains. Although we observed cell-specific synaptic signal in larvae with both inx-1p and unc-42p-driven FLP, we were not able to observe signal during embryogenesis, probably due to cell-specific synaptic protein expression levels being low.

      Therefore and to address the reviewer’s question about the temporal order of zippering and synapse formation, we have cell-specifically expressed two active zone proteins in AIB (CLA-1 and SYD-2) and measured their intensities over time in AIB in embryos (Figure 7, Figure 7-figure supplement 1A-F). We find that similar to RAB-3, synaptic signal is not visible until after the end of zippering, and progressively increases over time following zippering. These observations suggest that synapses do not initiate retrograde zippering. We added the time-course of active zone protein localization in AIB in the context of the time course of retrograde zippering (Figure 7J). Consistent with these observations, in a syd-2(ola341) allele identified in our screen, we find that although synapses are mislocalized, AIB neurite placement is unaffected, consistent with the idea that synapse formation is not upstream of zippering-mediated placement (Figure 7-figure supplement 1G-K).

      We acknowledge, however, that our studies are limited by detection of the synaptic proteins, and that, while it does not appear that synaptogenesis leads to zippering, a cooperative and synergistic relationship might exist between the process of zippering and synaptogenesis to hold the neurite position in place. We have added text to better discuss this relationship between zippering and synaptogenesis in light of these findings.

      Minor points

      1. The schematic diagram is somewhat misleading because in the axial view, the anterior and posterior segment of the nerve ring should appear on top of each other. The lateral view is the right view to show the anterior and posterior segments.

      This is a valid point - if we look at the worm directly head-on, the two segments of the neurite would be on top of one another. A slight tilt of the worm head enables visualization of the parts of the neurite in the two neighborhoods. We have now clarified this in the figure legends.

      1. Describe the screen that led to the mutant alleles of syg-1 and syg-2 better. Any other mutants?

      We have described the screen further in Methods and now include another allele, corresponding to syd-2 (Figure 7-figure supplement 1), isolated from the screen.

      1. "Consistent with the importance of adhesion-based mechanisms in the observed phenotypes, ectopic expression of the SYG-1 endodomain in the posterior neighborhood did not result in mislocalization of AIB (Figure 6-figure supplement 1A,B). " This statement is wrong. I suspect the authors meant in syg-2 mutants.

      The statement might have been confusing, but it is not wrong. We have found that ectopic expression of the SYG-1 endodomain, which lacks SYG-1’s extracellular domains, does not cause ectopic placement of the AIB neurite, which is what we described in that statement. We have edited to make it more clear.

      1. For Fig. 7-figure supplement 1, please quantify this phenotype.

      We have now included quantifications for this in Figure 7-figure supplement 3.

      References

      MCDONALD, N. A., FETTER, R. D. & SHEN, K. 2021. Author Correction: Assembly of synaptic active zones requires phase separation of scaffold molecules. Nature, 595, E35. SCHWARTZ, M. L. & JORGENSEN, E. M. 2016. SapTrap, a Toolkit for High-Throughput CRISPR/Cas9 Gene Modification in Caenorhabditis elegans. Genetics, 202, 1277-88.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary

      Moncunill et al set out to investigate a very important question: why are half of children vaccinated three times with RTS,S AS01 protected from clinical malaria - and half not? To do so they isolated PBMCs before vaccination and one month after third vaccination and stimulated them in vitro with DMSO (vehicle control), two malaria antigens (CSP (part of RTS,S) & AMA1) or HBS (hepatitis B antigen - part of RTS,S).

      They then assessed their transcriptional response by blood transcriptional module analysis and correlated those results with previous published data on antibody titers and T cell cytokine production to find associations. To assess risk of clinical malaria, responses were compared between RTS,S vaccinated children who developed clinical malaria in the one year follow-up (cases) and those who received RTS,S or a comparator vaccine and did not (controls). They found that responses after RTS,S vaccination did not predict protection from clinical malaria. Instead a blood transcriptional module signature related to dendritic cells, inflammation, and monocytes before vaccination may be associated with clinical malaria risk.

      Strengths

      Immune correlates of protection are evaluated in African children (who are the RTS,S target population) in a natural transmission setting.

      Excellent set of controls: children (same age) vaccinated with RTS,S or comparator vaccine alongside each other -> retrospectively stratified by whether the did or did not develop clinical malaria : controls for the effect of a developing immune system and would allow to disentangle RTS,S specific and clinical malaria specific response patterns.

      Weaknesses

      RTS,S is composed of CSP & HBS. yet when PBMCs from children vaccinated three time with RTS,S are stimulated with these peptides no transcriptional differences compared to children receiving a rabies or meningitis vaccine were detected (Figure 2). this lack of recall response impacts all downstream conclusions and comparisons made in the paper.

      The fact that bulk transcriptional profiling of Ag-stimulated PBMCs (and specifically to CSP) did not identify large significant differences in BTM expression between the RTS,S vs. comparator group could be due to several factors. First of all, the frequency of antigen-specific CD4+ T cells was very low among CD4+ T cells (Figure 4 of the manuscript shows that CSP-specific CD4+ T cells comprise < 0.004% of all CD4+ T cells). This low frequency of CSP-specific T cells is consistent with other RTS,S studies [e.g. as we state on line 330, we have previously found that CSP-specific T-cells in RTS,S/AS01 vaccines comprise < 0.10% of all CD4+ T cells (1)]. Moreover, CD4+ T cells themselves comprise approximately 45-57% of all PBMCs (2). Thus, finding an expression signal between the RTS,S vs. comparator group would require the signal to be high enough to be detected in only 0.002% of all PBMCs [0.004% (% CSP-specific CD4+ T cells out of total CD4+ T cells) x 51% (average % of CD4+ T cells out of all PBMCs) = 0.002%]. Thus, lack of detectable recall response does not mean lack of recall response. Moreover, as suggested below, we opted not to focus the rest of the manuscript on the Ag-stimulation results.

      Second of all, the PBMCs were stimulated on site for 12h and then cryopreserved. This stimulation time was chosen based on the kinetics of IFN- and IL-2 mRNA response (3), but other responses may have had different kinetics and thus have already resolved or have not yet occurred by the 12-h cryopreservation. We have added text in the manuscript to discuss these caveats (“Another potential reason for why no BTMs were found to associate with the response to RTS,S/AS01 vaccination or with protection when analyzing CSP-stimulated PBMC is that all PBMC were stimulated on site for 12 hours (this stimulation time was chosen based on the kinetics of the IFN-gamma transcriptional response) and then cryopreserved. Thus, we were unable to detect earlier transient responses that had already resolved by 12 hours, as well as more delayed response that had not yet initiated by 12 hours, if such responses occurred”).

      It should be noted that in all our analyses, the stimulated results were adjusted for DMSO to focus on the antigen-specific response only. This would explain why we detect signal in the DMSO samples but not in response to stimulation. We have realized that this was not very well described in the figure captions and the Methods section and have added more details, including the model description in Methods section. As such, we do not believe that these results impact all downstream conclusions. We believe that the unstimulated results provide significant new insights into the immune and molecular mechanisms of RTS,S vaccine efficacy, not necessarily directly related to the RTS,S-specific acquired immune response. Finally, we would like to highlight the fact that we have improved our model specification to directly account for the pairing of some of the samples using a random effect using the limma package. This has slightly increased statistical power, and as such the number of significantly differentially expressed BTMs in response to stimulation is a bit higher (but still much less than that for the DMSO). Originally, we had decided against the use of a random effect due to the computational cost of estimating the random effect.

      Transcriptional responses 1 month after the final RTS,S vaccination do not predict clinical malaria risk (Figure 3) - this is a key finding, which should be central to the conclusion of this paper.

      Considering that Kazmin et al. (4) showed that the transcriptional response to the third RTS,S/AS01 dose peaks at Day 1 post-injection, with some decline by Day 6 and approximately 90% of the response having waned by Day 21 (with the caveat that Kazmin et al.’s study population was malaria-naïve adults), we do not find it surprising that there were only a few BTMs whose 1 month post-final RTS,S dose associated with clinical malaria risk. However, the point is well-taken about the relative merits of the baseline. We have edited the Discussion to include discussion of the Month 3 correlates results:

      “Compared to the 45 BTMs whose baseline levels significantly associated with clinical malaria risk in RTS,S/AS01-vaccinated African children, fewer BTMs (seven) had levels at one month post-final RTS,S/AS01 dose that significantly associated with clinical malaria risk. Moreover, if a more stringent FDR cutoff had been used (i.e. 5%), six of these seven BTMs would not have been identified. Thus it is entirely possible that, at one month post-final RTS,S/AS01 dose, there is no circulating immune transcriptomic signature predictive of risk. Such a conclusion would not be surprising, given that in malaria-naïve adults, the transcriptional response to the third RTS,S/AS01 dose has been shown to peak at Day 1 post-injection, with some decline by Day 6 and approximately 90% of the response having waned by Day 21 (17). Therefore, it is likely that the sampling scheme in this study (one month post-final dose) misses the majority of the transcriptional response to RTS,S/AS01.”

      The take-home message put forward in the title/abstract (that a monocyte and DC related pre-vaccination signature predicts risk of clinical malaria in RTS,S vaccinated children) is not strongly supported by the data. It is based on blood transcriptional modules related to monocytes being picked out when comparing RTS,S vaccinated cases and controls.

      Thank you for giving us the opportunity to provide further rationale for our focus on the 7 monocyte-related and 4 DC-related BTMs shown in Figure 6B (MAL067 column) out of the 45 total BTMs whose baseline expression associated with clinical malaria risk in RTS,S/AS01-vaccinated children. The reviewer implies that these modules were chosen for focus somewhat randomly or without justification (or, even worse, “cherry picked”), which we would agree would be an imperfect method for drawing conclusions.

      First, we have always ensured to mention that the 45 baseline modules that correlated with risk in RTS,S recipients (Fig 6B, MAL067 column) belonged to many functional annotations, including DC cells and monocytes. (Abstract: “In contrast, baseline levels of BTMs associated with dendritic cells and with monocytes (among others) correlated with malaria risk”) (Main text, lines 519-522: “Compared to the results from the month 3 analysis (7 BTMs), the baseline correlates analysis of MAL067 revealed a larger number (45) of BTMs, spanning many functional categories, whose month 0 levels in vehicle-stimulated PBMC nearly all associated with clinical malaria risk in RTS,S/AS01 recipients .”

      The focus on DC cells and monocytes is due to two reasons: 1) the fact that the DC-related modules and the monocyte-related modules were some of the most significant correlations (lines 522-524: “The BTM with the most significant association with risk was “enriched in monocytes (II) (M11.0)” (FDR = 1.80E-14), followed by “inflammatory response (M33)” (FDR = 2.45E-07) and “resting dendritic cell surface signature (S10)” (FDR = 6.03E-07).”

      Second, the baseline association of DC- and monocyte-related modules appeared to generalize across populations: (Abstract: “A cross-study analysis supported generalizability of the baseline dendritic cell- and monocyte-related BTM correlations with malaria risk to healthy, malaria-naïve adults, suggesting that certain monocyte subsets may inhibit protective RTS,S/AS01-induced responses.”; Main text: “BTMs related to dendritic cells and to monocytes were most consistently associated with risk across these three studies [“resting dendritic cell surface signature (S10)”, “DC surface signature (S5)”, “enriched in dendritic cells (M168)”, “enriched in monocytes (I) (M4.15)”, “enriched in monocytes (II) (M11.0)”, “enriched in monocytes (IV) (M118.0)”, and “monocyte surface signature (S4)” significantly correlated with risk in all three studies].”

      The first two sentences of the Discussion (lines 577-580) explain our focus on monocytes and DCs:

      “Our main finding is the identification of a baseline blood transcriptional module (BTM) signature that associates with clinical malaria risk in RTS,S/AS01-vaccinated African children. In a cross-study comparison, much of this baseline risk signature – specifically, dendritic cell- and monocyte-related BTMs – was also recapitulated in two of the three CHMI studies in healthy, malaria-naïve adults.”

      Finally, we note that the title (“A baseline transcriptional signature associates with clinical malaria risk in RTS,S/AS01-vaccinated African children”) does not restrict to DC-related or monocyte-related BTMs, rather, we chose this title based on the larger number of BTMs, and higher correlations with risk, in the baseline analysis compared to the Month 3 analysis.

      We have revised all instances where we have communicated this less clearly, e.g. “for why we identified a baseline monocyte transcriptional signature of risk” has been changed to “for why we identified monocyte-related BTMs in our transcriptional signature of risk”.

      Many other modules are picked out as well e.g. cell cycle (Figure 6B). An in-depth analysis of the genes in these module and what their up and downregulation can tell us about their function is warranted to support the conclusions.

      Thank you for the suggestion to look at the cell cycle module in Figure 6B. You make a good point that this module is the only module to show a significant association with clinical malaria risk across all 4 of the RTS,S studies and should therefore be further examined. First, we have added this to the text:

      “Only one BTM, “cell cycle and transcription (M4.0)”, was significantly associated with risk across all four studies. Of the 335 genes in this module (M4.0), 130 were also present in one or more of the six “monocyte-related” BTMs shown in Figure 6B (297 genes total across all six BTMs), suggesting that the “cell cycle” and “monocyte” results may actually be picking up the same signal.”

      We have done the gene-level analysis as suggested, resulting in 8 new supplemental figures (Figure 6-figure supplements 1-8) and one new supplemental table (S5). We have also made the following revisions to the text:

      In Results: “To gain insight into specific module-member genes that may be involved in the RTS,S/AS01 baseline risk signature, we performed the same analysis on the gene level, i.e. examined associations with clinical malaria risk for each of the constituent genes in the 45 BTMs shown in Figure 6B. Figure 6-figure supplements 1-8 show the gene-level association results within the eight BTMs that were significantly associated with clinical malaria risk in MAL067 and at least two of the three CHMI studies, and had at least one gene in MAL067 that was significantly associated with risk (these eight correspond to M4.0, S10, S5, M168, M4.3, M11.0, M4.15, and S4). Within MAL067, 35 unique genes were shown to significantly associate with malaria risk (Supplementary Table 5); 9 of these genes (CCNF, MK167, KIF18A, NPL, RBM47, CFD, MAFB, IL13RA1, and CCR1) also had significant association with non-protection in one of the CHMI studies. Although no individual gene was significantly associated with risk across >2 studies, many showed consistent effect (direction and magnitude) across 3 studies. This further supports our choice to focus on modules instead of individual genes as GSEA increases power to detect more subtle but coordinated changes in gene expression data that would be missed otherwise. For this same reason, GSEA has been shown to enhance cross-study comparisons (45).”

      In Discussion: “Our gene-level correlates analyses suggest an alternative hypothesis, however. With the caveat that the gene-level analyses were performed post hoc, high baseline expression of STAB1 (which is present in DC-related, monocyte-related, and cell cycle-related modules) was found to positively associate with clinical malaria risk (Figure 6-figure supplements 1, 2, and 6). STAB1 encodes stabilin-1 (also called Clever-1), a transmembrane glycoprotein scavenger receptor that links extracellular signals to intracellular vesicle trafficking pathways (58). Interestingly, stabilin-1high monocytes show downregulation of proinflammatory genes, and T cells co-cultured with stabilin-1high monocytes showed decreased antigen recall, suggesting that monocyte stabilin-1 suppresses T cell activation (56). Thus one possibility is that stabilin-1high immunosuppressive monocytes circulating at baseline could decrease protective RTS,S-induced T-cell responses, or inhibit another aspect of adaptive immunity. Single-cell transcriptomic profiling of PBMC or purified monocyte subsets in future RTS,S trials in African children in malaria-endemic areas could help test this hypothesis.”

      Impact

      This paper will inform future studies looking for correlates of RTS,S induced protection from clinical malaria in a variety of ways:

      It validates the blood transcriptional module approach (as published by Li S, Rouphael N, Duraisingham S, Romero-Steiner S, Presnell S, Davis C, Schmidt DS, Johnson SE, Milton A, Rajam G, et al: Molecular signatures of antibody responses derived from a systems biology study of five human vaccines. Nat Immunol 2014) to find target cell populations which can then be investigated in much more detail.

      It shows that studying PBMC recall responses after peptide stimulation post final vaccination is not the way forward, since no response is detected (Figure 2). future studies can now take an alternative approach. e.g. since unstimulated PBMCs (vehicle control) from RTS,S vaccinated children were different from those who received a comparator vaccine (Figure 2) RTS,S vaccine signatures could be picked up much more easily by whole blood RNAseq.

      It implicates innate immune cells in shaping an individuals response to a vaccine - an exciting basis for future functional and mechanistic studies.

      We are glad the reviewer appreciates the value of the study.

      Reviewer #2 (Public Review):

      This paper reports a sub-study of the RTS,S/AS01 malaria vaccine Phase 3 trial, which aimed to identify groups of genes (blood transcriptional modules, BTMs) for which expression in DMSO or antigen-stimulated PBMCs was associated with clinical malaria during a 12-month follow-up period. Study subjects were infants and children who received either RTS,S/AS01 or comparator vaccines (meningococcal C for infants, rabies vaccine for children), enrolled in the study in Tanzania and Mozambique (with some additional analyses using samples from Gabonese infants).

      Using PBMCs collected at baseline before vaccination and 3 months later (a month after the third vaccine dose), stimulated with DMSO or parasite antigens, the authors used RNA-sequencing to identify BTMs which were different between recipients of RTS,S/AS01 vs comparator vaccines; which were different between baseline and month 3 in RTS,S/AS01 recipients; and which differed between RTS,S/AS01 recipients with a malaria episode and those without a malaria episode during the follow-up period. This combination of analyses might help to distinguish BTMs specifically associated with RTS,S/AS01 vaccine efficacy from those associated with other factors influencing susceptibility to malaria. To further aid mechanistic understanding the authors examined correlations between BTMs and measures of cellular and humoral immune responses. To try to establish generalisability the authors examined whether BTMs identified in African children were also associated with developing malaria in RTS,S/AS01-vaccinated malaria-naïve adults in the United States who underwent controlled human malaria infection (CHMI).

      Strengths of the study include:

      1) The relatively large number of subjects, the large amount of transcriptomic and immunological data which has been generated (and made publicly accessible), and the extensive analysis to evaluate associations between BTMs and numerous immunological variables.

      2) Clear explanation of both the rationale and methods for most of the analyses

      3) The attempt to validate findings in the CHMI studies

      4) Matching of subjects to try to eliminate the confounding effects of age, study site, and time of vaccination

      Weaknesses of the study include:

      1) Despite the relatively large size of the study, it is hard to know whether it had sufficient power to achieve its main objective, and we are not presented with data to demonstrate how successfully the authors managed to match subjects for age, timing of vaccination and follow-up duration

      We have added the following to our “limitations” paragraph in the Discussion: “Fourth, despite the relatively large size of the study, our statistical power was limited by the number of malaria cases with available samples; sampling additional controls would not have increased our statistical power.”

      Moreover, we now also provide the new Supplementary Table 1, which provides complete information on participant match ID, site, age cohort, sex assigned at birth, and time of vaccination.

      2) The comparator group to the RTS,S/AS01 vaccine is not a single vaccine, but two vaccines, but the presentation of the data makes it difficult to identify what effect this may have had on the results

      Indeed, comparators received a rabies vaccine or the meningococcal C conjugate depending on the age cohort. However, we think that the impact on the study results and conclusions is minimal since the main results are based on baseline gene expression and its association with malaria risk within RTS,S vaccinees. Correlates of malaria risk in comparators are done separately. Comparator vaccination may be a confounding factor for age cohort, but we are not analyzing the effect of age cohort on the transcriptional profile. Comparators are only included in the analysis of RTS,S immunogenicity at post-vaccination (RTS,S vs Comparators, Fig 2A, Comparison (1)) and we have adjusted analyses by age cohort and hence by comparator vaccine. The fact that the comparators received different control vaccines only stresses that the BTMs found to be associated with RTS,S vaccination are specific to the RTS,S vaccine.

      Moreover, as an alternative way to identify RTS,S-specific transcriptional responses, we also include Comparison (2), which compares Month 3 to Month 0 transcription levels within RTS,S vaccinees. We include in the text extensive discussion of the merits and drawbacks of each comparison:

      “Two comparisons were done to characterize the transcriptional response to RTS,S/AS01 vaccination: Comparison (1): comparing gene expression in month 3 samples from RTS,S/AS01 vs comparator recipients (month 3 RTS,S/AS01 vs comparator); and Comparison (2): comparing gene expression in month 3 vs month 0 from RTS,S/AS01 recipients (RTS,S/AS01 month 3 vs month 0). Each comparison has its own advantages: Comparison (1) allows the identification of RTS,S/AS01-specific responses while taking into account other environmental factors to which the children are exposed, such as malaria exposure (albeit malaria transmission intensity was low during the study at both sites). Moreover, the very young ages of the trial participants mean that RTS,S/AS01-induced changes may be confounded with normal developmental changes in participant immune systems, further underscoring the value of Comparison (1), as it does not involve comparison across two different time points. On the other side, an advantage of Comparison (2) is that it takes into consideration each participant’s intrinsic baseline gene expression. Comparison (1) uses data from both infants and children, whereas Comparison (2) can only yield insight into RTS,S/AS01 responses in children (as baseline samples were not collected from infants).”

      3) A very "liberal" false-discovery rate (FDR) threshold has been used throughout to define significant associations. An FDR of 0.2 indicates that 20% (or 1 in 5) results which are considered significant will be false-discoveries. This means that the "significant" results must be interpreted with a high degree of caution. Typically researchers use lower FDR thresholds, like 0.05 or 0.01, although one may argue for different thresholds under different circumstances

      While it is not uncommon to use a threshold of 20% for immune correlates studies [e.g. (5-10)], we agree with you that it is important to clearly state the chosen FDR rate and to discuss conclusions in the context of the FDR rate used. We see we could improve our manuscript in this respect. We have added the following:

      Results: “Compared to the 45 BTMs whose baseline levels significantly associated with clinical malaria risk in RTS,S/AS01-vaccinated African children, fewer BTMs (seven) had levels at one month post-final RTS,S/AS01 dose that significantly associated with clinical malaria risk. Moreover, if a more stringent FDR cutoff had been used (i.e. 5%), six of these seven BTMs would not have been identified. Thus it is entirely possible that, at one month post-final RTS,S/AS01 dose, there is no circulating immune transcriptomic signature predictive of risk…”

      Discussion: “Finally, while it is not uncommon to use an FDR cutoff of 20% in high-dimensional immune correlates studies [e.g. (65-70)], our results should be interpreted with the requisite level of caution. However, we do note that many of our significant modules in the baseline risk analysis would have survived even lower FDR cutoffs (in many cases even a 1% cutoff), giving us a fair degree of confidence in our results. For example, of the seven monocyte-related BTMs whose baseline levels associated with risk, all would have survived a 5% FDR cut-off, and three even a 1% cut-off; likewise, of the four dendritic cell-related BTMs whose baseline levels associated with risk, all would have survived a 5% FDR cut-off, and three even a 1% cut-off.”

      Moreover, we have revised Figures 2, 3, and 6 so that it is easy to discern whether a specific BTM correlation would also pass more stringent FDR cutoffs, through the addition of 1, 2, or 3 asterisks where appropriate: “|FDR| < 0.2 (), < 0.05 (), < 0.01 ().” Note that, most central to the key message of the paper, many of the monocyte-related, DC-related, and cell cycle-related BTMs would have passed more stringent FDR cutoffs, with many even passing a 1% FDR cutoff (as discussed above).

      4) A perplexing finding, which is not addressed in detail, is the large number of BTMs which differ between RTS,S and comparator vaccine groups after DMSO stimulation of PBMCs, but these are not seen when PBMCs are stimulated with parasite antigens in DMSO (and a similar finding for month 3 vs month 0 samples from RTS,S recipients). This raises some concern about the stimulation experiments, because one might expect that the DMSO vehicle in the antigen preparations would trigger a similar response to DMSO alone.

      It should be noted that in all our analyses, the stimulated results were adjusted for DMSO to focus on the antigen-specific response only. This would explain why we detect signal in the DMSO samples but not in response to stimulation. We have realized that this was not very well described in the figure captions and the Methods section and have added more details, including the model description in Methods section. As such, we do not believe that these results impact all downstream conclusions. We believe that the unstimulated results provide significant new insights into the immune and molecular mechanisms of RTS,S vaccine efficacy, not necessarily directly related to the RTS,S-specific acquired immune response. We would also like to highlight the fact that we have improved our model specification to directly account for the pairing of some of the samples using a random effect using the limma package. This has slightly increased statistical power, and as such the number of significantly differentially expressed BTMs in response to stimulation is a bit higher (but still much less than that for the DMSO). Originally, we had decided against the use of a random effect due to the computational cost of estimating the random effect.

      The fact that bulk transcriptional profiling of Ag-stimulated PBMCs did not identify almost any significant differences in BTM expression between the RTS,S vs. comparator group could be due to several factors. First of all, the frequency of antigen-specific CD4+ T cells was very low among CD4+ T cells (Figure 4 of the manuscript shows that CSP-specific CD4+ T cells comprise < 0.004% of all CD4+ T cells). This low frequency of CSP-specific T cells is consistent with other RTS,S studies [e.g. as we state on line 330, we have previously found that CSP-specific T-cells in RTS,S/AS01 vaccinees comprise < 0.10% of all CD4+ T cells (1)].

      Moreover, CD4+ T cells themselves comprise approximately 45-57% of all PBMCs (2). Thus, finding an expression signal between the RTS,S vs. comparator group would require the signal to be high enough to be detected in only 0.002% of all PBMCs [0.004% (% CSP-specific CD4+ T cells out of total CD4+ T cells) x 51% (average % of CD4+ T cells out of all PBMCs) = 0.002%]. Thus, lack of detectable recall response does not mean lack of recall response. Moreover, as suggested below, we opted not to focus the rest of the manuscript on the Ag-stimulation results.

      Second of all, the PBMCs were stimulated on site for 12h and then cryopreserved. This stimulation time was chosen based on the kinetics of IFN-g and IL-2 mRNA response (3), but other responses may have had different kinetics and thus have already resolved or have not yet occurred by the 12-h cryopreservation. We have added text in the manuscript to discuss these caveats (“Another potential reason for why no BTMs were found to associate with the response to RTS,S/AS01 vaccination or with protection when analyzing CSP-stimulated PBMC is that all PBMC were stimulated on site for 12 hours (this stimulation time was chosen based on the kinetics of the IFN-g transcriptional response) and then cryopreserved. Thus, we were unable to detect earlier transient responses that had already resolved by 12 hours, as well as more delayed response that had not yet initiated by 12 hours, if such responses occurred.”.

      The authors partly achieved their aims. They identified BTMs differentially expressed between RTS,S/AS01 and the comparator vaccines, and between baseline and month 3 in RTS,S/AS01 recipients. They also identified BTMs at month 3 associated with developing malaria, and BTMs at baseline associated with developing malaria. These latter BTMs were partly replicated in the CHMI study subjects. Higher expression of BTMs associated with monocytes and dendritic cells were most consistently identified across the different analyses and their expression in stimulated baseline samples was most consistently associated with development of clinical malaria in RTS,S/AS01 recipients. However there were inconsistencies in associations between some of the studies, and it is possible that the "consistent" monocyte and dendritic cell BTMs would not be so consistent if a more stringent FDR threshold was used. However the authors conclusions are largely quite measured and for the most part they do not over-interpret the significance of their findings.

      We have added the following to the Discussion: “Finally, while it is not uncommon to use an FDR cutoff of 20% in high-dimensional immune correlates studies [e.g. (65-70)], our results should be interpreted with the requisite level of caution. However, we do note that many of our significant modules in the baseline risk analysis would have survived even lower FDR cutoffs (in many cases even a 1% cutoff), giving us a fair degree of confidence in our results. For example, of the seven monocyte-related BTMs whose baseline levels associated with risk, all would have survived a 5% FDR cut-off, and three even a 1% cut-off; likewise, of the four dendritic cell-related BTMs whose baseline levels associated with risk, all would have survived a 5% FDR cut-off, and three even a 1% cut-off.”

      Overall the work provides some evidence that baseline immunological status, particularly related to monocyte and dendritic cell responses and possibly their role in or response to baseline inflammation, may be a determinant of how well the RTS,S vaccine works to prevent malaria. This provides a basis for further work to optimise the effectiveness of the vaccine. The usefulness of PBMC stimulation to predict an individual's response to vaccination will be limited because this is not a method which can be used at scale in resource limited settings, but the concept that vaccine response could be enhanced by modifying pre-vaccine immunological or inflammatory status is potentially important. The data published with this study will be a valuable resource and will undoubtedly be used by others to address similar questions. Increasing the efficacy of malaria vaccines remains an extremely important goal, and identifying possible mechanisms which restrict the efficacy of RTS,S is important.

      References:

      1. Moncunill G, De Rosa SC, Ayestaran A, Nhabomba AJ, Mpina M, Cohen KW, Jairoce C, Rutishauser T, Campo JJ, Harezlak J, Sanz H, Diez-Padrisa N, Williams NA, Morris D, Aponte JJ, Valim C, Daubenberger C, Dobano C, McElrath MJ. RTS,S/AS01E Malaria Vaccine Induces Memory and Polyfunctional T Cell Responses in a Pediatric African Phase III Trial. Front Immunol. 2017;8:1008.
      2. Kleiveland CR. Peripheral Blood Mononuclear Cells. In: Verhoeckx K, Cotter P, López-Expósito I, Kleiveland C, Lea T, Mackie A, et al., editors. The Impact of Food Bioactives on Health: in vitro and ex vivo models. Cham: Springer International Publishing; 2015. p. 161-7.
      3. Schultz-Thater E, Frey DM, Margelli D, Raafat N, Feder-Mengus C, Spagnoli GC, Zajac P. Whole blood assessment of antigen specific cellular immune response by real time quantitative PCR: a versatile monitoring and discovery tool. J Transl Med. 2008;6:58.
      4. Kazmin D, Nakaya HI, Lee EK, Johnson MJ, van der Most R, van den Berg RA, Ballou WR, Jongert E, Wille-Reece U, Ockenhouse C, Aderem A, Zak DE, Sadoff J, Hendriks J, Wrammert J, Ahmed R, Pulendran B. Systems analysis of protective immune responses to RTS,S malaria vaccination in humans. Proc Natl Acad Sci U S A. 2017;114(9):2425-30.
      5. Liu C, Martins AJ, Lau WW, Rachmaninoff N, Chen J, Imberti L, Mostaghimi D, Fink DL, Burbelo PD, Dobbs K, Delmonte OM, Bansal N, Failla L, Sottini A, Quiros-Roldan E, Han KL, Sellers BA, Cheung F, Sparks R, Chun TW, Moir S, Lionakis MS, Consortium NC, Clinicians C, Rossi C, Su HC, Kuhns DB, Cohen JI, Notarangelo LD, Tsang JS. Time-resolved systems immunology reveals a late juncture linked to fatal COVID-19. Cell. 2021;184(7):1836-57 e22.
      6. Andersen-Nissen E, Fiore-Gartland A, Ballweber Fleming L, Carpp LN, Naidoo AF, Harper MS, Voillet V, Grunenberg N, Laher F, Innes C, Bekker LG, Kublin JG, Huang Y, Ferrari G, Tomaras GD, Gray G, Gilbert PB, McElrath MJ. Innate immune signatures to a partially-efficacious HIV vaccine predict correlates of HIV-1 infection risk. PLoS Pathog. 2021;17(3):e1009363.
      7. Lu P, Guerin DJ, Lin S, Chaudhury S, Ackerman ME, Bolton DL, Wallqvist A. Immunoprofiling Correlates of Protection Against SHIV Infection in Adjuvanted HIV-1 Pox-Protein Vaccinated Rhesus Macaques. Front Immunol. 2021;12:625030.
      8. Haynes BF, Gilbert PB, McElrath MJ, Zolla-Pazner S, Tomaras GD, Alam SM, Evans DT, Montefiori DC, Karnasuta C, Sutthent R, Liao HX, DeVico AL, Lewis GK, Williams C, Pinter A, Fong Y, Janes H, DeCamp A, Huang Y, Rao M, Billings E, Karasavvas N, Robb ML, Ngauy V, de Souza MS, Paris R, Ferrari G, Bailer RT, Soderberg KA, Andrews C, Berman PW, Frahm N, De Rosa SC, Alpert MD, Yates NL, Shen X, Koup RA, Pitisuttithum P, Kaewkungwal J, Nitayaphan S, Rerks-Ngarm S, Michael NL, Kim JH. Immune-correlates analysis of an HIV-1 vaccine efficacy trial. N Engl J Med. 2012;366(14):1275-86.
      9. Fletcher HA, Snowden MA, Landry B, Rida W, Satti I, Harris SA, Matsumiya M, Tanner R, O'Shea MK, Dheenadhayalan V, Bogardus L, Stockdale L, Marsay L, Chomka A, Harrington-Kandt R, Manjaly-Thomas ZR, Naranbhai V, Stylianou E, Darboe F, Penn-Nicholson A, Nemes E, Hatherill M, Hussey G, Mahomed H, Tameris M, McClain JB, Evans TG, Hanekom WA, Scriba TJ, McShane H. T-cell activation is an immune correlate of risk in BCG vaccinated infants. Nat Commun. 2016;7:11290.
      10. Young WC, Carpp LN, Chaudhury S, Regules JA, Bergmann-Leitner ES, Ockenhouse C, Wille-Reece U, deCamp AC, Hughes E, Mahoney C, Pallikkuth S, Pahwa S, Dennison SM, Mudrak SV, Alam SM, Seaton KE, Spreng RL, Fallon J, Michell A, Ulloa-Montoya F, Coccia M, Jongert E, Alter G, Tomaras GD, Gottardo R. Comprehensive Data Integration Approach to Assess Immune Responses and Correlates of RTS,S/AS01-Mediated Protection From Malaria Infection in Controlled Human Malaria Infection Trials. Front Big Data. 2021;4:672460.
    1. Author Response:

      Reviewer #1:

      In this study, the authors seek to understand the target and mechanism of action of two structurally related orally available antibiotic drug candidates active against Neisseria gonorrhoeae (Ng). The experimental approach involves a detailed investigation of drug efficacy in bacterial culture experiments and a mouse model for gonorrhea infections, along with biochemical experiments to identify the drug target. The latter experiments include discovery of resistance-inducing mutations in class Ia ribonucleotide reductase (RNR), in vitro validation of the ability of the Ng inhibitors to diminish enzyme activity, and structural studies to evaluate the effects of the compounds on Ng RNR structure. The work succeeds in providing convincing evidence for inhibition of the RNR but it does not fully explain how the drug candidates bind to the enzyme. Although the findings represent an important advance that could motivate other work in exploiting bacterial RNRs for antibiotic drug development, conclusions about the mechanism of action could be better supported by more thorough understanding of inhibitor-enzyme interaction. This insight would be important for improving drug design and broad expansion of the approach to other pathogens. Additionally, the inhibitors are billed as narrow-spectrum antibiotic candidates, but these claims are based on analysis of a small and specialized group of bacteria that are not likely to contain or exclusively rely on a class Ia RNR. It is not clear from this study if the inhibitors could affect growth of commensal organisms that contain aerobic RNRs.

      Table S7 has been included to highlight the known forms of RNRs present in various organisms. The selective inhibition of Neisseria gonorrhoeae and N. meningitidis is corroborated by the presence of only a single RNR, type Ia. Other aerobic Gram-negative organisms, including E. coli and K. pneumoniae, have type Ia, Ib and III, and are not inhibited by the PTC compounds. The presence or multiple forms of RNR types may be the explanation for other organisms lacking sensitivity to the PTC compounds, but the data are not conclusive. Obtaining conclusive data will require additional experimentation with engineered knock-out isogenic strains to fully elucidate why other organisms are not inhibited.

      Reviewer #2:

      This study by Narasimhan et al. describes the identification of ribonucleotide reductase (RNR), a critical enzyme in all organisms, as a new target for treatment of antibiotic-resistant gonorrhea, via a novel mechanism for RNR inhibition. The authors begin with the identification of two inhibitors that selectively target Neisseria gonorrhoeae, including multidrug resistant strains, over other pathogens and microbiota. They then show that these inhibitors target the synthesis of DNA, but not by the mechanism of other members of this class of compounds; instead, isolation of resistant mutants indicates that the class Ia ribonucleotide reductase of this organism is the target of the molecules. These results are supported by in vitro activity assays of the RNR, along with electron microscopy characterization of the RNR, showing that the resistance mutations disrupt the protein's ability to form a ring-like inhibited state, implying that the mechanism of action of the compounds involves that ring-like state. While other compounds have been shown to induce formation of the inhibited state of the human RNR, the mechanism of inhibition of the gonorrheal RNR evidenced here is distinct. Finally, the authors present data from a mouse infection model showing efficacy of the compounds. The comprehensive nature of this study, from small molecule to in vitro analysis to in vivo efficacy, is compelling, and the results are of interest both from an enzymological perspective and from the development of new strategies to combat important pathogens. There are two issues that I believe the authors should address to support two important aspects of their work, the mechanisms of inhibition and resistance:

      1) The major question that sticks in my mind after reading this manuscript is the mechanism by which the molecules inhibit the RNR. They act as potent inhibitors in vitro, and the identification of the resistance mutations, H25R and S41L, which interfere with the ability of the RNR to form the inactive a4b4 form, are strong pieces of evidence in favor of the authors' proposal that the inhibitors "potentiate conversion of its active a2b2 state to an inactive a4b4." The clincher of this argument would be EM evidence that the presence of the inhibitors leads to a4b4 formation, just as the H25R and S41L variants do, or (perhaps simpler) size exclusion chromatography or direct evidence from another analytical method pointing to formation of the a4b4 species.

      Relatedly, it would also be helpful if the dependency of inhibition on dATP would be clarified. Figure S6 suggests that dATP is not required for this state, but on p. 14, line 12, the authors write "the dependency on a dATP-induced inactive a4b4 state also explains why the Ng and Ec Ia RNRs are both sensitive to these inhibitors…" and on p. 20, line 27, it is also implied that dATP could be involved. Please clarify this point.

      We have carried out additional experiments to probe the mechanism of inhibition of PTC compounds, and our results suggest that there is likely to be more than one mode of enzyme inhibition. One mode of inhibition does appear to be related to dATP-inhibition and thus α4β4 ring formation. In particular, we now present data that show that PTC-672 and PTC-846 potentiate the inhibitory effects of dATP (Figure 4). However, the PTC compounds themselves are not dATP mimics; they do not appear to be able to substantially increase the amount of α4β4 ring formation in vitro. This finding is based on new mass photometry data that we now present (Supplemental data Figure S8, S9). We also now present data that show that PTC-672 can inhibit (to some degree) variants of Ng RNR that can’t form rings (H25L and S41L) (Figure S7), providing evidence of a second mode of inhibition. Collectively, these results indicate that the mode(s) of inhibition of PTC compounds are complex. We don’t know why E. coli is not less effected than Ng by these compounds. It could be that the class Ia RNR in E. coli is equally inhibited but that the presence of multiple RNRs in E. coli is protective. It is possible that the synergistic effect of the PTC compounds with dATP is less dramatic for the E. coli enzyme or that the second (unknown) mode of inhibition is not in play in E. coli. Much more work will need to be done to answer this question and that work is out of the scope of this paper. We are pleased, however, to present several new pieces of data, including the mass photometry results, to fill out this story.

      2) The observation that the resistance mutant strains have lower fitness is an interesting and important one. I suggest that the authors determine whether this decreased fitness might be the result of the mutations in the RNR leading to lower activity - the authors should give the activities of the H25R and S41L with the normal substrate, vs. wild-type alpha. If the activities are similar to wild-type, perhaps the authors could suggest another potential explanation. One that seems possible to me is that the loss of dATP inhibition (see Fig S5) might lead to loss of fitness via misregulation of (deoxy)nucleotide pools.

      H25R and S41L Ng RNR variants are impaired in their ability to be down-regulated by dATP but are not otherwise impaired. Our hypothesis is as the reviewer suggests that loss of fitness is due to the inability to down-regulate RNR, which misregulates nucleotide pools. This phenomenon was observed for E. coli RNR class Ia. In the case of E. coli class Ia RNR, S39F and E42K variants were shown only to be impaired in their ability to be down-regulated by dATP, and yet, these mutations were linked to a mutator phenotype (see Chen 2018 JBC 293, 10404 and Ahluwalia 2012 DNA Repair 11, 480). Not being able to turn off an RNR is problematic for the cell.

    1. Author Response:

      We thank the reviewers and editor for the feedback on our manuscript. We now included high-magnification images of somite-like structures, which clearly show the shape of the somitic cells and the polarity of these epithelial cells by staining for multiple polarity markers (new Figure 3-Supplemental Fig 1; new Figure 3-Video 1). Consistent with the morphology and polarity of in vivo somites, we observe PAX3+/TCF15+ bottle-shaped cells radially arranged around a central cavity, which form rosette-like structures that are approximately the same size as the Carnegie stage 11 in vivo somites (Fig 5B). This can be observed in multiple new images added to the manuscript (Fig 3-Supplemental Fig 1A-A’’, Fig 3-Supplemental Fig 1B-B’’, Fig 2-Supplemental Fig 3A,B). Additionally, apical surface markers F-ACTIN (Fig 5A) and N-CADHERIN (Fig 3-Supplemental Fig 1) are both expressed around the central cavity, suggesting that the apical side of the somitic cells is facing the inside of the somite structure, again consistent with their in vivo counterparts. This is particularly evident in the newly added high-magnification images (Figure 3-Supplemental Fig 1) and accompanying movie showing a full confocal z-stack through the in vitro somites (Figure 3-Video 1). Together with our protein (Fig 3B,C and Fig 5A) and gene expression data (both in bulk (Fig 1C, Fig 5C, Fig 1-Supplemental Fig 1B) and at the single-cell level (Fig 4, Fig 4-Supplemental Fig 1-5)), and directed differentiation experiments of Somitoid-derived cells towards sclerotome (Fig 5C) and dermomyotome (newly added Figure 5-Supplemental Fig 1), we conclude that our in vitro somites are molecularly, morphologically, and functionally equivalent to in vivo somites.

      Reviewer #1:

      The foundational differentiation protocol up until day3 (formation of PSM) has been published previously in Diaz-Cuadros et al., 2020; Matsuda et al., 2020. The main difference between this manuscript and published protocol being the 2D (published) vs 3D differentiation. In this manuscript the authors were able to generate Pax3+ Somites (day4-5) from PSM. Both Diaz-Cuadros et al., 2020; Matsuda et al., 2020 were unable to generate Pax3+ somite in their 2D culture system but instead could only obtain a TCF15+ somatic mesoderm intermediate state. Moreover, the somites obtained in this manuscript could be further differentiated to sclerotome.

      The experiments across the paper were validated using three repeats using appropriate quantitative microscopy. Imaging data are high quality and mostly presented in a clear manner. However, it is unclear exactly what the authors are scoring as a somite. Moreover, for each figure it is not clear whether technical or biological replicates are presented. Similarly, the heatmap block presented in Fig2C,D, 3C,D apparently represents just one organoid/replicate. Authors should comment on the efficiency of the protocol over the different cell lines used. Transcriptome data presented strongly support the reproducibility and accuracy of the 3D differentiation, although comparison with the in vivo situation is highly limited - in part due to lack of availability of human in vivo data at these developmental stages.

      We thank the reviewer for the insightful comments and questions. Following the suggestion of the reviewer, we generated new organoids with somite structures and imaged them at a higher magnification (shown in the newly added Figure 3-Supplement Figure 1). The higher magnification clearly shows the ‘bottle-shaped’ morphology of the cells that comprise the somites in that the apical surface of these columnar cells is typically smaller than their basal side. These higher-magnification images also more clearly show the rosette structure (radial arrangement of columnar cells) with a central cavity in which NCAD is highly expressed. Nuclear expression of PAX3 is also clearly visible in these cells. These empirical observations were the criteria used to manually identify somites in the images acquired from the organoid screen. Additionally, we generated movies of z-stacks acquired on a confocal microscope showing examples of organoids with and without somite-like structures based on our scoring criteria using PAX3 and NCAD staining (Newly added Figure 3-Video 1).

      Regarding the number of replicates used and the questions related to variability/efficiency of our optimized protocol, we have made the following changes and performed new experiments to address these questions. To clarify which cell lines were used for each figure, we have added the information to each figure caption in the manuscript. We have also specified the number of organoids used to quantify variation for each experiment in the figure captions. As described below, we have conducted new experiments to further quantify the technical variability across experiments, as well as quantify the efficacy of our optimized differentiation protocol when applied to genetically independent cell lines.

      For our initial screens (Figures 2 and 3), we only used one cell line (NCRM1 hiPSCs). As mentioned by the reviewer, we measured variability across individual organoids for each condition of the screen to help identify the conditions that minimized variability and produced the most reproducible organoids (Figure 2-Supplemental Fig 2B; Figure 3-Supplemental Fig 1B). To analyze technical variability of our optimized protocol, we have repeated our optimized protocol two more times and quantified the number of somites across 10 organoids for each experiment (see new Figure 3-Supplemental Fig 3). Inter-organoid variability was comparable to our initial results from the secondary screen (CV for Experiment 1 = 17% and CV for Experiment 2 = 9.8%; see also Fig 3-Supplemental Fig 2B). Furthermore, mean and median are very comparable between the two additional experiments (Experiment 1: 39+/-8 (mean+/-std), median=40; Experiment 2: 43+/-4 (mean+/-std), median=41, p-val = 0.16). Please note that the absolute number of somites in our new experiments has increased compared to our initial screen (see Figure 3). This is a result of improvements in both our immunostaining protocol as well as the image acquisition workflow. The two new experiments both used the same improved staining and image acquisition workflow and are therefore comparable with each other.

      To extend our variability analysis to other cell lines, we tested the following cell lines: our original cell line (NCRM1), the WTC cell line released by the Conklin Laboratory at the Gladstone Institute, and a reporter cell line, ACTB-GFP, from the Allen Cell Collection. We tested our optimized protocol alongside another high scoring condition in all three cell lines:

      • CL+FGF2 for 24h, basal media for 24h (our optimized protocol)
      • WNTihi(C59, 2 µM) for 48h

      Applying our optimized protocol to two other cell lines confirmed that our protocol is reproducible across different genetic backgrounds/cell lines. NCRM1, average somite number = 43+/-4 (mean+/-std); ACTB-GFP, average somite number = 40+/-6; WTC, average somite number = 33+/-4. We also tested one additional top scoring condition (C59, 2µM for 48 hours) in all three cell lines. Notably, for this condition, the ACTB-GFP derived Somitoids (32+/-4 (mean+/-std)) showed a higher average number of somites compared with the Somitoids derived from the other cell lines (NCRM1, 20+/-3; WTC, 17+/-4). However, our optimized protocol resulted in the highest number of somites across all cell lines.

      We agree with the reviewer that our transcriptome analysis strongly supports the reproducibility and accuracy of our optimized 3D differentiation protocol. We unfortunately were not able to obtain human in vivo data at the relevant developmental stages to make a direct comparison. However, we did extend our analysis to compare our single-cell RNA-seq dataset with the previously published transcriptome data from 2D paraxial mesoderm differentiation protocols (Diaz-Cuadros et al., 2020, Matsuda et al., 2020), which we have included in our updated Discussion section.

      We compared the single-cell RNA-seq data from Diaz-Cuadros et al. with our own single-cell RNA-seq data. The 2D differentiated cells in Diaz-Cuadros et al. at the final time point of their experiment do not show a clear somitic cell state signature. PAX3, MEOX1, MEOX2, FOXC2, UNCX, and TBX18 are not expressed (compared for example to FOXC1). FOXC1 does not appear to be a somite-specific marker as it is expressed in early and late PSM-like cells as seen in our own data, starting on day 2 (Figure 4-Supplemental Fig 3). In the Diaz-Cuadros et al. dataset, TCF15 is not expressed uniformly in all cells nor specifically in the late-stage cells. Conversely, TCF15 is specifically expressed in day 5 somitic cells in our dataset both in a uniform manner and at high levels (as shown in the figure below and Figure 4C).

      We also analyzed the bulk RNA-seq data in Matsuda et al. 2020 as shown in Figure 1B of their paper. They show the expression of 4 somite markers (TCF15, MEOX1, PAX3, and RIPPLY1). As can be seen in the figure below (and Figure 4-Supplemental Fig 3), all markers highlighted in their paper are specifically expressed in our day 5 somitic cell population, including RIPPLY1 (see also updated Figure 4-Supplemental Fig 3). Beyond comparison of marker gene expression, it is difficult to assess the similarities and differences with the Matsuda dataset since their data lacks single-cell resolution. Thus, heterogeneity and efficiency of somitic fate induction within their cell population is unclear. Finally, neither of these two papers report the formation of somite-like structures.

      Reviewer #2:

      This study is interesting in the sense that it brings us one step closer to the formation of complex structures (such as somites) from human iPSCs. Markers that are typical of somites are indeed present at the end of the (rather complex) culture protocol. There is also quite a lot of work involved and the illustrations are of good quality.

      However, the spatial organization that is typical of somites is lacking in a number of important ways. Early somites in amniotes are epithelial (in fact it is a pseudo-stratified epithelium made of bottle shaped cells), apical facing the somitocoele (a cavity filled with a loose mesenchyme) and basal to the outside. Quite similar to the organization of the neural tube. This organization is initiated in the anterior last third of the PSM and amplified concomitant to segmentation. It would be important to show that somitoids display such structure. It does not seem that there is a central cavity. It is also unclear from the picture whether somitoid cells are bottle-shaped. Importantly, one sees (Figure 5a) that F-actin (which labels the apical side of epithelial cells) is facing the outside of the somitoid, and not the inside as it should. In this condition, the term "somitoid" seems quite inaccurate in comparison to other organoid systems that faithfully reproduce their in vivo counterparts, not only the "classical" intestinal crypts but also the more recently published neural tube organoids. Aggregates of somite-like cells may be more accurate.

      We are glad the reviewer found our study interesting and a step towards formation of complex structures in vitro. We believe that the reviewer has misunderstood the structure of the somites that we observe in vitro. We now include high magnification images that more clearly show the shape of the cells and the locations at which the epithelial polarity markers are expressed (new Figure 3-Supplemental Fig 1; new Figure 3-Video 1). Consistent with in vivo somites, we do observe bottle-shaped cells radially arranged around a central cavity forming rosette-like structures that are the same size as their in vivo counterparts (Fig 5B). This can be observed in multiple images shown in our manuscript (Fig 3-Supplemental Fig 1A-A’’, Fig 3-Supplemental Fig 1B-B’’, Fig 2-Supplemental Fig 3A,B). Importantly, both F-ACTIN (Fig 5A) and N-CADHERIN (Fig 3-Supplemental Fig 1) are indeed expressed around the central cavity, suggesting that the apical side of the PAX3+/TCF15+ somitic cells is facing the inside of the somite structures. This is especially evident in the newly added high-magnification images (Figure 3-Supplemental Fig 1) and accompanying movie showing a full confocal z-stack through the in vitro somites (Figure 3-Video 1). Taken together with our protein (Fig 3B,C and Fig 5A) and gene expression data (both in bulk (Fig 1C, Fig 5C, Fig 1-Supplemental Fig 1B) and at the single-cell level (Fig 4, Fig 4-Supplemental Fig 1-5) and our directed differentiation experiments of our Somitoid cells to dermomyotome (Fig 5-Supplemental Fig 1) and sclerotome fates (Fig 5C), we believe that the somite structures generated by our optimized in vitro protocol are indeed equivalent to their in vivo counterparts.

    1. Author Response:

      Reviewer #1 (Public Review):

      Previous studies have provided crystallographic snapshots of the autokinase domains of several sensor histidine kinases (HK) involved in signal transduction in bacteria. Nevertheless, the lack of a full-length structure of these HK hampered the understanding of the molecular mechanism of signaling. Moreover, how a stimuli perceived by the membrane-bound sensor domain is transmitted to the catalytic cytoplasmic domain of an HK, to modulate its activity is poorly understood. To probe the coupling between the sensor and autokinase domains Mensa et al. used cysteine cross linking and reporter gene assay to probe the signaling state of E. coli PhoQ in a set of several point mutations. Using these data they developed a 3-domain model in which the sensor, HAMP and catalytic domain are in allosteric communication to interconvert the kinase state in an "on" or "off" conformation. The authors conclude that signals transmit to the catalytic domain through intradomain allosteric transitions, rather than through a concerted conformational change.

      This work represents an important and novel attempt to understand the mechanism of signal transduction by sensor kinases in two-component systems. This contribution challenges the concept that signal transmit via propagation of single concerted conformational changes of the sensor kinase. The authors, instead, propose that signal is transmitted by the sensor via an interdomain allosteric mechanism.

      The way in that the paper is presented appears to be directed to enzymologist working in enzyme kinetics models, rather than to a wide audience. For example, the paper starts saying that "Fully cooperative two-state models are unable to explain the gamut of activities of mutants". This affirmation seems too abrupt without defining what kind of model they are talking about. Is a molecular model, a kinetic model or a thermodynamic model? They should explain these concepts before to show the results. After we start to read it seems clear that they propose a thermodynamic model to explain the coupling of the different domains.

      We have consolidated writing on modeling to the latter half of the manuscript. The introduction to this section now clearly states that we are exploring several thermodynamic allosteric coupling models.

      For an enzymologist should be quite worrying to interpret data of activity assays with gene reporters without knowing the answers to the following: Do the mutations affect the PhoQ protein levels in cells? How accurate are the Western blots to quantify dimer formation in Y60C and establish the kinase "on" or kinase "off" states of PhoQ? The error bar of the of crosslinking experiments, shown in the different Figs, seems quite small for a Western blot quantification. Nevertheless, in Figure 8-figure supplement 2 panel, in the mutant I221F is obtained a poor fit which is not taken into account. Is it because the error is dismissed? Same for panels A and D. Is missing something the proposed model?

      We have included Figure 2-figure supplement 1, which gives examples of crosslinking western blots and quantification (also “ Figure 2-figure supplement 1 - Source data 1”). These western blots also allow for the evaluation of total protein expression. In our model fitting, each individual replicate experiment was treated separately to give data with more replicates increased statistical weight as discussed in Results and Methods. The I221F mutation was measured in a single experiment. We also have conducted the assays under conditions where the concentration is approximately linear with respect to receptor occupancy as discussed previously by Goulian et al (ref 41).

      Reviewer #2 (Public Review):

      This manuscript characterized the effects of 35 mutational substitutions in three domains the bacterial transmembrane two-component sensor protein for Mg2+, PhoQ, on the signaling state of the periplasmic sensor domain and the cytoplasmic histidine kinase domain. Signaling state was assayed by a diagnostic cysteine cross-link for the sensor domain and the expression of a coupled beta-galactosidase reporter for the kinase domain. The results of those characterizations were used to develop an allosteric coupling model of conformational signaling from sensor domain to kinase domain, with a key role played by the HAMP domain that connects sensor to kinase. Single-site mutational substitutions were at positions expected to be in the interior of the protein structure in the periplasmic, HAMP and the S-helix regions of the protein as well as at boundaries of transmembrane segments. In addition, the connections between the second transmembrane helix and the HAMP domain, and between the HAMP domain and the S-helix were disrupted by introduction of a sequence of seven glycines. Each mutant protein was assayed for the signaling states of the sensor and kinase domains at five different concentrations of Mg2+. Some of the resulting dose-response curves showed patterns much like that for the wild-type receptor in which the signaling state of the sensor and kinase domain were correlated. However, a majority of the curves exhibited a variety of altered relationships between patterns for the two domains. Importantly, the effects of the glycine insertions before and after the HAMP domain indicated that this domain reduced the native "on" signaling state of both the sensor and kinase domain to be less extreme and thus in a more balanced state between on and off. Examination of the effects of similar glycine substitutions in two related two-component sensor kinases showed a similar negative influence of HAMP domain coupling on kinase domain signaling state. A global fitting using the allosteric coupling model between the three domains was performed for all 35 pairs of dose-response curves for PhoQ, allowing variation of one or a few individual parameters relevant to the position of the particular mutational substitution. An important validation of the resulting global parameters was a reasonable fit of the wild-type dose-response curves. The global parameters fit the experimental data for most but not all mutant receptors. Overall, the allosteric coupling model performed well, providing support for its validity.

      Thus, this work provides support for concept of intra-receptor signaling via allosteric coupling between independent domains that each have their own intrinsic equilibrium between the "on" and "off" state. This allosteric coupling model introduces a third way of thinking about how ligand occupancy of a transmembrane receptor site facing the cell's exterior generates altered activity of a cytoplasmic domain inside the cell. Instead of considering that ligand binding "sends a signal" by sequential conformational changes that travel through the receptor structure or that ligand binding shifts a conformational equilibrium of the entire receptor in a concerted manner, the allosteric model suggests that signaling occurs by allosteric coupling between relatively independent domains of a multi-domain receptor protein. This constitutes an important contribution to our concepts of receptor conformational signaling.

      However, the impact of this contribution is likely to be less than it could be because of the way the manuscript is written. Specifically, the devotion of a majority of the Results section to consideration of models for signaling obscures the most compelling parts of the work, the experimental observations of the striking effects of mutational substitutions throughout PhoQ on the signaling state of the sensor and kinase domains and the explanation of those disparate effects by the allosteric coupling model of conformational signaling. For many experimental scientists interested in mechanisms of signaling, this work would be much more accessible if the experimental results were presented first, the allosteric coupling model was introduced as a way to explain the results, and much of the consideration of other models and the development and details of the allosteric model were shifted to the Materials and Methods or provided as part of supplementary materials.

      We thank the reviewer for this useful criticism. We have made several changes to bring forward and emphasize the experimental observations in our data. 1) We have moved concerted signaling model from Figure 2 (formerly Figure 2B, 2D) to Figure 5. New Figure 2 now contains an expanded set of experimentally generated data only. 2) We have supplemented Figure 2C with additional representatives of our diverse experimentally generated functional data. 3) We have added Figure 2-figure supplement 1 with examples of western blots and crosslinking quantifications. 4) We have moved the former Figure 4 forward to Figure 3. 5) We have moved the former Figure 5 forward to Figure 4. 6) We have added Figure 7-figure supplement 1 to further highlight and discuss rationale for choice of point mutations and Gly7 insertions. 7) We have made several changes in the Results section that mirror the above changes.

      We have also made several changes to consolidate the discussion of the modeling work. 1) The concerted signaling model from the former Figure 2 (Figure 2B, 2D) and the 2-domain signaling model from former Figure 3 have been moved to a new Figure 5. 2) Former Figure 3D has been moved to Figure 5 figure supplement 1. 3) Population fraction equations in new Figure 5 (formerly Figure 2B, 2D, 3A) have been moved to Materials and Methods 4) Text discussing alternate allosteric models has been consolidated into one section.

      Reviewer #3 (Public Review):

      This manuscript describes a comprehensive study of kinase activation and allosteric coupling in the sensor histidine kinase (SHKs) PhoQ. Quantitative assays for sensor domain activation and kinase response are used to evaluate a large number of variant proteins that display a range of properties with respect to ligand binding, interdomain coupling and kinase activity. The data is used to construct and fit a conceptually elegant model that provides a thermodynamic explanation for domain interactions, allostery and sensing responses in SHKs. The experiments also demonstrate that sensor kinase domains intrinsically favor their "on" states and that HAMP domains act to deactivate both the sensor and the kinase units. In all it is a very impressive study that sets the bar for enzymatic approaches aimed at understanding signaling by multidomain transmembrane kinases. Generality of key principles are explored by examining several SHKs related to PhoQ. The paper is well written and the complex data and their interpretation are for the most part clearly discussed. That said, there are some issues the authors should address:

      The model applied for Mg binding should be described to a greater extent. The equations of Figs. 1, 2,3,6,7 represent a situation more complex than the accompanying schematics portray. Even the simplest equation of 1B implies sequential binding of 2 Mg ions to one PhoQ dimer (presumably 1 site per subunit). Furthermore, the binding sites are assumed to be independent and, importantly, there are no intermediate states in the model in which one subunit is "on" (in either its sensor or kinase domain) and the other is "off". Is it known experimentally that the two subunits act independently and what is the consequence of not allowing for hybrid activation states within the dimer?

      We discuss how we handle Mg2+ binding and has been elaborated based on this feedback. Given the data, we are unable to distinguish between a cooperative 2-state model versus a sequential binding model. We have 3 options: 1) 1-Mg2+/dimer creates an asymmetric signal state. This is well precedented in the review article cited (ref 17). 2) Binding occurs at both subunits in independent, unlinked events, or 3) binding occurs with negative cooperativity (first site higher affinity) or negative cooperativity (second site higher affinity). These alternatives differ only subtly in the steepness of transition from low to high signaling states. Unfortunately, our data are not sufficiently precise to distinguish between these options.

      In addition, there may be a factor of 2 missing in treating the relative dissociation constants for Mg binding to an empty PhoQ or to a singly Mg-occupied PhoQ. Because the multiplicity changes by a factor of 2 in going from both the empty to the half-occupied state and again by 2 in going from the half-occupied to the fully occupied states, the effective Kd for binding to the singly occupied state is 4x larger than for binding to the empty state. It appears that all of the models accommodate only a factor of 2. This issue affects the (1 + [Mg]/Kd)2 term, likely to a minor extent.

      Due to the difficulties in explicitly handling ligand binding in PhoQ as discussed above and in text, we report an overall ‘observed’ Kd for Mg2+. This observed Kd represents the true Kd for Mg2+ binding is implicitly scaled by the statistical factor of 2 (dimeric ligand binding), which we now state in lines 266-267. However, it is noteworthy that our purpose is to determine how mutants alter the energetic landscape, so differences such as multiplication of both the WT and mutant equilibrium by a constant factor (of 2.0) cancel out when comparing mutants.

      In a similar vein, for the final models of Fig. 6,7, why is Mg binding only considered to selected states (SenOFF/HAMP1/Akon/off, for example)? And in Fig. 6A what does AK "on/off" signify?

      All species are allowed to bind Mg2+, but only 2 such species are shown for clarity. Figure 6 legend has been modified to state this explicitly.

      Line 549 - Discussion of the setpoint of the autokinase domain depends on the "reference point" given that KAK and alpha2 are correlated parameters. For example, one could view the intrinsic activity of the autokinase as being the fully uncoupled state, with KAK defined closed to 1.0 and alpha2 having a smaller value that currently modeled in the case of the Y60C (WT) protein. Could one fix KSen and KAK at the values for the Gly-decoupled systems and allow the shifts in equilibrium owing to HAMP coupling to be compensated solely for by alpha1 and alpha2? This framing might be more straightforward for understanding the HAMP coupling.

      While some HAMP coupling mutations are adequately compensated by changes in α1/ α2, we adopted a universal standard state for local parameter variation, described in lines 420-425, 866-877. Thus, coupling mutations within the HAMP primary sequence were also allowed to alter KHAMP. In cases where α1/ α2 modulation is sufficient for fitting, the KHAMP value was found to be close to the global fit parameter value (which we have highlighted in green in Table 2). HAMP mutations near the autokinase junction also necessitated floating the KAK parameter for adequate fitting; therefore, we cannot fix KAK across the board for these coupling mutants. Furthermore, Gly7 insertions do not relieve the restraints associated with PhoQ being membrane-localized, and it is hard to consider the sensor and autokinase domains as fully uncoupled.

      Although the reference position is largely arbitrary and in any given fitting scheme likely depends on the choice of constraining and fixing parameters, it does alter how one views the role of kinase-activating mutations. i.e. with the fully decoupled state as the reference, the HAMP is always deactivating, with different variants (including the WT) deactivated to varying extents. Some additional comments on this issue may help readers understand the range of kinase behavior and how it is influenced by HAMP.

      Based on this comment, we have added lines 695-698 in the Discussion.

      Related to the previous point, in Fig. 7 the alpha2 parameter seems to have a large amount of uncertainty, and appears biphasic in the fits, this behavior deserves a comment as to its impact in the model. How much would the interpretations change if alpha2 is considered to hold its extreme values?

      We have added Figure 7-figure supplement 3 to show the effect of holding α2 at one of the 2 parameter value peaks, and have made additional comments in text (lines 468-473). There is no change in fitting quality at all values of α2 < 0.1, and the biphasic behavior appears artificial in the sense that it does not appreciably change the fit so long as α2 < 0.1. The bottom line is α2 for WT is consistent with strong negative coupling between the HAMP and catalytic domain.

      p. 30 line 587 - It's unclear what is meant by the statement that the HAMP domain "serves to tune the ligand-sensitivity amplitude of the response" (p. 30 line 587). In this model, the HAMP domain does alter the sensitivity of the sensor domain by favoring the sensor OFF state (even though it does not directly modulate KdOFF), but what is meant by "sensitivity amplitude".

      We have clarified this phrase on lines 665-667.

    1. Author Response:

      Evaluation Summary:

      The authors studied the neural correlates of planning and execution of single finger presses in a 7T fMRI study focusing on primary somatosensory (S1) and motor (M1) cortices. BOLD patterns of activation/deactivation and finger-specific pattern discriminability indicate that M1 and S1 are involved not only during execution, but also during planning of single finger presses. These results contribute to a developing story that the role of primary somatosensory cortex goes beyond pure processing of tactile information and will be of interest for researchers in the field of motor control and of systems neuroscience.

      We thank all reviewers and the editor for their assessment of our paper. We acknowledge that our description of the methods and some interpretation of the results can be clarified and expanded. We address every comment and proposed suggestion in the following below.

      Reviewer #1 (Public Review):

      This is a very important study for the field, as the involvement of S1 in motor planning has never been described. The paradigm is very elegant, the methods are rigorous and the manuscript is clearly written. However, there are some concerns about the interpretation of the data that could be addressed.

      We thank Reviewer #1 for the positive evaluation of our study. We clarify our methodological choices and interpretation of the data in the following response.

      • The authors claim that planning and execution patterns are scaled version of each other, and that overt movement during planning is prevented by global deactivation. This is an interesting perspective, however the presented data are not fully convincing to support this claim:

      (1) the PCM analysis shows that correlation models ranging from 0.4 to 1 perform similarly to the best correlation model. This correlation range is wide and suggests that the correspondence between execution/planning patterns is only partial.

      The reviewer is correct that the current data leaves us with a specific amount of uncertainty. However, it should be noted that the maximum-likelihood estimates of correlations between noisy patterns are biased, as they are constrained to be smaller or equal to 1. Thus, we cannot test the hypothesis that the correlation is 1 by just comparing correlation estimates to 1 (for details on this, see our recent blog on this topic: http://www.diedrichsenlab.org/BrainDataScience/noisy_correlation/). To test this idea, we therefore use a generative approach (the PCM analysis). We find that no correlation model has a higher log-likelihood than the 1-correlation model, therefore we cannot rule out that the underlying true correlation is actually 1. In other words, we have as much evidence that the correspondence is only partial as we do that the correspondence is perfect. The ambiguity given by the wide correlation range is due to the role of measurement noise in the data and should not be interpreted as if the true correlation was lower than 1. What we can confidently conclude is that activity patterns have a substantial positive correlation between planning and execution. We take this opportunity to clarify this point in the results section.

      (2) in Fig.4 A-B, the distance between execution/planning patterns is much larger than the distance between fingers. How can such a big difference be explained if planning/execution correspond to scaled versions of the same finger-specific patterns? If the scaling is causing this difference, then different normalization steps of the patterns should have very specific effects on the observed results: 1) removing the mean value for each voxel (separately for execution and planning conditions) should nullify the scaling and the planning/execution patterns should perfectly align in a finger-specific way; 2) removing the mean pattern (separately for each finger conditions) should effectively disturb the finger-specific alignment shown in Fig.4C. These analyses would corroborate the authors' conclusion.

      The large distance between planning and execution patterns (compared to the distance between fingers) is caused by the fact that the average activity pattern associated with planning differs substantially from the average activity pattern during execution. Such a large difference is of course expected, given the substantially higher activity during execution. However, here we are testing the hypothesis that the pattern vectors that are related to a specific finger within either planning or execution are scaled version of each other. Visually, this can be seen in Figure 4B (bottom), where the MDS plot is rotated, such the line of sight is in the direction of the mean pattern difference between planning and execution—such that it disappears in the projection. Relative to the baseline mean of the data (cross), you can see that arrangement of the fingers in planning (orange) is a scaled version of the arrangement during execution (blue). The PCM model provides a likelihood-based test for this idea. The model accounts for the overall difference between planning and execution by including (and estimating) model terms related to the mean pattern of planning and execution, respectively, therefore effectively removing the mean activation of planning and execution. We have now explained this better in the results and methods sections, also referring to a Jupyter notebook example of the correlation model used (https://pcm-toolbox-python.readthedocs.io/en/latest/demos/demo_correlation.html).

      Regarding your analysis suggestions, removing the mean pattern for planning and execution across fingers as a fixed effect (suggestion 1) leads to the distance structure shown in Fig 4B (bottom)—showing that the finger-specific patterns during planning are scaled versions of those during execution (also see Fig. R1 below). On the other hand, subtracting the mean finger pattern across planning and execution (suggestion 2) will not fully remove the finger specific activation as the finger-specific patterns are differently scaled in planning and execution. Furthermore, neither of these subtraction analyses allows for a formal test of the hypotheses that the data can be explained by a pure scaling of the finger-specific patterns.

      Figure R1. RDM of left S1 activity patterns evoked by the three fingers (1, 3, 5) during no-go planning (orange) and execution (blue) after removing the mean pattern across fingers (separately for planning and execution). The bottom shows the corresponding multidimensional scaling (MDS) projection of the first two principal components. Black cross denotes mean pattern across conditions.

      • A conceptual concern is related to the task used by the authors. During the planning phase, as a baseline task, participants are asked to maintain a low and constant force for all the fingers. This condition is not trivial and can even be considered a motor task itself. Therefore, the planning/execution of the baseline task might interfere with the planning/execution of the finger press task. Even more controversial, the design of the motor task might be capturing transitions between different motor tasks (force on all finger towards single-finger press) rather than pure planning/execution of a single task. The authors claim that the baseline task was used to control for involuntary movements, however, EMG recordings could have similarly controlled for this aspect, without any confounds.

      Participants received training the day before scanning, which made the “additional” motor task very easy, almost trivial. In fact, the system was calibrated so that the natural weight of the hand on the keys was enough to bring the finger forces within the correct range to be maintained. Thus, very little planning/online control was required by the participants before pressing the keys. As for the concern of capturing transitions between different motor tasks, that it is indeed always the case in natural behavior. Arguably there is no such thing as “pure rest” in the motor system, active effort has to be made even to maintain posture. Furthermore, if the motor system considers the hold phase as a simultaneous movement phase, it should have prevented M1 and S1 to participate in the planning of upcoming movements, as it would be busy with maintaining and controlling the pre-activation. Having found clear planning related signals in M1 and S1 in this situation makes our argument, if anything, stronger.

      Finally, we specifically chose not to do EMG recordings because finger forces are a more sensitive measure of micro movements than EMG. Extensive pilot experiments for our papers studying ipsilateral representations and mirroring (e.g., Diedrichsen et al., 2012; Ejaz et al., 2018) have shown that we can pick up very subtle activations of hand muscles by measuring forces of a pre-activated hand, signals that clearly escape detection when recording EMG in the relaxed state. Based on these results, we actually consider the recording of EMG during the relaxed state as an insufficient control for the absence of cortical-spinal drive onto hand muscles. This is especially a concern when recording EMG during scanning, due to the decreased signal-to-noise ratio.

      • In Fig.2F, the authors show no-planning related information in high-order areas (PMd, aSPL), while such information is found in M1 and S1. This null result from premotor and parietal areas is rather surprising, considering current literature, largely cited by the authors, pointing to high-order motor or parietal areas involved in action planning.

      We agree with the reviewer that, to some extent, the lack of involvement of high-order areas in planning is surprising. However, we believe that task difficulty (i.e., planning demands) plays a role in the amount of observed planning activation. In other words, because participants were only asked to plan repeated movements of one finger, there was little to plan. The fact that this may have contributed to the null result in premotor and parietal areas was further confirmed by the second half of the dataset, which is not reported in the current paper. Here, we investigated the planning of multi-finger sequences, where planning demands are certainly higher. We found that high-order areas such as PMd and SPL were indeed active and involved in the planning of those, as expected. We decided to split the dataset across two publications as the multi-finger sequences have their own complexities, which would have distracted from the main finding of planning related activity in M1 and S1.

      Reviewer #3 (Public Review):

      I found the manuscript to be well written and the study very interesting. There are, however, some analytical concerns that in part arise because of a lack of clarity in describing the analyses.

      1) Some details regarding the methods used and results in the figures were missing or difficult to understand based on the brief description in the Methods section or figure legend.

      We thank Reviewer #3 for pointing out some lack of clarity in our description of the methods. We now expanded both the methods section and the figure captions (Fig. 2-3-4).

      2) I think the manuscript would benefit from a more balanced description on the role of S1. As the authors state, S1 is traditionally thought to process afferent tactile and proprioceptive input. However, in the past years, S1 has been shown to be somatopically activated during touch observation, attempted movements in the absence of afferent tactile inputs, and through attentional shifts (Kikkert et al., 2021; Kuehn et al., 2014; Puckett et al., 2017; Wesselink et al., 2019). Furthermore, S1 is heavily interconnected with M1, so perhaps if such activity patterns are present in M1, they could also be expected in S1?

      To better characterize the role of S1 during movement planning, we now include recent research showing that S1 can be somatotopically recruited even in the absence of tactile inputs.

      3) Related to the previous comment: If attentional shifts on fingers can activate S1 somatotopically, could this potentially explain the results? Perhaps the participants were attending to the fingers that were cued to be moved and this would have led to the observed activity patterns. I don't think the data of the current study allows the authors to tease apart these potential contributions. It is likely that both processes contribute simultaneously.

      We agree that our results could also be explained by attentional shifts on the fingers. It is very likely that, during planning, participants were specifically focusing on the cued finger. However, as the reviewer points out, our current dataset cannot distinguish between planning and attention as voluntary planning requires attention. We expanded the discussion section to include this possibility.

      4) The authors repeatedly interpret the absences of significant differences as indicating that the tested entities are the same. This cannot be concluded based on results of frequentist statistical testing. If the authors would like to make such claims, then they I think they should include Bayesian analysis to investigate the level of support for the null hypothesis.

      We have now clarified the parts in the manuscript that sounded as if we were interpreting the absence of significant difference (null results) as significant absence of differences (equivalence).

    1. Author Response:

      Reviewer #2 (Public Review):

      Chan et al set out to assess the transcriptomic (bulk and single cell), proteomic and metabolic changes that occur as primary WI38 human lung fibroblasts progress from early proliferative stages through to replicative senescence (RS) in vitro, as well as using ATAC-seq to assess changes in chromatin accessibility in senescence. The authors compare findings from RS in primary WI38 cells with immortalised cells of the same lineage expressing hTERT, cells that are quiescent through contact inhibition and cells with radiation-induced DNA damage. The data presented confirm findings in the literature from individual -omics studies; what makes this work novel and provides new insight is the combination of a range of -omics techniques, including time resolved scRNAseq, to provide deep molecular profiling across the cell lifespan. This indicates that senescence is a process of gradual onset throughout the proliferative lifecourse, and that a few key pathways are strongly associated with (and probably drive) replicative senescence, particularly a fibroblast to mesenchymal transition (FMT) akin to the epithelial-mesenchymal transition (EMT) observed in cancer development. The identification of changes that occur at different stages along the senescence trajectory is important in that it may allow tailored interventions. Moreover, their finding of nicotinamide N-methyltransferase (NNMT) upregulation in senescence provides an explanation for the greater chromatin accessibility observed in senescence as well as NAD+ depletion.

      The reliance on -omics techniques is also to some extent a weakness - no attempt is made to orthogonally validate the findings e.g. by qRT-PCR for transcripts, or western blotting for proteins identified to change on senescence. While the data on replicative senescence appear mostly robust, there are potential weaknesses in comparisons with DNA damage-induced senescence, as the early time points analysed may reflect more the acute DNA damage response rather than senescence. While it is sensible to conduct the full range of analyses on the same cell line to identify degree of concordance between gene expression control at RNA and protein levels, and correlate with metabolic consequences, there is only a cursory attempt to compare with other senescence models (a single published dataset on oxidative stress induced senescence in astrocytes) so the findings are at this stage confined to senescence in the WI38 cell line studied, though it is likely they will have much wider applicability.

      We apologize for the lack of clarity in communicating that we did in fact compare our data to a model composed of multiple replicative senescence studies from different labs and different fibroblast cell lines compiled by Judith Campisis’s lab [Hernandez-Segura et al., 2017]. This comparison is the subject of what is now manuscript Figure 1D (1C previously). We report a very high correlation (r2=0.92) between our data and a large compendium of replicative senescence data. We apologize that this was not clear and have added clarifying text.

      The comparison with oxidative stress induced senescence in astrocytes was used specifically to show that we observe similar putative regulatory TFs enriched in a senescence context far removed from replicative senescence in WI38 fibroblasts to suggest the possibility of wider applicability outside of WI-38 cells as mentioned in discussion. Although interesting and suggestive, this is outside the main thrust of the paper which was to generate a high resolution molecular description of replicative senescence in WI-38 cells.

      A larger, comprehensive analysis to determine to what extent the TFs we highlight could be master regulators of senescence across many different senescence models and tissue cell lines is useful for the field, but it is outside the scope of this paper given the size of the literature. That analysis would be a publication in its own right similar to Hernandez-Segura et. al. If the reviewer finds the astrocyte comparison misleading or unhelpful we are happy to remove.

    1. Author Response:

      Reviewer #1 (Public Review):

      The study aims to investigate the role of A11 neurons in courtship behavior and vocalizations. In particular, the authors determine the inputs/outpus of A11 neurons and uncover that the outputs are both dopamine and glutamate positive. They then lesion A11 cell bodies and terminals in the songbird song-motor nucleus HVC and find that these lesions affect song production, especially, though not exclusively, of courtship song. They also measure the location and movement of lesioned birds and find that birds with lesions of A11 cell bodies show less engagement with a female. Finally, they use fiber photometry to study the activity of A11 terminals in HVC during singing. While this is an interesting question supported by novel data, and I appreciate the diverse and creative approaches employed in this study, the role of A11 in courtship behavior appears complicated and does not easily fit into the framework proposed by the authors. In particular, the authors argue that A11 is important for coordinating innate and learned aspects of courtship, however, their data fall short of supporting this idea.

      Strengths This is an impressive data set with considerable attention to detail.

      The tracing and histology data identify some novel connections not previously described in songbirds as well as the potential of A11 neurons to co-release of glutamate and dopamine.

      Photometry provides real-time monitoring of A11 and HVC neuron activity during singing.

      In principle, targeting both HVC terminals and A11 cell bodies has the potential to lend insight into the role of HVC terminals vs. the role of projections to other areas (see below for caveats).

      We appreciate the reviewer’s efforts and attention in evaluating our manuscript. We are grateful that the reviewer recognizes strengths in our study, which we agree provides novel insights into the brain circuits that enable a fully integrated courtship display comprising learned and innate behaviors.

      Weaknesses 1) While I find the overall question and the data interesting, I am not convinced that they demonstrate that A11 is important for "coordinating innate and learned aspects of courtship". In general, birds with A11 lesions appear less motivated to perform female-directed song, however, it's not clear that this is a consequence of a lack of coordination between innate/learned aspects of behavior. Rather, perhaps A11 neurons are important to instigate or drive courtship behavior, or to relay signals from the POA or other regions important for courtship. Because the lesions abolish behavior, it is difficult to discern the role of these neurons in courtship.

      We agree that discerning the precise role of A11 is tricky. It could be acting to gate a (motivational) drive from another source, providing a primary source of this drive, and/or performing a more intricate role in coordinating the various aspects of the courtship display. The reviewer is correct that the current experiments do not allow us to clearly distinguish between these possibilities, and we have revised the manuscript accordingly, first by replacing “coordinate” with “gate” in the title and introduction and including a more thorough treatment of gating and other possible roles for A11 in lines 258-262 of the discussion. That said, we lean towards the latter possibility - a coordinating role for A11 - because of its location immediately proximal to regions that drive learned (HVC) and innate (ICo, RPgc) aspects of behavior and because A11 neurons can contain synthetic enzymes for a fast acting neurotransmitter (glutamate) in addition to DA. But, ultimately, we acknowledge that future experiments are needed to more completely answer this question.

      In addition, I disagree with the innate vs. learned distinction as recent data indicate that introductory notes, which the authors treat as innate, are actually learned (e.g. Kalra et al., 2021). Further, there is also no quantification of the effects of lesions on female-directed calls and little analysis of the activity during call production. This would seem to further complicate the overall interpretation. Overall, it's difficult to make sense of how A11 activity relates to vocalizations, especially given the innate/learned framework that they focus on.

      We thank the reviewers for drawing our attention to the recent Kalra 2021 paper, which we now cite while also making sure to emphasize that introductory notes may have learned features (lines 194-195 and 278-279). However, even that recent study concluded that males raised without a tutor or tutored on recorded songs that lack introductory notes altogether still developed songs that include introductory notes. Nonetheless, we include citation of this recent study and qualify our characterization of introductory notes as being shaped by innate predispositions and experience. Furthermore, we conducted additional analyses to quantify female-directed calling before and after 6-OHDA lesions in either HVC or A11 (results can be found in lines 164-165 and Figure 4C). In line with the divergent effects of these two types of lesions on the production of introductory notes, lesions in HVC did not affect female-directed calling whereas lesions in A11 largely abolished these vocalizations. While we acknowledge that the fiber photometry data on female-directed calling was limited, it nonetheless reinforces the conclusion that A11 transmits information to HVC about innate vocalizations, and it also transmits information to HVC about introductory notes. Along with the loss of introductory note production following A11 lesions, we do believe that our findings support the idea that A11’s role is essential to female-directed vocalizations generally, regardless of whether they are learned or innate, and of of somehow enabling the transition from production of female-directed calling and introductory notes to motifs. We have done our best to draw out these points in the revised discussion.

      2) The HVC lesions appear to create damage/necrosis (Fig 3-suppl 2) and this raises the question of the degree to which the HVC lesion effects are the result of dopamine/glutamate depletion or local damage. In particular, it is surprising that syllable structure and stereotypy show such a dramatic breakdown with HVC A11 input lesions and effectively no change with lesions of the cell bodies, even though both treatments lead to effectively similar reductions in song production.

      We appreciate that 6-OHDA lesions are not highly specific and can introduce unwanted effects on non-TH+ cells and processes. To further quantify the effects of 6-OHDA lesions on HVC cells, we conducted additional 6-OHDA injections in HVC and TUNEL staining studies in addition to the preliminary efforts we had made in the original manuscript. Quantification of these data confirmed our original impression that 6-OHDA treatment in HVC increased HVC cell death (these data are shown in Figure 3-figure supplement 2J, K). To further address this issue, we also added an analysis of song structure when D1 receptor blockers were dialyzed into HVC. No changes in song morphology were detected, similar to the lack of effects on song morphology following A11 cell body lesions (Figure 3 - figure supplement 3). Taken together, these additional experiments and analyses indicate that the changes in song morphology following 6-OHDA treatment in HVC may arise from local damage to HVC cell bodies. In contrast, the reduction in singing following A11 terminal or cell body lesions is likely to reflect diminished DA signaling from A11. However, as the reviewer notes, our primary finding is the differential effects on female-directed singing, and the distinction between more purely singing-related effects following 6-OHDA treatment in HVC and a broad effect on all courtship behaviors following 6-OHDA treatment in A11.

      3) If the idea is that A11 is important for coordinating innate and learned movements, it seems that a detailed analysis of the movements would be important. As is, the movement data provide further support of a decrease in either the motivation or ability to perform female-directed song, but they do not speak to a more specific role for A11 in coordinating innate and learned movements.

      We maintain that we did provide a detailed analysis of a number of important nonsong behaviors, including changes in head orientation and translational movements that the male makes towards the female, both of which are major appetitive features of courtship in songbirds and other vertebrates. We also appreciate that these analyses do not allow us to say much about precisely how movements are being coordinated during courtship, and we have changed language throughout the manuscript to emphasize a gating rather than coordinating role for A11. Furthermore, in response to the reviewer’s concern, we performed additional analyses of the male’s movements during courtship, including beak wipes, vertical changes in posture (“standing tall”), which are finer components of female-directed displays. Notably, this new analysis reveals that all of these behavioral components are abolished by A11 cell body lesions, but not by A11 terminal lesions in HVC (lines 168-190 and Figure 4I, J). We appreciate the reviewer’s suggestion, as we believe these additional analyses strengthen our core finding, namely that A11 functions as a hub to gate, recruit and possibly coordinate innate and learned movements to generate a complete courtship display. These different roles are more fully considered in the revised discussion (lines 256-262).

      Reviewer #2 (Public Review):

      Ben-Tov et al. investigate function of midbrain region A11 and provide evidence that it plays a role in promoting and coordinating a variety of motor responses to sexually or socially salient stimuli. They show lesions of A11 cell bodies abolish female directed calling, orienting and singing, while lesions of terminals in the song premotor nucleus HVC prevent female directed singing, but leave female directed calling and orienting intact. Together with anatomical data indicating projections from A11 to multiple downstream targets associated with song (HVC), calling (DM/ICO) and locomotion, these data support the authors' idea that A11 forms a 'hub' that drives and 'coordinates' multiple different aspects of behavioral responses to social (here female/sexual) stimuli. The results are intriguing and begin to reveal how a single social context can elicit and coordinate multiple coordinated responses. However, as outlined below, I think that some of the specific stronger claims would benefit from additional data, discussion or moderation.

      The authors also provide compelling support for the idea that A11 plays a differential role in female-directed versus undirected song. This is especially underpinned by the observations that 1) A11 afferent activity in HVC appears to differ between directed and undirected signing, with increases in activity preceding song motifs only during directed song, and 2) lesions of A11 cell bodies or inputs to HVC have a dramatic suppressive effect on directed singing, but can leave undirected song largely unchanged. These observations that A11 differentially contributes to socially elicited versus spontaneous singing seem especially interesting and merit further highlighting and discussion as one of the especially striking aspects of the study that seems distinct from the thesis of a role in coordinating learned and unlearned behaviors.

      We appreciate the reviewer’s efforts and attention in evaluating our manuscript. We are grateful that the reviewer recognizes strengths in our study, which we agree provides novel insights into the differential contribution of A11 to socially elicited versus spontaneous singing. We also agree that this point should be highlighted and we expanded our treatment of this point in the discussion section of the revised manuscript (lines 296-309).

      Specific comments

      A central idea around which the results are discussed is that A11 plays a particular role in coordinating learned versus innate behaviors. I have several questions around this thesis where further guidance from the authors about both technical points and interpretation would be helpful.

      First is the question of how specific are the manipulations and conclusions to A11 itself versus other neighboring midbrain dopaminergic regions within which it is embedded. The authors show histology of lesions, injection sites and retrograde labelling in supplementary figures, but do not provide enough guidance for me to understand the strength of the argument that manipulations are restricted to A11 and/or its afferents. Can the boundaries between A11 and neighboring regions be better demarcated? What are the neighboring regions to which there might have been spillover? For lesions of A11 axons within HVC, wouldn't 6-OHDA also damage any other dopaminergic afferent to HVC, including those coming from regions such as VTA? Some discussion of these and related points regarding the specificity of manipulations to A11 would be helpful, especially in light of the literature that points to potential roles of neighboring dopaminergic regions in contributing to motivated behaviors and song more specifically.

      We appreciate that the definition and boundaries of A11 might be confusing. We demarcated A11 and neighboring regions in the relevant figures to better define A11’s boundaries. The reviewer is correct in surmising that the VTA is fairly close to A11 and hence a reasonable concern is that 6-OHDA treatment in A11 could spill over to the VTA and possibly the SNc. To address the concern that 6-OHDA lesions in HVC might cause cell damage to other DA sources to HVC, we quantified the number of VTA/SNc cells following HVC DA lesions. This additional analysis, provided in Figure 3-figure supplement 1D-F, shows that the number of VTA/SNc cells following 6-OHDA injections into either A11 or HVC is comparable to that of intact birds. These additional analyses support the conclusion that the behavioral deficits that emerge following 6-OHDA treatments reflect damage to A11 or A11 terminals in HVC.

      These points also relate to the general question of what is meant by A11 being a 'hub for coordination of learned and innate courtship behaviors'. Ultimately, it seems likely that many regions must work together to orchestrate these behaviors, and it is not clear from the present results how much I should view A11 as having a more specific role than other neighboring dopaminergic regions (or hypothalamic regions such as POA) that are interconnected and seem likely to also play critical roles. As the authors note, many of the relevant structures, including A11 and song system structures, are recurrently connected, further complicating interpretation of any one area as a hub. In this respect, I am not sure how much the authors are intending to argue that A11 is both necessary and sufficient for driving each of the studied behaviors in a courtship context, and it would be helpful to discuss this more specifically - does 'coordination' as used here imply that A11 is capable of triggering these behaviors - an interesting possibility raised by the current results but that does not yet seem to be demonstrated - or something else?

      As we noted in our response to a similar point made by the first reviewer, we agree that discerning the precise role of A11 is tricky. As we commented in that earlier response, A11 could gating a (motivational) drive from another source, providing a primary source of this drive, and/or performing a more intricate role in coordinating the various aspects of the courtship display. We agree that the current experiments do not allow us to make a clear distinction between these possibilities, and we have revised the manuscript accordingly, including a more thorough treatment of these various roles for A11 in the discussion (lines 256-262). That said, we lean towards the latter possibility - a coordinating role for A11 - because of its location immediately proximal to regions that drive learned (HVC) and innate (ICo, RPgc) aspects of behavior and because A11 neurons can release a fast acting neurotransmitter (glutamate) in addition to DA. But, ultimately, we acknowledge that future experiments are needed to more completely answer this question. In the revised manuscript, we emphasize a gating role for A11 in the title and introduction, and then in the discussion expand to encompass the possibility of a coordinating or timing role for A11.

      One additional question regarding the framework for interpreting the function of A11 as coordinating 'learned and innate' courtship behaviors, is for some further clarification and citations regarding what is learned versus innate, especially as it relates to song. The authors characterize introductory notes as 'innate', but previous work from Rajan and colleagues has demonstrated that aspects of introductory notes including acoustic structure and patterning are influenced by learning, and I am not sure what the literature says about orienting and calling to females.

      We thank the reviewer for drawing our attention to this recent study from the Rajan group which indeed concluded that some aspects of the introductory notes are learned. We also note that this study showed that juvenile males tutored on song playbacks that lacked introductory notes or that were raised without a tutor still produced introductory notes. Nonetheless, we include a citation of this recent study and qualify our characterization of introductory notes as being shaped by innate predispositions as well as through experience and learning (lines 194-195 and 278-279). Furthermore, our original analyses of birds with 6-OHDA treatment in HVC revealed that introductory note morphology was unchanged, whereas syllable morphology was degraded. Therefore, even if certain features of introductory notes are influenced by tutor experience, they apparently do not depend on HVC in the same manner as do the learned syllables in the motif. Lastly, we conducted additional analyses to quantify female-directed calling and other movements, before and after 6-OHDA lesions in either HVC or A11. In line with the divergent effects of 6-OHDA treatment in these two regions on the production of introductory notes, lesions in HVC did not affect female-directed calling, beak wipes or changes in male’s posture, whereas lesions in A11 largely abolished all of these behaviors (Figure 4C, I, J). While we agree with the reviewer that a distinction between innate and learned behaviors may not be straightforward, the more fundamental observation is that we can dissociate different aspects of the courtship display and that A11 is situated in a position to drive, gate or coordinate a unified display that involves a variety of learned and innate vocal and non-vocal movements.

      I also would find it helpful to have some further clarification in this context about what it means to coordinate learned and innate aspects of song. The authors indicate that undirected song is largely unaffected by A11 lesions while directed song is largely eliminated, leaving only innate calls or introductory notes. I think it would be helpful to see here a more complete characterization of the nature of vocalizations that remain following A11 lesions in the female directed context. While I understand that no recognizable 'learned motifs' are produced, it is unclear from the example that is shown how much the residual vocalizations should be construed as 'severely disrupted songs' versus strings of calls that resemble innate calls that were present prior to lesions, versus 'normal' patterns of introductory notes that resemble in acoustic structure what the birds produced prior to lesions, but that never proceed to song motifs, etc. A better understanding of the nature of these residual vocalizations might also help to interpret what A11 is doing. Do these birds seem motivated to 'sing' in terms of their posture? Do the authors think that HVC is engaged or that the same residual vocalizations would be produced in a bird that had HVC lesions? How do the authors interpret these data in terms of how learned and unlearned vocalizations are normally coordinated in the context of directed singing?

      We performed additional analyses of the male’s vocalizations and movements during courtship, including female-directed calls, beak wipes, vertical changes in posture (“standing tall”), all of which are components of female-directed courtship displays. Notably, this new analysis reveals that all of these behavioral components are abolished by A11 cell body lesions, but not by A11 terminal lesions in HVC (lines 168-190 and Figure 4C, I, J). Along with our prior report that males with A11 cell body lesions do not sing female-directed motifs, the additional analysis indicates that these males produce little or no female-directed vocalizations or non-vocal behaviors of any kind.

      We previously reported that males with A11 terminal lesions produced only introductory notes but not motifs but realize that this observation would benefit from more quantification. As noted in the previous response, we previously established that introductory note morphology was unchanged by 6-OHDA treatment in HVC (Figure 4 - figure supplement 1A-D). To extend this analysis further in this revised manuscript, we built on the observation that males with 6-OHDA treatment in HVC produce only introductory notes to females, with no song motif, whereas they produce a series of introductory notes followed by motifs comprising distorted syllables when alone (Figure 3K, Figure 3 - figure supplement 2, Figure 4B). To confirm that the directed introductory notes and undirected syllables were indeed distinct vocalizations, we computed their durations and spectral similarity scores (using Sound Analysis Pro). The introductory notes produced during directed conditions differed markedly in their durations from distorted syllables produced during undirected conditions, and these two types of vocalizations had very low similarity scores, indicating that they were cleanly separable vocal behaviors (Figure 4 - figure supplement E, F). Given that introductory notes are unchanged by 6-OHDA treatment in HVC, these analyses support the idea that males treated in this manner can still produce motifs, albeit distorted ones, when alone but not when in the company of a female.

      These questions relate in part to that of how much is the trigger to sing eliminated by A11 afferent lesions versus the ability to produce the relevant song output? It seems like there may still be a trigger to sing - short latency vocal response to female - but inability to produce motif. One point that may be interesting to note in this regard is that this seems somewhat opposite of observations made in other contexts about the effects of directed versus undirected context on song - for example, juveniles can produce better song when it is directed (Kojima), and deafened birds that are beginning to exhibit song deterioration can exhibit normalization of song structure during directed conditions (Nordeen).

      We agree with the reviewer’s point that birds with 6-OHDA lesions in HVC may still be triggered to sing, but are unable to produce a motif, given that they still produce introductory notes and seem to have the right posture, orientation and proximity to the female. We appreciate the reviewer’s comment regarding changes in song that can be elicited by females in either juvenile males or adult males that are deaf, although these additional contexts fall outside of the current study, which focused on adult male finches with normal hearing.

      Reviewer #3 (Public Review):

      The authors use a combination of quantitative acoustic and other behavioral analyses to evaluate the role of the midbrain dopaminergic area A11 in the production of female-directed song in adult male zebra finches. They show that female-directed courtship displays, which consist of song and the production of female-directed displacement behaviors, are dependent on A11 because targeted chemical lesions of this structure, using 6-hydroxydopamine (6-OHDA), permanently (i.e. for at least several months) eliminate both the vocal and non-vocal elements of this behavior. Destruction of A11 axons that directly target HVC, by administering 6-OHDA into HVC, only eliminates female-directed singing without causing any change in the other observed female-directed behaviors. Because these same lesions only temporarily (5-10 days) abolish undirected song, these findings suggest that A11 is not directly involved in song production but acts instead as a gate for the production of female directed courtship behaviors. The authors follow these lesion studies with fiber photometry-based calcium imaging of A11 axons that target HVC to show that A11 activation patterns precede activity in HVC during female-directed singing and that calcium elevation is primarily elevated during the production of the many introductory notes (a component of song that is primarily observed during female-directed singing) that precede the production of the learned song motif. These findings suggest that A11 inputs to HVC likely play a role in triggering and/or activating HVC to synchronize the production of introductory notes (which are likely produced by midbrain circuits) with the learned song component that immediately follows them. In contrast, activation of A11 axons during undirected song (which contain few to no introductory notes) do not precede HVC activation patterns. Consistent with the rapid transmission of A11 neurons, the authors also confirm, as has been suggested for A11 in mammals, that A11 dopaminergic neurons co-release glutamate.

      The findings of this study are of significant interest to our understanding of the neural mechanisms by which these complex behaviors are synchronized and open up a new way of thinking about how learned behavioral motifs can be synchronized with non-learned (e.g. female displacement behavior) behaviors. The study is rigorous, with many different experimental approaches being used to examine the proposed hypotheses, and the findings are convincing. Particularly impressive is the complete elimination of female-directed courtship behaviors following targeted elimination of A11. The primary weaknesses of the manuscript lie (1) in the way they present their anatomical findings and (2) how the authors discuss their findings in the discussion. In the discussion, which is very short (~750 words), the authors miss the opportunity to draw parallels with similar studies in drosophila (they only provide a cursory statement with a few references). In the discussion, the authors propose a model that seems quite oversimplified and lacks, in fact, many of the anatomical connectivity that they show in the first part of their study (for example A11 is only shown having a unidirectional connection to ICo/DM when in fact the connections are bidirectional). The model is also presented in simple hierarchical fashion with many connections omitted. Perhaps these omissions were made to simplify the model but in my opinion such simplification possibly misrepresents the actual mechanisms involved in the coordinated control of courtship song.

      We thank the reviewer for their careful reading of the manuscript and his supportive and constructive comments. We agree that the loss of all female-directed behaviors (which we now extend to female-directed calling and other non-vocal behaviors, such as beakwipes and postural changes) following A11 cell body lesions is especially intriguing. Further, the different effects of A11 cell body lesions and A11 terminal lesions in HVC, along with the connectivity of A11, indicate that A11 acts via a range of downstream sites to gate these various female-directed behaviors. We have done our best to address the two primary weaknesses identified by the reviewer. First, we have done our best to provide a more detailed accounting of the anatomical findings. Second, we have expanded the discussion to address parallels with other studies, as in the fly, and to provide a more nuanced and complete consideration of how A11 may function to facilitate male courtship behaviors.

    1. Author Response:

      Reviewer #1:

      Reviewer 1 expresses some concerns regarding concentrations of soluble proteins during our experiments. This is a good point and, in response, we are rewriting the manuscript to more clearly describe the metastable nature of the soluble protein pool. The key feature of our reaction mixture is that it contains both profilin and capping protein, which work together to suppress filament assembly. Spontaneously nucleated filaments are rapidly capped at their barbed ends. Profilin then effectively prevents elongation from the pointed ends of these filaments and they disassemble. We will cite relevant work that establishes and discusses these synergistic activities. Young et al. (1990) found that factors that cap >90% of filament barbed ends increase the critical concentration from that of the barbed end to that of the pointed end, and several groups demonstrated that profilin and barbed-end capping proteins work together to suppress filament assembly and promote disassembly of filaments with free pointed ends (e.g. DeNubile, 1985; Blanchoin, 2000; Pernier, 2016). This combination produces a large pool of monomeric actin that is capable of transiently elongating any newly formed barbed ends. We previously described this pool as ‘metastable’ (Pollard, 2000) while others have described it as ‘dynamically stable’ (Pernier, 2016). Only the branched actin networks formed by the micro-patterned nucleation promoting factors have an appreciable lifetime and consume a significant fraction of the soluble proteins, because filaments can only be formed by continual rounds of nucleation and only remain stable when their pointed ends are capped by the Arp2/3 complex (Blanchoin, 2000). In addition, the total amount of protein incorporated into the micro-patterned branched networks is only a small fraction of the total protein present in the reaction mix. This is demonstrated by the fact that the network growth rate is constant over the course of each experiment. We will mention this in the revised manuscript and provide the following simple calculation to emphasize this point: The concentration of actin in our reaction mixes is 5 µM, with a total volume of 150 µl. The maximum concentration of actin in our networks is 1.25 mM, but the maximum total volume of these networks is only <0.002 µl (based on a total of 400 WAVE1 patches with an average area of 50 µm^2, generating networks with a maximum height of <100 µm). The fraction of actin used up during an experiment, therefore, is less than 0.3%.

      REFERENCES:

      Blanchoin L, Pollard TD, Mullins RD. (2000) Interactions of ADF/cofilin, Arp2/3 complex, capping protein and profilin in remodeling of branched actin filament networks. Curr Biol. 10(20):1273-82.

      DeNubile MJ, Southwick FS. (1985)Effects of Macrophage Profilin on Actin in the Presence and Absence of Acumentin and Gelsolin J. Biol. Chem. 260(12):7402-7409.

      Pernier J, Shekhar S, Jegou A, Guichard B, Carlier MF. (2016) Profilin Interaction with Actin Filament Barbed End Controls Dynamic Instability, Capping, Branching, and Motility. Dev Cell. 36(2):201-14.

      Young CL, Southwick FS, Weber A. (1990) Kinetics of the Interaction of a 41-Kilodalton Macrophage Capping Protein with Actin: Promotion of Nucleation during Prolongation of the Lag Period. Biochemistry, 29:2232-2240.

      Reviewer #2:

      Reviewer #2’s comments about the molecular mechanism underlying the force-induced increase in free barbed ends make it clear that our explanation was not as clear as it should have been. We will provide more detailed derivations for our mathematical methods, but in the meantime, we hope that the following explanation will clear up any misunderstanding.

      Reviewer 2 rightly notes that “…for both capping and branching, the authors find that they decrease the same way with increasing loads - as they should: this is imposed by their being at steady state, where the birth rate of growing barbed ends (branching) must match their death rate (capping).” This steady state condition is actually the starting point for our analysis. At steady state the overall rates of nucleation and capping must be equal (Rcapping = Rnucleation). Importantly, the overall rate of nucleation is a complicated function that depends on the occupancy of the WH2 domains, the surface-associated Arp2/3 complex, and the local density of polymeric actin. On the other hand filament capping in our system appears to be a simple bimolecular interaction between soluble capping protein and free barbed ends. We demonstrated this by showing that the average filament length (i.e. the ratio of polymeric actin to capping protein in the growing network) varies as a simple inverse function of the capping protein concentration. This means that the overall rate of nucleation (Rnucleation) must equal the product of the capping protein concentration ([CP]), the surface density of free barbed ends (E), and an appropriate capping rate constant (kc). This yields,

      kc[CP]E = Rnucleation

      Which can be rearranged to give the density of free barbed ends,

      E = Rnucleation/(kc*[CP])

      As the reviewer notes, this equation describes a density, not a unitless number. Note that a sudden decrease in per-filament capping rate (e.g. a decrease in the rate constant, kc) with no change in overall nucleation rate will cause the number of free barbed ends to increase until the overall rate of capping (kc[CP]E) once again matches the overall rate of nucleation. This equation is an “iron law” imposed by the steady-state (or quasi-steady state) character of the system, and it means that any increase in the density of free barbed ends must reflect EITHER an increase in the overall rate of nucleation OR a decrease in the per-filament capping rate (or possibly both). Our direct measurements of the overall nucleation rate (the quantity in the numerator) rule out the first possibility, meaning that the per-filament capping rate MUST go down with applied force. Furthermore, our measurements demonstrate that this capping rate displays the same force sensitivity as actin filament elongation. The best explanation for this phenomenon is that the insertion of a capping protein onto a filament barbed end is subject to the same constraints as the insertion of an actin monomer. This could have been predicted from Brownian Ratchet theory, but as the reviewer points out, it was not. Our “bulky capping protein” experiments are a direct test of whether Brownian Ratchet theory can account for the force sensitivity of filament capping, and they demonstrate that it can.

      In summary, we stand by our original explanation, namely that applied forces cause a decrease in the rate at which individual filaments are capped (via a change in the rate constant for filament capping, kc). This decrease, which can be explained by Brownian Ratchet Theory, leads directly to an increase in the steady-state barbed end density.

    1. Author Response:

      Public Review:

      This manuscript from Pacheco-Moreno et al. compares the microbiome of potato fields with and without irrigation. Irrigation is known to control potato scab caused by Streptomyces scabies and the authors hypothesized that changes in the microbiome may contribute to disease suppression after irrigation. Using 16S rRNA sequencing, they identified a number of taxa, including Pseudomonas that are enriched after irrigation. They went on to isolate and sequence the genomes of many Pseudomonas strains. By correlating the ability of Pseudomonas to suppress Streptomyces growth in vitro with genomic data, the authors identified a novel group of cyclic lipopeptides (CLPs) that can inhibit Streptomyces in vitro and in planta.

      This work provides a substantial contribution that advances our understanding of disease suppressive soil mechanisms. It is novel in scope in that it focuses on suppression of a bacterial pathogen, while many prior studies focus on suppression of fungal pathogens. Additionally, the large-scaled comparative genomics is a useful resource, and the identification of CLPs that inhibit Streptomyces is novel. Importantly, the authors provide in planta data to show role a for CLPs in disease suppression in vivo. The manuscript is well written and the data are well presented. The analyses are quite thorough and I appreciate the extensive use of genetics and metabolomics to support the genomic predictions. The main weakness is a lack of data the conclusively links the change in microbiome function to disease suppression after irrigation in the field. However, I think the data they've presented, combined with those in the drought literature, might suggest that an increase in total Pseudomonas (and the corresponding disease-suppressive genes) in well-watered soil might contribute to suppression, rather than a change in function of Pseudomonas.

      While the reviewer is correct that we cannot conclusively link disease suppression to a change in microbiome function after irrigation, we are confident that our results demonstrate a real and repeatable phenomenon that must be considered in future studies of soil scab suppression. Independent field experiments conducted two years apart both show a decrease in the proportion of suppressive pseudomonads associated with potato roots. The first experiment (Figures 1 & 2) contained too few sequenced isolates to draw statistically robust conclusions, therefore we designed the second experiment (Figure 8) to investigate this phenomenon further. This experiment showed highly significant differences in the proportion of suppressive isolates on irrigated and non-irrigated roots. The alternative hypothesis presented by the reviewer; that relative Pseudomonas and Streptomyces abundance are affected by irrigation and this may be a factor in scab suppression, is also a valid possibility, although relatively small abundance changes were observed in the data reported in Figure 1. We have amended the discussion to include this as an alternative explanation for our results.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors conducted an extensive analysis of transcription termination in Methanococcus maripaludis, which is an archael species. Using a combination of functional genomics, statistical analyses, in vitro/biochemical approaches, and reporters; the authors explore mechanistic aspects of termination. The authors determine that the archael CPSF protein binds an upstream uridine tract using a KH domain. The RNA binding activity of aCPSF is not present in eukaryotes and is shown to be important for termination in a uridine tract dependent manner.

      Overall, the work is well-conceived and, in general, the conclusions by the authors are supported by the data. Investigation of archael species is not mainstream but still has significant potential to impact the field of transcription as the process of termination is still be unraveled. Moreover, several aspects of the methodology would benefit other researchers - notably, the development of Term-seq. The readers would benefit from consulting early structural studies of aCPSF to more fully gain a perspective on the interesting aspects of KH domain presence in this homology of CPSF.

      Immensely thank you for the positive comments.

      Reviewer #2 (Public Review):

      Transcription termination defines accurate transcript 3'-ends and ensures programmed transcriptomes, making it critical to gene expression regulation. Our understanding of archaeal transcription termination mechanisms is still limited. Li et al present new evidences that support their model of aCPSF1-dependent transcription termination in Archaea (Yue et al NAR 2020). Importantly, they show that aCSPF1 recognize U-rich terminator signals with its KH-domain, using the methanogen Methanococcus maripaludis, as study model. The reported results of the manuscript are the continuity of their work published in Nucleic Acid Research journal in 2020 (Yue et al NAR 2020). Previously, the authors demonstrated the importance of the absolutely conserved ribonuclease aCPSF1 to terminate transcription by cleaving the transcripts at their 3' ends. They proposed a model for transcription termination in Archaea in which they defined aCPSF1 as general transcription termination factor. In here, by reinvestigating Term-seq data and by using RNA-protein binding assays (EMSA & SPR) and genetics experiments, Li et al argue that (i) PolyU-tracts are signals for transcription termination in M. maripaludis, (ii) aCPSF1 binds to PolyU-tract signals in transcript 3'-ends through its KH domain, and (iii) transcription termination is more effective in presence of aCPSF1. In general, the experiments and analyses that are shown are well-conducted. A major criticism is the lack (i) of quantification of RNA-protein binging assays which will allow going deeper in the understanding of how aCPSF1 specifically recognized PolyU-tract signals and (ii) of data on the oligomerisation status of aCPSF1. It is important to decipher if aCPSF1 is acting as a monomer or a dimer. Both will be helpful for proposing a more precise model for transcription termination mechanism in Archaea.

      Immensely thank you for the positive comments and the valuable suggestions. According to your comments and suggestions, we have supplemented the following experimental data in the revision as (i) the aCPSF1-U rich RNA binding association contents (Kd) have been quantified for the binding assays; (ii) truncation of the C-terminal 13 residues (C13) that are essential for aCPSF1 dimerization leads to the loss of the in vivo termination function of aCPSF1, demonstrating that the dimerization of aCPSF1 is essential to its routine function; (iii) Footprint assay also determined that aCPSF1 binds to the U-tract region of the tested terminator. All the related results and discussions have been supplemented in the revised manuscript.

      Reviewer #3 (Public Review):

      In this manuscript by Li et al. examine U-rich motifs enriched for the transcript end sites in archaea M. marpaludis and analyze the role of aCPSF1 and its KH domains in binding these U-rich motifs. Their data indicate aCPSF1 binding to U-rich motifs is necessary for efficient 3' end definition of transcripts. Overall this work is well carried out, but there are also several key issues that should be addressed in order to more fully support the conclusions of the authors.

      Immensely thank you for the positive comments.

      Major:

      1. The conclusion that aCPSF1 functions as a back-up termination is not supported by their data. As far as I can tell, back-up termination should happen at non-primary sites, which have a lower frequency of U quadruplex. It is not clear how aCPSF1 functions at those sites.

      Combination of the comments of you and the reviewer 1, this related inappropriate conclusion has been removed in the revised manuscript.

      1. The authors appear to indicate that aCPSF1 is the sole factor for 3' end cleavage of archaeal transcripts. But this is not supported by the data. Their data indicate both U-rich motif and aCPSF1 are necessary for cleavage. No data are shown to indicate that these two alone are sufficient for 3' end cleavage.

      The obscure related description has been revised as “the only trans-action factor”. (Line 36 in the Abstract and Lines 476-477).

      1. The U-rich motif (U quadruplex) was recently reported in their NAR paper (Yue et al.). There appears to be limited additional information on motifs in this work. It would be useful to readers to know if U-rich motifs are the only type for 3' end cleavage. The authors may want to examine motifs beyond single nucleotide models (which is what they are doing in this work). For example, are there any k-mer enrichments besides U quadruplex?

      According to your advice, we searched RNA sequences proximal to TTSs; however, no other motif in addition the U-rich sequence is found.

      Minor:

      1. Strictly speaking, they are measuring transcript ending sites, not polymerase termination sites. The Term-seq data do not map the polymerase termination site. The authors should make this distinction in their writing. Likewise, it is better to rename transcription termination efficiency (TTE) to transcript termination frequency, because they are measuring steady state RNAs which embody both 3' end cleavage efficiency and RNA stability. The efficiency implies termination kinetics, which is not what they are measuring.

      Thank you for the advice. To deliver the information more clearly, description of Term-seq method has been added as “Recently, Term-seq, an approach that enables accurate mapping of all exposed RNA 3′-ends in prokaryotes and determines the transcription termination sites (TTSs) at the genome-wide level in representative bacteria and archaea (Dar et al., 2016b; Porrua et al., 2016; Yue et al., 2020), has been developed. ” (Lines 82-85). In addition, “transcription termination efficiency (TTE)” has been revised as “Transcription termination efficacy” throughout the manuscript according to the suggestion.

      1. Figure 3 needs better quantification. Binding curves showing binding constant are needed to make quantitative conclusions.

      Equilibrium dissociation constants (Kd) were calculated based on the binding curves, which is generated by quantifying the unbound and bound substrates in Figures 3, 4, and 6A, and the Kd values are indicated in respective figures.

      1. Figure 7. the eukaryotic model seems based on budding yeast. This should be noted. 3' end motifs in other eukaryotes are quite different than those in the budding yeast.

      Thank you for the point. Budding yeast has been indicated in Figure 7 legend (Lines 1200).

      1. There are numerous grammatical errors, which the authors should address.

      English language and grammar have revised by a professional English language editing company. We hope the language and grammar have been improved.

    1. Author Response:

      Reviewer #1:

      This work significantly advances our understanding of the role of Strongyloides stercoralis nuclear receptor Ss-DAF-12 in determining parasite life cycle and infection outcomes. Strongyloidiasis is a facultative soil-transmitted nematode that infects hundreds of millions of humans. While infections are typically subclinical, 'hyperinfections' associated with host immunosuppression bring about severe morbidity and mortality that are not well-controlled by existing anthelmintics. DAF-12 signaling in free-living nematodes is known to promote growth and development, lifespan extension, and avoidance of the dauer dormancy state. Some of these findings have been shown to be conserved in parasitic nematodes, where infective stages are argued to be akin to dauer, motivating the study of the orthologous DAF-12 pathway as an antiparasitic substrate.

      Through a lipid fractionation strategy, the authors first show that Δ7-DA is a potent endogenous ligand for Ss-DAF-12 and that Δ7-DA abundance throughout the life cycle is correlated with states of parasite reproductive development. The authors show that S. stercoralis possess the machinery to synthesize Δ7-DA from dietary cholesterol and use an insect cell expression system to identify biosynthetic enzymes in this pathway: Ss-DAF-35, Ss-DHS-16, and the cytochrome P450 Ss-CYP-22a9. CRISPR/Cas9-mediated disruption of Ss-CYP-22a9 is shown to near-completely inhibit cGMP-mediated Δ7-DA production and activation of infective larvae as measured by feeding, which can be rescued by exogenous Δ7-DA. Finally, a gerbil infection model is used to show that Δ7-DA treatment drastically reduces fecal output of larvae in uncomplicated strongyloidiasis and that a combined Δ7-DA/ivermectin regimen can decrease the burden of autoinfective intestinal larvae and prevent death in hyperinfection induced by immune compromise.

      Overall, the conclusions are well-supported and this work provides important new insights on a pathway that is of broad relevance to the regulation of the complex and diverse life cycles of parasitic nematodes. The discovery that the combinatorial action of a DAF-12 agonist and ivermectin can synergistically control hyperinfective strongyloidiases is a major and impactful finding. This work will be of great interest to the larger parasitology and nematode biology community. My enthusiasm is only slightly tempered by the acknowledged caveats that currently limit the therapeutic outlook of this approach. The eventual development of therapeutics targeting this pathway could aid the treatment of uncontrolled strongyloides infections and be of potential value for the treatment and control of other parasitic worm infections.

      Strengths:

      • Experiments are generally well-designed and rigorous, clearly establishing that Δ7-DA is a primary ligand for Ss-DAF-12 and resolving the primary biosynthetic pathway for the production of Δ7-DA in S. stercoralis. While Δ7-DA is the known ligand of C. elegans DAF-12, significant difference in primary sequence justified caution and experimental validation. Similarly, while two of the Δ7-DA biosynthetic pathway enzymes have one-to-one C. elegans orthologs, the role of Ss-CYP-22a9 as the DAF-9 isoenzyme could not have been bioinformatically inferred.

      • CRISPR-based knockdowns are not trivial in this system and both the HDR and NHEJ pathways were elegantly leveraged to confirm the in vitro activity of Ss-CYP-22a9 and its essentiality to the life stage responsible for infection.

      • Animal studies convincingly reveal the synergistic effect of Δ7-DA and ivermectin in disseminated strongyloidiases. The burdens of intestinal larvae and adults in both uncomplicated and disseminated infection after treatment align with the standing model that Δ7-DA is acting against the intestinal larval stages (L3a) that are naturally deficient in the hormone.

      • While there is precedent for combinatorial drug therapies in antifilarial control, there is great novelty in combining drugs that are known to target different stages as opposed to just different molecular targets. This provides the first clear demonstration of this as far as parasitic nematodes are concerned.

      Thank you for your enthusiastic and supportive comments.

      Weaknesses: Weaknesses are categorically minor.

      • As the authors recognize, the animal studies required daily Δ7-DA dosing over a two-week period. While some of this is explained by a short half-life and poor pharmacokinetics, drug was continually delivered directly to the gut at high (uM) concentrations. This is a major hurdle to surmount and it is entirely possible that even if drugs with equivalent potency and more favorable pharmacokinetics were discovered, they would have to be administered in multiple dosing regimens or at prohibitively high concentrations to achieve a curative effect.

      We did address this potential limitation in the second to the last paragraph of the Discussion and believe that it should not be an insurmountable task to develop such drugs. The repeated dosing that we used was necessary because the endogenous ligand has poor pharmacokinetic properties, which is a relatively common characteristic of other endogenous nuclear receptor ligands as well. Notably, however, this hurdle has been overcome successfully with the design of potent, long-acting agonists and antagonists (the depot drugs used in reproductive medicine are great examples of this strategy).

      • While there is some evidence in other nematode clades that modulation of the DAF-12 pathway can affect developmental phenotypes, many of these parasites have huge phylogenetic separation from strongyloides and lack dormant stages requiring 'activation' or free-living stages. Given the independent evolution of parasitism across the phylum, it is just as likely that drugs acting on DAF-12 will have subtle (and not curative) effects in these other parasite systems.

      This will be an important point to address with other parasites. However, we note that a large number of them (including all of the soil-borne nematode parasites that have been surveyed) have a DAF-12 and a similar L3i stage. In fact, the so-called “rule of the infective third stage” is a paradigm that is broadly conserved throughout the nematode phylum.

      Reviewer #2:

      Mangelsdorf, Kliewer and colleagues here identified the endogenous ligand in Strongyloides stercoralis that governs that parasitic nematode's capacity to autoinfect its mammalian hosts. The ligand, [Delta]7-DA, interacts with nuclear receptor Ss-DAF-12, just as it does with its C. elegans ortholog, Ce-DAF-12, governing transcription of genes essential for metabolism and reproductive growth. Specifically, [Delta]7-DA appears to mediate a switch in DAF-12 function: in unfavorable conditions, [Delta]7-DA is absent and unliganded DAF-12 is said to arrest growth in both species and produces developmentally quiescent infective S. stercoralis larvae; in favorable conditions, the ligand is synthesized and the liganded DAF-12 triggers infection by S. stercoralis and subsequent development and reproduction in both species. The authors determined the ligand's biosynthetic pathway and showed by mutating the rate-limiting enzyme in the pathway that [Delta]7-DA is essential for parasite reproduction, whereas its absence is required for infectious larval development. In an animal model, they demonstrated that administration of [Delta]7-DA suppresses autoinfection and host lethality. Given in combination with an existing drug targeted to actively developing stages, [Delta]7-DA virtually cures the disease.

      This work establishes a finding and an implication. The finding: gene circuits for growth and reproduction in nematode species with distinct life cycles -- parasitic vs free-living -- are regulated by a hormonal signal and cognate receptor that are structurally and functionally conserved. This is evolutionarily unremarkable, made mildly surprising because S. stercoralis lacks a cytochrome P450 with strong sequence identity to C. elegans DAF-9, which catalyzes the rate-limiting step in [Delta]7-DA synthesis in C. elegans; a screen of S. stercoralis P-450s demonstrated that Ss-CYP22a9 is the DAF-9 isozyme. The implication: targeting DAF-12 function (either agonism to block lethal hyperinfection or antagonism to prevent development of adult worms) or ligand synthesis may offer a therapeutic route to treating nematode parasitism. This is a valuable implication, identifying three therapeutic target approaches -- Ss-DAF-12 agonism or antagonism, and Ss-CYP22a9 inhibition -- that potentially might be be advanced from these pre-clinical observations. Overall, the manuscript makes a modest yet significant contribution.

      Typical of the work from Mangelsdorf and Kliewer, the research plan and experiments are rigorously designed, executed and interpreted. My only quibble is that the requirement for Ss-DAF-12 (unliganded) to produce infectious L3i larvae is claimed (lines 165-167) but not directly demonstrated here. Instead, the authors depend on, and eventually cite in the Discussion (line 340) their nice PNAS paper earlier this year, which makes this case. Because of its importance in rounding out the implication noted above, my preference would be for the authors to add an experiment to this work that documents that Ss-DAF-12 is essential both pre- and post-ligand production.

      Thank you for your positive comments and enthusiasm for our work. Regarding your comment on the requirement of Ss-DAF-12 for L3i, in the text we have updated the citation of our previous published results to support our finding that unliganded Ss-DAF-12 is required for L3i formation. Importantly, we also showed this requirement using a different strategy in our present study: we demonstrated that the enzyme that makes the ligand is required for recovery from L3i and that knocking it out results in worms that develop into L3i but do not progress unless exogenous ligand is added back.

      Regarding the preference for an experiment that details Ss-DAF-12’s requirement for pre-ligand production, this was shown in our 2021 PNAS paper by Cheong et al. using the Ss-DAF-12 loss of function worms. For post-ligand production, these Ss-DAF-12 KO worms could not be used for the following reason. Recall that the apo-receptor functions as a transcriptional repressor and so loss of Ss-DAF-12 results in de-repression of its targets, thereby phenocopying the presence of the ligand. Thus, adding the ligand to the Ss-DAF-12 KO has no easily discernable phenotype. However, in our PNAS paper we did knock out the essential coactivator (DIP-1) for Ss-DAF-12, which is required for post-ligand activity. This knockout eliminates Ss-DAF-12 ligand transactivation (but not transrepression) and shows the expected phenotype of not being able to recover from L3i.

    1. Author Response:

      Reviewer #1:

      For this manuscript, I focused on the metabolite analysis. The data is presented as supporting a common response based on shared selective histories if I'm understanding properly. However, primary metabolite data is hard to interpret in the same fashion as genetic data. This arises because of the high degree of pleiotropy wherein it is very hard to find a mutant or variant that doesn't alter primary metabolism. As such, it is possible that there is a common response less because of shared history and more because there is constraining selection that shapes what is the optimal primary metabolite response to cold in photosynthetic organisms. For example, in Arabidopsis, it has been found that accessions tend to have a highly similar primary metabolism but when they are crossed, the progeny have a vastly wider array of primary metabolism phenotypes, suggesting that the similarity in accessions is not shared genetics but constraining selection that forces compensatory variants. I don't think this detracts from the utility of including the primary metabolism but it would help to have more clarity in the strengths and weaknesses in using metabolite data to track theories and arguments that are largely genetic based.

      We fully agree with the reviewer. The idea of constraining selection is at least as interesting as our explanation, and should be in the forefront. Given this interesting idea of compensatory mutations that are private to each accession (or ‘lineage’ or ‘line’), in principle this idea also hints towards the parallel/convergent evolution (‘constraining selection’ in the reviewer’s words) of this important trait or trait complex. We re-phrased this within the manuscript and considered this comment seriously throughout. We also incorporate into our manuscript this interesting compensatory variant notion and metabolic network pleiotropy.

      One difference we would like to highlight still is that in our study (compared to Arabidopsis thaliana studies) we are comparing across many different species, ploidal levels, and varying species-level evolutionary histories. This makes our experiment different from Arabidopsis thaliana ecotype experiments and crossings; but indeed the reviewer is fully right that our results may also follow a similar evolutionary path as for Arabidopsis thaliana.

      Reviewer #2:

      Cochlearia, and other species that have rapidly evolved new ecological niches, represent excellent systems to study adaptation to past, present, future and changing environments. Furthermore, reticulate evolution within such groups offers a natural experiment to test hypotheses about the roles of hybridization, introgression, etc. on evolutionary dynamics, including pre-adaptation. However, there are also several significant challenges to using such systems, most crucially separating adaptation as the causal mechanism from the wide array of non-adaptive processes that could also cause the observed patterns. Overall, Wolf and colleagues do a nice job describing this complex taxonomic system and provide multiple lines of inquiry into how observed patterns may align with various adaptive scenarios. Despite the strong descriptive framework, I had trouble understanding exactly how causality could be assigned. Thus, the interpretation and discussion of the results felt speculative.

      Thank you for the encouraging comments. Yes, we agree: the points towards an important aspect of this kind of phylogenetic-systematic-evolutionary research, namely demonstrating causality. Honestly speaking, in such studies we are not able to show causality in its strict sense, and we think that the reviewer wants to claim this without using quite so strong wording. We considered this while re-phrasing respective paragraphs and also town down some speculative conclusion.

      Reviewer #3:

      There has been intense interest in how plants have responded during periods of rapid climate change in the past. Understanding these responses can increase our understanding of how plants might respond to rapidly accelerating anthropogenic climate shifts and help set conservation priorities. Many paleoecological studies have provided insight on how plants have migrated and persisted in suitable climate refugia (i.e. pockets of suitable habitat that exist even if regional climate is unfavorable for the persistence of a species) throughout glacial cycles, however there has been considerably less work that details the evolutionary dynamics of plants during these periods. This piece provides timely and valuable analyses illustrating the potential influence of pronounced climate change on the evolutionary dynamics of the genus Cochlearia.

      Thank you for the encouraging comments.

      The authors' use of cytogenetic analyses, organellar phylogenies, and demographic modeling allows for insights into the geographic patterns of diversity, speciation rates, and postglacial expansion scenarios of Cochlearia. Drawing unique conclusions from these different lines of evidence provides new understandings into the putative role of Pleistocene glacial cycles in driving evolutionary processes such as speciation. The study also aims to provide insight into the origins of the stated putative cold tolerance exhibited by Cochlearia by using a metabolomics approach; however, the framing and use of a single related outgroup (sister genus Ionopsidium) obfuscate the link between the results and stated conclusions.

      We appreciate this point, but indeed there is no other outgroup to be used. In this study we included all (both) genera with most of its species of tribe Cochlearieae. Within a family- wide phylogenetic context this tribe is placed along a polytomy (together with not well resolved other tribes) and stem group age of Cochlearieae is of appr. 18.9 million years ago (Walden et al., 2020). Therefore, for our research question additional outgroups from other tribes will not contribute any further information, because more basal splits are then nearly 20 million years ago (Early Miocene) with no biogeographic and environmentally defined scenarios that can be compared. 16-23 million years ago most tribes of evolutionary lineage II underwent an early radiation with highest net diversification rates (Walden et al. 2020) during this time. We included some of this information into the introduction.

      Specifically, regarding the approach that resulted in figure 4 which encompassed the metabolomics and related analyses, the initial climate groupings into 'climate ecotypes' would benefit from clarification and consideration of assignment methods. Typically, using the term ecotype invokes the idea of distinct forms of a species with phenotypic differences adapted to local conditions rather than groupings to those under broad climate regimes. While grouping populations according to climate origin can be useful, it is not clear how the final 9 WorldClim bioclimatic variables were selected (e.g. it is not apparent how importance of or correlations between climate variables, etc. were considered). Consequently, knowing this information would help understand the patterns in figure 4b, which seems to indicate that geographically distant populations experience very similar climate conditions (understanding that similarities can exist but variable selection can greatly influence these patterns).

      Thanks for this reminder to explaining selection and analyses of BioClim variables.

      As for the term “ecotype”: In plant taxonomy ecotypes are often referred to on subspecies level, in particular if environmental conditions are extremely different (e.g. heavy metal contaminated versus not-contaminated soils) and often these subspecies do not significantly differ in morphology (Noccaea caerulescens, Minuartia verna). In Cochlearia morphology is at best a morphospace which is more or less shared between all species in different ways. Species definition and taxonomy is based on a combination of largely overlapping morphospace, cytotype, ecotype and habitat types (bedrock; arctic, lowland to alpine; soil type and salt, life cycle) and distribution – often sole morphology is a bad species predictor (morphologically cryptic species – this is well-known also for some other arctic species such from the genus Draba). But the reviewer is fully right, that using the term ecotype here is somehow misleading. Our idea was to highlight that groups of taxa are combined by bioclimatic variables (and biomes or habitat types) while spanning the entire species/ecotype space of the genus – and this grouping follows also evolutionary meaningful cluster. We clarified this.

      As for selection of BioClim variables: we agree, indeed selection might have appeared arbitrary to the reader. Our original selection followed our field and cultivation experiences. However, structuring into four clusters as originally shown with the first submission is robust also when including all 19 BioClim variables. The same four cluster are retained in PCA, when temperature related BioClim variables are used only.

      Therefore, we added a Principal Component Analysis as starting point for Bioclim variable selection, secondly we added a PCA using temperature related BioClim variables 1-11 only. Built upon this we added a sentence why our nine selected variables were used to highlight the four groups in Fig. 4. The two PCA scree plots (including vector data) plus the correlation matrix and the results of a KMO test (Kaiser-Meyer-Olkin test: testing significant difference between the correlation matrix of variables and an identity matrix) are additionally provided with the Suppl. Material.

      The other concern is in regards to the framing and interpretation of these results. For instance, in the results (lines 329-330) and discussion (lines 419-423), the impression is given that experimental results here match those found in plants belonging to a different genus (i.e. Arabidopsis). However, rather than attributing this to more generally conserved mechanisms in response to considerable cold stress, the authors relate this to the unique history of Cochlearia (and its relationship to the drought adapted sister genus). The authors also note that surprisingly there was no demarcation of cold responses between the climate-defined groupings. Detailing why this is surprising given some of the other conclusion statements would be helpful. Some targeted revision to strengthen this link would be useful to bolster the inference of about the origins of cold tolerance in Cochlearia, rather than making it seem like this result could be expected in other taxa.

      Thank you for this. We agree that we did not explain our reasoning as well as we could and we now have reworked this. Original lines 329-330 simply refers to the (expected) and obvious general response to cold – some explanatory text has been added, e.g. such as at the end of the discussion and directly with the above-mentioned lines.

      Lastly, another area that would benefit from some clarification and tightening is revisiting the connection between the results and stated conclusions. For instance, some of the statements from the introduction and conclusions indicate the reader might expect explicit niche exploration analyses and more detailed genomic approaches. It is not abundantly clear for a general audience how these results definitively demonstrate how genetic diversity was rescued in reticulate and polyploid gene pools or species barriers were torn down. These are very specific, strong claims that do not appear to be explicitly discussed outside of the introduction/discussion or directly related to the results presented in this manuscript.

      Thank you for pointing out how this could be read in this way. We have revised this to indicate that agree: we do not think our data ‘definitively demonstrate’ (in the reviewer’s words, not our) this. We modify the text to avoid such interpretation.

      This is no way diminishes the considerable effort of the authors to conduct the informative array of presented analyses, but more closely aligning the conclusions within the scope of presented results (or providing direct links on how the results provide these insights) would help increase the effectiveness of this manuscript.

      Many thanks for this very encouraging note. We have worked to incorporate these thoughtful comments.

    1. Author Response:

      Reviewer #1 (Public Review):

      Molecular probes that respond to disease-specific activities to produce a diagnostic readout have had a major impact in the clinical management of cancer. The current study extends the teams previous work on the development of a molecular sensor for the cancer-associated, fibroblast activation protein (FAP). The molecular sensor is based on CGRP (Calcitonin gene-related peptide), a potent vasodilator of human arteries which mediates relaxation of arteries via activation of the CGRP(1)-type receptor. The sensor is fused to biotin with a linker sequence that contains FAP cleavage sites. In its intact form, the sensor fails to activate the CGRP-receptor, however, in the presence of FAP, proteolytic release of CGRP from inhibition, leads to CGRP-receptor engagement, which is then detected by changes in MRI contrast. Receptor activation on the vasculature, provides a diagnostic readout via local changes in hemodynamic image contrast for MRI. This is a technical report that provides a proof of principle evidence that a sensor for FAP proteolytic activity can be used in rodent models, with a robust signal to noise. However, the discussion and abstract overstate the clinical impact of the findings.

      We are grateful for the Reviewer’s overall assessment and have made a number of additions and edits to the manuscript to characterize and clarify the clinical potential.

      Reviewer #2 (Public Review):

      In this manuscript, the authors create an imaging probe for magnetic resonance imaging (MRI) that is based on triggering vasodilation in a protease-dependent manner. The imaging probe is a steric blocking domain fused to the N-terminus of a vasoactive peptide connected by a linker that is sensitive to proteolytic cleavage by fibroblast activation protein (FAP). The linker design was optimized for FAP-mediated cleavage and led to a 34-fold increase in activity when there was FAP present. The imaging probe detected cells overexpressing FAP implanted into the rat striatum when infused directly at the transplantation site or into the cerebrospinal fluid. The authors also create a kinetic model to determine FAP catalysis rate, k, from temporal MRI signal. Lastly, the authors demonstrate in a proof-of-concept experiment that the vasoactive peptide is able to create imaging contrast in a nonhuman primate brain.

      This group has previously described using peptide-mediated vasodilation as a method for image contrast in MRI. In this work, they advance this concept to make the peptide activity triggered by protease cleavage, thus creating an activity-based molecular imaging probe. The design presented in this work could likely be adapted to a wide range of proteases through substitution of the substrate that links the steric blocking domain and the vasoactive peptide allowing for the study of a wide range of protease activity in diseases that affect the brain. The creation of activity-based imaging probes is an important area of study for advancing precision medicine because the imaging signal may more accurately represent disease prognosis and stratification over conventional imaging probes.

      A claim made in the abstract that is provided with limited support is the whether the probe allows for quantitative analysis of FAP activity. A useful measure for a diagnostic would be whether the imaging signal can quantitate the amount of enzyme activity. In this study, all in vivo experiments were conducted with the injection of a single concentration of cells overexpressing FAP transgene.

      We thank the Reviewer for this considered assessment. In this paper, we use kinetic modeling to quantify FAP activity over animals and over voxels in the experiments of Figures 2 and 3. Although these experiments followed implantation of a fixed number of FAP-expressing or control cells per animal, the three-dimensional distribution of FAP activity produced in the brain is heterogeneous, thus allowing for spatially resolved quantification of enzyme activity that addresses the Reviewer’s point. To this revision, we further add an analysis of enzymatic constants obtained on a per-voxel level from the data in Figure 4, indicating approximate linearity of the relationship between maximum signal change and the probe cleavage rate constant k. Values of k that range from 0 to 0.03 s^-1 are obtained from our analysis, corresponding to estimated localized FAP concentrations ranging from 0 to 20 nM in rat brains. These values are physically plausible and consistent with in vitro quantification of FAP activity over a range of transfected and patient-derived tumor cell numbers in data added to Figure 4 of the revised manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      In their manuscript, Urtatiz and colleagues propose that gain-of-function mutations affecting the G-alpha-q signaling pathway are not tolerated in melanocytes residing in the interfollicular epidermis because of paracrine signals from neighboring keratinocytes. This is an interesting and important hypothesis that would explain a mystery to the melanoma field - i.e. why are GNAQ/11 mutations common in uveal melanoma, among other rare subtypes, yet are exceedingly rare in cutaneous melanoma.

      Specific comments on experimental work:

      Previously, this group showed that forcible expression of oncogenic GNAQ during embryogenesis depletes melanocytes in the interfollicular epidermis. This paper offers an advance because TYR-cre mouse is inducible at later points in life, which also permits in vitro explant cultures to perform more in-depth studies. This is a major strength of the manuscript.

      Major concerns:

      The most significant issue with the experimental results is that in most of the explant cultures, the melanocytes are not proliferating. Instead, the authors are observing which melanocytes are dying slower than others. This seems a bit strange because there are countless examples of laboratories who have established healthy murine melanocytes in tissue culture, and it raises the question that there is something off with the culture conditions.

      Thank you to the reviewer for their thorough analysis and for raising interesting questions.

      Yes, it is true that the wildtype and GNAQ^Q209L expressing melanocytes in the survival curves (in Figure 2C, for example) did not increase above the number plated during the experiment. However, the Braf^V600E expressing cells did increase, which suggests that the culture conditions were not fundamentally off. It is important to note that the melanocytes in our study were sorted from mouse tail IFE. As far as we know, other labs have used mouse trunk skin or neonatal skin, which is populated by immature migrating melanocytes. These tissues have stem cells, for example in the hair follicle niche, and highly proliferative cell populations. We chose tail skin because it has a permanent population of IFE melanocytes.

      Our goal in these studies was a short observation with minimal interference. Encouraging the melanocytes to behave in a certain way in culture could change the results of the experiment or mask the differences between wildtype and GNAQ^Q209L melanocytes resulting from the microenvironment. We don't agree that studying the rate of dying off is not applicable, when the relevant in vivo phenotype is that GNAQ^Q209L melanocytes experience gradual attrition from the IFE.

      In addition, when sorting the GNAQ-mutant melanocytes, there is a selection for a subpopulation that did not die. This introduces a bias and seems, at minimum, worthy of discussion. One potential experiment to remove these doubts would be to isolate the GNAQ-mutant melanocytes prior to tamoxifen treatment and then induce the mutation formation in vitro.

      If we understand this correctly, the concern is that since some melanocytes have been lost in the GNAQ^Q209L IFE by the time we do the FACS, this approach may have selected for a subpopulation of melanocytes that are more resistant to the effects of GNAQ^Q209L. Maybe the GNAQ^Q209L cells survived better than wildtype in culture on fibronectin because they were naturally more robust. How then do we explain why the GNAQ^Q209L melanocytes survived less well than wildtype when cultured with IFE? If there is a subpopulation at play, it is behaving exactly as one would expect from the mice, so we don't see how it changes the conclusions.

      We have added this sentence to the manuscript: "Note, since some melanocytes have been lost in the GNAQ^Q209L IFE, this approach may have selected for melanocytes that are more resistant to the effects of GNAQ^Q209L.".

      The suggested experiment is difficult to do because Cre is required to turn on both Tomato and GNAQ^Q209L. To be able to do the suggested experiment, we would have to first obtain a different transgenic mouse line that constitutively expresses a fluorescent marker in melanocytes and cross it into our mouse model. Alternatively, we could try to sort melanocytes following antibody staining against a cell surface marker. In fact, neither of these approaches is free from the possibility of selecting subpopulations that vary in expression of the markers.

      Specific comments on bioinformatic work:

      Major issues:

      The evidence that there is selection against PLCB4 in cutaneous melanoma is weak. It is true that mutations frequently affect PLCB4, but this is equally true for a great number of genes in melanoma because of the high mutation burden in this cancer type. Following their lines of reasoning, the authors could make an equally compelling case that TTN, the gene encoding the muscle fiber titin which is the largest gene in the genome, is under selection. Unfortunately, the ratio of nonsynonymous to synonymous mutations is not supportive of the authors' argument that PLCB4 is under selection. It is somewhat bizarre that the authors' entirely disregarded synonymous mutations, but this reviewer looked them up in the TCGA study, and they are abundant. To identify genes under selection, there are much more sophisticated strategies that take into account the trinucleotide context of mutations, the ratio of nonsynonymous to synonymous mutations, and/or the ratio of exonic to intronic mutations. To be sure, the authors correctly point out these sophisticated algorithms have missed driver mutations in the past, but the missed mutations tend to be exceedingly rare, hotspot mutations. If the authors are going to make the case that loss-of-function PLCB4 mutations are under selection in melanoma, then the onus is on them to explain why the much more sophisticated strategies, previously invoked, have missed this finding, and they should employ even better methods to make their point. Unfortunately, the strategies that the authors do employ fall for the same traps that many older papers in the field stumbled upon. For example, they make the case that PLCB4 mutation are more frequent in melanomas with high mutation burdens. While this seems to denote a biological signal, it is exactly what one would expect to observe if a gene only harbored passenger mutations.

      We understand the concerns of the reviewer. To explain, we were led to the question of whether the frequent non-synonymous mutations in PLCB4 could play a role in cutaneous melanoma from a logical deduction based on biological insight, which we found quite convincing:

      1) We had found that oncogenic Gq inhibits melanocyte growth/survival when the cells are located in the epidermis.

      2) Phospholipase C beta (PLC-Beta) is the immediate effector of heterotrimeric G protein alpha subunits of the q class, GNAQ and GNA11. It is very likely to be involved in the pathway inhibiting melanocytes in the IFE.

      3) PLCB4 was already identified as a significant melanoma gene in uveal and CNS melanomas, through a recurrent hotspot mutation. Furthermore, the hotspot mutation in PLCB4 is mutually exclusive with the hotspot mutations in GNAQ or GNA11 in these cancers.

      We agree that more could have been done to describe the synonymous mutations and examine the ratio of non-synonymous (missense, nonsense) mutations to synonymous mutations. These studies can be found in the manuscript and in Supplementary File 1 Table 1f, 1g and Supplementary File 3. Using the approach described in Van den Eynden and Larsson (1), which takes into account the mutational signature of melanoma, we found that the synonymous mutations were significantly less frequent than expected based on the expected ratio (p = 7.35 x 10^-4; one-tailed binomial test). This indicates that there is positive selection for the non-synonymous mutations in PLCB4 in cutaneous melanoma.

      (1)Van den Eynden and Larsson (2017) Front Genet Jun 8;8:74 [https://doi.org/10.3389/fgene.2017.00074]

      In addition, not included in the above analysis were 13 more mutations having to do with splicing: 7 splice region, 3 splice donor site and 3 splice acceptor site mutations in the melanomas. While reviewing the literature, we noticed that PLCB4 was already identified in Chen et al. (2) as a gene subject to positive selection for splice-site mutations (p=7.86Ex10^-4; q=5.77Ex10^-2) in a genome wide analysis of melanoma.

      (2)Chen et al (2015) Mol Biol Evol. 2015 Aug;32(8):2181-5 (mentioned in the Supplement).

      These findings strengthen our hypothesis. We propose that loss-of-function mutations in PLCB4 simultaneously reduce Galpha_q and Galpha_11 signaling, releasing some of the inhibitory effects of the pathway caused by the epidermal microenvironment.

      As you know, due to a number of factors related to the process of mutagenesis, gene size and accessibility, different genes have different rates of mutation. PLCB4 might be a gene that is more likely to be mutated. It is the 99th most frequently mutated gene in the SKCM dataset. PLCB4 has more synonymous mutations (34 affected cases in total) than NF1 (14 cases), TP53 (0 cases) or PTEN (0 cases). Looking at the problem from our perspective, it seems that biological insight and hypothesis driven inquiry could help determine whether some of the genes with a higher inherent rate of mutation are playing a role in melanoma.

      We also would like to clarify the statement that "...PLCB4 mutations are more frequent in melanomas with high mutation burdens." What we saw was that the melanomas with 2000 to 3000 mutations overall had the highest rate of PLCB4 non-synonymous mutations (Figure 7E). Melanomas with more than 3000 mutations actually had a reduced rate of non-synonymous mutations in PLCB4. It was not simply that more overall mutations equaled more mutations in PLCB4.

      The Semaphorin mechanism is interesting but remains too speculative to warrant so much attention and space.

      We have reduced the emphasis on semaphorin signaling.

      In addition, there was too much speculation regarding signaling mechanisms in the apoptosis section of the manuscript. Generally speaking, RNA-sequencing is a powerful tool, but when there are >1000 differentially expressed genes, it is too easy to construct "in silico" mechanistic stories centered around a handful of genes. These would need be backed up with biological data, but in this case, additional mechanistic studies would go beyond the scope of the manuscript. Instead, this reviewer suggests removing these points.

      We have reduced the discussion of particular genes that supported increased apoptosis in GNAQ^Q209L melanocytes in our RNAseq analysis.

    1. Author Response:

      Reviewer #1 (Public Review):

      Reichardt M. et al investigated cardiac tissue of Covid-19 samples compared to other infections (influenza and coxsackie) and control. Using X-ray phase-contrast techniques, they provide interesting results on the microstructure description. For instance, the orientation of the cardiomyocytes, their degrees of anisotropy, their shapes (obtained via structure tensor analysis). They also present interesting findings thanks to the segmentation of the vascular network analysis (via deep learning method).

      This paper is using state of the art techniques. The experiments use the latest development of X-ray phase-contrast techniques (at laboratory and synchrotron). Analysis are using machine learning approach. This paper will therefore serve as a reference for future analysis on cardiac tissue. Furthermore most of the tools used are publicly available.

      The results presented show that X-ray imaging is providing more information than standard histology by accessing 3D information. This is illustrated for example by the 3D vasculature tree to assess intussusceptive angiogenesis.

      In overall the paper is well written and giving clear background and explanation that people outside the fields can follow. The findings and conclusions of this paper are mostly well supported by data and analysis, but some aspects in the image acquisition, sample choice and data analysis need to be clarified.

      We are very motivated by this positive assessment of the reviewer, regarding the methodology, the information gain by the 3d approach, and in particular by his/her judgment of the work to be a reference for future analysis.

      1/ In the abstract, the authors don't mention the laboratory setup which is a big part of their results and actually the technique that could pave the way to a clinical translation of the technique. A sentence mentioning it is necessary.

      We have now added the information on the laboratory data in the abstract.

      2/ The authors present their results as "fully quantified". It would be nice to see how the results presented in this paper are comparable to the gold standard analysis used so far (i.e. conventional histology analysis). This technique seems to provide more or different information not accessible by 2D slices analysis as done in histology.

      The structural parameters obtained by shape measure analysis introduced in this manuscript as well as the segmentation of the vasculature extend conventional 2d histological investigations by a third dimension. Figure 1 of the appendix shows the HE stain which is gold standard in clinical routine of all samples.

      3/ A major drawback of the technique presented here is that it is necessary to make a biopsy punch on the initial paraffin block. It means that the original sample is destroyed (which also goes again a bit the "non-destructive" claim of the method). For the high resolution acquisition done with the Wave-Guide setup, a second biopsy punch is even done. Several questions can then be raised: How those biopsy punches have been selected, how is this representative compare to the entire samples, etc.

      In general X-ray phase contrast tomography is a destruction-free imaging technique. However, in order to reduce absorption, we chose to take biopsies from the paraffin blocks. In view of the desired resolution and image quality our intention was to suppress artifacts of local (interior) tomography, and hence we did not record scans on large tissue blocks, but chose the approach of biopsy punching. Importantly, the structure of these biopsy cores is still intact, and does not suffer from cutting and staining artifacts as in conventional histology. Further, the biopsy cores taken can either be used separately for additional histological slices or reconstructed into the existing core leaving near to no trace of the procedure itself and not hampering clinical diagnoses. We agree that for future work, larger tissue pieces up to an entire organ would be an interesting option.

      4/ The statistics obtained are based on sparse data with large error bars. Only 26 samples have been used. For instance, the parameters obtained for the shape of the cardiac tissue represented in a ternary diagram in Figure 5, present tendencies but it would need more statistics to clearly affirm that there is a clear difference between the groups.

      At this point, the main limitation with respect to increasing the cohort is actually to obtain more samples from post mortem autopsies as this is not always possible in clinical routine. For this reason, we think that 26 samples is already a good start, in particular since the shape measure analysis described in this manuscript is intended as a proof-of-concept pipeline for future investigations of the 3d cardiac tissue structure. Each tomographic reconstruction yields a volume of approximately 8x10^9 voxels, hence there is certainly no sparsity for each patient. At the same time –we fully agree – that increasing the cohort size is the next important step. Once that the potential gain of these investigation is made clear, it will be easier to convince the medical community that this needs to be done.

      5/ Concerning the sample selections, several samples have been taken for control (2 patients and 6 samples) and coming from 2 young patients while for the diseased samples only one sample per patient have been collected, on older patients. Furthermore, the majority of the patients are men. How the authors are sure for instance that the control patients didn't have another disease that could have affected the cardiac structure? Could they see any differences between gender, as it has been shown in other COVID-19 studies? Could the difference in age also have an impact?

      The medical background of all patients is provided in Appendix 2 Table 1. All samples have been investigated by pathologists before we investigated the structure of the tissue using X-ray tomography. Effects of gender and age were not taken into account in this study since the focus of this manuscript is on the introduction of the shape measure analysis pipeline and more importantly the evidence of the presence of intra-luminar pillars in cardiac tissue of Covid-19 patients. However, the background information is publicly available and can be further investigated.

      Reviewer #2 (Public Review):

      The works aim to characterize cardiac tissues from patients which have succumbed to Covid-19. The authors studied pathological and normal tissues using microtomography scans performed at different resolution scales. Starting on the reconstructed volumes, special automatic analytical procedures were developed to extract some quantitative structural parameters about the samples themselves. This characterization method was used previously in the study of murine heart models. The main outcome of the research is that there are some well defined characteristics found in Covid tissues that are not revealed in other pathological and normal samples. The authors achieved the proposed aims and their conclusions are supported by the obtained data. The samples statistics should be further improved but it is already enough significant to validate the outcomes.

      We are grateful for this precise and positive assessment.

    1. Author Response:

      Reviewer #1:

      This paper uses intracellular patch-clamp recordings of hippocampal CA1 pyramidal cells in awake mice running on a cued treadmill to investigate whether dendritic plateau potentials, which can induce place fields in silent cells through Behavioral Time Scale Plasticity (BTSP), could also modify the spatial modulation of existing place cells. They report that plateau potentials can lead to the formation of a secondary place field by synaptic potentiation while reducing the primary place field by synaptic depression. As for place fields induced in silent cells, the spatial extend of this bi-directional plasticity depends on the speed of the animal during induction suggesting a fixed time course. Further analysis revealed that the sign and magnitude of Vm changes varied in a distance/time -dependent manner from the location/time of plateau induction such that Vm tended to increase at plateau location and to decrease away from the plateau in both directions adding a bidirectional property to previously described BTSP. The sign and magnitude of the plasticity also depends on the value of Vm at the time/location of plateau induction such that if Vm is more hyperpolarized than -55 mV the plasticity induces a depolarization at the plateau location and less hyperpolarization away from that location as observed in silent cells while if Vm is more depolarized than -55 mV the plasticity induces a smaller depolarization and more hyperpolarization further away. The authors then used Vm manipulations and computational modeling to show that the critical factor was not the absolute Vm level but instead the level of potentiation of activated synapses. Finally, the authors used a network model of CA1 to show that BTSP can account for over-representation at reward location within a familiar environment. Altogether, this work represents a nice combination of cutting-edge experimental work and modeling that shed new lights on cellular mechanisms allowing experience -dependent modifications of spatial maps in familiar environments.

      Major comments:

      1) The procedure for BTSP induction is described without much details in the text and method. According to the text from 1 to 8 laps/ stimulation were used to induce the secondary place field. Why is there such variability? I guess that sometimes one stimulation is not enough to induce the place field but this should be clearly stated in the introduction. Also how does one know if a new place field is induced? Sometimes the new and old place field strongly overlap (e.g. Fig. 1C, blue trace) and it must be difficult "by eye" to decide if a new place field was induced. For readers to get a better idea of the all process could you report the average number of laps/stimulations used to induce the secondary place field? Could you mention how many laps/stimulations were used for the example traces shown in Figure 1C. Was the success rate 100%? How many laps were recorded after induction and used to compute the average traces shown in Figure 1C? Could you please report those numbers in the text and the figure legend? Also, in case a primary place field was induced by BTSP how many laps/stimulations were used? Was it more difficult to induce a secondary place field compared to a primary one?

      We have now included additional details in a new Supplementary Figure S1, and have added the following text to the Materials and Methods:

      P25, L664: “Plasticity was induced in vivo by injecting current (700 pA, 300 ms) intracellularly into recorded CA1 neurons to evoke dendritic plateau potentials at the same position on the circular treadmill for multiple consecutive laps. In most cases, plateaus were evoked on five consecutive laps (Figure S1D, left). However, during some experiments, large changes in the spatial Vm ramp depolarization could be observed to develop after as few as one plateau (consistent with the observation that plasticity could be induced by a single spontaneously-occurring plateau), and so fewer induction laps were used. In other experiments, plateaus were induced on more than five consecutive laps if place field expression remained weak after the first five trials (Figure S1D, left). The source of this variability across cells/animals is not yet clear, and requires future investigation. Overall, this procedure induced changes in spatial Vm ramp depolarization in 100% of cells in which it was attempted by three investigators. In some cells, the initial place field was first induced by this procedure, and then the procedure was repeated a second or third time in the same cell with plateaus induced at different locations. In those cases, there was no systematic difference in the number of plateaus required to induce the first place field compared to subsequent fields (Figure S1D, right).”

      We also now report in the legend of Figure 1 and in the Materials and Methods (P27, L716) that spatial Vm ramps before and after plasticity were computed by averaging across 10 laps.

      2) Along the same line the behavior of the mice is not extensively described. However, behavior and notably running speed seems to have a major impact on the spatial extend of the plasticity. Could you plot the speed profile of mice below voltage traces for example in Figure 1B, S1I and so on. Also could you show lap by lap speed profiles superimposed for one recording session to have an idea of the variability and overall stereotypy of behavior. It appears that the place fields can span the border between laps (i.e. start before and end after the reward zone). How is it possible if animal stops at the end of a lap to get reward? Usually these stops induce a state change with reduced theta oscillations, which is less favorable to place cell coding. Is it the case that some animals do not stop at reward location? Could you give more details here?

      We have now included additional details in a new Supplementary Figure S1, and have added the following text to the Materials and Methods:

      P26, L679: “Since the time window for plasticity induction by BTSP extends for seconds around each plateau, and plateaus were typically evoked on multiple consecutive laps, the changes in synaptic weights induced by BTSP depended on the run behavior of the animals across all induction laps. We showed in Figure 3D that the spatial width of place fields induced by BTSP varied with the average velocity of animals across all plasticity induction laps. Another factor that contributed to the spatial width of induced fields is the proximity of the evoked plateaus to the reward site, as animals tended to stop running briefly to lick near the fixed reward site. Variability across laps in either the run velocity or the duration of pauses could pose a challenge in trying to relate spatial changes in Vm ramp depolarization to the time delay to the plateau (see below). Figure S1 shows the full run trajectories of animals during all plasticity induction laps for the five example cells shown in Figure 1. While some variability across induction laps was observed, each animal tended to run consistently at similar velocities across laps.”

      Please also note that, on the circular treadmill, the place fields of presynaptic neurons in CA3 can “wrap around” the track (e.g. see presynaptic firing rates schematized in Figure 3J). In some cases, this meant that the same synapse that generated an eligibility trace and underwent plasticity before the animal stopped to lick for reward, was also activated and contributed to Vm ramp depolarization once the animal continued running, since the spatial positions were traversed contiguously. In the model, spatial CA3 inputs were considered to be silent during pauses in running.

      3) The rationale behind the analysis of delta Vm against time from plateau induction shown in Figure 2E, 3 and 4 and associated supplementary figures is not clear from the text and method sections. If I understood well this analysis uses the difference between average Vm of several laps after the induction laps minus the average Vm of several laps before the induction laps but then uses the speed of the animal during the induction laps to convert this delta Vm trace in the temporal domain. But this assumes a relatively constant behavior of the animal during induction. If induction is performed over 1 or 2 laps the chance of a constant speed are probably higher than if it is performed over say 7-8 laps. If the animal slows down consistently or even stops during induction laps 6-8 but runs fast during induction laps 1-5 how does one interpret the DeltaVm over time representation? Authors should report the number of laps used for induction for the traces illustrated in Fig. 2 and the time against position traces for all individual induction laps superimposed on top of the average in Fig. 2C and delta Vm against time traces for all individual induction laps superimposed on top of the average in Fig. 2E.

      We have now included additional details in a new Supplementary Figure S1, and have added the following text to the Materials and Methods:

      P27, L732: “In order to relate spatial changes in Vm ramp depolarization to the time delay to a plateau (e.g. Figures 2E, 2F, 3A – 3F, 3I, 4B, 4C, 4E, 4F and 6E), we assigned to each spatial position the shortest time delay to plateau that occurred across multiple induction laps (Figure S1). This is a conservative estimate, as the shortest delay between presynaptic activity and postsynaptic plateau will generate the largest overlap between eligibility traces (ET) and instructive signals (IS), and will result in the largest changes in synaptic weight. While this method is imperfect and did discard variability in running behavior across laps, it enabled direct comparison of the time-course of BTSP across neurons. We also note that, to generate the modeling results shown in Figure 6, the full run trajectory of each animal during all induction laps, including pauses, was provided as input to the model (see details below). This resulted in good quantitative agreement between experimentally-recorded and modeled spatial Vm ramps (Figure 6D).”

      4) In Figure 3D it is unclear where the PCs within-field data comes from. The n = 26 suggests that this data includes all stimulation but in most cases induction was performed by stimulating outside place field location (as shown in Figure 1D) and induction is done by stimulating always in the same position (except for 2/24 cells were there was a third induction). Could you please specify?

      We acknowledge this it was confusing how data points were selected for inclusion in the category “PCs (within-field)” in Figures 3D and 4C. For each of the 26 inductions performed in cells with pre-existing place fields, cells were most depolarized at spatial bins within their place fields, but were also relatively hyperpolarized at positions outside their place fields. On this background, we induced plasticity by evoking plateau potentials at a fixed location, which was at a different distance from the initial place field in different cells, as highlighted in Figure 1D. In Figure 3D, we sought to determine if changes in Vm ramp were different at positions that were depolarized, compared to positions that were hyperpolarized. To do this, we pooled data from all 26 inductions, selecting only the spatial bins in each recording where the Vm ramp depolarization exceeded a threshold of -56 mV.

      We have now revised the text to clarify this point:

      P8, L184: “In Figures 3D – F, we examined this further by comparing data from initially hyperpolarized silent cells (black; n=29 inductions, see Figure S3 and Materials and Methods) to data from place cells (dark red; n=26 inductions). Place cells were on average more depolarized before plasticity than silent cells (Figure 3D), and more depression occurred in place cells compared to silent cells (Figure 3E). However, each place cell had both spatial positions where it was depolarized within its place field, and positions where it was hyperpolarized out-of-field. To determine if spatial positions that were initially depolarized were associated with larger depression, we grouped Vm ramp data from all place cells, considering only spatial bins where each cell was more depolarized than a threshold of -56 mV (light red traces labeled “PCs (within-field)” in Figures 3D – F). Indeed, more depression and less potentiation was induced in place cells at those spatial positions that were initially most depolarized (Figure 3E).”

      5) In the modeling experiment (Figure 6 A) it is unclear why the dV/dt trace show no change for synapses activated before the plateau (unlike what is illustrated in Figure 5C). In my understanding the eligibility trace of these synapses shown in green allow them to be potentiated by a certain amount that depends on the overlap of their eligibility trace with the instructive signal. Maybe to facilitate understanding authors could show the post-synaptic potentials before and after plateau induction in Figure 5A.

      The Reviewer pointed out that in Figure 5, changes in synaptic weight occur for inputs activated before a plateau, but this appeared to not occur in Figure 6A. This is not the case – in both Figures, an eligibility trace (ET) is generated at the time of presynaptic activity, but changes in synaptic weight do not occur until later when the plateau arrives and an instructive signal (IS) is generated. In the example shown in Figure 6A, nonzero changes in synaptic weight occur for all presynaptic inputs (each row in the bottom panel labeled dW/dt). However, these changes do not begin until after the plateau is initiated. We have revised the text to clarify this point:

      P20, L498: “Note that, at inputs activated before the onset time of the plateau, changes in synaptic weight (bottom row) do not begin until after plateau onset when the instructive signal IS and the signal overlap ET*IS are nonzero.”

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript integrates conditional mouse models for TRAP, PAPERCLIP and FMRP-CLIP together with compartment specific profiling of mRNA in hippocampal CA1 neurons. Previously, similar approaches have been used to interrogate mRNA localization, differential regulation of 3'UTR isoforms, their local translation, and FMRP-dependent mRNA regulation. This study builds on these previous findings by combining all three approaches, together with analysis of mRNA dysregulation in Fmr1 KO neuron model of FXS. The strengths of the paper are the rich data sets and innovative integration of methods that will provide a valuable technical resource for the field. The weakness of the paper is the limited conceptual advance as well as lack of deeper mechanistic insights on FMRP biology over previous studies, although the present study validates and integrates past studies, adding some new information on 3'UTR isoforms.

      We appreciate the Reviewer’s recognition that “the present study validates and integrates past studies, adding some new information on 3'UTR isoforms”. We also appreciate the Reviewer’s recognition that “The strengths of the paper are the rich data sets and innovative integration of methods that will provide a valuable technical resource for the field.”

      We differ, however, with the concern that the work presents a “limited conceptual advance.” Specifically, we find, for the first time, that FMRP regulates two different biologically coherent sets of mRNAs in CA1 neuronal cell bodies and neurites. This provides a profound new insight into FMRP-RNA regulation, including the fact that these two different sets of mRNA targets (encoding chromatin-associated proteins and synaptic proteins, respectively) are both translationally regulated by FMRP and transcribed from genes implicated in autism.

      We recognize that FMRP was known, by our own work and that of others (as noted by the Reviewer) to regulate specific targets “in bulk” in neuronal cell types, brain and even in CA1 neurons. What is most unexpected here? Among directly bound FMRP mRNAs in brain CA1 neurons, there is subcellular compartmentalization of this regulation. This is new for FMRP, and in fact is new for RNA binding proteins more generally (recognizing of course the extensive work on RNA localization in different compartments previously discovered by others, beginning with Rob Singer’s work on actin localization and up to the present in work on neurons).

      We also think it is also important for readers to understand up-front the novelty in “combining approaches” referred to. We use cell-specific (cTag) CLIP to define direct FMRP interactions in subcompartments--dendrites vs cell bodies--of CA1 neurons within mouse brain hippocampus. We also normalize this data to ribosome-bound mRNAs in CA1 neurons, and validate observations by studying WT and FMRP-null brains. This set of complex mouse models and methods is completely new, and its application is what allowed us to make robust conclusions about FMRP translational regulation of different mRNAs in different cellular compartments.

      We strongly disagree with the Reviewer’s comment that FMRP directly interacts with functional classes of mRNAs in different cellular compartments “has previously been shown in the field.” Compartment-specific FMRP-CLIP has not been reported that we’re aware of, much less in a cell-type specific manner. Our previous cell-type specific FMRP-CLIP experiments have been on bulk neuronal material (Sawicka et al. 2019; Van Driesche et al., n.d.). Although cell-type specific TRAP-seq has been performed on microdissected CA1 compartments (Ainsley et al. 2014), investigators were unable to isolate significant amounts of RNA from resting neurons, and degradation of the isolated RNAs did not allow the types of 3’UTR and alternative splicing analyses that were performed here. The Schuman group has performed extensive analysis of mRNAs from microdissected CA1 compartments (Cajigas et al. 2012a; Tushev et al. 2018), but have not performed FMRP-CLIP or any experiments using cell-type specific or direct protein-RNA regulatory methods. In vitro systems have been used to analyze mRNA localization in FMRP KO systems (i.e. (Goering et al. 2020)), but in vitro systems are unable to fully recapitulate the complexities of in vivo brain regions, and did not analyze direct RNA-protein interactions. As our work is on in vivo brain slices, is cell-type specific, and integrates TRAP-seq, PAPERCLIP and CLIP-seq datasets, we believe that our work is novel and will be of great interest to the field.

      Despite the fact that FMRP targets are overrepresented in the dendritic transcriptome, it does not appear from this study that FMRP plays an active role in the mechanism of dendritic mRNA localization, at least under steady state conditions. One goal of the manuscript is to address a major question in the mRNA localization field, which is how FMRP may differentially modulate "localization" of functional classes of mRNAs such as those encoding transcriptional regulators and synaptic plasticity genes (Line 78-90). The data here indicate that FMRP directly interacts with functional classes of mRNAs in different cellular compartments, which has previously been shown in the field. However, no evidence is provided that mechanistically reveal a role for FMRP to promote subcellular localization of different functional classes of mRNAs. The correlative evidence presented in this manner does not add mechanistic insight.

      We do recognize that the question of what localizes FMRP mRNA targets differentially in the dendrite (and cell body) is of great interest, and remains unanswered. We also appreciate that, despite the Reviewer’s comment above, they also recognize “it does not appear from this study that FMRP plays an active role in the mechanism of dendritic mRNA localization, at least under steady state conditions.”

      We believe that some of the confusion here lies in the Reviewer’s comment “One goal of the manuscript is to address a major question in the mRNA localization field, which is how FMRP may differentially modulate "localization" of functional classes of mRNAs such as those encoding transcriptional regulators and synaptic plasticity genes (Line 78-90).” While this is a question of interest that has been studied, we think there is a major disconnect here in the Reviewer’s comments and our findings. To be clear, in the original manuscript, we did not find evidence, in WT vs KO CA1 neurons, that FMRP was acting to differentially localize mRNAs, including those mentioned by the Reviewer.

      Nonetheless, to further address the issue of a possible role for FMRP in localizing the transcripts it regulates, we have now performed quantitative analysis of FMRP target mRNA localization in dendrites from WT vs. Fmr1 KO mice. These results are now presented in Supplemental Figures 9 and 10 of the manuscript, and which we present and summarize below.

      Supplemental Figure 9. FMRP is not required for localization of its targets into the dendrites of CA1 neurons. A) Dendrite-enriched mRNAs were defined in FMRP KO mice (red) in the same manner as in Figure 1 for FMRP WT animals using bulk RNA-seq and TRAP-seq data. Overlap with dendrite-enriched mRNAs in WT (Figure 1, shown here in green) and CA1 FMRP targets (blue) in shown. 95.6% of dendrite-enriched FMRP targets in the WT were also found to be enriched in the dendrites of FMRP KO animals. B) Dendrite-present mRNAs were defined in FMRP KO. Overlap with dendrite-present mRNAs in WT (Figure 1) and CA1 FMRP targets is shown. 95.7% of dendrite-present FMRP targets in WT are also to be found as dendrite-present in KO animals. C-E) FISH was performed to assess FMRP target localization (Kmt2d (C) , Lrrc7 (D) and Map2 (E)) in FMRP KO mouse brain slices. Left panel shows the proportion of detected mRNAs that were detected in the neuropil (> 10 um from the predicted Cell bodies layer) in WT and KO animals. Wilcoxon ranked sum was performed to detect significance. Middle panel shows densitometry of 1000 spots samples from each picture analyzed. Distance from the CB was determined as described in methods and Figure 1. In the right panel, spots were binned into 15 groups according to the distance traveled from the CB, and the fraction of spots in each genotype in this range was analyzed by t-test to determined differences in the fraction of spots at each location in FMRP WT and KO animals (* indicates p-value < .05, ** is < .01).

      Supplemental Figure 10. FMRP is not required for differential localization of 3’UTR isoforms of its targets. A) Differential 3’UTR usage was analyzed using DEXseq as described in Figure 2 to identify 3’UTRs whose ratio of usage between neuropil and CB in FMRP WT and KO animals were altered. Shown is results from DEXseq analysis showing the log2foldChange (neuropil vs cell bodies, KO vs WT) and -log10(p-value) of each 3’UTR. Gray spots indicate that all 3’UTRs analyzed have an FDR > .05, indicating no significant change in usage between FMRP KO and WT animals. B and C) FISH analysis of localization of 3’UTR isoforms of Cnksr2 (B) and Anks1b (C ) isoforms in FMRP WT and KO animals. These genes were found in Figure 2 to express 3’UTR isoforms that are differentially localized to dendrites. Sequestered isoforms are those that are significantly localized to cell bodies in FMRP WT, and Localized are those that are significantly used in the dendrites of WT CA1 neurons. Left panel, the fraction of spots that are found to be localized to the neuropil (> 10 um from the cell body layer) are shown for each isoform in FMRP WT and KO animals. Differences were assessed by wilcoxon ranked sum tests. Middle panel, densitometry of the distance traveled from the cell bodies for a representative 1000 spots from each picture that was analyzed. Right panel, as described in Supplemental Figure 9, detected mRNAs were binned into 15 bins according to the distance traveled from the cell bodies, and differences in the fractions of spots in each bin in FMRP WT and KO slices were analyzed. Significance indicates results of t-tests (* indicates p-value < .05).

      In summary, we characterized the dendritic transcriptome in FMRP KO animals, and compared it to the FMRP WT results presented in Figures 1 and 2, as suggested by the Reviewers. We find that the dendritic transcriptome of FMRP KO animals is extremely similar to that of FMRP WT animals, with ~95% of mRNAs found to be dendrite-present or dendrite-enriched in WT also being found in FMRP KO animals (Figure S9). We validated these results with FISH and found no evidence for significant disruption in the localization of FMRP targets Kmt2d (Figure S9C), Lrrc7 (Figure S9D) or Map2 (Figure S9E) to the CA1 neuropil.

      To detect FMRP-dependent changes in distribution of 3’UTR isoforms of FMRP targets, we first performed global analysis of 3’UTR usage in TRAP from FMRP KO animals, using the expressed 3’UTR isoforms that were found in Figure 2. DEXseq analysis on 3’UTR expression in CA1 neuropil vs cell bodies TRAP showed no significant instances of altered 3’UTR usage ratios in FMRP KO animals (Figure S10A). We validated these results by performing FISH on the sequestered and localized 3’UTR isoforms of Cnksr2 and Anks1b genes and show no significant changes in the localization of the 3’UTR isoforms in FMRP KO animals (Figure S10B-C). Taken together, this data suggests that FMRP is not significantly involved in localization of its targets in resting CA1 neurons, but rather shows remarkable selection for localized mRNA isoforms. Instead, we find evidence that FMRP regulates the ribosome association of its targets in a compartment-specific manner by showing an increase in ribosome association of a subset of FMRP targets in the dendrites of CA1 neurons (see Figure 7E).

      Besides the addition of the figures described above, we have also now made corrections to the text of the manuscript, enumerated below, to address this.

      First, we have, as much as possible, reduced our emphasis throughout the manuscript on the “localization” of mRNAs and rather point out that the study seeks to characterize the differences between the regulated transcriptomes in CA1 cell bodies and dendrites. For example, for Figure 4, instead characterizing the log2FoldChange (neuropil vs CA1 cell bodies) as “dendritic localization”, we change the wording to “relative dendritic abundance” to focus on changes in the abundance of these transcripts in the dendrite vs the cell bodies. We also changed the section heading in the results that describes analysis in the FMRP KO animal from “Dysregulation of mRNA localization in FMRP KO animals” to “FMRP regulates the ribosome association of its targets in dendrites”. We believe that these changes will help to clear up this confusion for the reader.

      Second, we reformatted the model in Figure 7F. The new version of the model (shown here) emphasizes the point that our study reveals compartment-specific FMRP regulation of a subset of its targets without implying a role for FMRP in the mRNA localization of these transcripts. The text of the manuscript and figure legends have been updated accordingly.

      Figure 7F Distinct, compartment-specific FMRP regulation of functionally distinct subsets of mRNAs in CA1 cell bodies and dendrites. In dendrites, the absence of FMRP increases the ribosome association of its targets; this finding is consistent with a model in which FMRP inhibits ribosomal elongation and thereby translation (J. C. Darnell et al. 2011). In resting neurons, the translation of FMRP-bound mRNAs encoding synaptic regulators (FM2 and FM3 mRNAs) is repressed. When FMRP is absent, due to either genetic alteration (FMRP KO or FXS) or neuronal activity-dependent regulation (e.g. FMRP calcium-dependent dephosphorylation (Lee et al. 2011; Bear, Huber, and Warren 2004), ribosome association and translation of targets are increased. In cell bodies, FMRP binds mRNAs that encode for chromatin regulators (the FM1 cluster of FMRP targets), as well as FM2/3 mRNAs (consistent with synapses forming on the cell soma). FM1 targets show patterns of mRNA regulation similar to what our group observed in bulk CA1 neurons: FMRP target abundance is decreased in FMRP KO cells, perhaps due to loss of FMRP-mediated block of degradation of mRNAs with stalled ribosomes (Sawicka et al. 2019; R. B. Darnell 2020).

      Third, we have revised the Discussion in order to more completely discuss the model above and also emphasize the finding that FMRP was not found to be involved in the localization of its mRNA targets, but rather in the regulation of the local translation of its targets in a compartment-specific manner. We further speculate on the roles of FMRP in regulation of mRNA abundance and translation in these compartments.

      We hope that these changes better reflect the interpretation and novelty of our findings for both the Reviewers and the readers.

      Further related to a role of FMRP in mRNA localization, a recent paper in eLife reports that FMRP RGG box promotes mRNA localization of a set of FMRP targets through G-quadruplexes (Goering et al 2020). This relevant paper needs to be cited and discussed.

      We apologize for this omission, and have now cited and discussed this paper in the Results and Discussion of the manuscript. Importantly, we find that dendrite-enriched mRNAs have high GC content (see figure below, which is now Supplemental Figure 5). This complicates the discovery of potential G-quadruplexes; put another way, G-rich mRNAs will therefore be enriched when compared to not-localized mRNAs, and this is also true for C-rich mRNAs. Dendrite-enriched FMRP directly-bound CA1 neuronal targets (defined by CLIP) are actually G-poor when compared to dendrite-enriched FMRP non-targets (see new Figure S5 and below).

      Supplemental Figure 5A-D: Dendrite-enriched are GC rich and dendrite-enriched FMRP targets are GC poor compared to dendrite-enriched non FMRP targets. A) Schematic of the overlap between CA1 FMRP targets and dendrite-enriched mRNAs (defined in Main Figure 1) B) GC content, as defined by percent G + C for all CA1 mRNAs, dendrite enriched mRNAs (1211), dendrite-enriched FMRP targets (413), and dendrite-enriched non-FMRP targets (798, see A). Stars indicate significance in wilcoxon rank sum tests ( is p < .05, ** is p < .0001). C) G content, as defined by percent G, D) C content, as defined by percent C.

      In light of these observations, analysis of G- or C- containing motifs needs to be examined in this context. To this end, we performed the experiments suggested here, but did so by searching for the prevalence of G-quadruplexes in dendrite-enriched FMRP targets versus dendrite-enriched FMRP non-targets (Figure S5A). To do this, we used both experimentally-defined G-quadruplexes (described in (Guo and Bartel 2016), Figure S5E), as well as motifs (described in (Goering et al. 2020), Figure S5F). We include the results below, and in a new Figure S5 in the paper.

      Supplemental Figure 5E-F: mRNAs containing G-quadruplexes are not enriched in dendritic FMRP targets vs dendrite-enriched non-FMRP targets. E) The percent of all CA1 mRNAs, all dendrite-enriched mRNAs, dendrite-enriched FMRP-bound targets (413), and dendrite-enriched non-FMRP targets (798) that contain experimentally-defined G-quadruplexes is plotted. Shown are the results of chi-squared analysis comparing the enrichment of G-quadruplex containing mRNAs in dendrite-enriched FMRP targets vs dendrite-enriched non-FMRP targets. F) As in E, except looking for the presence of mRNAs with G-quadruplex motifs in 3’UTRs as described in (Goering et al. 2020)

      Interestingly, we found no difference in the presence of G-quadruplex motifs in the 3’UTRs of these two sets (above and new Supplemental Figure 5). For example, of 413 dendrite-enriched FMRP targets, 100 (24%) had experimentally defined G-quadruplexes in the 3’UTRs, while 159 (22.5%) dendrite-enriched non-FMRP targets had experimentally defined G-quadruplexes. These differences were not significant (by chi-square test).

      Searching the 3’UTR sequences of 413 dendrite-enriched FMRP targets above for G-quadruplex motifs (as described in (Goering et al. 2020), which searched for an empirically derived specific motif: GW--G, separated by 7nt), we only found 3 instances in dendrite-enchriched FMRP-bound target mRNAs. Similarly, we found out of 798 non-FMRP targets, only a small subset (6) contained this specific motif in their 3’UTRs. These results were not significant (chi-square test).

      In summary, we do not find evidence in our data of G-quadruplexes playing a role in determination of FMRP binding in CA1 dendrites. This data is now included in the results and discussed in the Discussion of the paper.

      Reviewer #2 (Public Review):

      The authors performed transcriptomic analyses from compartment-specific, micro-dissected hippocampal CA1 region tissue from transgenic mice. One feature that distinguishes this work from previous studies is the use of conditional knock-in of tags (GFP or HA) and tissue specific expression of the Cre recombinase to target a very specific population of pyramidal neurons in the CA1 region--as well as the combined use of TRAPseq, PAPERCLIP and FMRP-CLIP. Also, central to this work are the analysis pipelines that look at large populations of mRNA with the goal of finding features shared by those mRNA that bind FMRP.

      First, they established the identity of mRNAs that are dendritically enriched or/and alternatively polyadenylated (APA) by sequencing; followed by validation of a few candidates using smFISH. Next, the APA data was filtered through the rMATS statistical program to identify alternatively spliced (AS) mRNA variants within the APA population. The authors concluded that the majority of splicing events were of the exon-skipping type with NOVA2 as the likely culprit leading to this differential localization of AS isoforms. The authors then proceeded to perform FMRP-CLIP which was analyzed against the TRAP dataset. The (413) mRNAs that were shared by the two experiments (TRAP and FMRP-CLIP) exhibited two notable features: dendrite-enrichment and longer average transcript length. More importantly, They demonstrated that FMRP can preferentially bind to an AS isoform that is enriched in dendrites. Further analyses of FMRP CLIP targets showed that they shared a significant level of genes designated by gene set enrichment analysis (GSEA) as involved in ion transport and receptor signaling and similarly for ASD-related candidate genes.

      Strengths: -The combined use of tissue-specific Cre and conditional tags for RPL22, PABPC1 and FMRP help make these pull-downs highly specific and robust. -RNA sequencing approach allows for identification and comparison of populations of ribosome-, PABPC1- and FMRP-associated mRNAs. -Preferential binding of FMRP to AS or APA isoforms in dendrites is an impactful and significant finding.

      Weaknesses: -A caution in interpreting comparative or differential RNA-sequencing results as some are correlative.

      We appreciate this concern, and agree that RNA-seq analysis alone can be difficult to interpret. However, we feel that our unique approach of combining multiple cell-type specific approaches, including CLIP-seq and PAPERCLIP along with TRAP-seq and RNA-seq result in stronger conclusions that are supported by multiple lines of evidence.

      -Validation of FMRP interaction with AS or APA isoforms or ASD candidates by smFISH-IF is lacking.

      We find that smFISH-IF in the CA1 neuropil is difficult to interpret in mouse brain slices due to dense networks of processes in addition to contaminating cell types, making IF signals dense, noisy and difficult to quantitate. Although we could theoretically attempt these experiments using an in vitro cell culture model, we believe that the novelty of our work is in a) the cell-type specific nature of our analyses and in b) the fact that our analysis and validation is all performed in vivo. We do not feel confident that in vitro systems are similar enough to our in vivo system to be relevant for this work. This is due not only to differences in their transcriptomes, but also due to the limited number of synapses in vitro cells make with other neurons when compared to CA1 neurons in the brain. Instead, we validate the interactions between FMRP and AS and APA isoforms by isolating junction reads among FMRP-CLIP tags isolated in a cell-type specific manner from intact mouse brains (Figure 5). In this manner, we find direct evidence of FMRP selectively binding to dendritic mRNA isoforms in vivo.

      -Although hippocampal CA1 region is an excellent site to study FMRP-RNA interactome, are there other projection systems where altered FMRP-RNA interaction may lead to greater dysfunction?

      We appreciate this point and now include this in the revised Discussion.

    1. Author Response:

      Reviewer #1 (Public Review):

      This study set to test the hypothesis that alleles of paired NLRs Pik-1 and Pik2 have co-evolved to prevent premature inactivation and enable strong activation in response to matching effectors. They show that co-expression of Pikm-1 and Pikp-2 allowed weaker HR in response to Avr proteins compared to co-expression of Pikm-1 and Pikm-2, and this is attributed to a single amino acid residue at 230 in Pik-2. Most interestingly, they found that co-expression of Pikp-1 and Pikm-2 led to Avr-independent HR and this is also determined by polymorphism at residue 230. This HR requires Pikm-2 P-loop and MHD domains, which are known to be required for Pik-2 function. The authors reconstructed phylogenetic tree to trace the evolutionary history of the polymorphism of residue 230. The data showed that Gly230 is the ancestral residue and Pik-2 carrying this residue is functional in working with Pik-1 for Avr-D recognition, whereas Asp230 (Pikp-2) arose from Gly230. Glu230 (Pikm-2) likely resulted from a further mutation from Asp230. The authors further provided evidence that, while both matching and mismatching alleles of Pik-1 and Pik-2 can generally interact, there seems to be a preference for matching pairs, supporting the possibility for the co-evolution of Pik-1 and Pik-2. Other interesting results include greater accumulation of Pik-2 protein associated with autoimmunity. Overall, the study provides insight into the co-evolution of paired NLRs which has important implications in hybrid necrosis. However, the work could be further strengthened if the author could address the following issues:

      1) The study relies solely on transient expression in Nb plants. It will be more convincing if the authors could show whether combination of Pikp-1 and Pikm-2 in rice by either crossing or transgenics leads to autoimmunity.

      2) The observation that autoimmunity caused by Pikp-1 and Pikm-2 can be strengthened by extending to additional alleles. How general is this phenomenon? Do other combinations of mismatched pairs also show autoimmunity?

      Interestingly, all known sensor Pik alleles distribute in two clades according to the HMA domain: Pikp/Pikh form one clade and Pik+/Piks/Pikm form a second (Białas et al., 2021; De la Concepcion et al., 2021). Within clades, variation is very limited. For example, Pikp and Pikh differ in a single amino acid only (De la Concepcion et al., 2021)). Further, the Pikp-1 and Pikh-1 sensor NLRs are linked to a helper NLR with 100% amino acid sequence identity to Pikp-2, while Piks-1, Pikm-1 and Pik+-1 are linked to a helper identical to Pikm-2. Therefore, it is reasonable to assume that the autoimmune phenotypes shown between Pikp and Pikm in this paper are representative of what would occur in other mismatches between clades i.e. Pikh/Pikm, Pikp/Piks, etc.

      We have included a paragraph in the discussion to clarify this.

      3) Figure 2A, Pikp-2D230E showed stronger HR compared to Pikm-2 when co-expressed with Pikm-1 and Avrs. What about co-exressing Pikp-2D230E and Pikm-1 without Avr? Does it show autoimmunity?

      The experiment co-expressing Pikp-2D230E and Pikm-1 without Avr-PikD was shown in Figure 5 of the original manuscript. The Pikm-1/Pikp-2D230E combination shows a low level of autoimmunity in some repeats, although very reduced overall when compared with Pikp-1 (Figure 5). We suggest this may be because Pikm-1 has also evolved to suppress unregulated activation by the D230E polymorphism. We are currently investigating this and, as it was also pointed by other reviewer, we have expanded our discussion to comment on it.

      Why is it stronger than Pikm-2?

      As we lack a full mechanistic understanding of Pik NLR activation we cannot explain why the cell death responses triggered by Pikp-2 D230E are stronger than Pikm-2 WT at this time. However, as noted in the manuscript, polymorphisms in position 434 and 627 seem to harbour a negative contribution towards cell death (Figure 2-Figure supplement 2, 3 and 4). Although is tempting to speculate that these polymorphisms harbour a potential regulatory role to tame the increased activation of D230E, we preferred to only state this and not overspeculate.

      4) Figure S5A, why Pikp-2T434S showed weaker HR compared to Pikp-2 in Figure 1A? It is necessary to compare Pikp-2 and Pikp-2T434S side-by-side.

      As mentioned above, we suggest that polymorphisms 434 and 627 may have a negative contribution towards cell death. Therefore, when introduced in Pikp-2, they lead to a lowered response compared to WT. Indeed, the reverse mutations in Pikm-2 increase the HR level over WT Pikm-2 (Figure 2-Figure supplement 2, 3 and 4). We appreciate the point that to directly compare Pikp-2 with the mutants ideally these experiments would be performed side-by-side, but we have not done this as our strategy was to look for mutations that altered cell death outcomes compared to Pikm-2.

      5) Is the autoactivation associated with oligomerization? A blue-native gel assay would do.

      We currently have no evidence for this. We continue to perform experiments to detect oligomerization in Pik NLRs using WT, inactive and autoactive mutants. However, these have proven challenging and did not yield a conclusive result to date. We continue to work towards optimising these assays.

      6) It needs to be cautious to draw conclusion from the competition experiments where different ODs are compared, as these may not guarantee correlation with protein concentrations. For example, in Figure 9C, a OD of 0.1 gave stronger Pikp-2 band in co-IP compared to higher ODs.

      We fully agree with this cautionary note. As stated in the manuscript, we could not obtain even inputs. Therefore, we chose to report this phenomenon, including a paragraph highlighting to readers about the limitations of this assay, and avoided drawing strong conclusions based on this assay alone.

    1. Author Response:

      Reviewer #2:

      Weaknesses:

      • The priority given to metagenomic protein sequences over reference genome sequences in the clustering pipeline is not sufficiently justified. Indeed, the metagenomic coding sequences are notably more likely to be fragmented due to challenges in assembly. A combined clustering of both would present a conceptually simpler and potentially less biased workflow. Likewise, the conceptual division between genomic and environmental genes belies their mutual importance in discovering unknown functions.

      We explained better in the text the rationale for the different decisions we took. Briefly, by using metagenomic data instead of references as initial data, we can show the robustness of AGNOSTOS to deal with noisy and incomplete data. Most of the studies that will use our methods will use data derived from metagenomes (contigs or MAGs), and it is crucial to show that our validation and refinement steps perform as expected. Later, we added the GTDB sequences to show the capabilities of AGNOSTOS to enrich already processed data. The results of clustering both data sets together or updating the existing gene clusters will be almost identical, but by doing it in two steps, one can track the dynamics of the singletons, the stability of the gene clusters and many other interesting processes that can provide a better understanding of the data. Also one can “paint” the gene clusters by integrating other sources of data, like enzymatic sequences like we did in Dittmar et al., 2021 where we integrated CAZymes, KEGG and other data sources in the seed database used in this manuscript.

      • The authors do not compare their methods to other possible ways to identify the unknown fraction. It is therefore unclear how much better than a naive approach it might be. Likewise, it is worthwhile to question the sensitive of their results to analysis parameters. As a suggestive example, in the one case where they did compare possible parameter values-the systematic selection of the inflation parameter for MCL clustering of gene clusters into super-clusters (Supplemental Figure 7-1)-the selected values resulted in distinctly different super-cluster properties compared to all other assessed parameter values. The manuscript would be strengthened by highlighting how the chosen parameters maximize sensitivity to remote homology.

      We moved and expanded from the supplementary our comparison against FunkFams and show that many of the FunkFam families belong to the known coding sequence space. In addition, we expanded the section where we reanalyze the data from Salazar et al., 2019 where they used eggNOG to explore the known and unknown fractions of the OM-RGCv2. Here we show how a large proportion of the genes classified under the category [S] by eggnog-mapper correspond to our known fraction. For the remote homology searches we used the cut-offs recommended by the authors of HHblits (https://github.com/soedinglab/hh-suite/wiki#how-can-i-verify-if-a-database-match-is-homologous)

      • It is not clear why super-clusters ("cluster communities") are identified within each of the cluster classifications (Known / Genomic Unknown / etc.) instead of across all four groups. Intuitively, this would present the opportunity to detect distant homology between clusters with known and unknown function.

      We improved the part where we explain why we perform the identification of the gene cluster communities by category and not combining all gene clusters. Briefly, as we are dealing with the unknown, we need to have a reference to evaluate the quality of the MCL clustering. The reference is the known fraction, as we can exploit the information related to the domain architectures to fine tune the parameter selection and avoid over splitting or lumping gene clusters together. Then we use the learnt parameters to partition the other categories where we lack the domain architecture information. One can identify the relationships between known and unknown gene clusters a posteriori, combining the sequences of a gene cluster community and creating a new HMM profiles that can be used to link known with unknowns.

      • It is not clear why small clusters and those with many fragmented members are removed entirely from downstream analyses, given that the inclusion of additional sequences in later steps would presumably improve the quality of these clusters by adding new representatives.

      We included in the main text parts of the Supp. Note 11 to improve the explanation of how we handle singletons. Singletons or those gene clusters that didn’t pass the validation process are not removed but flagged, so the user is aware that these gene clusters might be problematic. In the manuscript, depending on the downstream analysis, we keep or remove them, it depends on the question we want to answer, the user can decide what to do. For example, for the collector curves shown in Figure 3, we use a subset of the singletons. These singletons are selected based on the information we gathered by integrating GTDB in the metagenomic dataset combined with the inferred gene abundances from the metagenomes. As explained in the methods section, we removed the singletons from the metagenomic dataset with an abundance smaller than the modal abundance of the singletons that got reclassified as good-quality clusters after integrating the GTDB data to minimize the impact of potential spurious singletons. We clarified this point in the main text to avoid confusion

      • While maximizing sensitivity to remote homology is appropriate for the overarching goal of characterizing entirely unknown protein clusters, the likely decrease in specificity means that the accuracy of functional annotations and the shared function of all sequences in a cluster are suspect (as the authors are aware). It would have been interesting and valuable to extend the hierarchical clustering framework, already partially developed here, to enable both sensitive and specific annotations.

      Now, we explicitly stated in the main that we don’t perform any functional annotation besides PFAM as this is not our purpose. We believe that predicting function from sequence similarity methods is not a trivial task and, in many cases, it might be wrong. Related to the last comment of the reviewer, as shown in Figure 1C, this is possible with the current implementation of AGNOSTOS, as one can use the most adequate level of resolution. Our gene clustering and gene cluster community inference are highly constrained to preserve domain architectures. With this approach we have a good balance between sensitivity and specificity, although some noise is expected as we show in Figure2C-D. To support this, we included a new table and figure in Supplementary, where we evaluated the entropy of the eggNOG annotations at the gene cluster and gene cluster community level. In both cases the entropy values are very low.

    1. Author Response:

      Reviewer #1 (Public Review):

      Sherrard and colleagues investigated the dynamics of stress fibers of the follicular epithelia during the migratory stages of egg chamber development. During the migratory stages of egg chamber development the follicular epithelium forms actin-based protrusions at the leading edge followed by a parallel array of stress fibers. As the follicular epithelia migrates along the basement membrane the encapsulates the egg chamber, the egg chamber rotates. At later stages of development the follicle cells stop migrating and the stress fibers are then used to form a contractile network which needed to help make the elongated shape of the egg. Using near total internal reflection microscopy, Sherrard and colleagues show that during this migratory period these stress fibers showed tread-milling behavior (growing in the front and shrinking in the back) and persisted for longer than the time for cell to travel at least one cell length (62%). Concomitant with these treadmilling stress fibers was the appearance of adhesions at the front of these stress fibers and their disappearance at the rear. This means that rather than just observing adhesions at the termini of the stress fibers they can be found all along their lengths. These dynamics were captured by the use of fluorescently tagged adhesion proteins (Paxillin,Talin) but could also been visualized by staining for endogenous proteins (betaPS integrin). The authors go on to correlate the treadmilling stress fibers with adhesion dynamics and observed that new adhesions are pulled reward, consistent with their maturing under tension. Blocking cell migration (with CK-666 or genetically with depletion of Abi) resulted in a conversion of these migratory (modular), tread-milling stress fibers to the conical stress fibers with adhesions on either termini rather than along the length. The authors go on to demonstrate the modular migratory stress fibers are dependent on the formin, DAAM, as depletion of this formin specifically led to 30% reduction in F-actin levels without affecting other actin-based structures. Furthermore, as the migratory phase of these follicle cells subsides, DAAM expression decreases with the more canonical stress fibers being dependent on the formin Dia.

      Strengths:

      The claims of the manuscript are well supported by the data presented. The analyses rely on the dynamics of stress fibers being observed in the context of organism in vivo. This is in contrast to other studies where stress fiber dynamics have been limited to tissue culture cells in 2D on a single extracellular matrix protein. The finding that stress fibers a) treadmill in this context and b) form adhesions along their length and not just their termini will likely have a big impact on our understanding of cell migration and the role of stress fibers in general. This study also elegantly capitalizes on Drosophila genetics in order to identify which formins involved in stress fiber formation and dynamics and clearly shows that there is developmental shift from DAAM to Dia as the follicular epithelium stops migrating. They also show that modular stress fibers are also dependent on behavior, and when cell migration is inhibited there is shift to canonical stress fibers, thus modular stress fiber formation is both DAAM-dependent and dependent on migration.

      Weaknesses:

      The 30% reduction in F-actin in the DAAM amorphic allele does suggest that there is something else contributing to the formation of modular stress fibers which is not surprising but could also be more thoroughly addressed in the manuscript. While the authors went through great lengths to quantify their observations. Many of their claims are based on qualitative observations in particular the descriptions of adhesion dynamics when correlated to modular stress fiber dynamics (Figure 4) and the shift from modular stress fibers to canonical stress fibers (Figure 5). The near total internal reflection microscopy offers them an opportunity to derive rate constants to focal adhesion dynamics, and quantify the change in stress fiber type would have strengthen the manuscript.

      We measured the lifetimes of both stationary and sliding adhesions and reported these data in Figure 4E. We also related these lifetimes to the distance required for the cell to travel one cell length within the text to provide the reader with additional context for these values. Although it would be possible to derive rate constants for focal adhesion assembly and disassembly, the relative lifetimes of the two adhesion types seemed to be a more pertinent piece of information for this first description of treadmilling stress fiber assembly.

    1. Author Response:

      Reviewer #1 (Public Review):

      Munc13 is a key regulator of synaptic vesicle (SV) fusion that is thought to mediate SV tethering and regulate SNARE assembly. Based primarily on Munc13 crystal structures, the authors design a set of four charge reversal mutations in the C1C2B region that are predicted to affect the interaction of Munc13 with the plasma membrane (PM). Various in vivo and in vitro consequences of these mutations are studied, leading to two main conclusions: (1) an interaction between the PM and a polybasic surface of Munc13 is likely important for SV tethering, and (2) two residues in the Ca2+-binding loops of the C2B domain are important for SV fusion.

      So far, so good - I think the data strongly support the two main conclusions noted above. It is less clear that these studies support (or could falsify) the main hypothesis, stated in the title, that re-orientation of membrane-bound Munc13 controls neurotransmitter release. Primed vesicles appear to exist in dynamic equilibrium between two states, one of them "loosely" primed (LS) and the other "tightly" primed (TS). Inasmuch as this simple model is correct, one could characterize the various players - SNAREs, synaptotagmin, complexin and of course Munc13 - in terms of their ability to influence the LS/TS equilibrium, perhaps in response to Ca2+ or other small molecules. This manuscript postulates that the orientation of Munc13 relative to the membrane has a major impact on the LS/TS equilibrium, with a perpendicular orientation favoring LS and a slanted orientation favoring TS.

      The authors' previous structure (Xu et al., 2017) suggested that two partially-discrete faces of C1C2B, one polybasic and the other centered around the Ca2+-binding loops of C2B, are likely involved in PM binding. In that paper they hypothesized that the polybasic face would dominate in the absence of Ca2+ whereas the 'Ca2+-binding face' [not a very good name, but the authors haven't suggested a better one] would dominate in the presence of Ca2+. Binding to the PM via the polybasic face would yield a more erect or 'perpendicular' binding orientation, whereas binding to the PM by the Ca2+-binding face would yield a more tilted or 'slanted' binding orientation.

      In revised manuscript we use the term DAG/Ca2+/PIP2-binding face, or Ca2+-dependent face when we discuss the effects of Ca2+ in particular.

      Here the authors performed two molecular dynamics simulations, one without and one with bound Ca2+. In the Results section, they correctly point out that their findings cannot be used to support their hypothesis because, in each case, Munc13 was placed in the hypothesized orientation - perpendicular for minus Ca2+, slanted for plus Ca2+ - at the beginning of the simulation. In the Discussion however the authors argue that the MD simulations support their model. I disagree because the simulations needed to falsify the model have not yet been conducted. In addition, an opportunity was seemingly missed by not doing MD simulations on the mutants.

      We have removed the sentence stating that the MD simulations support the model in the corresponding paragraph of the discussion (page 22). A meaningful analysis of the effects of the mutations would have required much longer simulations of this large system, which would take several months for each mutant in the UT Southwestern BioHPC facility or acquisition of a dedicated allocation at the Texas Advanced Computing Center.

      Of the four mutations studied, two (K603E and K720E) should specifically destabilize PM binding by the polybasic face, one (K706E) should destabilize binding by the Ca2+-binding face, and one (R769E) is expected to destabilize both. Two of the mutants (K603E and R769E) in fact abrogate priming. This result, along with biochemical experiments, implicates the polybasic face in SV tethering and thus represents the main evidence supporting the first of the main conclusions (see Evaluation Summary above). However, since an unprimed vesicle does not participate in the LS/TS equilibrium, these mutants are in this respect uninformative. Only the remaining mutants, K720E and K706E, would therefore appear to have the potential of yielding information about the LS/TS equilibrium and its relationship to Munc13 orientation.

      Although we understand the concern expressed by the reviewer, we do not fully agree with the last sentence. If we accept that the K603E and R769E mutations impair priming, this result implies that binding through the polybasic face occurs for WT Munc13-1. This conclusion does not demonstrate the LS/TS equilibrium, but it does support the notion that one of the proposed states exists.

      Both K720E and K706E support normal priming but have opposite effects on vesicular release probability and evoked release. These results can be rationalized in terms of an LS/TS equilibrium. The K720E mutation, which selectively destabilizes binding by the polybasic face, would shift the equilibrium toward TS and thereby increase the release probability. Conversely the K706E mutation, which destabilizes binding by the Ca2+-binding face, would shift the equilibrium toward LS and thereby reduce the release probability.

      However, the authors themselves cast serious doubt on this straightforward interpretation. In the case of K720E, they point out that the other 'polybasic mutant', K603E, has no effect of release probability. (I argued above that, perhaps, K603E is best viewed as uninformative about the LS/TS equilibrium owing to its strong upstream priming defect.) In the case of K706E, the authors point out that phorbol ester potentiation was similar for K706E and wild-type, suggesting to them "that the effects of the K706E mutation might not be related to the transition to slanted orientations but rather to another mechanism that directly influences fusion. For instance, the Munc13-1 C2B domain might cause membrane perturbations analogous to those that are believed to underlie the function of the Syt1 C2 domains in triggering release (Fernandez-Chacon et al., 2001; Rhee et al., 2005). It is also possible that the phenotypes caused by the K706E mutation and other mutations studied here reflect effects of Munc13-1 in more than one step leading to release, which complicates the interpretation of the data." If this is indeed the case, we are down to one mutant - K720E - that can be informative about the LS/TS equilibrium. (For the most part, I did not find the double and quadruple mutants informative, especially because each of them contains at least one mutation that strongly abrogates priming.)

      We again understand the concerns expressed by the reviewer but do not agree that the K706E mutant does not provide any information on the LS/TS equilibrium. If we accept that the K706E mutation does have an effect on evoked release and that K603E has an effect on priming, these results support the notion that both proposed binding modes occur and are functionally relevant. We do agree however that this conclusion does not prove that there is an equilibrium between two primed states.

      It looks like K720E is right in the center of the polybasic surface (although it's hard to tell from a single 'projection' image) so it would have been expected to impair Ca2+-independent liposome binding, and it does. However the liposome clustering effects are very weird, displaying a much broader distribution than any other experiment, an observation which the authors disregard. However, overall, I would say that the authors' K720E findings offer modest support for their overall main hypothesis. But for me it's not enough to justify making that hypothesis the title of the paper.

      We agree with the reviewer that it is a stretch to include the hypothesis in the title of the paper. We have changed the title to: ‘Control of neurotransmitter release by two distinct membrane-binding faces of the Munc13-1 C1C2B region’, which emphasizes the notion that there are two functional membrane-binding faces of the Munc13-1 C1C2B region without making a specific claim on a role of two faces on presynaptic plasticity. We note that the notion that the Ca2+- and DAG-dependent face of the C1C2B region is functionally relevant was already supported by previous studies (Rhee et al. 2002; Shin et al. 2010), which we now cite in additional sentences to emphasize this point (e.g. pages 22, 23). Hence, we believe that, together with the previous data, our results strongly support the conclusion that two faces of the C1C2B region are functionally important. We still present the LS/TS model and use it often to interpret our results, but have tried to be careful throughout the manuscript to not overstate our conclusions and point out when our results are consistent with the model without concluding that they prove it.

      For the most part I could not follow the discussion of figures 4 and 5. But I am struck by strong similarity between the data for K603E and K706E (comparing Fig. 4B/C to Fig. 4H/I). How can these results be reconciled with the opposite roles predicted for K603 and K706?

      The normalized data obtained for K603E and K706E mutants do look similar (new Fig. 8C,I), but the absolute amplitudes are lower for the former (new Fig. 8B,H). Nevertheless, we agree that it not straightforward to interpret some of the data obtained in the repetitive stimulation experiments. To acknowledge this difficulty, we have included the following sentence at the end of the first paragraph of the corresponding section (line 416): ‘Nevertheless, interpretation of some of the data was not straightforward, and there may be alternative explanations to those offered below, which are based in part on the proposed LS-TS equilibrium.’

      I'm not sure how the results of the PDBu experiments contribute to the conclusion that "two faces of the C1-C2B region are critical for Munc13-1-dependent short-term plasticity" (p. 15), since the only mutant that selectively affects one of the faces, K706E, has no impact (Fig. 6).

      We have toned down the sentence at the end of the section describing the PDB data, which now reads (line 475): ‘Overall, these results show that basic residues in the Munc13-1 C1-C2B region influence the potentiation of synaptic responses by PDBu and, together with the data obtained with repetitive stimulation, they support the notion that two faces of the C1-C2B region are involved in Munc13-1-dependent short-term plasticity.’

      Why are the liposome-binding assays in Fig. 7 done with C2C present - isn't that just a confounding factor? And if Ca2+-independent binding by C2C is as weak as suggested by the results in Fig. 7, how do any of the Munc13 constructs cluster liposomes in Fig. 8? (Note that, according to my reading of the methods, V-type liposomes are simply T-type liposomes without the DAG and PIP2.)

      Binding of the C2C domain to liposomes is indeed weak but still can contribute to liposome clustering because multiple C1C2BMUNC2C molecules can cooperate in this activity (see Quade et al. 2019). We used C1C2BMUNC2C mutants in the binding assays because they were also employed for the liposome clustering and fusion assays, in which C1C2BMUN is much less active (see Quade et al., 2019). We agree that having the C2C domain present could be a confounding factor, but we included the binding results because the effects of the mutations did correlate, albeit qualitatively, with those of the clustering and fusion assays.

      What is the basis for the claim (p. 22) that "the perpendicular orientation of Munc13-1 is expected to facilitate initiation of SNARE core complex assembly"?

      The perpendicular orientation may hinder the initiation of SNARE complex assembly if Munc13-1 is located between the vesicle and the plasma membrane, but can facilitate initiation of assembly if the bridging Munc13-1 molecules are located further from the center of the vesicle-plasma membrane interface (e.g. as in Fig. 1D; see also cryo electron tomography images of Quade et al. 2019). We agree that the term ‘expected’ is too strong and now state that the perpendicular orientation ‘may facilitate initiation of SNARE complex assembly’ (line 523).

      Reviewer #2 (Public Review):

      In this manuscript, Rosenmund and colleagues describe new results regarding the mode of action of Munc13 in neurotransmitter release. Based on molecular dynamics simulations of Munc13 (C1C2BMun) with phospholipid membranes, the authors selected promising point mutations and comparatively investigated their functional impact with electrophysiological experiments in hippocampal neurons and with a variety of in vitro experiments (lipid binding assay, liposome clustering and fusion). The results show that specific mutations in the C1C2B-domain (also referred to as polybasic face) of Munc13 (K603E, R769E) strongly inhibit vesicle priming, a property that correlates well with their re duced ability to bind to phospholipid membranes in a calcium-independent manner.

      The manuscript describes comprehensive electrophysiological and biochemical experiments that are complemented and extended by thoughtful analyses. The direct combination of electrophysiological and biochemical expertise from the Rosenmund and Rizo laboratories, respectively, represents a particular strength of this study, allowing the authors to develop new insights into the function of the Munc13 protein. A welcome (but not necessary) extension of the data presented would be the demonstration that the mutants in question (K603E, R769E) also show altered phospholipid binding in the MD simulations. In any case, the presentation of the data is clear and the authors' conclusions are convincing.

      Taken together, the manuscript and the results represent a significant advance in the understanding of the molecular mechanisms underlying synaptic vesicle priming.

      We thank the reviewer for the very positive evaluation of our study. As mentioned above, a meaningful analysis of the effects of the mutations would have required much longer MD simulations of this large system.

    1. Author Response:

      Reviewer #1 (Public Review):

      This ms targets an interesting question, whether changes of feedforward inhibition at the DG-CA3 synapses regulate the representational capabilities of contextual fear memory at CA1 and the anterior cingulate cortex (ACC). The paper exploits a recent tool developed by the group (viral-mediated shRNA interference of Ablim3 in DG), to enhance PV+ mediated inhibition of CA3 pyramidal cells by increasing both their recruitment by DG cells and their number of contacts over postsynaptic cells. Using micro-endoscopic imaging of mice experiencing contextual fear conditioning, the authors nicely evaluate the effect of feedforward inhibitory control of CA3 outputs in the formation, stabilization and specificity of contextual fear memory representations in the CA1 and ACC. Data is relevant to understand how specific microcircuit motifs can influence representational dynamics in downstream regions. I have some methodological comments and recommendations for authors to improve their presentation and to exclude potential confounding factors.

      1- Since imaging is performed in CA1 and ACC separately, the study design entails 4 groups: shNT vs shRNA which is the main experimental manipulation, plus CA1 vs ACC. While data is in general carefully presented, some analysis may require additional validation to discard whether some regional effects caused by manipulation may actually reflect group differences. This is important because there may be some differences between ACC and CA1 groups in some behavioral readout (e.g. Fig.2c; Fig.S2b) which may actually explains different effect of manipulation. Formal comparisons of behavior in ACC and CA1 shNT groups may be required to discard this effect.

      We compared behavior data in the control groups across brain region to test if our calcium imaging findings are driven by differences in groups rather than virus manipulation. We did not find a significant difference for any of the data sets (see figure legend Rebuttal Figure 1 a-d for details). In general, we tried to avoid presenting the same (or part of the same) dataset in multiple figures. An alternative would be to plot all 4 groups in 1 graph and test as such but that would decrease readability in our opinion. Therefore, we are happy to provide the additional graphs and analysis but prefer not to include them in the main manuscript. (Rebuttal Figure 1a-d).

      <a href= “https://imgur.com/Y4mLzrg"></a>

      2- Differences of activity level (calcium rate) are examined using bins of 5 seconds for a total of 360 sec of exploratory activity. To discard motility effects an analysis is implemented using 1 sec bins. Thus, the two data samples are not commensurate. Also, an ANOVA on calcium rate is applied over uneven multiple comparisons to account for statistical effects of region x time or context x time. This is relevant for fig.1g vs 1i and Fig.S2j,l and may require correction.

      We assume you mean “1 minute” and not “1 sec” here. We presented the two datasets (calcium event rate) and moving index indeed using different time bins (5 sec and 1 minute respectively). It is true that a difference in binning and therefore different sample size in one factor (time) could affect the result of the ANOVA. Rebuttal Figure 1 e-f shows the behavior comparison made in Suppl.Figure 2b in the original manuscript with a 5 second bin. A 2-Way ANOVA with repeated measurements reveals no main virus effect [Two-way repeated measures ANOVA, ACC (e): virus x time effect 0.0113; virus main effect N.S., time main effect N.S., n=5 per group; CA1 (f): virus x time effect N.S.; virus main effect N.S., time main effect N.S., n=5 shNT, n=6 shRNA]. In ACC, we find a significant interaction effect but a posthoc Sidak test did not reveal a difference between virus groups at any time point. This confirms our previous findings that differences in movement do not seem to drive the differences between virus groups.

      <a href= “https://imgur.com/0x0xqqw"></a>

      3- Fig.3 nicely show accurate context classification based on calcium activity from A&C contexts neurons using support-vector machine. The authors report very interesting representational effects for shNT vs shRNA manipulations. Is prediction accuracy of the SVM classifier correlated with behavioral discrimination? That would reinforce conclusions.

      Thank you for raising this very interesting point and indeed, we found a positive correlation between the discrimination ratio and the accuracy of the SVM classifier (Pearson’s r, shNT: R2 = 0.5794, p= 0.0282, n=4; shRNA: R2= 0.5771, p= 0.0288 , n=4. We added these data in Figure 4 (Figure 4c) and in Rebuttal Figure 1g.

      <a href= “https://imgur.com/J8mZhfJ"></a>

      Regarding conclusions and physiological relevance, the authors may need to discuss why enhanced feedforward inhibition at DG-CA3 synapses is not naturally established given the beneficial effect in context discrimination.

      We apologize that we did not make that aspect of our manipulation clearer in our discussion. We edited the introduction and discussion (LL 65, LL 365) to clearly convey that FFI in DG-CA3 is naturally temporarily increased following learning (Ruediger 2011, Ruediger 2012, Guo et al 2018).

      Reviewer #3 (Public Review):

      In this study, Twarkowski et al. aim to understand the role of a specific circuit motif, dentate gyrus (DG) to CA3 feed-forward inhibition (FFI), for memory encoding and consolidation. FFI is a ubiquitous circuit motif in the brain. As a result, providing insights on its function is an interesting and a potentially very impactful contribution to neuroscience.

      To tackle this issue, the authors describe how increasing DG-CA3 FFI impacts the ensemble activity in hippocampal area CA1 and the anterior cingulate cortex (ACC) in mice undergoing a contextual fear conditioning paradigm. To selectively increase FFI onto CA3 neurons, the study uses a molecular tool (downregulation of Ablim3 using virally mediated expression of shRNA), which has been developed by the same group (Guo et al, 2018, Nature Medicine). The impact of this manipulation is assessed via chronic in vivo one-photon Ca2+ imaging of dorsal CA1 and ACC neurons on the day of fear conditioning, one day after (recent recall), and 16 days after (remote recall) the fear conditioning. During and after fear conditioning, the results show in both experimental groups (shRNA and control) various population activity changes in both CA1 and ACC. Furthermore, the study finds improved context discrimination in the shRNA group only at the remote recall timepoint. The authors' conclusion is that increasing FFI enhances the formation of learning-specific ensembles, first in CA1 and later in ACC, which is associated with an improved memory recall. The experiments presented here were very technically challenging and produced a comprehensive and valuable dataset describing the parallel ensemble activity changes in CA1 and ACC after fear conditioning, with or without increasing DG-CA3 FFI. However, a causal relationship between the manipulation of DG-CA3 FFI, the network activity changes in CA1 and ACC, and the behavioral improvement is, in my opinion, not fully demonstrated. This is for a couple of reasons:

      1) The magnitude of the effect of the shRNA manipulation on the immediate downstream area CA3 remains unclear. Therefore, the findings in the downstream areas CA1 or even ACC (which is at least three synapses removed from CA3) are, in my opinion, difficult to interpret. This uncertainty includes (1) the extent of the virus injection in the dentate gyrus and the extent of subsequent changes in CA3, and (2) the effect of the manipulation on CA3 pyramidal cell activity in vivo. The original paper (Guo et al, 2018) uses in vitro voltage-clamp recordings to record EPSCs/IPSCs in CA3, but does not exclude possible compensatory changes in vivo, e.g., in the excitability of CA3 neurons, which could result from increasing FFI chronically over a few weeks. The data in Figures 1f and g seems to suggest that there are baseline activity changes in CA1, which might be caused by changes in the upstream CA3 network activity. Along the same lines, I am unsure how to interpret the comparisons between CA1 and ACC in Figure 1; within brain region comparisons are more relevant and should be shown instead.

      This is a great point and was raised by all reviewers. We acknowledge the weakness of this comparison, apologize for this misstep in our analysis and have accordingly, removed this dataset from our manuscript. Instead, we performed new experiments using in vivo electrophysiology to allow for cross-region comparison of LFPs in CA1 and ACC within the same animal. We removed data from Figure 1 e-i and added new, simultaneous electrophysiological LFP recordings (Figure 5 and supplementary Figure 4 in revised manuscript).

      We found an increased number of CA1 ripples that are coupled with ACC spindles (“coupled ripples”) in shRNA mice compared to control mice prior to a learning event (Figure 5c, two-tailed unpaired student’s t-test with Welch’s correction, p=0.0499, n=5) with no difference in time spend in slow-wave sleep (SWS) (supplementary Figure 4a) or total numbers of spindles or ripples (supplementary Figure 4b-c). Control mice show a learning-dependent increase in coupled ripples (Figure 5f, two-tailed paired student’s t-test, p=0.019, n=5) to a similar level as seen in shRNA mice prior to learning. No further increase is seen in shRNA mice indicating a saturation of circuit changes that cannot be further amplified following learning.

      2) Several parameters are used in this study to describe the network activity in CA1 and ACC. These include the number of correlated neuron pairs, the number of neurons active in both the training context and a neutral context (so-called A-C neurons), or the event rate observed in these A-C neurons. Most of the activity changes observed do not appear specific to the shRNA group and occur also under control condition, suggesting that they are not caused by an increase in DG-CA3 FFI. It would be helpful to clarify the sequence, how increasing FFI onto CA3 is hypothesized to cause the changes in CA1 or even ACC.

      We apologize for failing to make this clearer. Prior work has shown that learning increases FFI in DG-CA3 and downregulates Ablim3 in DG (Ruediger 2011, 2012, Guo et al 2018). Therefore, it is not surprising that we observe similar changes in the control (shNT) group as shRNA group.

      From previous work we know that shNT mice show increased DG-CA3 FFI following learning (training day) for approximately 24 hours (Guo et al, 2018). Thus, our manipulation allows us to mimic and boost a naturally occurring learning-induced synaptic modification in an inhibitory microcircuit in DGCA3 and examine the impact on network mechanisms underlying systems consolidation. Importantly, enhanced feedforward inhibition at the DG-CA3 synapses is naturally established for several hours following a spatial learning event (see Ruediger et al, 2011, Guo et al, 2018). Leveraging a molecular tool to enhance FFI prior to learning, we were able to reveal that DG-CA3 FFI plays a role in tuning the circuit towards cross-regional long-term storage of precise neuronal representations. (see also edits in text, LL 365).

    1. Author Response:

      Reviewer #1 (Public Review):

      Short chain fatty acids produced by gut microbiota interact with the short chain fatty acid receptors FFA2 and FFA3 (formerly GPR43 and GPR41). Barki and colleagues report the results of studies designed to define the roles of FFA2 and FFA3.

      Using a Designer Receptor Exclusively Activated by Designer Drugs (DREADD) derived from human FFA2 modified to allow a BRET signal in the presence of agonists, 1210 compounds with structural similarity to the known agonist sorbic acid were screened in an appropriately validated assay. Reconfirmation screens identified sorbic acid and MOMBA. Assessment of 320 additional compounds identified chemicals related to MOMBA.

      MOMBA did not activate human FFA2 in interaction assays. However, interaction assays could not be performed with mouse FFA2, so Gi inhibition of cAMP assays were pursued. Neither sorbic acid nor MOMBA inhibited cAMP levels using human or mouse FFA2, and the same lack of effect was seen using human and mouse FFA3. MOMBA was shown to be an orthosteric agonist of hFFA2-DREADD.

      FFA2 and 3 were shown to be expressed in myenteric neurons. MOMBA increased GI transit time in hFFA2-DREADD-HA mice but not control mice. MOMBA also increased transit in animals expressing only FFA2 or only FFA3. MOMBA also increased GLP-1 release from colonic cells and tissues, an effect already reported by this group for sorbic acid. MOMBA also promoted release of PYY.

      Vagal afferents in the colon were stimulated by activation of FFA3 but not FFA2. Cells from the nodose ganglion were stimulated by C3 and to a lesser extent by MOMBA. FFA2 and FFA3 also appeared to be active and functional in DRG cells.

      Using wild type mice, C3 administered to the rectum activated spinal cord neurons. C3 and MOMBA activated c-Fos in hFFA2-DREADD mice. The authors conclude that there is a SCFA-gut-brain axis.

      1. This paper extends findings from this group published in a very strong paper in 2019 (Nat Chem Biol 15:489-498) using knockin receptor hFFA2-DREADD mice and showing that activation of FFA2 promotes GLP-1 release, accelerates gut transit, and promotes lipolysis in adipocytes. The GLP-1 observations are confirmed here, and a new agonist for FFA2 is identified, but it is difficult to appreciate how these studies "define" a short chain fatty acid receptor gut-brain axis.

      Answer: to address whether we have ‘defined’ a short chain fatty acid receptor-gut-brain axis, we have slightly altered the title to more correctly reflect the work performed. It is now ‘Chemogenetics defines the roles of short chain fatty acid receptors within the gut-brain axis’

      Why did the authors choose to pursue detailed studies of the myenteric neurons, nodose ganglion and DRG?

      Answer: That there is connectivity between gut and brain and therefore a ‘gut-brain axis’ is now well established. Many products of both anabolic and catabolic metabolism by the gut microbiota have been suggested to play roles in this connection. Chief amongst these are the short chain fatty acids (SCFAs). The hypothesis explored herein is that SCFAs derived from the microbiota will impact on the gut-brain axis at many levels, hence regulating physiology and function. The FFA2 and FFA3 receptors are major targets for the SCFAs and the animal models and novel ligands we have developed allow us to define explicitly whether function at key points in this axis involve activation of FFA2, FFA3, both receptors or neither, and this is the first large scale study to have undertaken such mapping at multiple different levels. This is why, among other tissues and cells types we have explored myenteric neurons, nodose ganglion (NG) and DRG. Each of these tissues showed expression of FFA2 and/or FFA3. This naturally lead to the question of whether afferents from the gut innervating DRG/NG were activated by FFA2/3 stimulation – this was the case. It was then entirely rational to assess whether FFA2/3 were functionally expressed in the DRG and NG – this would give information regarding the possibility that there is receptor expression in nerve fibers innervating the gut from the DRG/NG. Through this investigation we present a previously unappreciated profile of FFA2/3 expression and function in neurons associated with the gut.

      Other mediators of this purported axis could also be involved. What is the purpose of this axis? GI motility?

      Answer: Other mediators certainly will play roles and co-ordinate with the actions of the SCFAs. We had shown previously that FFA2 in the gut controls aspects of gut motility and we now show that FFA3 also plays a role. We hypothesised that the roles of SCFAs might also invove control of sensory function. The results provided herein<br> confirm this and dissect the specific roles of each of FFA2 and FFA3.

      If that is the focus, there are technical issues that limit the conclusions that can be drawn.

      Answer: As noted above, the work and the conclusions are not restricted to gut motility but include actions from the gut to the spinal cord.

      1. A strength of the paper is the elegant and rigorous screening strategy and validation of receptor agonists.

      Thank you for this comment: it was indeed a substantial task and component of the work because without rigorous assessment of the selective actions of MOMBA and TUG1907 it would have been impossible to fully dissect the specific contributions of FFA2<br> and FFA3.

      1. A weakness is that expression of hFFA2-DREADD is induced by use of a whole-body Cre mouse. Given the broad distribution of FFA2, it is likely that this receptor is being activated in multiple tissues when MOMBA is administered. How can the authors be sure that observed effects after administration of agonist in drinking water are due to local expression as opposed to an effect mediated at a distant site removed from the myentery?

      Answer: We can understand where the concern of the reviewer comes from but our aim was to understand the pathway in as close as possible to whole animal and physiologically normal conditions. Hence, we wanted to maintain all elements of FFA2/3 expression whilst investigating the effects of introducing ligands into the gut, which is where the endogenous SCFAs largely originate. So the question we wanted to ask was by introducing FFA2/3 ligands into the gut (thereby mimicking the release of SCFA by the microbiota) what were the physiological consequences of this at levels of the gut, enteric neurons, DRG/NG and spinal cord.

      Reviewer #2 (Public Review):

      Strengths. Barki et al. report extensive and rigorous studies which convincingly establish that FFA2 and FFA3 are functionally expressed in dorsal root ganglia and nodose ganglia where they signal through different G proteins and mechanisms that regulate intracellular calcium concentrations. The authors further demonstrate that activation of both FFA2 and FFA3 within the gut lumen stimulates spinal cord activity and that activation of gut FFA3 directly regulates sensory afferent neuronal firing. These data support the authors' contention that their investigations define a SCFA-gut-brain axis.

      The authors have employed a number of complimentary pharmacological, genetic, cell culture, and ex vivo approaches to obtain their data. The use of these diverse methodological approaches is a key strength of the work. They have employed transgenic mice where FFA2 was replaced by an altered form of FFA2 referred to as FFA2-DREADD (Designer Receptor Exclusively Activated by Designer Drugs), which can be activated by novel ligands but not SCFAs. They have further identified through screens of chemical libraries a novel FFA2-DREADD agonist, referred to as MOMBA, for use in their investigations. Using the FFA-DREADD mice, which are also HA tagged to allow for immunologic detection, and other related transgenic lines, they have been able to establish and identify distinct roles for FFA2 and FFA3 in signaling SCFA production and presence in the gut (specifically the colon) to neuronal pathways that communicate directly with the brain. Using isolated cells, the authors further establish roles for different G proteins and mechanisms that affect cellular calcium levels. Collectively, the data obtained using these diverse experimental approaches support the existence of a SCFA-gut-brain axis.

      The authors' new findings significantly extent understanding of the molecular actions of SCFA's produced in the colon through bacterial fermentation of dietary fiber. Multiple publications have identified altered SCFA levels in the gut as a significant contributor to dysregulated metabolism and metabolic disease. Often such studies fail to provide insight into the molecular basis for the observed linkages between SCFAs and altered metabolic states, only reporting the association. The new data being reported by Barki et al. provide new possibilities for understanding these associations.

      Answer: We thank the reviewer for these very positive comments.

      Weaknesses. The perceived weaknesses of the work are minor compared to the strengths. Although the authors provide a description of the general characteristics of the chemical libraries they screened to identify MOMBA, sparse other information is provided. This is especially true for the second screen of 320 compounds where the data provided indicate a number of compounds may be equally potent agonists. What similarities were there in these structures?

      Answer: We state specifically in the text that compound 132 was 4-methoxy-3-chlorobenzoic acid and compound 235 4-methoxy-3-hydroxy-benzoic acid. These are indeed closely related to MOMBA. We did not test the other hits from the secondary screen for their broader selectivity as none of them were substantially more potent than the MOMBA-related compounds and the others did not provide the close structure-activity relationship of MOMBA, compound 132 and compound 235, so there is no useful information we can provide.

      Were any of these compounds naturally occurring or resemble naturally occurring molecules?

      Answer: No, they were not, although as small molecule carboxylates they can of course be considered to ‘resemble’ certain naturally occurring compounds. For example they ‘resemble’ short chain fatty acids.

      The issue here is one of trying to understand whether endogenous substances exist that might have influenced experimental outcomes and/or data interpretation. This is far from being transparent.

      Answer: We have no evidence to suggest that an endogenous FFA2-DREADD activator exists – in fact our data supports the opposite, that the FFA2-DREADD is a true DREADD receptor i.e. is only activated by a synthetic ligand.

      In the authors' first report of the FFA2-DREADD mice (reference 15), they establish that sorbic acid is an agonist for FFA2-DREADD. Many of the studies being reported in the present manuscript are repetitive of this earlier work but using MOMBA in place of sorbic acid. The authors provide some justification for this, but this reviewer does not find these very compelling. Seemingly, sorbic acid stimulated FFA2-DREADD signaling could have been studied without the need to screen libraries and characterize MOMBA actions. This issue detracts from the overall positive feelings towards the work. Why was the identification of MOMBA needed prior to undertaking the present studies? The authors note greater activity of MOMBA over sorbic acid, but why was this important for establishing FFA2 presence and actions in signaling to the brain?

      Answer: Although this is a good point one of the important outcomes of this study is that we have generated a novel FFA2-DREADD agonist, and in so doing further validated the FFA2-DREADD model – i.e. it is not dependent on just one ligand:receptor pairing but in fact we are now able to employ two chemically distinct ligands. Interestingly,<br> reviewer 2 found the identification of a novel FFA2-DREADD ligand an important component of the paper.

    1. Author Response:

      Reviewer #2 (Public Review):

      In this manuscript, Markello and colleagues exhaustively characterize the impact and relative importance of the many data-processing decisions that go into constructing whole-brain transcriptomic maps from microarray data in the Allen Human Brain Atlas. The authors motivate the need for and have developed an open-source toolbox, abagen, for standardizing workflows in imaging transcriptomics. The authors propose a taxonomy of analyses commonly performed on these data in the literature; they then use abagen to compute the distributions of statistical outcomes for three prototypical analyses across 750,000 combinatorial choices of end-to-end data-processing pipelines. Informed by these findings, the authors then place into context several specific pipelines reported in recent and influential studies.

      The paper is well-written and the authors are successful in illustrating and attempting to address the need for standardized and systematic research in the burgeoning field of imaging transcriptomics. The abagen toolbox is an important contribution and is to my knowledge the current state-of-the-art. The code is clean, flexible, and very well-documented. The chief weakness of this paper is the lack of clear guidance on best practices. Readers should, however, be sympathetic to the fact that there is currently a lack of ground-truth data against which to benchmark different data-processing pipelines.

      Even after reading the paper thoroughly, it's still not completely clear to me whether the analyses in this study are performed for cortex only, or at the whole-brain level (or bi- or uni-laterally for that matter). I'm assuming this study is cortex-only as you say in the methods that "the brain atlas used in the current manuscript represents only cortical parcels." But abagen supports joint cortical+subcortical atlases too. It'd be helpful to readers to make this explicit.

      To ensure comparability across both the volumetric and surface-based versions of the Desikan-Killiany parcellation examined in our analyses, we investigated bilateral cortical samples (i.e., we omitted samples from the cerebellum, subcortex, and brainstem). We have clarified this in the manuscript (“Materials and Methods” section, “Data” subsection, “Parcellations” subsubsection):

      “To facilitate comparison between volumetric- and surface-based parcellations, samples from the cerebellum, subcortex and brainstem were omitted.”

      Along similar lines, do you expect any of the main findings of this study to change when deriving whole-brain maps?

      We anticipate that examining whole-brain gene expression—rather than just cortical expression as in the current manuscript—would likely strengthen the primary findings of our analyses for several reasons. Primarily, there are known differences in gene expression values between cortical and subcortical / brainstem / cerebellar tissue samples in the AHBA (Arnatkevic̆ iūtė et al., 2019). We expect that differentially normalizing these samples across pipelines would therefore result in greater differences between effect estimates for the three examined analyses. In a similar vein, we expect that the rankings of parameter importance would likely remain stable, especially at the extremes. It is possible that some parameters related to normalization (e.g., normalize matched, normalize structures) may move up in rankings; however, overall, the qualitative interpretation of these results is likely to remain unchanged.

      We have revised the Discussion to highlight this consideration (paragraph #4):

      "Although we only considered cortical tissue samples in the current analyses, we expect that including non-cortical samples would further reinforce these results (Arnatkevic̆ iūtė et al., 2019) as known differences in microarray expression values between cortex and subcortical structures will likely emphasize the impact of different normalization procedures across pipelines."

      Arnatkevic̆ iūtė, A., Fulcher, B. D., & Fornito, A. (2019). A practical guide to linking brain-wide gene expression and neuroimaging data. Neuroimage, 189, 353-367.

      Would it make sense to use PET maps or another type of neuroimaging data as a (pseudo-)benchmark in a future study?

      This is a great question and an area of ongoing research, including in our own group. The few studies that have compared PET data with the AHBA have shown that the spatial correlation between gene expression and receptor density is highly variable, with correspondence strongly dependent on the genes and receptors being considered (Beliveau et al., 2017, Martins et al., 2021). This is likely due to the fact that gene expression (as measured by mRNA) is not equivalent to protein synthesis and that PET tracers vary in their specificity and sensitivity for specific receptors. Our group is currently collating a large sample of PET datasets from multiple tracers to demonstrate this lack of correspondence (work forthcoming; presented by Hansen et al., 2021). Given this, we would be hesitant to suggest such a comparison as a benchmark.

      Comparisons of microarray expression data with the RNAseq data also included in the AHBA (as performed in Arnatkeviciute et al., 2019) are also feasible; however, given that some of the pipelines in the current manuscript utilize the RNAseq data to determine probe selection we felt that using this as a benchmark would be biased. Alternatively, a different dataset (e.g., PsychENCODE) could be used; unfortunately, in these datasets the precise spatial location of collected samples are uncertain, and for that reason we would also hesitate to use them as a reference.

      Martins, D., Giacomel, A., Williams, S. C., Turkheimer, F. E., Dipasquale, O., Veronese, M., & PET templates working group. (2021). Imaging transcriptomics: Convergent cellular, transcriptomic, and molecular neuroimaging signatures in the healthy adult human brain. bioRxiv. Beliveau, V., Ganz, M., Feng, L., Ozenne, B., Højgaard, L., Fisher, P. M., ... & Knudsen, G. M. (2017). A high-resolution in vivo atlas of the human brain's serotonin system. Journal of Neuroscience, 37(1), 120-128. Hansen, J. Y., Markello, R. D., Palomero-Gallagher, N., Dagher, A., Misic, B. (2021). Correspondence between gene expression and neurotransmitter receptor and transporter density in the human cortex. In 13th International Symposium of Functional Neuroreceptor Mapping of the Living Brain.

      What about a cross-validation strategy where data are selectively withheld during processing and then predicted after the fact? This may only be possible for a subset of genes and/or pipelines, but it could nonetheless be informative.

      A cross-validation strategy is feasible; however, it will depend on what exactly you are trying to assess. What features are being omitted (i.e., samples or genes) will be strongly influenced by the research question and null hypothesis being tested. For example, when examining the distance-dependent relationship of correlated gene expression, you could leave some tissue samples out and "predict" the fit of these samples (e.g., as in Hansen et al., 2021). As the reviewer suggests, a cross-validation strategy will thus only be possible for some specific research questions, but not generally for entire pipelines.

      One alternative that would be applicable in many cases would be to examine the robustness of the observed effects via a leave-one-donor-out strategy, whereby analyses are repeated six times, omitting one donor each time, to ensure that none of the donors are unduly influencing analytic estimates (Vogel et al., 2020; Arnatkevic̆ iūtė et al., 2019). This may require careful interpretation, however, as different donors contribute variable numbers of samples, and so gene expression estimates will have variable spatial coverage across folds.

      We have added the following text to the Discussion to expand on these points (paragraph #9):

      "One potential solution to this could be to examine the robustness of pipelines based on a leave-one-donor-out strategy (e.g., Vogel et al., 2020; Arnatkevic̆ iūtė et al., 2019), wherein analyses are repeated six times, omitting one donor each time, to ensure that none of the donors are unduly influencing analytic estimates. This approach is likely to become more useful as data from more individuals becomes available, but at present may be a worthwhile approach for assessing whether chosen processing parameters are appropriate."

      In the discussion, you claim that "the optimal set of processing parameters will very likely vary based on research question." I'd like to see this elaborated on a bit further, at least for the most important parameters. For example, when would it make more sense to use one form of gene normalization over the other? What are the implicit assumptions underlying each choice?

      This is an important aspect of processing for the AHBA data: not only do we believe that the optimal set of processing parameters will vary based on research questions, but which processing parameters are most important may also be influenced.

      Gene normalization is a great example. Some genes have very low expression values whereas others have very high expression, and this variability can influence downstream analysis. For example, consider the distance dependent correlated gene expression (CGE) analysis shown in the manuscript: CGE values derived from non-normalized gene expression values will be high because the correlation will be driven by these differences in expression levels across genes rather than common patterns of expression. Normalizing expression values will therefore result in CGE values being more broadly distributed and better capturing shared spatial expression patterns.

      More generally, gene-expression values in the AHBA are imperfect; it is an open problem in transcriptomics to obtain measures of expression that are comparable across genes. Throughout the literature, research has shown that the binding strength of in situ hybridization depends on properties of the RNA sequence used in the binding process, making it difficult to compare "raw" values across different genes. As such, gene normalization allows for a more fair comparison of expression patterns across probes.

      However, even if we were able to obtain perfect measurements that were comparable across genes, there are contexts where researchers may want to retain the variance contributed by genes to accurately reflect their relative expression levels. For example, since many genes measured in the AHBA are not brain-specific, normalization will amplify their noisy expression patterns, potentially obscuring more relevant expression information. This can be avoided by sub-selecting genes in a hypothesis-driven manner, but, as before, this will depend on the research question.

      Within the forms of gene normalization examined (i.e., z-scoring, scaled robust sigmoid normalization), we believe that scaled robust sigmoid is the optimal choice as it is less sensitive to outliers, which are known to exist in the imaging microarray-based transcriptomics data (Fulcher et al., 2019; Arnatkevic̆ iūtė et al., 2019).

      We have added text to the Discussion to expand on these points (paragraph #9):

      “For instance, in most applications gene normalization is appropriate, as it ensures that downstream analyses are not driven by a small subset of highly expressed genes. However, in other applications it may be desirable to retain the variance contributed by genes to accurately reflect their relative expression levels. For example, many genes in AHBA are not brain-specific, so normalization will amplify their expression patterns, potentially obscuring more relevant expression information. This can be avoided by sub-selecting genes in a hypothesis-driven manner and skipping the normalization step altogether.”

      Is there anything to be said about the order of operations? There seem to be several steps in Table 1 which could conceivably be interchanged. If nothing else, this procedural ambiguity is yet another good reason to standardize workflows.

      We believe that the importance of processing order is strongly dependent on which processing steps are being considered. For example, intensity-based filtering of probes must always be performed before probe selection—reversing the order of these operations would, in the majority of cases, be problematic because it would potentially result in the selection of noisy probes to be carried through to analysis. However, the order of other steps (i.e., sample versus gene normalization) could arguably be reversed with no ostensible detriment. We agree with the reviewer that this ambiguity is a good reason to standardize these workflows, and believe that the order of operations implemented in abagen and described in the manuscript is a principled solution to this problem.

      We have added text to the Discussion to clarify this point (paragraph #5):

      “Note that there are some processing steps that should be performed in a specific sequence, and others whose order could potentially be interchanged. For example, intensity-based filtering of probes must always be performed before probe selection—reversing the order of these operations would, in the majority of cases, be problematic because it would potentially result in the selection of noisy probes to be carried through to analysis. However, the order of other steps (e.g., sample versus gene normalization) could arguably be reversed with no ostensible detriment. This procedural ambiguity is a salient example of the need to standardize workflows.”

      I particularly liked the analysis in Figure 2A and thought it made a nice contribution to the paper.

      We appreciate the reviewer's kind words, especially given their extensive foundational work in this field.

      Reviewer #3 (Public Review):

      The work Standardizing workflows in imaging transcriptomics with the Abagen toolbox is a major meta analysis pipeline workflow for comparing and integrating parameter choices in imaging transcriptomics using the Allen Human Brain Atlas (AHBA). The release of the AHBA has strongly increased the interest in determining transcriptomic associations in brain imaging studies, yet there is much variability in the analysis, methods used, and subsequent interpretation.

      This work is illustrative of an important trend in informatics analysis allowing strong metadata control by users so as to access, and implement optimal choices of parameters and to study there distribution. The work implemented as an open source Python toolkit is likely to be of importance to analysts working in these areas.

      It would be helpful to clarify and specifically define the term pipeline as a specific set of parameter, normalization, and other choices that are selected. Whereas this term is in common use in the field, in the present work the meaning is specific to a set of selectable options. Of course the any number of such variable selections could be implemented in the Abagen toolbox, it will help for clarity to more clearly define this term up front.

      We have added text to the Results clarifying what we mean when we refer to "pipeline" (“Results” section):

      “We refer to each unique set of processing choices and parameters as a “pipeline”.

      Similarly, we have added text to the Methods to clarify this as well:

      “Each unique set of these 17 processing choices and parameters constitutes a pipeline, yielding 746,946 unique pipelines."

      My major consideration in this work concerns are two issues. The first is how to characterize and summarize the results of pipeline output produced by Abagen. The manuscript illustrates the workflows and various means of summarizing results but does not offer guidance into preferred interpretation of relative value of the results. Whereas we may argue that the primary purpose of Abagen is to run the various pipelines, allowing downstream interpretation to the user, it would be helpful to understand how the Abagen toolbox organizes, summarizes, and sets this output options up for interpretation. This appears to be only weakly addressed in the present manuscript.

      The primary output of abagen is a single brain region x gene expression matrix based on a researcher-specified atlas. We believe this is the simplest and most fundamental output object of the AHBA that can facilitate a range of analyses, including those we examined in the paper (i.e., correlated gene expression, gene co-expression, and regional gene expression or gene-ofinterest analyses).

      In the manuscript, we examined the outputs of various pipelines only to highlight the potential variability of results as a function of parameter selection; however, in most use cases, we would recommend that researchers only use abagen to run a single pipeline, yielding one brain region x gene expression matrix that they can carry forward to their desired analyses. Selecting different parameters when using abagen will modify the shape or values of this matrix, but not the structure.

      To clarify this we have added the following text to the Results (section "Standardized processing and reporting with the abagen toolbox"):

      "The main output of abagen is a single brain region (or tissue sample) x gene expression matrix. Changing the parameters may modify the shape of the matrix (e.g., different atlases will yield different numbers of regions or samples) or different values (e.g., different processing choices may yield different numbers of genes), but not the structure."

      The second point I of importance I believe is more description of the available functionality in the toolbox, perhaps as more of a specific use case analysis. The authors provide substantial documentation on installing and working with Abagen, and but some more direct indication of how the toolkit would be used would be valuable.

      We agree that it is important to clearly lay out the functionality of the toolbox in the manuscript. We have modified the following paragraph to the Results (Standardized processing and reporting with the abagen toolbox) to elaborate on the tools made available to researchers in abagen:

      “The abagen toolbox supports two use-case driven workflows: (1) a workflow that accepts an atlas and returns a parcellated, preprocessed regional gene expression matrix (Fig. 4a); and, (2) a workflow that accepts a mask and returns preprocessed expression data for all tissue samples within the mask (Fig. 4b). Workflows can be called via a single line of code from either the command line or Python terminal, and take approximately one minute to run with default settings using the Desikan-Killiany atlas. The main output of abagen is a single brain region (or tissue sample) x gene expression matrix. Changing the parameters may modify the shape of the matrix (e.g., different atlases will yield different numbers of regions or samples) or different values (e.g., different processing choices may yield different numbers of genes), but not the structure. The outputs of these workflows can be used generally to examine the three prototypical research questions enabled by the AHBA: correlated gene expression, gene co-expression, and regional expression of genes of interest more broadly (Fornito et al., 2019, Trends Cogn Sci). Beyond its primary workflows, abagen has additional functionality for post-processing the AHBA data (e.g., removing distance-dependent effects from expression data, calculating differential stability estimates; Hawrylycz et al., 2015, Nat Neuro), and for accessing data from the companion Allen Mouse Brain Atlas (e.g., providing interfaces for querying the Allen Mouse API; https://mouse.brain-map.org/; Lein et al., 2007, Nature).”

      As we envision the abagen software to continue to develop in the coming years, we have purposefully omitted the inclusion of code examples in the current manuscript as the API is liable to change over time. To ensure that these examples stay up-to-date with the abagen API, we only include code in the online abagen documentation (https://abagen.readthedocs.io; citable via Zenodo; https://zenodo.org/record/3726257), which can be continuously updated along with the software package.

    1. Author Response:

      Reviewer #1:

      This study largely confirms prior observations, and the strength of the study is in its comprehensive nature rather than in shedding new insight into the effects of either FH or SDH loss. Nevertheless, there are some somewhat unexpected observations including a defect in proline synthesis, and changes in glutathione and NADPH metabolism that are interesting and incompletely explained. Some suggestions to strengthen the study include:

      1) Exactly how each perturbation affects cell proliferation is not clear. This should be considered, as whether some of the differences are a result in changes in growth or proliferation rate is possible, and will affect how they normalize their data.

      We thank the referee for raising a critical point and allowing us to clarify how we normalize metabolomics experiments. All the metabolomics data takes into consideration the cell number. Indeed, prior to metabolite extraction, cells are counted using a separate counting plate prepared in parallel and treated exactly like the experimental plate. In this way, differences in cell number are accounted for and the efficiency of metabolic extraction is preserved. Consumption release (CoRe) metabolomics also takes into consideration the proliferation rate during normalization since we normalise data to the final cell number. We have expanded the description in the relevant sections in the Methods.

      2) It is unclear why FH loss is different than SDH loss, and it is also somewhat surprising that the effects of acute and chronic loss of either enzyme are not that different. While explaining this is too much to ask, some additional speculation might be warranted.

      We postulate that FH loss is different to SDH loss for several reasons:

      A. FH is localized to mitochondria (specifically in the mitochondrial matrix) and the cytosol (cytFH). cytFH can translocate to the nucleus to regulate the DNA damage response (PMID: 26237645). In contrast, the SDH complex is only localized to mitochondria. As such, a loss of FH function is likely to have mitochondrial and extramitochondrial consequences.

      B. SDH is also the only TCA cycle enzyme that's physically associated with the electron transport chain (ETC) and tethered to the inner mitochondrial membrane, where it also regulates the ubiquinone pool. The different distribution of both enzymes within mitochondria is likely to influence their impact mitochondrial bioenergetics and on the overall metabolic profile.

      C. A major difference between FH and SDH loss is the accumulation of fumarate. As discussed in the manuscript, fumarate is a mildly electrophilic metabolite that can succinate GSH and protein cysteine residues to form a post-translational modification termed succination. Fumarate-mediated succination is known to impair iron-sulphur cluster metabolism and perturb aconitase and Complex I function (PMID: 29069586). This is just one example of how succination can affect cellular function. In contrast, SDH loss results in a decrease of fumarate and an accumulation of succinate.Unlike fumarate, succinate is not an electrophilic compound that can modify cystine residues, and so differences between FH and SDH loss are likely owed to succination, at least in part.

      D. While it hasn't been investigated in this study, succinate released from cells can bind to the succinate receptor SUCNR1, which is expressed in the kidney (PMID: 21803970). Autocrine and paracrine ligation of SUCNR1 by high levels succinate accumulation and release is likely to alter the metabolic and transcriptional landscape of the cells.

      Based on these observations, we also argue that the effect of acute and chronic enzyme inhibition is expectedly different regarding how the key metabolic and signalling hallmarks of FH and SDH loss develop and interact with each other over time. This hypothesis is part of work currently undergoing in our laboratory. For example, chronic SDH loss led to a significant increase in 20 metabolites however, acute SDH inhibition with TTFA and AA5 led to an increase in 60 and 50 metabolites, respectively. Chronic FH loss also led to a significant increase in 92 metabolites, whereas acute FH inhibition led to a significant increase in 49. The fact that only 2 metabolites overlap between all conditions indicates apparent differences between the loss of both enzymes on the metabolome and whether the loss is acute or chronic in nature. There are also notable differences between chronic FH loss and acute FH inhibition in relation to reductive carboxylation. Chronic FH loss triggers a higher accumulation of fumarate and succination of aconitase that impairs reductive carboxylation (PMID: 21849978); however, acute inhibition facilitates reductive carboxylation (Figure S2), likely due to lower levels of succination given the acute treatment. As such, we feel there are notable differences in the metabolite profiles and rewiring events associated with acute versus chronic enzyme inhibition. We have discussed these important points in the discussion section of the manuscript.

      3) The increase in glutathione and GSSG is interpreted as a consequence of increased oxidative stress, but that will not necessarily affect total levels.

      We agree that oxidative stress will not necessarily affect total glutathione levels and this finding is likely a time-dependent phenomenon due to persistent redox signalling. In this instance, the alterations in total glutathione levels are likely linked to transcriptional and post- transcriptional changes in GSH biosynthetic enzymes and the observed metabolic reprogramming. While ATF4 regulates the glutathione redox state and glutathione levels, it's not entirely clear if it is solely responsible for increasing total glutathione levels with TCA cycle inhibition. One possibility is that there is simultaneous activation of the transcription factor NRF2, which is a crucial regulator of glutathione synthesis and is known to be regulated by FH loss and succination (PMID: 22014567) and reactive oxygen species (ROS). ATF4 and NRF2 may cooperate to transactivate glutathione-related metabolic enzymes upon TCA cycle inhibition, as previously reported in other contexts (PMID: 23618921). Further investigation of the crosstalk between these two transcription factors is warranted in this context.

      4) The text in the Figure S1 PCA plots have legends is too small to read. This should be corrected.

      We thank you for this note and apologize for this oversight. We have now corrected the figure legends for the PCA plots.

    1. Author Response:

      We have now revised the manuscript to address the helpful comments and criticisms from the reviewers. The revised manuscript includes additional experiments demonstrating that inclusion of Csn2/Cas9 in the in vitro assays does not suppress the disintegration activity of Cas1-Cas2 to favor integration. These additional factors do not confer strand selectivity on integration either. Furthermore, the results of integration reactions using substrates mimicking PAM-containing pre- spacers have also been added.

      New figures and figure modifications at a glance:

      1) The new Figure 2 shows Cas1-Cas2 reactions in a linear target site and the effects of Csn2 and/or Cas9 on proto-spacer insertion into this target (Reviewer 1).

      The original Figure 2 (with slight modifications) is now moved to ’Supplementary Data’ as Figure 2-figure supplement 2, and shows proto-spacer insertion by Cas1-Cas2 into a nicked linear target site (Reviewer 2). Figure 2 is the only one in the main set of figures that has been extensively modified.

      2) The new Figure 2-figure supplement 1 (under ‘Supplementary Data’) shows the effects of Csn2, Cas9 or both on proto-spacer integration-disintegration by Cas1-Cas2 when the target site is present in a supercoiled plasmid (Reviewer 1).

      3) The new Figure 4-figure supplement 1 lists the sequences of the full- and half-target sites used for the reactions shown in Figure 4 (Reviewer 2).

      4) The new Figure 2-figure supplement 3 shows the insertion properties of PAM-containing pre- spacer mimics in reactions with Cas1-Cas2 alone or supplemented with Csn2, Cas9 or both (Reviewer 1).

      5) The new Figure 6-figure supplement 1 gives a structural perspective of the trombone substrates used for the reactions shown in Figure 6B, C (Reviewer 1).

      6) The original Supplementary Figure S8 showing assays for PAM-specific cleavage by Cas1- Cas2 has been removed (Reviewer 1).

      7) There are no changes in the other figures under ‘Supplementary Data’, although several have new numbers consistent with the revisions made.

      Public Review (Reviewers #1 and #2):

      The present work is a critical extension of the in vitro biochemical activities of the Cas1- Cas2 complex described by Wright and Doudna (Nat Struct Mol Biol, 2016; 23: 876-883). We have kept all experimental conditions nearly identical to those used by these authors to make the results from the two studies directly comparable. Importantly, we now show that the prior model for proto-spacer integration into the CRISPR locus by Cas1-Cas2 is an oversimplification of a much more nuanced mechanism.

      While both reviewers recognize the importance of our findings in challenging the current thinking on the adaptation mechanism of CRISPR immunity, they express reservations as to whether the in vitro results recapitulate the in vivo mechanism of spacer acquisition. This seems to us to be too broad a criticism from which few (if any) biochemical experiments can be immune.

      Our key finding is that disintegration during the second step of proto-spacer integration generates a DNA structure that has all the hallmarks of a DNA damage intermediate that the bacterial repair machinery can readily process into an authentic integration product. We invoke no new or ad hoc mechanisms, and the model we propose fits neatly into the DNA gap-filling mechanisms known to operate in DNA transposition pathways.

      The proto-spacer is functionally a ‘micro-transposon’, whose shortness imposes severe torsional strain on the transposition intermediate that precedes the final integration product. In vitro experiments suggest that transcription is potentially capable of resolving this intermediate (Budhathoki et al., Nat Struct Mol Biol, 2020, 27: 489-99). In principle, replication can also accomplish this task. Our study now demonstrates that simply nicking the DNA (disintegration) is an equally effective solution for relieving the topological stress accompanying integration. DNA loose ends can then be readily tied up by the bacterial repair machinery.

      We concur with the concluding sentence of reviewer 2, “The simple conclusion that Cas1- Cas2 catalyzed hydrolysis of a phosphodiester may relieve strain and allow productive transposition to occur doesn’t get emphasized enough in my opinion.” We have now expanded on this point in the revised ‘Discussion’.

      Reviewer #1:

      In addition, the in vitro system used here is only partially reconstituted. The substrates lack a PAM sequence, which is necessary for protospacers to be incorporated in the correct orientation and may help direct the first integration event to the L-R junction. Presumably because of this all the reactions presented do not analyze the orientation of the incorporated prespacer sequence. Cas9 and Csn2 are also absent (as are other potentially required host factors), which are necessary for correct integration in vivo.

      1A. Strand specificity: The in vitro integration reactions with the Cas1-Cas2 complex were done using a protospacer of the optimal size (26 nt on each strand with the four 3’- proximal bases on each strand as unpaired). Either proto-spacer strand is equally competent to initiate the strand transfer reaction, as could be inferred from Figure 3 of the original submission. Here, reactions utilized modified proto-spacers that differed in their top and bottom strand lengths. They gave two insertion products (IP) each at the L-R (leader-repeat) and R-S (repeat-spacer) junctions of a normal target site. In modified targets in which integration was limited to just the L- R junction, two insertion products were formed. One panel of Figure 3 (which is retained in the revised manuscript) showing the four insertion products from the normal target (lane 10) and two from the modified targets (lanes 11-13) for a protospacer with 26 nt and 31 nt long strands is displayed below.

      The ability of either proto-spacer strand to initiate integration is now more directly shown in Figure 2 (new) of the revised manuscript. Here the labeled top or bottom strand of the proto- spacer (PS) gave insertion products (IP) at the L-R and R-S junctions of the target site. Panel B of Figure 2 (pasted below) demonstrates this result.

      1B. Cas9, Csn2 included reactions: The data for reactions containing Csn2 or Cas9 or both were not shown previously, as they did not alter Cas1-Cas2 activity by promoting strand specificity of integration or suppressing disintegration. These results are now shown in the revised Figure 2 (linear target) and the new Figure 2-figure supplement 1 (supercoiled target). Portions of these figures are shown below.

      The relevant revised text describing the lack of strand specificity to proto-spacer integration by Cas1-Cas2 and the Csn2/Cas9 effects on integration is pasted below.

      Page 15, lines 229-235.

      "Unlike orientation-specific proto-spacer integration in vivo, Cas1- Cas2 reactions in vitro showed no strand-specificity (Figure 2B). This bias-free insertion of the top or bottom strand from the proto-spacer was unchanged by the addition of Csn2 or Cas9 or both to the reactions (Figure 2C-E). These proteins, singly or in combiantion, also failed to stabilize proto-spacer integrations in the supercoiled plasmid target (Figure 2-figure supplement 1). Instead, they inhibited plasmid relaxation. Inhibition could occur at the level of integration per se or strand rotation during integration-disintegration"

      1C. PAM-containing substrates: We have now tested Cas1-Cas2 activity (with and without added Csn2 or Cas9 or both) on PAM-containing substrates that mimic ‘pre-spacers’, Figure 2- figure supplement 3 (new).

      In these substrates, a proto-spacer strand of the standard length (26 nt; lacking PAM or its complement) is inserted at the L-R junction with higher efficiency than the longer strand (containing PAM or its complement). Following the first integration at L-R, the pre-spacer mimics containing > 26 nt in one strand or both strands are inhibited in the second strand transfer to the R-S junction. A portion of Figure 2-figure supplement 3 illustrating theses points is shown below.

      The revised ‘Results’ section has the following added description of the activities of PAM- containing pre-spacer mimics.

      Pages 16-19, lines 265-297. Cas1-Cas2 activity on pre-spacer mimics carrying the PAM sequence

      "The strand cleavage and strand transfer steps of proto-spacer insertion at the CRISPR locus must engender safeguards against self-targeting of the inserted spacer as well as its non-functional orientation. However, no strand selectivity is seen in the in vitro Cas1-Cas2 reactions with already processed proto-spacers lacking the PAM sequence (Figures 2 and 3). By coordinating PAM- specific cleavage of a pre-spacer with transfer of this cleaved strand to the L-R junction, the inserted spacer will be in the correct orientation to generate a functional crRNA. To examine this possibility, we tested the integration characteristics of pre-spacer mimics containing the PAM sequence.

      The inclusion of PAM or PAM and its complement in the integration substrates (Figure 2- figure supplement 3A) did not confer strand specificity on reactions with Cas1-Cas2 alone or with added Csn2, Cas9 or both (Figure 2-figure supplement 3B-E). Optimal integration by Cas1-Cas2 occurred with the 26 nt strands of the native protospacer with their 4 nt 3’-overhangs (Figure 2- figure supplement 3B-E; lanes 2). The pre-spacer mimics containing one or both > 26 nt strands had reduced integration competence (Figure 2-figure supplement 3B-E; lanes 4). Even here, the 26 nt strand with the 4 nt overhang (Figure 2-figure supplement 3C; lane 4) was preferred in integration over the longer 29nt PAM-containing strand (Figure 2-figure supplement 3D; lane 4) or the 33 nt PAM complement-containing strand (Figure 2-figure supplement 3E; lane 4). In contrast to the processed proto-spacer that gave nearly equal integration at L-R and R-S, IP(L- R) ≈ IP(R-S) (Figure 2-figure supplement 3B-E; lanes 2), the longer pre-spacer mimics were inhibited in integration at R-S, IP(L-R) > IP(R-S) (Figure 2-figure supplement 3B-E lanes 4). This is the expected outcome if the initial strand transfer occurs at L-R, and a ruler-like mechanism orients the reactive 3’-hydroxyl for the second strand transfer at R-S. This sequential two-step scheme for proto-spacer integration is consistent with the results shown in Figure 3 as well. These reaction features were not modulated by Csn2 or Cas9 (Figure 2-figure supplement 3B-E; lanes 6 and 8), although Csn2 plus Cas9 was inhibitory (Figure 2-figure supplement 3B-E; lanes 10).

      There is no evidence for integration accompanying PAM-specific cleavage in our in vitro reactions. In the E. coli CRISPR system, Cas1-Cas2 is apparently sufficient for PAM-specific cleavage in vitro (22). By contrast, in the S. pyogenes system, cleavage is attributed to Cas9 or as yet uncharacterized bacterial nuclease(s) (35). The mechanism for generating an integration- proficient and orientation-specific proto-spacer, which may not be conserved among CRISPR systems, is poorly understood at this time."

    1. Author Response:

      Reviewer #1 (Public Review):

      1. The authors use a surrogate marker for "slow" vs "fast" MNs: immediate vs delayed firing in response to rectangular current injection. They switch their language to call these motoneurons slow and fast, but they should be more cautious about doing so, given that firing is a surrogate marker that has not been fully studied and characterised across this time period. We do not know if developmental changes in Kv1 might also contribute to the effects on spiking seen here. I agree, there is delayed firing from the outset, which is interesting in itself given the rather homogenous other properties. But that doesn't mean that Kv1 expression is stable and thus non-contributory to the changes in firing described.

      As suggested in the editorial summary, we have changed all reference to fast and slow motoneurons to delayed and immediate firing motoneurons up until the discussion. Furthermore, we have expanded our discussion highlighting potential roles for Kv1 in shaping motoneuron rheobase across development as an interesting direction for future study. Page 19, Lines 630-639.

      1. The focus of the manuscript is on MN recruitment, but recruitment is never defined, despite being used in the title, through the abstract and key points, as well as throughout the manuscript. What they are looking at is response to current injected at the soma vs recruitment during a behaviour when synaptic inputs are bombarding an extensive dendritic tree. Thus, this manuscript does not look at recruitment per se, but rather activation of action potentials in response to intra-somatic stimulation. Accordingly, the term "recruitment" would be best kept to the Discussion.

      We have changed the wording in the title from ‘recruitment’ to ‘active properties’ in the title, changed references to recruitment current to rheobase in the results section, and re-addressed recruitment in the discussion. Further, we have highlighted the importance of where synaptic inputs terminate on motoneurons and implications for recruitment with emphasis placed on the compartmental localization between synaptic terminals and compartmental clustering of voltage-sensitive ion channels on Page 21, Lines 688-692.

      I also note that the "size" principle relates more to electrical than physical size in the 21st century; I agree that the two are correlated, but the authors may not want to stick to arguments about physical size.

      We have tried to clarify when we are referring to electrical or physical size within the revised text.

      1. Figure 6 is problematic, possibly in the way it's presented. It seems to me, but not clear, that the authors suggest that Ih is active at rest, more so in "F" than "S," and that therefore "F" are more depolarised and have smaller mAHPs. So (a) how come RMP didn't seem to come out in the PC analysis earlier? (b) would that not suggest that "F" would be "recruited" earlier? (Or is it that there is reduced sodium channel availability because of the depolarisation - what are the differences in spike amplitude and rate of rise?) (c) shouldn't the mAHP be larger if the cell is more depolarised and further from the potassium equilibrium potential? On the last note, maybe it's that Ih makes the mAHP smaller, but with the kinetics of Ih, wouldn't the decay be faster, but in Fig 6 the decay seems faster when Ih is blocked? Finally, Fig 6E suggests changes in the fAHP and delayed depolarisation (and spike width?) - how to these come into the picture? If the fAHP is thought to result from a high conductance state, and if therefore one were to align the voltages based on this potential, then the mAHPs would be about the same amplitude? The authors likely have explanations, but I'm afraid that I can't follow it.

      We agree that some aspects of Figure 6 were not ideal, particularly those related to RMP and mAHP. For example, we had attempted to utilise measures of mAHP to help provide evidence that Ih was active at rest. However, upon reflection, the data provided in the original manuscript did not achieve this as clearly as the new data we have added, where we now measure Ih at resting membrane potential. Furthermore, the subset of data utilised in the previous Figure 6B misleadingly suggested a slightly more depolarised RMP in delayed MNs. However, as noted in supplementary table 1, delayed firing MNs in fact have a more hyperpolarised RMP compared to immediate firing MNs. We have therefore removed some of these problematic aspects of the figure that lacked clarity and have become less important towards the main points we were attempting to make, given the new data we have added. The simple explanation for RMP not coming out as a strong contributor in the PCA is likely because this parameter is not a prime contributor of variance across MN type or through development. We did not find any difference in spike amplitude or rise time between subtypes (these data are summarized in Supplementary File 1). As is common practice, we injected bias current to measure parameters included in the overall PCA from approximately -60mV. However, in hindsight, this could have reduced potential effects of RMP on some of the properties related to activation.

      1. A number of labs have looked at development of MN properties, and it would be useful to compare properties seen across different labs, for example Quinlan (e.g. PMID: 21486770) and Whelan (e.g. PMID: 20457856, which is only mentioned in the manuscript) (and for that matter, mine - e.g. PMID 10564356, PMID: 32851667 although I don't want to self-promote).

      We have highlighted the consistencies between our results and those reported by others, including Whelan, Heckman, Zytnicki, and Brownstone Labs. This statement can be found on page 3, Lines 104-106.

      We have also included a comparison table in the supplemental information (Supplementary File 3) and referenced this table in the discussion on Line 528.

      1. In the Discussion, the authors might want to discuss propensity of F vs S MNs to express PICs / sustained firing as described in the Heckman lab (albeit particularly in cat; see PMID 9705452, for e.g.). How do these data correspond?

      It is interesting to note that the properties of PICs that we measured (ie. onset and amplitude) in immediate and delayed firing motoneurons are similar to those of fully and partially bistable motoneurons described by Lee and Heckman. In particular, we find that higher input resistance, immediate firing motoneurons have smaller PIC amplitude than delayed firing motoneurons but their PICs are activated at more hyperpolarized membrane potentials. This is consistent with fully bistable motoneurons, which are higher input resistance, and have smaller PICs that are activated at more hyperpolarized voltages compared to the partially bistable motoneurons. While we did not see bistability in the samples of motoneurons that we studied, a key difference is that we studied intrinsic properties in the absence of neuromodulation, which is a key factor for promoting bistability in motoneurons. Modulation of PICs might contribute to this propensity for bistability; however, modulation of outward currents is also very likely. We have highlighted these similarities and differences in our discussion on PICs and can be found on Page 19, Line 591-596.

      Reviewer #4 (Public Review):

      1) During the 1st postnatal week, authors suggest that fast and slow MNs cannot be distinguished neither on their passive properties nor the rheobase and therefore their recruitment is mainly based on the size. The conclusion that the recruitment is not linked to MN functional differences is difficult to follow since the main distinction between MN subtypes is based upon the presence of a delayed firing, an active property that regulates the recruitment of MNs (Leroy et al., 2014). However, the square current pulse adopted to discriminate between delayed and immediate firing in Figure 1 was replaced with a ramped depolarization protocol on which the authors measured the rheobase (Figure 3A1). This suggests that the slow depolarization in immature motoneurons might minimize the activation of ionic conductance(s) responsible for the delayed firing and thus may bias the measure of the rheobase (the minimal current amplitude of infinite duration). In line with this, the recruitment of a motoneuron has been shown to depend on the rate of membrane potential depolarization preceding a spike (Krawitz et al., 2001). Rather than using a slow ramp depolarization, it therefore seems more appropriate to assess MN rheobase with the current pulse protocol used to distinguish between MN subtypes. With this kind of measure, differences in the excitability of MN subtypes related to active conductances may come out earlier during development.

      The reviewer raises an interesting point, which is consistent with one also raised by Reviewer 1. We have now included additional analysis in which we calculated rheobase values from the long (5 s) square current steps that were used to identify delayed and immediate firing MNs. These rheobase measurements made using current steps correlate strongly with rheobase measured from slow ramps (W1: r=0.87 p < 1.0 e-15; W2: r = 0.95 p < 1.0 e-15; W3: r= 0.92 p < 1.0 e-15). This is consistent with findings from Leroy et al. (2014) and Buisas et al., (2012), who both also demonstrated similar rheobase values in response to ramps and current steps. Importantly, we also find similar developmental changes in rheobase across motoneuron subtypes when assessed with a current step - showing no difference in rheobase values between delayed and immediate firing cells at week 1, with differences emerging in week 2 due to a progressive increase in rheobase in delayed firing motoneurons at 2nd and 3rd weeks. These findings are included as a new Supplementary Figure (Figure 3 – Figure Supplement 1) and summarized in the results on Page 6, lines 173-178.

      2) During the 2nd postnatal week, the study suggests that "PICs contribute to the emergence of orderly recruitment amongst MN subtypes". This interpretation appears too definitive because the study did not provide direct evidence for that.

      This is a good point. We did not directly test the contribution of PIC maturation to the staggering of rheobase currents between weeks 1 and 2. We have revised this statement to soften our claim, restricting differences in PIC activation to the second week. This revision can be found on Page 9, Line 319-329.

      The authors describe a more hyperpolarized activation of PICs in slow MNs suggesting that the early recruitment of PICs in slow MNs may help them to fire before fast MNs. By a pharmacological approach the authors show that the sodium PIC mediated by Nav1.6 channels sets the activation threshold of PICs and that their blockade increases the rheobase (recruitment current). However, since pharmacological investigations have been done only in fast MNs, it is not very informative on the putative role of the sodium PIC (and PICs in general) on the orderly recruitment of MN subtypes. Similar experiments should be extended to slow MNs to compare the effects with those observed on fast MNs. If sodium PIC plays a significant role in the differential recruitment of MN subtypes, its blockade should induce an overlap in the recruitment of slow and fast MNs.

      We initially focused specifically on the roles of PICs in shaping recruitment of delayed firing motoneurons during weeks 2 and 3, because we were trying to account for changes in rheobase that occur within delayed firing motoneurons during this period of postnatal development. However, in response to the useful comments here and above, we have now conducted an additional set of experiments to determine the relative contribution of NaV1.6 (n = 12 MNs, 7 animals) and L-type calcium channels (n = 10 MNs, 7 animals) to PIC and rheobase in immediate firing motoneurons. These results have been integrated into Figure 4, the results section, and discussion.

      Furthermore, voltage clamp recordings to characterize PICs in MN subtypes have been done without blocking potassium conductances. Therefore it is difficult to determine if differences in PICs between MN subtypes are related to inward currents or opposing outward currents.

      We agree that we cannot rule out contributions of other currents to our measures of PIC in voltage clamp given that we did not block potassium conductances. This is indeed an interesting point. However, our approaches are consistent with previously published approaches (Quinlan, et al., 2011, Verneuil, et al., 2020), which may be useful for the purposes of cross-study comparisons. Of note, Quinlan, et al., (2011), did measure PICs in the presence and absence of TEA (n = 18), with no differences in PIC amplitude or onset found. However, recent work from the Brocard Lab has found opposing contributions of M-currents to measures of PICs in voltage clamp in Hb9 interneurons. This would therefore be an interesting direction for future study amongst motoneuron subtypes. As highlighted below, and in our revised manuscript, it is quite possible that outward currents may oppose and diminish the actions of Ih. This is also likely true for PICs and would be an interesting direction for future study. We have included an additional statement in our discussion on Page 19, Lines 630-639 to acknowledge this potential interaction and contribution to maturation of motoneuron recruitment.

      3) During the 3rd postnatal week, the authors suggest that fast MNs display a prominent Ih current at rest that provides a depolarizing shunt delaying their recruitment compared to slow MNs. However, data appear not enough conclusive for such interpretation. First, the relationship between the resting membrane potential (RMP) and the amplitude of Ih (the larger the Ih, the more depolarized the RMP) depicted in Figure 6B from a small sample of MNs is not consistent with values reported in the supplementary table 1. Indeed, fast motoneurons supposed to have a prominent Ih current display a more hyperpolarized RMP compared to slow MNs. The opposite would be expected according to the authors' hypothesis. A similar concern can be raised regarding the strong relationship between the amplitude of Ih and that of the AHP illustrated in Figure 6F, which is not in line with the lack of difference in the amplitude of the AHP between slow and fast MNs in week 3 (see supplementary table 1).

      As discussed above in response to similar comments from another reviewer, we realise that the data presented in our original manuscript (Figure 6) regarding the relationships between RMP, Ih and mAHP lacked clarity and perhaps depicted a subset of recordings that was not representative of the complete dataset reported in our supplementary table (as pointed out by the current reviewer). Furthermore, these data did not achieve our main objective of supporting the existence of a resting Ih current as well as the new data we have included - where we directly measure Ih at resting membrane potentials. Given the addition of these new data, and the potential oversimplification of our attempts to relate RMP, Ih and mAHP (e.g. Ih is unlikely to be the main contributor to RMP), the latter has been removed from the revised manuscript.

      In addition, we have expanded our discussion to highlight potential roles for Ih in shaping recruitment of fast motoneurons during periods of inhibition, such as during rhythmic activity, where the membrane potential often dips below -70 mV. Fast fatigable (MMP9+) motoneurons have been shown to receive a greater density of inhibitory synaptic inputs, particularly those derived from V1 interneurons, compared to slow (ERRB+) motoneurons (Allodi et al., 2021), and this differential synaptic weighting may create greater opportunity for Ih to be engaged and contribute to staggering recruitment as our pharmacology data suggests. This addition can be found on Page 20 Lines 657-663.

      Second, the inward current recorded in fast MNs to hyperpolarization at -70 mV appears not significantly affected by the Ih blocker ZD7288 (Figure 5J, and 5L) suggesting that Ih is not recruited at rest in this class of MNs.

      We realize that displaying the full IV plots for Ih at weeks 2 and 3 in addition to before and after application of ZD7288 (at week 3) may not have effectively illustrated the magnitude of Ih measured at -70 mV given the wide range of current values, which vary 10-fold between measures made at -70 and -110mV. We have modified our graphs in Figure 5 to better illustrate the magnitude of Ih measured at -70 mV and -110 mV. We hope that these modified graphs better capture our observations. We have further simplified Figure 5 to reduce redundancy in results that are summarized in Supplementary File 2 and the Results text. We have also included additional traces and analysis in Figure 6 that highlight a significant ZD-sensitive sag potential detected in delayed but not immediate firing motoneurons when hyperpolarizing the membrane potential from -60 mV to resting potential. These data can be found on Page 13, Lines 434-439, and in figure 6A, B. These results have been further supported by an additional set of recordings of delayed (n= 13) and immediate (n = 11) firing motoneurons obtained from week 3 animals, where Ih was measured during a voltage step (in VC) from a holding potential of -50 mV down to the respective resting potential of each cell (Del: IhRMP = -96 ± 60 pA; Imm: IhRMP: -0.13 ± 11 pA; t(23) = 4.8, p = 8.7e-5). These results have been included on Page 13, Lines 431-434.

      On the other hand, ZD7288 hyperpolarizes the RMP in fast MNs (Figure 6A) and reduces the amplitude of their sags recorded at -70mV (Figure 5M). Similar discrepancies are more striking for slow MNs. Slow MNs did not display inward current sensitive to ZD7288 above -80 mV (Figure 5N). However, ZD7288 unexpectedly hyperpolarizes their RMP (Figure 6C). How the authors can explain such discrepancies? An interestinsting, but unexpected observation is the hyperpolarization of the RMP by ZD7288 in immediate firing motoneurons, even though, we were unable to detect measurable Ih or sag at potentials at resting membrane potential in immediate firing motoneurons. We have two explanations for these observations.

      1.) One possibility is that, as pointed out above regarding our measures of PICs, we did not block other conductances during our voltage clamp protocols for measuring Ih. It is therefore possible that immediate firing (and even delayed firing) motoneurons express other ion channels that oppose Ih and may mask its true magnitude and effects on membrane potential. Indeed, this has previously been demonstrated for a variety of currents (eg. Kjaerulff and Kiehn, 2001; MacLean et al., 2003; Picton et al., 2018; Buskila et al., 2019). If this is true, then it is possible that the relatively smaller Ih in immediate firing motoneurons may have been masked, whereas the relatively larger Ih in delayed firing motoneurons may have been more apparent. In support of this possibility, we note that many of the immediate firing motoneurons demonstrate a slow hyperpolarization of the membrane potential during current steps intended to measure sag. Interestingly we also find this phenomenon in delayed firing motoneurons (that demonstrated depolarizing sag potentials at baseline), in the presence of ZD7288. We have included an example in the modified figure 6 to highlight this phenomenon. We have included additional discussion to highlight these caveats and possibilities (Page 20 Lines 664-675).

      2.) Alternatively, voltage and space clamp errors, may have caused an underestimate of Ih. While we expect space clamp errors to be greater in the largest motoneurons, such as the delayed firing motoneurons, it is possible that such errors may have been sufficient to mask the small, albeit significant Ih in immediate firing motoneurons. It is possible that blockade of this small Ih by ZD7288 in immediate firing motoneurons may have hyperpolarized their RMP due to their high input resistance. This has been highlighted on Page 20 Lines675-679

      Finally, there is a mismatch in values reported in supplementary table 2 and figures 5G and 5I. In the table, both Ih amplitude and Ih density (at -70mV) appear significantly different between slow and fast MNs in week 3, but not in figures 5G and 5I. Altogether, these results appear inconsistent.

      We have modified our graphs to better illustrate the range of inward currents measured at -70 and -110 mV, which due to such high variance (in some cases 10 fold comparing those measured at -70 and -110 mV), were not as apparent when showing the full IV plot. We have modified our graphs in Figure 5 to better reflect the data summarized in Supp Table 2, and capture our observations. Both the data in the table and the plots have been analyzed using the same 2 way anova with cell type and age as factors.

      Regardless of inconsistencies, data should be replicated at least with a second Ih blocker such as Ivabradine hydrochloride or Zatebradine hydrochloride.

      We have performed an additional set of experiments in delayed (n = 8) and immediate (n = 3) firing motoneurons with ivabradine. These new results are included in text on Page 14, Lines 457-465. 10 µM Ivabradine produced a 45% reduction in Ih, and consistent with ZD7288, hyperpolarized the RMP of both delayed and immediate firing motoneurons and caused a significant decrease in rheobase of delayed firing motoneurons.

      Minor concerns: 1) Does the pharmacological blockade of Nav1.6 channels 4,9-AH-TTX induce changes in the spiking threshold as already reported in cortical neurons (Hargus et al., 2013)? Such an effect may contribute to the higher rheobase observed in fast MNs under 4,9-AH-TTX (Figure 4M).

      We have included an analysis of spike threshold before and after application of 4,9-AH-TTX. In line with previous reports from cortical neurons (Hargus et al., 2013), 4,9-AH-TTX significantly depolarized the spike threshold of delayed and immediate firing motoneurons and could contribute to the higher rheobase observed in delayed firing motoneurons following blockade of Nav1.6. The results from this analysis have been included in text and can be found on page 9, Line 306-310.

      2) The study reported a more depolarized PIC in fast MNs during the 2nd postnatal week but the acceleration onset voltage in response to a current ramp depolarization (attributed to the activation of PICs), is similar between slow and fast MNs at the same age (Figure 4C). This is in discrepancy with figure 4G, where a significant effect on PIC onset voltage is shown within the same time points.

      Indeed, there are differences between delayed and immediate firing motoneurons in the onset voltage of the PIC measured in voltage clamp at week 2 and these findings are not mirrored by differences in the accelerating phase of the membrane potential depolarization as measured from depolarizing current ramps in current clamp mode. We believe that this difference likely reflects that measurements made in current clamp are indirect estimates of PICs, whereas voltage clamp protocols provide more direct and likely more sensitive measurements. This is highlighted on Page 8, Line 263-265.

    1. Author Response:

      Reviewer #1:

      Cesanek et al. performed a series of experiments designed to reveal whether or not the encoding of motor memories for novel object weight carries categorical structure. That is, given a set of objects of varying sizes and with weights that must be learned through experience, are objects grouped into categories such that the learned weight of one object generalises to objects within the same category, but not to objects outside of that category. Their results convincingly demonstrate the presence of such a categorical encoding. They show the following:

      1) The weight of an outlier object is not learned if its weight is near the weight predicted by category membership.

      2) The weight of an outlier object is learned if its weight is far from the value predicted by category membership.

      3) The weight of an outlier object is learned if there is no category structure binding the remaining objects (i.e., there is no category against which an outlier can be defined).

      4) If an outlier object is learned, then it influences the estimated weight of the other category members.

      5) If the weight of an outlier object is learned first in isolation, it is unlearned when the remaining objects are introduced if and only if its weight is near the value predicted by category membership.

      6) The threshold that constitutes "near" or "far" from the category boundary depends on recent sensorimotor experience.

      7) Learning of the outlier is all-or-nothing on a per participant basis.

      The major strength of the paper is the persuasiveness of the behavioural experiments, which were designed soundly and yielded clear results. There is little doubt that the some motor memories carry the type of categorical encoding detailed by the authors.

      Thank you for this clear summary of our main findings and the positive evaluation of the persuasiveness of our study.

      The major weakness of the paper is that it does not make strong contact with the relevant existing literature to clearly show that categorical encoding is (1) a truly novel behavioural observation in the motor learning literature, and (2) that it is inconsistent with the predictions of common motor learning theories and models. In particular, the authors own prior work frames motor learning as being governed by multiple internal models that can be switched between depending on contextual cues and environmental demands (see reference below). Insofar as contextual effects can drive similar results as categorical memory encoding, it is unclear how and why this and related models would fail to account for the present data.

      Wolpert, D. M., & Kawato, M. (1998). Multiple paired forward and inverse models for motor control. Neural Networks, 11(7-8), 1317-1329.

      With respect to the first issue, we do not know of any literature that discusses categorical effects in motor control. In our revision, we have added text emphasizing that existing models of motor control cannot account for categorical effects because they do not contain any form of categorical encoding. With respect to the second issue, please see our response to Essential Revision #2, where we explain that the MOSAIC model of Wolpert & Kawato (1998) is effectively an associative learning model, not a categorical learning model, and hence could not explain our data.

      Reviewer #2:

      Using a novel 3D robotic device, the authors had participants learn to lift four training similar-looking objects whose weights were linearly correlated with the sizes and then tested how this training influenced the later memory formation for the middle-sized object with different densities. When the difference between the actual weight and the weight estimated from the linear relationship was not so large (i.e., within a family boundary), surprisingly, the training for lifting the test object was ineffective: The estimation of the object weight was constrained by the linear relationship of the family between size and the weight. The memory specific to the new object could be developed only when the difference was large enough.

      The results were unexpected from the conventional idea that object properties are encoded in an "associative map." The authors interpreted these results as evidence that the motor memory for lifting objects with different sizes and weights could be formed according to the "object family effect." All results of other control experiments were consistent with this interpretation.

      I was intrigued by the counter-intuitive results that the motor system sticks to estimate the object weight based on the family property even though this estimation is incorrect and induces the error. Although it remains unclear how such a memory for multiple objects is integrated from memory for each object, it is sure that this study has demonstrated a new aspect of motor memory while manipulating the objects with different sizes and weights.

      Thank you for this positive summary of the impact of our study.

      Reviewer #3:

      In this paper, Cesanek et al. use a novel object lifting task to investigate the "format" of memories for object dynamics. Namely, they ask if those memories are organized according to a smooth, local map, or discrete categories ('families'). They pit these competing models against one another across several experiments, asking if subjects' predicted weights of objects follow the family model or a smooth map. This was tested by having people train on objects of varying volumes/masses that were either consistent with a linear mapping between volume/mass, or where those dimensions were uncorrelated. This training phase was either preceded by, or preceded, a testing phase where a novel object with a deviant mass (but a medium-size volume) was introduced. As the authors expected, individuals trained on the linear mappings treated novel objects that were relatively close to the "family" average mass as a member of the family, and thus obligatorily interpolated to compute the expected mass of that object (i.e., under-predicting its true mass); conversely, when a novel object's mass was a substantial outlier w/r/t the training items, it was treated as a singleton and thus lifted with close to the correct force. Additional variations of this experiment provided further evidence that people tend to treat an object's dynamical features as a category label, rather than simply forming local associative representations. These findings offer a novel perspective on how people learn and remember the dynamics of objects in the world.

      Overall, I found this study to be both rigorous and creative. The experimental logic is refreshingly clear, and the results, which are replicated and extended several times in follow-up experiments, are rather convincing. I do think some additional analyses could be done, and data presentation could be improved. I also thought the generalization analysis, as I interpreted it, was difficult to align with the initial predictions.

      Thank you for this assessment of our work. In particular, please note that we have modified the generalization analysis based on some of your recommendations, and we feel that it is now more convincing and easier to understand.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, Khan et al. investigated the roles of SARS-CoV-2 proteins on activation of immune cells. The authors found that the macrophage cell lines such as human THP-1 cells and mouse RAW 274.7 cells with recombinant viral proteins, and found that only spike proteins (S1 and S2) could potently activated macrophages to produce pro-inflammatory cytokines and chemokines.

      Strengths:

      It was Intriguing that only spike proteins (S1 and S2) could potently activate macrophages to produce pro-inflammatory cytokines and chemokines. The authors also observed that direct contact of macrophages with spike protein transfected epithelial cells, that mimic viral infection, resulted in the activation of macrophages. Detail analyses showed that spike proteins were recognized by Toll-like receptor 2 to activate NF-kB signaling. In vivo mouse experiments further supported the in vitro experiments. This study revealed a pathogenic of the SARS-CoV-2 spike proteins that is directly activating the host inflammatory responses, which therefore may have a profound impact in understanding a novel aspect of the cytokine signaling that is involved critically in the COVID-19 pathogenesis unveiled for the first time.

      Weaknesses:

      Recent report by Shirato and Kizaki (Heliyon 7(2021) e06187: 10.1016/j.heliyon.2021.e06187) showed that SARS-CoV-2 spike protein can stimulate macrophages (RAW 264.7 cells and THP-1 cells) to produce pro-inflammatory cytokines via TLR4-dependent manner. This is likely to contradict this study. The authors must thoroughly argue these controversial observations.

      We are grateful to the reviewer for finding our study impactful. We also appreciate the reviewer’s suggestion to discuss why our findings are inconsistent with other studies. In fact, three recent studies attempted to explore the role of SARS-CoV-2 structural proteins in inflammatory responses (Shirato K, Heliyon, 2021; Zhao Y, Cell Res, 2021; Zheng M, Nat Immunol, 2021). While two studies demonstrated that the Spike protein is responsible for triggering inflammation (Shirato K, Heliyon, 2021; Zhao Y, Cell Res, 2021), the other study found that the E protein is inflammatory (Zheng M, Nat Immunol, 2021). Further, as they tried to identify the cellular sensor for S and E proteins, the first two studies demonstrated that S protein is sensed by TLR4, which contradicts our findings. Notably, Dr. Kanneganti’s group nicely showed that Tlr2-/- macrophages do not produce cytokines during SARS-CoV2 infection, while Tlr4-/- macrophages are responsive. These data are consistent with our finding that TLR2 but not TLR4 is the sensor for S protein. It is intriguing that the findings of all these studies are partially consistent and partially contradictory. We believe that the discrepant results of other studies are possibly due to contamination of bacterial pattern molecules. Indeed, we noticed that other studies used recombinant proteins generated in E. coli. Thus, it is possible that those recombinant proteins were contaminated by LPS and other bacterial PAMPs, which may activate the TLR pathways. Since the inception of our studies, we were concerned about the possible contamination of recombinant SARS-CoV2 proteins. Therefore, throughout the study, we used recombinant proteins generated in mammalian cells (HEK293T cells). We used three different S proteins – S1, S2, and S tri from two different commercial sources (RayBiotech and R&D), and we found that all S proteins triggered the inflammatory response (Figure 1- Figure supplement 1E). Further, both S subunit and S-tri are sensed by TLR2, but not TLR4 (Figure 5E and 5F). We have included a discussion on this concern in Page 16.

    1. Author Response:

      Reviewer #1 (Public Review):

      Rarely do I read a paper and have so little to criticize. The paper is very well written and the studies have been conducted carefully and described fully. The only comment that I would make is that the divergence of quaternary structure in CTPS filaments brings to mind some well-known parallels that should be cited and discussed. Perhaps the most prominent one is hemoglobin, where this protein can form tetramers in some species that are very different from the vertebrate tetramers. In plants, I believe that dimeric hemoglobins have been described. Another example would be the diversity of filaments formed by actin-like proteins in bacteria, while in eukaryotes the actin filament architecture has been extremely conserved.

      We thank the reviewer for pointing out the evolutionary parallels with well-known polymers, and have updated our discussion with a comparison to diverse actin architecture.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper seeks to address how animals use sensory feedback from multiple sensory modalities during the navigation of complex environments. In particular, the authors are concerned with the use of odor-based signals for search behaviors, as odors are sparse and do not provide reliable directional information. An additional goal of this paper is to model the general principles they extract from their experiments so that they can be applied to engineered systems. Through the use of a virtual reality (VR) system that combines visual, wind, and odor input in a controlled manner, the authors investigate the search behavior of the silkworm moth, Bombyx mori. The females of the species emit a pheromone that triggers both a search behavior as well as mating behavior in males. Whereas previous work has taken advantage of the highly stereotyped nature of this behavior, it is unclear how these animals integrate other sensory cues such as wind and vision to locate an odor source. By presenting these stimuli in different combinations where the odor is always present, the authors find that the presence of a wind source strongly influences the animal's behavior-specifically its ability to locate the target-although the direction from which it is presented is crucial when all three are together. Multimodal input also affects the moths' walking speed and turning behavior, and an updated model that incorporates wind direction outperforms other models of this behavior.

      The conclusions of the paper are generally supported by the presented data, although there are some issues with the framing of the paper and the model that I feel should be addressed.

      1) The introduction and discussion go to great lengths to emphasize the technical advance of the constructed VR apparatus. Although it is certainly an impressive achievement, the use of VR for exploring insect behaviors like search and navigation is hardly a new development at this point in time. Indeed, work on flies, Drosophila and otherwise, going back decades has used VR to extract principles regarding issues such as visually mediated flight control or walking. More pertinent to the ideas of multimodal sensory integration explored here, over the past decade numerous researchers have combined visual input with odor and other cues to discern the relative importance of each of these modalities during search behavior. For example, see Duistermars and Frye, 2008. Generally, I feel that the paper overemphasizes the technical advance without providing sufficient biological context. So much work has been done on Bombyx that a paper using these methods has the ability to address, but much of that literature is absent from the paper. I think focusing more on the behavior will broaden the appeal of the paper by putting it in conversation with a well-established phenomenon

      Thank you for providing these insights. As you pointed out, we improved the quality and discussion of the paper by adding literature to the introduction because there were few descriptions of the multisensory integration of other insects. Regarding VR research, the recent definition of VR is that "virtual reality creates a physical and a mental space for people." In other words, we need to connect a device that can provide multisensory stimuli to organisms (physical space) and a mental space where avatars reflect their actions. Based on this definition, we claim that most biological experiments with insects have been conducted in physical space (just utilizing multisensory stimulators) and have not measured the purposeful behaviors that result from connection to the mental space. Behavioral experiments using a multi-sensory stimulator have revealed how each modality is utilized, but it has not been directly investigated how other modalities affect actual navigation predominantly utilizing an odor, such as in the current study. In fact, the previously proposed insect-inspired algorithm for navigation was modeled based on experimental data using a multisensory stimulus device, but as a result of simulation, it behaves differently to the actual insect. On the other hand, the model derived from the experimental results using the VR system can reproduce the behavior of the insect (silkmoth). In other words, we claim that in order to model the superior functions of an organism, it is necessary to do more than provide multisensory stimuli, and it is important to obtain the relationship between multisensory stimuli and behavioral output when the organism performs the actual purposeful behavior (e.g., navigation behavior). However, as you pointed out, we necessarily revised the manuscript including those comments because it is a fact that wonderful biological findings have been clarified by multisensory stimulation experiments.

      2) I think that the model is well-done and fits with the goals of the paper. Asking about the role of wind direction in this behavior is an important step given the behavioral data presented. However, I am not convinced based on the data that the new model developed by the authors is much better than the surge-zigzag model. The success rates are slightly different (is the statistical difference a function of the number of runs or timesteps?) and both models search about the same amount of time before finding the source. Finally, the migration probability maps are rather similar, so it is hard for me to conclude that factoring in wind direction is necessary to get good performance out of the model.

      Thank you for your valuable comments. Because a surge-zigzagging algorithm utilizes only odor stimuli to determine actions, an agent moves to the edge where the change in odor information occurs most. The behavior of MiM2 is modulated by information on wind direction and it moves into the areas with a high probability of odor reach. Hence, we thought that the effectiveness of the model could be shown by starting the search for the agent near the edge. Consequently, we set the initial position (x, y) = (300 ± 100) in the area where the odor is hard to reach and carried out the simulation. As a result, the MiM2 algorithm, which actively moves in the area where the probability of odor reach is high, was significantly better than other conventional algorithms. As for the search time, because MiM2 is based on the surge-zigzagging, the movement speeds were almost the same; therefore, both algorithms (MiM2 and surge-zigzagging) require almost the same search time. However, we found that there is a difference in the search success rate because there is a difference in the selected route.

      Revised part: Lines 282–311 (red letters) at “Modeling and validation of behavioral modulation mechanisms”.

      Reviewer #2 (Public Review):

      This paper uses a multi-model virtual reality system to assess which combinations of visual, wind, and olfactory information male silk moths rely on to find a female. The overall conclusion is that for the moths to search effectively, wind direction information is an important input. Vision, on the other hand, while it is used to control angular velocity, does not appear to be important for the moths to search effectively. Given what is known about other walking and flying insects these results are not surprising. Although the virtual reality system is advertised as being able to provide naturalistic and complex stimuli, the visual stimuli are limited to a traditional LED array, and the olfactory stimulus is created by projecting a 3-dimensional plume into two dimensions. The analyses of the data are rather simplistic and do not provide a mechanistic description of what the moths are actually doing on a moment-by-moment basis. The authors then proceed to construct a model for search behavior that uses frequency and relative timing information for the odor stimulus in conjunction with the wind direction information. This search algorithm is ever so slightly better than the prior surge-zigzagging algorithm. The role of relative timing information in the olfactory signal supplied to the moths in the virtual reality experiments, however, was not investigated.

      Thank you for your valuable comments. We can respond to the following points:

      Effects of vision: Our biological experiments demonstrated that the visual stimulus maintains the balance of the angular velocities of the left and right rotational movements. When the effect of this balance on the function of an odor search was confirmed by simulation, the path became longer due to the bias in the search trajectory, and the search time became significantly longer. In addition, the success rate decreased as in the biological experiments. This suggests that an efficient odor-source search can be performed by maintaining the balance between the left and right angular velocities.

      Limitation of visual stimulus: The main purpose of our study was the integration of other sensory information during navigation using odor. Therefore, the problem becomes complicated when cognition, i.e., visual object identification, is included. For that reason, we adopted optical flow as a visual stimulus, which does not involve cognition but affects behavior. Because there were several studies using LED arrays to provide optical flow to an insect with compound eyes, we conducted experiments using LED arrays this time as well. We had already tested whether the optical flow by the LED array could be provided correctly (see Supplementary Material).

      The plume model: We utilized plume diffusion, which was observed in two dimensions as a virtual odor field. We captured the movement of particles on a two-dimensional plane using particle image velocimetry technology. Please refer to the Supplementary Material for details.

      The behavior analysis: By plotting each trajectory in the behavior analysis and calculating a histogram of the change in heading angle, we investigated what kind of feature difference occurs during the odor search depending on the conditions of sensory stimulation. Regarding the odor, which is a cue/stimulus, we analyzed the behavioral change with respect to the "frequency", instead of in response to a single stimulus. The reason for this is that recent studies have shown that while searching the environment (1) the frequency of the odor changes with distance from the odor source, and (2) the behavior of Drosophila in walking also changes with frequency. By analyzing the behavioral changes with respect to the odor detection frequency, we modeled the behavior during the search based on the odor frequency information. As a result, we found that not only the behavior close to the movement of a silkmoth could be reproduced compared to the conventional model, but the search success rate could also be increased.

      Simulation: As you pointed out, we increased the conditions for verification of simulation because the simulation conditions were limited. As a result, we found that the proposed model has a higher odor tracking performance than the other conventional models even if the initial position is different. However, as you pointed out, this is a relatively simple condition of simulation; therefore, in the future, we plan to implement the algorithm in an autonomous robot system and verify the performance of the algorithm in more complicated spaces, such as obstacle environments or an outdoor environment.

    1. Author Response:

      Reviewer #1 (Public Review):

      This work presents a new Bayesian method for detecting those patterns of neural responses that are connected to behavioral output and are worth investigating further. The manuscript contains the derivation of the approach and its test on synthetic and real neural data.

      The derivation should be improved by providing additional steps. For example, it was not clear how Eq. 5 was derived and why the double derivative with respect to parameters theta_mu and theta-nu are present ( these terms appear to be missing in the definition of log-likelihood).

      Thank you for the suggestion. We have added steps in the derivation leading to the Ising equation for the indicator variables, now in Eq. (8). These intermediate steps corresponds to the two main approximations of the BIA method, namely, the saddle point approximation for the posterior (Eq. (5)) and the Taylor expansion in the inverse regularization strength (Eq. (6)). We hope that these changes improved the readability of the derivation.

      Parameter M should be more clearly defined as the number of samples. It is briefly mentioned on line 170, but it was difficult to connect this to equation (7) and those following that use M explicitly.

      We thank the Reviewer for the suggestion. We have clarified the definition of M just after Eq. 11.

      Is it possible to include multiple binary quantifications of behavior, similarly to how words are constructed from neural spike trains? For example, one can envision describing a particular song segment with respect to multiple binary features simultaneously.

      We explicitly examine this question in “Dictionaries for exploratory vs. typical behaviors” and the corresponding Figure 6, which repeats our analysis for different binary discretizations of our behavioral data.

      Reviewer #2 (Public Review):

      Summary:

      Hernandez et al propose a new statistical tool for identifying codeword in multivariate binary data (for instance neural activity patterns), with a small number of measurements. It demonstrates the utility of the approach on neural responses to analyzing the statistical structure of songbird responses and how they change in different contexts (during exploration vs typical song production).

      Strengths:

      • The approach is innovative, in that it takes advantage of clever tools from sparse linear regression, in particular a method termed Bayesian Ising Approximation (BIA), to be able to identify codewords individually, rather than directly estimating a model of their joint statistics, by comparing to a null model that assumes independence across dimensions. This approach has the advantage of resulting in a very flexible model, with very few assumptions about the statistical structure of the data, that is applicable for a range of datasets sizes; the more data is available, more of the structure underlying it can be revealed .
      • The strong mathematical foundations provides clear bounds on data regimes in which the approximation is theoretically well justified and reasons to expect that the estimated models are minimal and interpretable.
      • The numerical estimation procedures are fast, and computationally efficient (for a reasonably sized neural dataset, can be run on a regular laptop).
      • The code is available on github for quick community dissemination.
      • Application to identification of behaviorally relevant patterns of co-activity goes beyond previous Ising-based models used in neuroscience.
      • When applied to songbird data, it reveals that the variability in neural responses during exploration has much more structure than previously thought.

      Weaknesses:

      • Although the paper is written as a methods paper, emphasizing the technical contributions and promising wide applicability to a range of different types of datasets, the numerical validation of the method is very much restricted to the statistical regime of the songbird dataset. From the perspective of a potential future user of the tool it's less clear how the method would behave on different datasets, and what needs to happen in practice for adopting the tool to data with different statistics.

      We have edited second half of the abstract and a few sentences in the Introduction (see latexdiff file) to make it clear that our main applications to date have been to songbird data.

      • The numerical comparison to other existing methods is minimal.

      We have argued in our previous submission that there really are no other methods to compare to, designed to work in the regime similar to uBIA. It seemed to us that it would be unfair to run other methods on our datasets, see them not work well (as expected – because they make assumptions that are invalid in our regime), and then claim success. However, since the concern has been raised again, we really have to address it. To do this, we added a section in the Online Methods “Direct application of MaxEnt methods to synthetic and experimental data”, in which we compare uBIA to the relevant interactions model of Ganmor et al., with which uBIA has the highest similarity. The results are as expected – a method not designed for our data regime fails. We emphasize here again that the relative superiority of uBIA on these data should not be taken as a slight directed at other methods, but rather as an indication than, to cover different data regimes, multiple methods should be combined. We emphasized this in the “Overview of prior related methods in the literature” supplemental section.

      • The songbird analysis already reveals some challenges with respect to interpretability: in particular it is not clear how much information about the underlying neural processes can be revealed by summary statistics generated by the method, such as the number of codewords and their length distribution.

      The reviewer is correct that our analysis of the songbird data raises a number of important questions for future studies. Although these remain to be answered, we emphasize that before the biological interpretation of over/underrepresented neural patterns can be attempted, such patterns must first be identified. uBIA therefore represents a crucial advance in our ability to address these questions.

      Most conclusions are reasonably supported by the data. The analysis of the irreducibility of the codewords has insufficient support based on the numerical simulations. Moreover, the generality of the tool and comparison to other methods are discussed in almost entirely theoretical terms, which makes the claim on immediate utility for other datasets less convincing, especially outside the neuroscience community.

      We hope that the addition of the new comparison figure partially alleviates these concerns. Additionally, we point out that 3rd and 4th order words are long, as most others deal with just pairs, as illustrated in the new Figure 7. Indeed, it is not easy to fit an N = 20 Ising model with 4th order terms, because there are 20 ∗ 19 ∗ 18 ∗ 17/(4 ∗ 3 ∗ 2 ∗ 1) = 4845 terms in this model, which cannot be fit from just a few hundred samples, which is precisely why the Ganmor model fails in this case (Fig. 7).

      Nonetheless, the idea is quite interesting and likely of broad interest for theorists interested in the development of unsupervised statistical tools for neural data analysis, with practical applicability for a range of modern systems neuroscience experiments that involve task specific ensembles as the building block of circuit computation.

    1. Author Response:

      We thank the reviewers for highlighting the importance and potential clinical significance of our findings. The concentration of bisphosphonate drug in tissues outside the skeleton, such as lung, are unknown and we recognise that further studies are needed to determine whether the effects of a bisphosphonate that we describe in mice also occur in humans with standard clinical doses of these drugs. Nonetheless, our findings add further weight to the view that bisphosphonate therapy has benefits beyond just preventing bone loss and could be considered as prophylactic agents to reduce the risk of pneumonia in individuals with osteopenia or osteoporosis, who are already eligible for treatment under standard clinical guidelines.

  2. Oct 2021
    1. Author Response:

      Thank you to the reviewers and editors at eLife for the comments on our manuscript. We believe we can address or rebut all of the reviewer comments, and our responses are included below.

      Reviewer #1 (Public Review):

      In this manuscript Bigge et al. use chemical inhibitors and a mutant in ARPC4arpc4 mutant to investigate the role of the Arp2/3 complex in regulating cilia length and assembly in Chlamydomonas. The authors have previously shown that the actin cytoskeleton is required for ciliary assembly and maintenance in this organism, but the precise mechanism(s) involved were unclear. Furthermore, while previous studies targeted the actin cytoskeleton in general, the current study focuses on branched actin networks regulated by the Arp2/3 complex. The authors first demonstrate that chemical inhibition of the Arp2/3 complex leads to shortening of existing cilia, a phenotype that is recapitulated in the arpc4 mutant and which can be rescued be reintroducing V5-tagged ARPC4 in the latter mutant. Next, using similar approaches they show that initial stages of cilium biogenesis are also impaired upon Arp2/3 complex inhibition. They next use a variety of approaches, mostly involving chemical inhibitors and F-actin- or membrane dyes, to assess the mechanism by which Arp2/3 complex affects ciliary biogenesis. They provide evidence indicating that Arp2/3 complex specifically promotes endocytosis at the plasma membrane to support lipid and protein for the growing ciliary membrane. This is an interesting discovery that advances our understanding of how ciliary membrane biogenesis is regulated, especially in Chlamydomonas.

      Thank you for these comments and the accurate summary of our findings.

      Reviewer #2 (Public Review):

      Previous studies have demonstrated that interfering with actin polymerization leads to the shortening of flagella in Chlamydomonas cells, indicating that F-actin is important for ciliary/flagellar elongation. However, the precise roles of F-actin, and especially of branched actin networks nucleated by the Arp2/3 complex in cilia formation have not been elucidated. Here, the authors aimed to examine one of the mechanisms that may be involved in F-actin-dependent cilia elongation/formation, namely, actin- and clathrin-dependent endocytosis of membrane proteins.

      The authors used both pharmacological inhibition and genetic disruption of the Arp2/3 complex to demonstrate that interfering with the activity of the Arp2/3 complex reduces the ability of Chlamydomonas to form or elongate cilia. The authors then showed that incorporation of existing (not newly synthesized) proteins into cilia is perturbed by the genetic or pharmacological inhibition of the Arp2/3 complex, and that new membrane for building cilia may be derived via an Arp2/3-mediated membrane retrieval pathway.

      These experiments are carefully performed and convincing. A description of the Arp2/3 complex components and clathrin-containing structures in Chlamydomonas is novel and will be of interest to the cell biologists working on this model organism.

      The authors propose that the Arp2/3-mediated actin assembly promotes cilia elongation/formation due to the contribution of the branched actin to clathrin-dependent endocytosis. This is an intriguing idea that ties together previous findings showing that F-actin is needed for cilia elongation and that some ciliary proteins are internalized from the plasma membrane and then redistributed to cilia. One concern regarding this portion of the manuscript in that the experiments addressing the role of endocytosis rely solely on the use of a pharmacological inhibitor of clathrin-dependent endocytosis, PitStop2. PitStop2 was previously shown to have non-specific effects on endocytosis, suggesting that its mechanism of action may not be directly related to disrupting clathrin heavy chain interactions (see, for example, Willox et al., 2014).

      We thank this reviewer for their helpful comments regarding the work presented in our manuscript. We understand the concerns regarding the off-target effects of PitStop2. However, the PitStop2 data is only one of multiple lines of evidence that we provide supporting endocytosis occurring in these cells (internalization of membrane dye in Figure 5, internalization and relocalization of a ciliary membrane protein in Figure 6, actin patches in Figures 4 & 7, the presence of plasma membrane proteins in the cilia in Figure 8). Further, the requirement for Arp2/3 in the phenotypes listed points to an endocytic mechanism as the Arp2/3 complex has repeatedly been shown to be involved in actin dynamics in endocytosis in other systems. To determine the mechanism of endocytosis most likely occurring in these cells, we searched the Chlamydomonas genome for proteins commonly thought to be involved in endocytic pathways. We propose clathrin mediated endocytosis is occurring in these cells because this is the only pathway where the most important components of the mechanism are present in Chlamydomonas. This is opposed to every other form of endocytosis which is lacking some major component. For example, Chlamydomonas do not contain endophilin for endophilin-mediated endocytosis, flotillin for flotillin-mediated endocytosis, or caveolin for caveolin-mediated endocytosis. Therefore, our conclusions regarding clathrin mediated endocytosis do not rely entirely on PitStop2 data. The PitStop2 data is merely present to support the findings of our comparative analysis and other endocytosis assays. Further, mutants of clathrin heavy or light chain does not exist and cannot be reliably generated in this organism.

      We now have further evidence that endocytosis is occurring in these cells using dynamin inhibitors. Specifically, using the dynamin inhibitor, Dynasore, reduces membrane uptake in a dye internalization assay. We plan to include this data in the next version of the paper.

      Intriguingly, a previous paper by Kim et al., 2010 demonstrated that interfering with the Arp2/3-dependent actin assembly resulted in cilia elongation in mammalian cells, suggesting that branched actin assembly was counteracting growth of cilia. Similarly, actin depolymerization is known to promote ciliogenesis in mammalian cells. The difference between mammalian actin organization/cilia growth regulation vs. the new observations in the Chlamydomonas system should be discussed by the authors to help the readers understand whether the findings can be generalized to the diverse ciliated cell types or are unique to algae.

      While Chlamydomonas is an excellent model for ciliary studies due to the structural and mechanistic conservation of the cilia in relation to mammalian cilia, there are several important differences between mammalian cells and Chlamydomonas cells that might influence ciliary dynamics. These differences have been reported on by others in the field (Jack et al. 2019). Briefly, one difference is that Chlamydomonas cells have a cell wall, which means they have no need for a cortical actin network as mammalian cells do. This cortical actin network in mammalian cells is thought to potentially block access of basal bodies and ciliary proteins from the cortex of the cell. This in turn blocks ciliary formation and assembly.

      We now have data that a mechanism of ciliary elongation mediated by lithium, which is conserved between mammalian cells and Chlamydomonas, is blocked by loss of the Arp2/3 complex, suggesting that ciliary/membrane growth not just from zero length but also from steady state requires the Arp2/3 complex. This data will be reported in a subsequent publication. While we could speculate on the conservation of ciliary growth between mammalian cells and Chlamydomonas in this paper, a more well-supported speculation would be in the subsequent paper where we directly discuss the role of the Arp2/3 complex in a conserved ciliary growth process.

      Reviewer #3 (Public Review):

      Wild-type Chlamydomonas cells possess a pool of ciliary precursors that, in conjunction with newly synthesized precursors, are used to build a new cilium after de-ciliation. The cellular and molecular mechanisms the underly retrieval of the pool are unknown. Given the role of the Arp2/3 complex in endocytosis in other systems, it was reasonable to test whether it also functions in reclaiming of the pool of precursors. The central finding is that cells bearing a mutation in an essential Arp2/3 component, ARPC4, fail to assemble cilia in a timely fashion after de-ciliation. Although the failure of assembly is well-documented here, the manuscript lacks evidence that cells missing ARPC4 actually establish a pool of ciliary precursors in the first place. Without such information it is not possible to determine the cellular function that fails to occur after de-ciliation in the mutant: Retrieval of a pool of ciliary precursors? Establishing the pool during the ciliary assembly that occurs after cell division? Synthesis of the precursors as new cilia are formed? Sensing loss of cilia and activating the events needed for re-ciliation?

      Upon discovering that there was almost no initial ciliary assembly in mutants of the Arp2/3 complex caused by loss of the functional Arp2/3 complex, we immediately suspected that either it lacked a precursor pool, as the reviewer suggests, or that mutants were unable to incorporate it. To discriminate between these possibilities, you typically need to be able to regenerate cilia in cycloheximide. However, the inability of genetic mutants to regenerate in cycloheximide prevents us from being able to do the typical studies testing new protein synthesis, precursor pool size, and new protein incorporation as they all require regeneration in cycloheximide. Therefore, to get around this roadblock, we used an acute perturbation through chemical inhibition on wild-type cells that have a normal ciliary precursor pool (as evidenced by their ability to grow to half-length in cycloheximide). These cells were deciliated and then the Arp2/3 complex inhibitor CK-666 (in addition to cycloheximide) was added only for the regrowth, and thus it was not able to affect the size of the precursor pool. Cells treated with CK-666 and cycloheximide could not incorporate the precursor pool we know exists in these wild-type cells (Figure 2B). We referred to these results in the text saying: “This can be further dissected because we know in the case of cells treated with cycloheximide and CK-666, the protein pool is available but is still not incorporated, suggesting a problem with membrane incorporation or delivery as opposed to a problem with protein availability” (Lines 195-200). We understand that this piece of data may be confusing the way it is displayed (only the final growth length after 2 hours), so we plan to provide the full regeneration graphs in the supplement.

      While we are not able to concretely rule out a problem with signal transduction alerting cells that they have lost their cilia or with protein synthesis, we believe these are lesser effects because some arpc4 mutant cells do grow immediately following deciliation and because eventually cells reach near full length in the absence of cycloheximide (Figure 1C). However, we plan to test protein synthesis immediately following deciliation using semiquantitative PCR for the next iteration of this manuscript.

      Finally, we provide several pieces of orthogonal data to suggest that membrane incorporation is a problem in these cells. First, we show that blocking membrane delivery from the Golgi causes increased ciliary shortening in arpc4 mutant cells compared to wild-type cells. This suggests there is an alternate source of membrane that is defective in the arpc4 mutant cells (Figure 3). We go on to show that actin structures that form near the membrane at the base of the cilia are absent in the arpc4 mutant but present in wild-type cells, and that these structures increase following deciliation (Figures 4 and 7). We show that aprc4 mutants are less capable of internalizing membrane than wild-type cells (Figure 5) and less capable of internalizing a ciliary membrane protein for mating (Figure 6). Lastly, we show that the pattern of ciliary membrane proteins that came from the cell body plasma membrane is altered in arpc4 mutants suggesting defects in at least one pathway (Figure 8). Based on this entire body of data together, in addition to our CK-666 data on wild-type cells with an existing precursor pool, we feel that the best-supported model is one in which a defect in membrane delivery may be responsible for the impact we see on ciliary assembly in cells lacking a functional Arp2/3 complex.

      Other conclusions also were not sufficiently supported by the experimental results, including the following: Clathrin function in endocytosis: Pitstop2 was described as specifically blocking clathrin-mediated endocytosis, but reports in the literature indicate that Pitstop2 also blocks nuclear functions and clathrin-independent endocytosis. Experiments with an anti-clathrin antibody were interpreted as showing mislocalization of clathrin in the arpc4 mutants. The antibody was raised against a peptide near the N-terminus of human clathrin, but the antibody was not validated, and it was not reported whether Chlamydomonas clathrin even has that peptide.

      We understand the concerns regarding the off-target effects of PitStop2, and as this concern was shared by reviewer 2 we have addressed this above.

      If further evidence is needed of endocytosis occurring in these cells, we also have new data suggesting that dynamin is also involved in these processes.

      We thank the reviewers for this comment, and plan to add a western blot confirming the clathrin light chain antibody works in Chlamydomonas in the next version. It is true that mammalian antibodies do not always work in Chlamydomonas. This particular antibody was selected based on the immunogen sequence. The immunogen of this antibody was the most similar we could find to the Chlamydomonas sequence (61.5% similar). Our intent with the included data was to highlight the similarities between both membrane internalization and clathrin behavior whether endocytosis or the Arp2/3 complex are inhibited.

      Internalization of a protein from the plasma membrane: In the experiments to use protease-sensitivity to examine SAG1-HA relocalization induced by db-cAMP, the authors assumed that all of the SAG1-HA was on the cell surface in untreated cells and the 2 chemically treated cells, but they never experimentally documented this assumption.

      We respectfully disagree that our assumption is that all of the SAG1-HA is on the surface prior to induction. In fact, our data support the opposite, that there is indeed some percentage of SAG1-HA on the surface and some percentage of SAG1-HA already inside the cell even in uninduced cells given that some portion is protected from trypsin treatment. Regardless of whether the full population of SAG1-HA is on the surface, we can still draw conclusions about the amount of SAG1-HA internalized under the conditions through quantification of the changes in band intensity. This specific assay can only report on whether there is an internalization defect for a plasma membrane protein destined for the cilium.

      Arp2/3 relation to actin dots: Structures termed actin dots that stained with Phalloidin were reported to undergo changes after de-ciliation of wild-type cells and were missing in the arpc4 mutant. The conclusion that the properties of the dots in the wild-type cells changed with de-ciliation were not supported by statistical analysis. Also, without experiments showing localization of Arp2/3-V5 at the dots, it was not possible to assess whether, as the text asserts, Arp2/3 functioned at the dots.

      Statistics were not done for Figure 4 where we show dots are present in wild-type cells and absent in the arpc4 mutant cells because it is clear even without statistics. We do not feel that including an infinitesimally small p value added any substantive information to this piece of data. However, statistics can and should be included in the next version for Figure 7 which shows an increase in dots (both percentage of cells with dots and number of dots per cell) following deciliation. To fully convince readers, we also plan to include images with a larger field of view showing a population of cells, which will make it clear that dots increase following deciliation.

      We attempted to look at ARPC4-V5 localization in Figure 1 Supplemental Figure 2, but saw mostly diffuse localization with high noise. The ARPC4 component of the Arp2/3 complex may be diffusely localized with only some proportion being incorporated into the complex at sites where it is active, or the diffuse nature could be due to overexpression, or both. While we do not definitively show that overexpressed ARPC4 is at the dots in addition to its diffuse localization, we do not believe that this prevents us from drawing conclusions about Arp2/3 complex function in the dots. We do show that the Arp2/3 complex is required for dot formation as cells lacking a functional Arp2/3 complex do not have dots while cells that contain the Arp2/3 complex do have dots. Further, expression of the ARPC4-V5 construct rescues the presence of the dots.

      Cilia resorption induced by BFA in arpc4 mutants: In the experiments with the Golgi-active agent BFA, the percent ciliation in the arpc4 mutants, but not wild type, fell rapidly after drug addition. The authors concluded that BFA induced ciliary resorption, but they did not determine whether the lack of cilia on cells was a consequence of cilia resorption or cilia detachment.

      As the reviewer points out, the percent ciliation of the arpc4 mutant cells treated with BFA fell rapidly after drug addition. We believe this is due to resorption and not ciliary detachment because we see ciliary length decrease over time (not all at once). Additionally, in Figure 3B, we only measure cells that have cilia, and we know that arpc4 mutant cells cannot regrow their cilia to half-length in 1 hour (Figure 1C). Therefore, we know that the cilia we are measuring are in fact shortening. Further, there was no increase in free cilia in the samples. Images can be provided in the supplement for the next version.

    1. Author response

      Reviewer #3 (Public Review)

      Major concerns:

      1. The most substantive concern is that there is an alternative explanation for the data which must be ruled out in order to conclude that mutations are occurring during the study period. Consider the following scenario. Suppose that B cell clones expanded and diversified through somatic hypermutation prior to the study period (that is, prior to the secondary vaccination event which is the focus of the study). It seems that preferential expansion of highly mutated subclones during the study period could bias detected sequences towards more divergent sequences, even without ongoing somatic mutation during the study period. Preferential expansion of divergent sequences would give rise to higher average divergence as the study period goes on, giving the appearance of accumulation of additional mutations, but in fact these mutations had occurred prior to the study period and are simply more readily detected in the sparsely sampled repertoire sequencing data after their expansion. Far from being simply a pathological counter-example, this scenario seems biologically plausible, given that B cells harboring more divergent, affinity-matured sequences should generally have higher affinity antibodies that allow them to better compete for limited antigen and thus provide stronger division stimulus. This model predicts that some highly divergent sequences exist at early timepoints and would occasionally be detected. An example of this is seen in Fig 3C, where a divergent tip from an early sample time is present (labeled PB and colored blue, in the middle of the diagram), indicating that this divergent sequence was present early. While the authors' model of ongoing evolution is supported, this alternative model also appears to be consistent with the data and must be ruled out in order to conclude that clones are accumulating mutations during the study period, which is the central claim and most interesting and impactful finding of the work. The authors must provide evidence that their approach can distinguish between these scenarios. This could potentially be accomplished using simulations of the two scenarios to determine the power of the approach to distinguish between them.

      We agree that this is a potential alternative explanation for a significant positive correlation between divergence and time, and have now addressed this as a possibility in the Results and Discussion sections. We have also removed explicit references to detecting “ongoing SHM” in the text, in favor terms that more directly reflect what our test detects such as “B cell evolution” or “increasing SHM frequency” which do not imply novel SHM over the sampling interval. Nevertheless, we believe our results are more easily explained as a result of ongoing SHM, and have added some text making that point. In the context of influenza vaccination, day 5 plasmablasts represent the breadth of the B cell memory pool. If measurable evolution were due solely from preferential recall, we would expect the divergences of sequences at later timepoints to fall within the range of day 5 plasmablasts. Instead, in the high-GC influenza binding lineages we identified (Fig 3B/C), many late-sampled GC sequences are clearly more diverged from the day 5 plasmablast response. Further, if measurable evolution from influenza vaccination were due simply to preferential re-stimulation of highly mutated B cells, we would expect influenza binding lineages without any GC sequences to be measurably evolving. To test this, we repeated the analysis in Fig 3A using only lineages containing influenza-binding monoclonal antibodies (mAbs). Results were highly consistent with Fig 3A: influenza-binding lineages without GC sequences were less likely to be evolving than those with high proportions of GC sequences (Figure 3 – figure supplement 3). Thus, significant GC involvement, rather than simply binding to influenza, is more predictive of measurable evolution. All of these points are more easily explained if measurable evolution is the result of additional SHM. Nonetheless, we cannot definitely rule out this alternative explanation, we have highlighted both possible mechanisms of B cell evolution. We have included descriptions of this new analysis in the Results (pp. 14-15) and Discussion (pp. 20-21).

      1. Statistical support for measurable evolution appears to be lacking in several key examples. The reported percentages of measurably evolving lineages in several scenarios (7.2% for primary hepatitis B vaccination; 6.5% for allergen-specific immunotherapy; 5.9% for HIV infection) are near the false positive rate of the test (5% of lineages measurably evolving). The authors have performed this test on datasets from ~21 studies, raising a concern that multiple hypothesis testing could give rise to false positives in some of the datasets. These results are interpreted as evidence of measurable evolution, even though they could seemingly be explained by the false discovery rate combined with multiple hypothesis testing. The authors should clarify how these results can be interpreted in light of the false positive rate of their test and multiple hypothesis testing, and must consider whether more conservative conclusions are warranted in these scenarios.

      We appreciate the reviewer’s concern and have added a new section on multiple hypothesis testing to the Discussion detailing these caveats (pp. 19-20), as well as additional details to the relevant Results sections (p. 10). We also repeated our initial germline divergence analysis that used “adjusted” measurably evolving lineages without the multiple testing correction, and found similar results (Figure 2 – figure supplement 4). We also repeated these analyses using a more strict cutoff (adjusted p < 0.05), which also yielded similar results (Figure 2 – figure supplement 4). These are discussed in the main text

      Minor comments

      1. The authors define measurably evolving populations as systems undergoing mutation and selection rapidly enough to be detected. Despite this definition, the test employed for measuring evolution appears to focus purely on accumulation of mutations without examining selection per se. Mutations accumulating neutrally could be detected as measurable evolution. For the sake of clarity, the authors should explain more clearly whether their test examines selection and ensure that initial definitions are consistent with later usage. It may be interesting to further examine whether the mutations detected as measurable evolution in antibody lineages are neutral or selected using classical tests for selection, such as the dN/dS statistic or summary statistics of the site frequency spectrum.

      We have now made it clearer in the initial definition that measurable evolution does not necessarily require selection. This definition is more in line with the original definition of measurably evolving populations from Drummond et al 2003: “We define measurably evolving populations (MEPs) as populations from which molecular sequences can be taken at different points in time, among which there are a statistically significant number of genetic differences.

      1. Statistical support for the association between GC B cells and measurable evolution should be clarified. On p14 L7-8, it is reported that "6.5% of lineages containing sequences from GC B cells were measurably evolving, compared to only 3.7% of lineages with no identified GC sequences." However, this does not constitute convincing evidence for the association because this difference of proportions is not significant. The proportions are 3 lineages measurably evolving among 46 lineages containing sequences from GC B cells, and 4 lineages measurably evolving among 107 lineages not containing sequences from GC B cells. Applying Fisher's exact test for a difference of proportion yields P = 0.43. While the evidence based on the trend in Fig 3A (of fraction measurably evolving against GC sequence percentage) is compelling, the authors should clarify whether each difference of proportions is significant. Providing statistical support for the trend itself, such as through bootstrapping or simulation, would seem most direct.

      We have now included a bootstrap analysis of this relationship to demonstrate its significance.

    1. Author Response:

      Reviewer #2 (Public Review):

      The authors benchmark their workflow and analyses using fairly well characterized compounds that are relatively potent against established targets. However, the authors appear to use significantly higher concentrations than the reported activity for these inhibitors and observe relatively few stabilized targets. Similarly, the corresponding measured induced-stabilization fold change at these concentrations often appear to be 1.5-2 fold. For example, SCIO-469 has reported in vitro potencies of ~10nM against MAPK14, ~100nM against MAPK11, with ~1000-fold selectivity over other kinases (including other MAPKs), and cell-based IC50s of ~300nM. However, the authors use 100 micromolar of SCIO-469 in their solvent-PISA profiling experiments, where they observe ~2-fold change for MAPK14 and ~1.5 fold changes for MAPK12 and MAPK9, and MAPK11 does not appear to be detected. This might suggest that solvent-PISA might not be sensitive to detecting stabilization to less-well developed compounds, decreasing its utility to identify targets of bioactive compounds that are less characterized/developed. It would be informative if the authors provided context for the concentrations of the small molecules that they use and provided some assessment of the sensitivity of this approach in regard to required compound potencies/target affinities.

      We thank the reviewer for the rigorous assessment of our manuscript. We agree that the concentrations of compounds used in our experiments are much higher than the respective IC50s. Because any thermal or chemical stability measurement represents the average denaturation point of every copy of an individual protein in a proteome, the ability to detect a meaningful curve shift depends on saturating the target with the ligand. This means that the small molecule concentrations generally need to be high. This is even more important in lysate-based experiments, in which the cellular architecture is lost and the proteome is massively diluted. Moreover, the concentrations we employ are in line with similar studies (Zhang et al., Anal. Chem., 2020, 92, 1363-1371, Gaetani et al., J. Proteome Res., 2019, 18, 4027-4037, Savitski et al., 2014, Science, 346). We also agree that overall the fold changes of the PISA assay are relatively small (1.5-2 fold). In solvent-PISA experiments, the magnitude of the fold changes are not only affected by the ligand concentration and the shift in denaturation point, but also the denaturation point of the protein and the concentrations selected. We have investigated the relationship between the fold changes and these factors in the PISA assay in a former paper (J. Proteome Res. 2020, 19, 5, 2159–2166). We showed that the fold changes are inherently small in the classic PISA assay, which is determined by the nature of sigmoidal curves. We also showed that optimization of the selected temperatures ameliorated the issue (J. Proteome Res. 2020, 19, 5, 2159–2166). In this manuscript, we also showed in Figure 4C-E that a careful selection of the concentrations improved the magnitude of the fold changes in the solvent-PISA assay. Importantly, although the fold changes are relatively small (1.5-2 fold), TMT-based multiplexed quantitative proteomics is capable of analyzing both control and treated samples in multiple replicates in one experiment, which minimizes the variance and has robust sensitivity in detecting these fold changes.

      Reviewer #3 (Public Review):

      This is a highly interesting work providing an alternative method for drug target deconvolution for thermal proteome profiling. The experiments are thoroughly performed, and the conclusions are mostly supported by the obtained data. The only conclusion that needs further support is the one of the complementarity of SPP and TPP (as in "these two approaches share much in common, they remain distinct and likely serve to complement one another").

      We appreciate that reviewer raised this point and would be happy to clarify this statement. Commonality – Both approaches rely of protein denaturation to determine target engagement. Furthermore, either approach could be used to screen ~70% of the proteome for ligand-induced changes in thermal- or chemical-stability. Lastly, both approaches follow a similar workflow, take approximately the same amount of time to prep and measure, and in the end generate similar data (Figure 3). Complementarity – We note that there are certain proteins that denature well in only a single condition and that combining the two approaches allows one to cover a greater fraction of the proteome than either approach, individually (Figure 5). Furthermore, the two approaches can also be used to corroborate one another. If one observes a ligand-induced thermal shift, for example, then SPP could be used to provide greater confidence in this hit and vice versa. It seems that our original statement was far too general and made with knowledge of data that had not yet been presented in the manuscript (mainly figure 5). We modified our conclusion in the main text to more accurately reflect the data presented specifically in Figure 3. The new conclusion is stated below and can be found on pages 10 and 11 of the revised manuscript: “Overall these data suggest that while these two approaches are capable of generating a similar set of putative targets, these lists are not completely identical. Therefore, SPP and TPP appear to be complimentary, not only because they can provide independent corroboration but because one method could potentially identify a target that the other might miss.”

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript begins with the larger notion that comparing item similarity is an important principle that guides human behavior, and that these similarity representations can have both a general and an idiosyncratic component. While individual-specific representations have been identified in some visual processing areas using personally meaningful object stimuli or simple stimulus features, this study looks at complex real-world stimuli with no personal meaning. They observe that even using the same stimuli across people, there are differences in how people judge the similarity of same-category items, and these differences correlate with performance on comparing the identities of images presented in sequence. Further, they examine the visual stream – specifically early visual areas like the early visual cortex (EVC) and lateral occipital complex (LOC), and late visual areas like the perirhinal cortex (PRC) and the anterior lateral entorhinal cortex (alErC), to see how representations in the brain relate to these behavioral representations. They observe that while EVC and LOC show correlations with behavior, PRC and alErC really show the strongest links to the individual-specific representation and to fine-grained differences across stimuli.

      The analyses in this paper are very methodologically sound. They rely on well-controlled and well-tested analyses (e.g., testing if representational similarity is indeed higher for comparisons with one own's behavior, versus someone else's). They also replicate their results using classification-based analyses. My only key methodological question is more about experimental design. Given that participants performed the object arrangement task right before entering the scanner, I wonder if similarities in the brain to their own behavior could be due to memory for the representation they created just prior (especially given the role of PRC and alErC in memory). So, if instead, participants were shown and interacted with someone else's similarity arrangement, I wonder if these regions would show more similarity to that other person's arrangement, or still show similarity with one's own representations. It is thus currently unclear if the current findings are due to some deep-seated individual, internal representations, or memory for a recently performed task.

      We thank the reviewer for highlighting numerous strengths in our methodological approach. We note that our experimental design was intentionally designed to have participants complete the iMDS sorting task prior to completion of the 1-Back task in the scanner. This ensured that all participants had the same exemplar familiarity during scanning. We cannot rule out the possibility that this order led to priming effects, as suggested by the Reviewer, which may have facilitated the emergence of observer-specific effects. Notably, however, such priming effects could be expected to affect all exemplars equally whose neural representations were probed during scanning. Critically, we also obtained new behavioural evidence for this resubmission, now included in Supplementary Materials, revealing that reports of perceptual similarity in our iMDS task reflect an observer characteristic that is temporally stable rather than just a situational idiosyncrasy. In the follow-up behavioural experiment (Supplementary Figure 2; page 49), “a distinct group of 30 participants completed two sessions of the iMDS task for the 10 object categories separated by 7 days +/- 1 day later. Correlations were computed between each participants’ perceived similarity Representational Dissimilarity Matrices (RDMs) from Session 1 and from Session 2. The mean within-subject correlation across the two sessions was 0.84, indicating high stability of participant’s perceived similarity ratings one week apart. Intersubject correlations for perceived similarity ratings across all exemplars and categories. Correlations were computed between each participant’s RDM Session 1 with the mean RDM (excluding the participant) in Session 2. Mean inter-subject correlation was 0.68. Critically, a paired t-test(intra-subject>inter-subject correlation) confirmed that intra-subject correlations were significantly higher than inter-subject correlations (p<0.0001). This pattern of results indicates that the perceived similarity structure that is unique to the individual observer is a stable characteristic.” This new behavioural experiment provides critical support for the perspective that our findings elucidate individual differences equivalent to those discussed as observer-specific effects in the vision literature more broadly (Mollon, Boston, Peterzell, & Webster, 2017, Individual differences in visual science: What can be learned and what is good experimental practice? Vision Research).

      The results presented in this work are very clear and fit in well with previous findings on idiosyncracies in visual areas (Charest et al., 2014), and various work on the PRC as it relates to oddball tasks and object representational similarity. One question I am stuck with in this work is whether these current results show us something surprising or new. I'm unsure if we would have expected anything different for these generic real-world stimuli (versus the personally meaningful stimuli, or limited visual features tested previously).

      We acknowledge that the paper by Charest et al., 2014 was an important stepping stone for the work presented in our manuscript. We have modified the framing of our main research questions so as to place more emphasis on levels (or grain) of perceived similarity among category exemplars that are reflected in subjective reports and object representations in different VVS regions. We also place more emphasis on the representational-hierarchical model as the theoretical framework that allows for related predictions and that guides our interpretation. In the interpretation of our results in the Discussion, we also speculate that observer-specific effects in fine-grained similarity perception, and in corresponding representations in PrC and alErC, may reflect interindividual differences in category expertise. Here we make reference to recent behavioural findings (i.e., Collins & Behrmann, 2020; Minos, Ferko, & Kohler, 2021) and present hypotheses about neural representations that can be directly tested in future fMRI studies with training paradigms. Indeed, we are in the process of planning to conduct such follow-up work in our laboratory. Inasmuch as the current study (i) revealed a mapping of perceived similarities among exemplars of object categories to representations in PrC and alErC (regions traditionally not considered to be part of the VVS and not included in analyses reported by Charest and colleagues); (ii) was guided by the representational-hierarchical model of VVS organization for interpretation of findings in PrC/alErC vs more posterior regions; (iii) showed an impact of observer-specific perceived similarities on behaviour that was most pronounced for fine-grained discrimination; and (iv) involved computational modeling to help interpret differences between observer-specific and observer-general (i.e., averaged) representations in different VVS regions, we feel that the contribution of our study clearly goes beyond replicating idiosyncrasies in VVS object representations as previously reported in Charest and colleagues’ pioneering work.

      The manuscript frames the study around the idea that similarities are an important guiding principle of behavior. But this statement is not necessarily so obvious to me – is judging similarity itself an important ecological behavior, or is it just that looking at similarity structures can give insight into underlying relationships in how we represent information? (The latter is how I often see these sorts of representational similarity analyses.) What is this similarity task really capturing about our representations for these objects, and why do these idiosyncrasies emerge? My main hesitation about the current work is that I struggle with seeing a scope beyond a replication (e.g., finding behavior-correlated idiosyncracies in the brain, but with a different stimulus set, and in a slightly similar but expected region). I really want to know what factors are driving these idiosyncracies (e.g., is it visual? mnemonic? semantic?), and what this implies about the mechanisms of the PRC and ERC.

      This is a really interesting point to consider in the context of prior lesion research on the role of PrC in perceptual discrimination. In this rich literature (see Murray et al., 2007 for review) emphasis has been placed on the perception of similarities between objects as probed with oddity-discrimination tasks, which require observers (human or nonhuman) to judge perceived similarities among multiple objects. Findings of behavioural impairments on this task after medial temporal-lobe lesions that included PrC and ErC have played a key role in the development of the representational-hierarchical model that guides the interpretation of our research. While the results of this lesion research have critically informed theoretical arguments that PrC plays a role in perceptual discrimination of objects (see Bonnen et al., Neuron, 2021, for a recent computationally focused review), it is important to recognize that they do not provide a characterization of similarity structure of neural representations in PrC and alErC, nor a characterization of the transformation of representations from more posterior VVS regions to these regions in the medial temporal lobe. Moreover, lesion findings do not address whether neural representations in the medial temporal lobe capture the perceived similarity structure that is unique to individual observers. Critically, our behavioural results also directly reveal effects of the perceived similarities that observers reported on discrimination performance in the 1-back task we employed during scanning. Inasmuch as this task taps discrimination between exemplars of real-world categories we would argue that the examination of representational similarity structure in our study also sheds lights on ecologically relevant behaviour.

    1. Author Response:

      Reviewer #1 (Public Review):

      This study sought to systematically identify the components and driving forces of transcriptome evolution in fungi that exhibit complex multicellularity (CM). The authors examined a series of parameters or expression signatures (i.e. natural antisense transcripts, allele-specific expression, RNA-editing) concluding that the best predictor of a gene behavior in the CM transcriptome was evolutionary age.

      Thus, the transcriptomes of fruiting bodies showed a distinct gene-age-related stratification, where it was possible to sort out genes related to general sexual processes from those likely linked to morphogenetic aspects of the CM fruiting bodies. Notably, their results did not support a developmental hourglass, which is the rather predominant hypothesis in metazoans, including some analysis in fungi.

      The studies involved analyses of new transcriptomic datasets for different developmental stages (and tissue types in some cases) of Pleurotus ostreatus and Pterula gracilis, as well as the analyses of existing datasets for other fungi.

      There are diverse interesting observations such as ones regarding Allele Specific Expression (ASE), suggesting that in P. ostreatus ASE mainly occurs due to cis-regulatory allele divergence, possibly in fast evolving genes that are not under strong selection constraints, such as ones grouped in youngest gene ages categories. In addition, a large number of conserved unannotated genes among CM-specific orthogroups highlights the rather cryptic nature of CM in fungi and raises as an important area for future research.

      Some of the key aspects of the analyses would need to be better exemplified such as:

      – Providing a better description of the developmentally expressed TFs only in CM species

      – Providing clear examples of the promoter divergence that could be the underlying mechanism behind ASE. In particular, for some cases, there may be enough information in the literature/databases to predict the appearance or disappearance of relevant cis-elements in the promoters showing the highest divergence in genes depicting the highest levels of ASE.

      We appreciate the constructive comments of the Reviewer and have revised the ms in accordance with the suggestions. In particular, we link different parts of the ms better to each other, provided a more detailed discussion of developmentally expressed TFs (lines 615-621). We also provide case studies of ASE genes with cis-regulatory divergence (Figure 5 and see below), although we note that these analyses are based on inferred and not directly determined motifs, so they should be considered as preliminary.

      We had considered using TF binding motifs previously, and now we gave a try to analyzing potential transcription factor binding sites in divergent promoters. We find that there are no P. ostreatus transcription factors for which motifs based on direct evidence are available; rather, all P. ostreatus motifs are based on extrapolations from experimentally determined motifs (typically in Neurospora crassa). Therefore, to avoid too general motifs, we used only those where at least 5 nucleotides show at least 80% expected frequency in the PWM-s. This left us with 158 motifs (126 excluded). High motif binding score (>=4) and self-rate (>=0.9) were also required to ignore false positive hits. Different binding ability and lack of binding in one of the parental genomes were counted for each promoter. We found that genes with allele specific expression (ASE S2 and S4) show significantly higher differences in motif binding (lacking motifs, or different binding ability) than non-ASE genes (Fig. A1). These observations show that, not only promoter divergence, but differential predicted TF binding ability is also more common among ASE genes than among non-ASE genes. This supports our conjecture that ASE arises from cis-regulatory divergence.

      *Fig A1: The left plot below shows the number of cases when the promoter of one allele of an allele pair in the two parent genomes has, but the other lacks a motif. The right plot shows the same in terms of difference in binding score.*

      We could find examples, such as the allele specific expression of PleosPC15_2_1031042, a Hemerythrin-like (IPR012312) protein which might be regulated by the conserved c2h2 transcription factor, containing zinc finger domain of the C2H2 type (Fig. A2). C2h2 has already been proved to be important during the initiation of primordia formation with targeted gene inactivation (Ohm et al 2011, https://pubmed.ncbi.nlm.nih.gov/21815946/). A binding site of c2h2 was detected in the upstream region of PleosPC15_2_1031042. There is a mismatch in the inferred binding motif which causes a reduced binding score in PC15 (Fig. A2/c). Indeed the PC9 nuclei contribute better to the total expression of this gene.

      Despite this, and other (not shown) examples that we have found, we were not convinced about the reliability of this approach. There are many assumptions in this analysis, the positional weight matrices (PWM) that we used, are all based on indirect evidence, high number of loci these PWMs identify, uncertainty in the position of binding site from transcriptional start site, relation of difference in binding motif and expressional changes. We consider these factors to potentially contribute too much noise to the analyses for these to be robust, therefore, we are hesitant to include these results in the ms.

      *Fig A2: An example for promoter divergence a) expression of c2h2 transcription factor (TF) in P. ostreatus. b) allele-specific expression pattern of PleosPC15_2_1031042 from the two parental genomes. c) inferred binding motif of c2h2 TF and a detected potential binding site in the upstream region of PleosPC15_2_1031042 gene.*

      Reviewer #2 (Public Review):

      The evolution of complex multicellularity represents a major developmental reprogramming, and comparing related species which differ in multicellular structures may shed light on the mechanisms involved. Here, the authors compare species of Basidiomycete fungi and focus on analyzing developmental transcriptomes to identify patterns across species. Deep RNA-Seq data is generated for two species, P. ostreatus and Pt. gracilis, sampling different developmental stages. The authors report conflicting evidence for a "developmental hourglass" using a weighted transcription index vs gene age categories. There is substantial allele-specific expression in P. ostreatus, and these genes tend to have a more recent origin, have more divergent upstream regions and coding sequences, and are enriched for developmentally regulated transcripts. Antisense transcripts have low overlap with coding regions and low conservation, and a subset show a positive or negative correlation with the overlapping gene. Comparison to a species without complex multicellular development is used to further classify the developmental program.

      Overall the new transcriptional data and extensive analysis provide a thorough view of the types of transcripts that appear differentially regulated, their age, and associated gene function enrichment. The gene sets identified from this analysis as well as the potential to re-analyze this data will be useful to the community studying multicellularity in fungi. The primary insights drawn in this study relate to the dating of the developmental transcriptome, however some patterns observed with young genes and noncoding transcripts are primarily reflective of expected patterns of evolutionary time.

      We appreciate the Reviewer’s nice words on our ms, we think the revised version has substantial improvements in many aspects listed above.

      Reviewer #3 (Public Review):

      Fungi are unique in forming complex 3D multicellular reproductive structures from 2D mycelium filaments, a property used in this paper to study the genetic changes associated with the evolution of complex 3D multicellularity. The manuscript by Merenyi et al. investigates the evolution of gene expression and genome regulation during the formation of reproductive structures (fruiting bodies) in the Agaricomycetes lineage of Fungi. Transcriptome and multicellularity evolution are very exciting fundamental questions in biology that only become accessible with recent technological developments and the appropriate analysis framework. Important perspectives include understanding how genes acquire new functions and what role plays transcriptional regulation in adaptation. The study gathers a very useful dataset to this end, and relies on generally relevant hypotheses-driven analyses.

      Analysis of fruiting body transcriptome in nine species revealed that prediction from the development hourglass model (that young genes are expressed in early and late but not intermediate phases of development) verified only in a few species, including Pleurotus ostreatus. An allele-specific expression (ASE) analysis in P. ostreatus showed that young genes frequently show ASE during fruiting body development. A comparative analysis with C. neoformans, which reproduces sexually without forming fruiting body, indicates that young and old (but not intermediate) genes are likely involved specifically in fruiting body morphogenesis. A number of underlying hypothesis could be presented better, and importantly the connection between the various analyses did not appear obvious to me. Some hypotheses and reasoning therefore need clarification. Some important results from the analyses are not provided and not commented, although they are required to fully meet the manuscript's objectives.

      We appreciate the Reviewer’s suggestions and have revised the ms as explained below.

      1. I do not clearly see the connection between the developmental hourglass model studied in the first part of the ms, and the allele-specific expression patterns in the second half of the ms. Which "phase" of the hourglass is expected to contain true CM-related genes (by contrast to general sexual processes)? Was P. ostreatus chosen for the ASE analysis because evidence for a developmental hourglass pattern was detected in this species? The conclusion that "evolutionary age predicts, to a large extent, the behaviour of a gene in the CM transcriptome" was established thanks to ASE in P. ostreatus, which was also found to be rather an exception for conforming to the hourglass model of developmental evolution. To what extent is this conclusion transferable to other Agaricomycete/fungal species?

      We chose P. ostreatus because this is the only species for which the genomes of both parental strains (PC9 and PC15) are available. Although the hourglass concept is indeed a central hypothesis in animal developmental biology (though see recent critiques some (Piasecka et al 2013), our results suggest that it simply does not generally apply to fungal development. This may be due to the unique developmental mechanisms of fungi, or the independent origin(s) of CM in fungi. Our ms might have been misleading in this respect, in the revision we clarify that the ASE and hourglass analyses are independent of each other. Our interpretation of the hourglass results is that this model is not or hardly applicable for fungal development and the fact that P. ostreatus was the only species that in fact showed an hourglass did not drive our selection of this species. We inserted a note on this in the ms.

      1. The authors acknowledge that fruiting body-expressed genes may relate either to CM or to more general sexual functions, and that disentangling these functions is a major challenge in their study. An overview of which gene was assigned to which function is not explicit in the ms (proposed to be described in a separate publication). Do these functional gene classes show distinct transcriptome evolution patterns (hourglass model, ASE...)?

      We made accessible the complete list of CM-related genes and genes with more general sexual functions in Table S2/b-c. Due to length restrictions, we do not discuss many or each of these genes here, but provided gene ontology-based overviews (Fig 8/c-d, from lines 631). To answer the question whether CM vs shared genes show distinct transcriptomic patterns, we analyzed ASE, NATs and the hourglass model separately for CM-specific and shared genes. as follows:

      -hourglass: We calculated and visualised the TAI for CM-specific and Shared gene sets of P. ostreatus separately. The average value of TAI decreased a lot in Shared genes possibly due to the overrepresentation of ancient genes here, but the patterns remained similar to the original, which imply that not simply one or the other gene set drives these patterns (Fig A3).

      *Fig A3: Transcriptome Age Index for CM-specific and Shared gene sets of P. ostreatus separately*

      -ASE: As we detailed in the ms, allele specific expression occurs mainly in young genes. Indeed, only 13.1% of ASE genes belong to the conserved gene sets (CMspecific: 200 and Shared: 144). Although there are more ASE genes (>2FC) among CM-specific genes, they are still underrepresented compared to young genes that are neither shared, nor CM-specific. This indicates that ASE is generally a feature of non-conserved genes and is not particularly characteristic for either conserved or CM-specific genes.

      -NAT: We found that 17.3% of CM-specific (141 genes) and 18.3% of Shared genes (165 genes) overlap with antisense transcripts. Since these numbers don't differ substantially from 17.6%, which is the proportion of NATs corresponding to all protein coding genes, it implies an independent occurrence between NATs and these gene conservation groups.

      3.) As far as I understand, major functions of the fruiting body transcriptome are either CM or general sexual functions. Could these genes, notably those showing ASE, play a role in general processes other than sexual development (hyphal growth, environment sensing, cell homeostasis, pathogenicity)?

      Certainly, ASE might also occur in genes related to these processes. However, the processes mentioned by the Reviewer are likely associated with very conserved genes (except pathogenicity, which we can’t examine here) and our results suggest that ASE is more typical of young genes that are under weak selection. We detected ASE in 931/343 (S2/S4 genes) genes expressed in the vegetative mycelium stage of P. ostreatus. We also note that by the definition of developmentally regulated genes, we do not expect very basic fungal processes, like hyphal growth to be among the functions of the genes we identified. Genes related to such basic (housekeeping) processes usually (exceptions exist) show flat expression profiles (because they are equally important in mycelia and all fruiting body stages) and will not be picked up by our pipelines for identifying shared developmentally regulated genes.

      1. As stated by the authors, "the goal of this study was to systematically tease apart the components and driving forces of transcriptome evolution in CM fungi". What drives the interesting ASE pattern discovered however remains an open question at the end of the ms. The authors appropriately discuss that these patterns could be either adaptive or neutral but there is no direct evidence for any scenario in P. ostreatus. Is the expression of (some of) the young genes showing ASE required for CM? one or two case studies would allow providing support for such scenarios.

      We respectfully disagree. We provide evidence that the driving force of ASE is promoter divergence (and consequently differential transcription factor binding) in genes in which it is tolerated (see conclusions, lines 708-712). Our results suggest that ASE is mostly a neutrally arising phenomenon. To get to the mechanistic bases of how promoter divergence can cause ASE (following the suggestion of Reviewer 1), we analysed putative, inferred transcription factor binding motifs in P. ostreatus and found that ASE genes had more divergence in putative TF binding sites. However, it is important to emphasise that all TF motifs we analyzed are inferred motifs and therefore these results are indicative at best.

      Reviewer #4 (Public Review):

      This work develops a comparative framework to test genes which support complex morphological structures and complex multicelluarity. This expands beyond simple gene sharing and phylogenomics by incorporating comparison of gene expression profiling of development of multicellular structures during sexual reproduction. This approach tests the hypothesis that genes underlying sexual reproductive structure formation are homologous and the molecular evolutionary processes that control transcriptome evolution which underlie complex multicellularity.

      The approaches used are appropriate and employ modern comparative and transcriptome analyses to example allele specific expression, and evaluate an age of the evolutionary ages of genes. This work produced additional new RNAseq to examine developmental processes and combined it with existing published data to contrast fungi with either complex morphologies or yeast forms.

      The strengths of work are well selected comparison organisms and efforts to have developmental stages which are appropriate comparisons.

      We appreciate the Reviewer’s positive comments.

      Weakness could be pointed to in how the NAT descriptions are interesting it isn't clear how they link directly to morphology variation or development. I am unclear if these are arising from new de novo promotors, are ferried by transposable elements, or if any other understanding of their genesis indicates they are more than very recent gains in a species for the most part and not part of any conserved developmental process (outside a few exemplars).

      Originally, we assayed natural antisense transcripts (NAT) based on the assumption that they regulate developmental processes (e.g. Kim et al 2018 https://doi.org/10.1128/mBio.01292-18). Our analyses showed that although NATs are abundant in CM transcriptomes of fungi, they show no homology across species and so are unlikely to drive conserved developmental processes, which we are after in this ms. Rather, our data are compatible with most (but likely not all) NATs being transcriptional noise, arising from novel or random promoters. We therefore shortened this section and moved much of it to the Appendix 1.

      The impact of this work will reside in how gene age intersects with variability and relative importance in CM. it will be interesting to see future work examine the functions of these genes and test how allele specific expression and specific alleles are contributing to the formation of these tissues and growth forms. I am still not sure if molecular mechanisms of how high variability in gene expression is still producing relatively uniform morphologies, or if it isn't quantification of morphological variation would be nice to link to whether ASE underlie that.

      We agree that allele specific expression could influence morphologies significantly, but investigating that is beyond the scope of the current work (it would require a population genomics project). More direct evidence on allelic differences can be seen in monokaryon phenotypes, which only express one of the parental alleles. Phenotypic differences are obvious in the mycelium of the two parental monokaryons : the mycelium of PC9 is more fluffy and grows faster than that of PC15. This was reported recently by Lee et al 2021 (https://doi.org/10.1093/g3journal/jkaa008). We agree with the Reviewer that this is a very exciting future research direction.

      To my read of the work, the authors achieved their goals and confirmed hypothesis about the age of genes and the variability of gene expression. I still feel there is some clarity lost in whether the findings across the large number of species compared here help inform predictions or classifications of types of genes which either have ASE or are implicated in CM. This is really work for the future as the authors have provided a detailed analysis and approach that can fuel further direction in this research area.

      To address this issue we reworked the ms to make connections between ASE and CM clearer. Because ASE appears based on our results to (mostly) arise neutrally, predictions for other species are expected to be hard. On the other hand, we think we can make confident predictions on what types of genes are implicated in CM in other species, at least for conserved aspects of fruiting body development.

    1. Author Response:

      Reviewer #2 (Public Review):

      [...] The strength of such an impressive and comprehensive analysis of large collection of nanobodies lies in the comparison with existing nanobodies. To fully benefit from the publication of this latest collection of nanobodies, the authors should publish all the sequences and have to make the best efforts to provide comparisons with existing nanobodies described in the literature:

      -Values obtained for neutralization potency differ substantially between different techniques and labs. A good reference point for neutralization data is the use of ACE2-Fc, which is commercially available and widely used in earlier publications. It is hard to compare the described nanobodies with the control nanobodies from the literature mentioned (Wrapp et al., Xiang et al.), as they are not identified in detail. Differences between the potency described in the original description and the values determined here are not discussed.

      -Epitope mapping defines a new list of epitope groups, although similar efforts had been undertaken earlier. It would make the comprehensive list of SARS-COV-2 nanobodies even more helpful if information about representative existing nanobodies targeting the same epitopes is included throughout the paper (in particular taking advantage of determined nanobody-Spike structures). Comparison to nanobodies with structural information would permit more meaningful predictions with regards to the mechanism of action and help explain the synergistic behavior observed.

      -The manuscript substantially refers to previous antibody publications (with important contributions from the institution of the last author). Yet, important earlier publications of SARS-CoV-2 nanobody are barely mentioned/discussed. Many of these publications defined structures and epitopes, contributed to an understanding of spike activation, and determined modes of neutralization. These publications should be appropriately acknowledged.

      A comprehensive review of this topic would certainly be extremely valuable, but is beyond the scope of this work. However, we have added additional nanobody citations as requested where appropriate throughout the text.

    1. Author Response:

      Reviewer #1:

      The manuscript entitled, "Early evolution of beetles regulated by the end-Permian deforestation" by Zhao et al. is a strong, interesting, and well-written study worthy of publication after revision.

      The authors met their goal of documenting and analyzing the diversity of Paleozoic beetle taxonomy, morphological disparity, ecosystem roles, and phylogeny. This, in my opinion, is the strongest portion of the paper as it brings several lines of evidence to show the high diversity of xylophagous beetles, up until the EPME, followed by a distinct extinction of xylophagous beetles and the expansion of ecological roles into a more modern component of beetles.

      A distinct weakness of the paper is the reliance of correlation between biochemical cycling and the evolution of beetles. To address this, we would ideally see isotopic data associated with these statements. My overall suggestion is to make clear that this is speculative and bring other hypotheses to the table, and hopefully rule them out. It isn't very helpful to say something like xylophagous beetles were the main source of nutrient cycling in the Permian without discussing fungus at greater length. Or similarly, implying a drop in O2 was caused by beetles, without describing any of the other biotic/abiotic things going on at that time.

      We really appreciate those insightful comments, and completely agreed that the correlation between carbon cycling and Permian beetles is speculative. We followed the reviewer’s suggestion and toned down the discussion about this correlation. The role of ancient insects in deep-time forest carbon cycle is unclear, partly because the contribution of extant insects to the decomposition of deadwood is poorly understood. Fortunately, a paper published last month reveals the functional importance of insects in the decomposition of deadwood and the forest carbon cycle (Seibold et al., 2021), and thus provide a further support for our conclusion. We added this reference to our paper. Moreover, regarding the Permian biochemical cycling change, we also have added an introduction about other two hypotheses (reduction in the extent of coal swamps and the evolution of lignin-consuming fungi) and ruled them out in the Discussion.

      Please see lines 237–249:

      “The oxygen concentration of the atmosphere began to rise in the early Palaeozoic, probably with a peak in the Carboniferous and large decline from the beginning of the Permian (Dahl et al., 2010; Berner, 2009; Krause et al, 2018). The reason for this plunge was attributed to a tectonic- or climate-driven reduction in the extent of coal swamps (Berner and Canfield, 1989) or to the evolution of lignin-consuming fungi (Floudas et al., 2012). However, global recoverable coal is only equivalent to a few percent of the oxygen budget in the atmosphere, and thus cannot account for the large drop of atmospheric oxygen (Nelsen et al., 2016). Furthermore, lignin-consuming fungi may have been present before the Carboniferous (Nelsen et al., 2016). Recently, a new geochemical model proposed that the development of Permian terrestrial herbivores may have limited transport and long-term burial of terrestrial organic compounds in marine sediments, resulting in less organic carbon burial and attendant declines in atmospheric oxygen (Laakso et al., 2020).”

      Please see lines 261–264:

      “In extant forest ecosystems, insects may account for 29 percent of the total carbon flux from deadwood and thus they have a functional importance in the decomposition of deadwood and the carbon cycle (Seibold et al., 2021).”

      Please see lines 266–269:

      “Permian beetles had probably evolved close interactions with various microorganisms especially lignin-consuming fungi (Nelsen et al., 2016), which also accelerated the decomposition of deadwood.”

      We have added 7 references.

      Berner RA. 2009. Phanerozoic atmospheric oxygen: new results using the GEOCARBSULF model. American Journal of Science 309: 603–606.

      Dahl TW, Hammarlund EU, Anbar AD, Bond DPG, Gill BC, Gordon GW, Knoll AH, Nielsen AT, Schovsbo NH, and Canfield DE. 2010. Devonian rise in atmospheric oxygen correlated to the radiations of terrestrial plants and large predatory fish. PNAS 107: 17911–17915.

      Krause AJ, Mills BJW, Zhang S, Planavsky NJ, Lenton TM, Poulton SW. 2018. Stepwise oxygenation of the Paleozoic atmosphere. Nature Communications 9: 4081.

      Berner RA, Canfield DE. 1989. A new model for atmospheric oxygen over Phanerozoic time. American Journal of Science 289: 333–361.

      Nelsen MP, DiMichele WA, Peters SE, Boyce CK. 2016. Delayed fungal evolution did not cause the Paleozoic peak in coal production. PNAS 113: 2442–2447.

      Floudas TD, Binder M, Riley R, Barry K, Blanchette RA, Henrissat B, Martínez AT, Otillar R, Spatafora JW, Yadav JS, Aerts A, Benoit I, Boyd A, Carlson A, Copeland A, Coutinho PM, Vries RPD, Ferreira P, Findley K, Foster B, Gaskell J, Glotzer D, Górecki P, Heitman J, Hesse C, Hori C, Igarashi K, Jurgens JA, Kallen N, Kersten P, Kohler A, Kües U, Kumar TKA, Kuo A, Labutti K, Larrondo LF, Lindquist E, Ling A, Lombard V, Lucas S, Lundell T, Martin R, Mclaughlin DJ, Morgenstern I, Morin E, Murat C, Nagy LG, Nolan M, Ohm RA, Patyshakuliyeva A, Rokas A, Ruiz-Dueñas FJ, Sabat G, Salamov A, Samejima M, Schmutz J, Slot JC, John FSt, Stenlid J, Sun H, Sun S, Syed K, Tsang A, Wiebenga A, Young D, Pisabarro A, Eastwood DC, Martin F, Cullen D, Grigoriev IV, Hibbett DS. 2012. The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science 336: 1715–1719.

      Seibold S, Rammer W, Hothorn T, Seidl R, Ulyshen MD, Lorz J, Cadotte MW, Lindenmayer DB, Adhikari YP, Aragón R, Bae S, Baldrian P, Varandi HB, Barlow J, Bässler C, Beauchêne J, Berenguer E, Bergamin RS, Birkemoe T, Boros G, Brandl R, Brustel H, Burton PJ, Cakpo-Tossou YT, Castro J, Cateau E, Cobb TP, Farwig N, Fernández RD, Firn J, Gan KS, González G, Gossner MM, Habel JC, Hébert C, Heibl C, Heikkala O, Hemp A, Hemp C, Hjältén J, Hotes S, Kouki J, Lachat T, Liu J, Liu Y, Luo YH, Macandog DM, Martina PE, Mukul SA, Nachin B, Nisbet K, O’Halloran J, Oxbrough A, Pandey JN, Pavlíček T, Pawson SM, Rakotondranary JS, Ramanamanjato JB, Rossi L, Schmidl J, Schulze M, Seaton S, Stone MJ, Stork NE, Suran B, Thygeson AS, Thorn S, Thyagarajan G, Wardlaw TJ, Weisser WW, Yoon S, Zhang NL, Müller J. 2021. The contribution of insects to global forest deadwood decomposition. Nature 597: 77–81.

      Below are some more detailed suggestions:

      -It would be helpful to address the evolution of lignin-consuming fungi. Whether or not you can tie fungal symbiosis into the evolution of these beetles, fungal decomposition may (or may not) have accelerated in the Early Permian due to the timeline of particular clades of fungi. Worth a quick sentence or two. See relevant references below.

      Nelsen, M.P., DiMichele, W.A., Peters, S.E. and Boyce, C.K., 2016. Delayed fungal evolution did not cause the Paleozoic peak in coal production. Proceedings of the National Academy of Sciences, 113(9), pp.2442-2447.

      Floudas D, et al. (2012) The Paleozoic origin of enzymatic lignin decomposition reconstructed from 31 fungal genomes. Science 336(6089):1715-1719.Abstract/FREE Full TextGoogle Scholar

      Kohler A, et al., Mycorrhizal Genomics Initiative Consortium (2015) Convergent losses of decay mechanisms and rapid turnover of symbiosis genes in mycorrhizal mutualists. Nat Genet 47(4):410-415.CrossRefPubMedGoogle Scholar

      Thank you very much for pointing out this issue. We complete agreed with the reviewer that Permian beetles had probably evolved close interactions with fungi, which also accelerated the decomposition of deadwood. We have revised the text and added the references based on the reviewer’s comments and suggestions. Please see comment 1.

      For the concluding paragraph in the Discussion, there is no acknowledgment to modern studies of xylophagous beetles in relation to climate change. There are many studies of the effects on climate change and xylophagous beetles, ex the North American pine bark beetles. Might be worth saying that the diversity and abundance of xylophagous beetles are extremely sensitive to climate change and can cause forest collapse too.

      Thank you. We have added a sentence and two new references about extant xylophagous beetles and climate change in the concluding paragraph. Please see lines 328–330:

      “In particular, the diversity and abundance of xylophagous beetles are extremely sensitive to climate change and can also cause forest collapse and carbon cycle disturbance (Kurz et al., 2008; Fei et al., 2019; Šamonil et al., 2020).”

      Kurz WA, Dymond CC, Stinson G, Rampley GJ, Neilson ET, Carroll AL, Ebata T, Safranyik L. 2008. Mountain pine beetle and forest carbon feedback to climate change. Nature 452: 987–990.

      Šamonil P, Phillips JD, Pawlik Ł. 2020. Indirect biogeomorphic and soil evolutionary effects of spruce bark beetle. Global and Planetary Change 195: 103317.

    1. Author Response:

      Reviewer #1:

      This study aims to find the genetic mechanisms underlying sex-ratio distortion through male-killing in Drosophila melanogaster flies infected with the endosymbiont Wolbachia. The endosymbiont carries the prophage WO, which is in the center of interested in this study. The key result of this study is that a synonymous mutation in a prophage gene can explain the differences between sex-ratio distorting and not distorting symbionts. The study uses transgene technology to modify phage genes and to investigate which changes in the gene is involved in the phenotype. The finding, that a synonymous SNP plays a key role is not entirely novel in biology, but there are only few examples known of this type of genotype - phenotype associations. The study does not include experiments to show that the main finding is not limited to one particular background of the fly line used. An experiment including multiple genotypes would be needed to show this.

      We agree that recapitulating the results in other backgrounds is intriguing and important for establishing a broader role of these findings. We thank the Reviewers and Editor for allowing us to pursue this line of investigation separately from this work, and we now discuss what experiments can be completed to answer these and other questions. We also edited the manuscript to tone down any conclusions that would imply generalizability of the findings at this point. For example:

      "For example, we cannot conclude that the particular codon tested here is responsible for phenotype alterations in other host genetic backgrounds or species. It is possible that this codon plays a functional role only in a singular host genetic context. Here, we changed wmk sequences while holding the host genetic background fixed, but the reverse is required to conclude whether or not the particular codon plays a general role in other genotypes or natural contexts. Second, due to possible coevolution, various codons may or may not yield similar functional effects across different host backgrounds, and additional synonymous sites may contribute to the male-killing phenotype. Thus, the results here illuminate a previously unrecognized need for future research on the functional impacts of synonymous substitutions in endosymbionts. Future work may focus on determining if there is one specific synonymous codon that affects the male-killing function in all cases, if a more general feature exists where alteration of any or a subset of N-terminal or other wmk codons affects function, or if the effect of synonymous changes is specific to this background.”

      Text summarizing the 06/21/2021 query to the Editor and Reviewers for further clarification: We believe there are several reasons why the results can stand on their own, while appropriately acknowledging caveats. First, we note the lack of genetic background testing on previous transgene experiments driving the major discoveries of Wolbachia genes involved in reproductive parasitism. This requirement would therefore hold the current work to a novel bar not previously applied by the field. In addition, the genetic background here is the same as used in previous work on these phenotypes, making it the most pertinent to test and inform previous and ongoing studies by many research groups. Second, the results shown here would still stand no matter the results of genetic background testing and would demonstrate that it is possible for synonymous changes to have functional relevance in the transgenic wmk phenotype. The major findings are still novel in the field, relevant to ongoing studies of reproductive parasitism, and informative regarding one of the most common genetic backgrounds. Finally, we note that two different lines with unique synonymous codon changes (the final experiment) independently created the same result that a synonymous codon change ablates phenotype, providing additional robustness to our findings. Doing additional experiments would be logistically difficult. Barriers include the relocation of the first author of the work to another lab for a postdoctoral position, completion of the funding for the project, remaining institutional COVID-19 restrictions, and lack of replacement personnel in the lab to continue the work. Notably, there is also the non-trivial requirement to create and test new transgene lines that would be costly and take nearly a year to complete (the experiments in the manuscript already took several years and the new fly lines would cost thousands to make).

      The study is mostly clear and easy to follow, but requires a lot of attention. The authors choose to build up the story as I guess it was carried out in the lab. Thus, the reader is guided through every step of the process. While I see that this is appealing from the way the study was carried out, it results in a very long manuscript with a lot of material that would be much better placed in a supplement.

      We thank the reviewer for pointing this out. We shortened the manuscript by removing redundant information and transferring some parts of the results to the supplement. We also removed about three pages of text from the discussion (before adding in new sections as requested by reviewers).

      The introduction seems unfocused. It meanders around, jumping from topic to topic and does not give the reader a sense of where things will go.

      We added a few topics into the Introduction as recommended in other comments, and we edited various portions of the Introduction to connect the ideas together more clearly. We hope the changes are now satisfactory, and we are of course happy to consider further feedback.

      Fig. 1 gives an overview about the different aspects addressed here, but it is not used to guide the reader through the different lines of thought addressed in the introduction. If Fig. 1 will stay (I actually think it is not needed) it should be introduced earlier and used as a road map for the paper. Alternatively, the introduction could stay more general and only in the last paragraph the different ways the system is studied will be summarized.

      We edited the final paragraph of the Introduction to more comprehensively cover the content of the figure and full direction of the paper. For readers not familiar with the biological system or questions, we believe this figure will serve as a gateway to the genetic alterations conducted in the experiments.

      Along these lines, it would be good to have a better reasoning for the combination of experiments conducted. It is left to the reader to understand why certain types of experiments have been done.

      It was not clear to us at the outset of these experiments what results would ultimately emerge and what follow-up experiments would be necessary as our initial hypotheses were proven wrong with many of the surprises from the work. So, there was no a priori reasoning for why experiments were done until we had the results of the previous experiments. We agree that this makes the reading a bit confusing. As such, we clarified the logic flow in the results section as the narrative progresses from experiment to experiment, and we reorganized some of the introduction to improve transition statements and offer a roadmap to readers earlier on.

      On the other hand, the introduction misses a section on the biology of the phage and its interaction with the host(s). It is hard to understand the biology of the system without getting an understanding of the insect - Wolbachia - phage interactions. For non-specialist, understanding the role of the three players is essential for the system.

      Thank you for the suggestion. We now add a section introducing phage WO and its relevance to the phenotypes tested here.

      “The wmk gene and two cytoplasmic incompatibility factor (cif) genes that underlie cytoplasmic incompatibility (a parasitism phenotype whereby offspring die in crosses between infected males and uninfected females) occur in the eukaryotic association module (EAM) of prophage WO, which refers to the phage WO genome that is inserted into the bacterial chromosome. The EAM is common in WO phages across several Wolbachia strains and is rich in genes that are homologous to eukaryotic genes or annotated with eukaryotic functions. As such, the expression of reproductive parasitism genes from the EAM and tripartite interactions between phage WO, Wolbachia, and eukaryotic hosts are central to Wolbachia’s ability to interact with and modify host reproduction.”

      The result section could be easily shortened by focusing on the essential experiments. Experiments that do not contribute to the final result can go into the supplement.

      We removed redundant sentences and made some figures supplemental.

      Also the discussion is much too long. I suggest to reduce it to half and focus on the important points and the take-home messages. Currently the discussion follows the way the results are presented in the result section. However, this is not needed. The important finding should be discussed first. Findings that are important in the development of the project, may not be important for the biology of the system overall. And they may not be important for the reader.

      We reordered the discussion to cover the biggest findings first, and removed about a third of the original writing in the discussion.

      Reviewer #2:

      This study aims to unravel the genomic basis to wmk-induced male killing by transgenically expressing homologs of varying relatedness, with synonymous nucleotide changes, and predicted alternative start codons in D. melanogaster flies. The study builds on previous work showing that expression of wmk in fly embryos recapitulates several aspects of male killing. While more distantly related homologs did not induce male killing when expressed in D. melanogaster, more closely related wmk homologs induce either killing of both sexes or male killing only. However, the male-killing phenotype was not due to amino acid differences, but associated with RNA structural differences of the different wmk homologs. In addition, only one synonymous nucleotide change was sufficient to ablate the killing phenotype. These findings suggests that minor and even silent nucleotide differences impact on the expression of male killing in D. melanogaster. It is concluded that a new model incorporating the impacts of RNA structure and post-transcriptional processes in wmk-induced male killing needs to be developed.

      The strength of the study lies in the systematic and carefully controlled approach to quantify the phenotypic effects of both sequence and structural changes to various wmk homologs for inducing the male-killing phenotype. Detailed dissection of the phenotypic impact of minor changes to the wmk homologs including sequence variation, silent nucleotide changes, and RNA structural differences was quantified. This approach reveals a complex genotype-phenotype relationship, but highlights the importance of including post-translational processes. The data is novel in that previous work have largely ignored structural changes and assumed that synonymous differences in codons has no effect on protein function, whereas the current study based on updated codon optimization algorithms reveal that this assumption is incorrect. The finding highlights the importance of considering also structural genetic variation for phenotypic expression differences. This suggestion is further corroborated by the lack of difference in wmk homologue expression levels, indicating that the functional differences are due to post-translational effects.

      We thank the reviewer for the thoughtful comments.

      There are limitations to the findings of this complex genotype-phenotype relationship. The current study only examined the phenotypic impact by expressing the different homologs in one D. melanogaster genetic background. Given the variability of the phenotypic pattern revealed based on minor changes to the wmk homologs, it will be critical to repeat some of the main findings in other D. melanogaster genotypes to determine the importance of the variation in the wmk homologs more generally. It is entirely plausible that the observed changes in the effect and strength of killing is due to an interaction between host and wmk genotype. This has implications for unravelling the underlying genetic basis to the male-killing phenotype more widely. It is as yet to be demonstrated whether wmk is involved in male killing in natural population, and to what extent there are shared patterns and mechanisms of male killing induced by other bacterial endosymbionts such as Spiroplasma.

      We addressed this point in more detail above in the first response to the comments from Reviewer 1.

    1. Author Response:

      Reviewer #1 (Public Review):

      Garcia-Souto, Bruzos, and Diaz et al. analyzed hemic neoplasia in warty venus clams at multiple sites throughout Europe. They identified cases of disease in two locations, in Galicia and in the Mediterranean. They then use Illumina sequencing to discover that the samples with cancer DNA had reads which mapped to the mtDNA reference sequences from a different clam species in the same family, suggesting a cross-species transmissible cancer. By mapping reads to both the V. verrucosa and C. gallina mitogenomes they showed that more reads mapped to C. gallina in cancer samples compared to matched host tissue samples, and this was consistent across the whole mitogenome. Phylogenetic analysis of mtDNA genes of the host and cancer samples as well as identification of SNVs at a short region of one single-copy nuclear locus suggest that all cancer samples come from a single C. gallina transmissible cancer clone. All data agree that a single lineage of cancer from C. gallina is responsible for all identified cancers in V. verrucosa.

      There are a few sections where there are either unclear methods or the methods do not quite match the descriptions of the results.

      1. Regarding mapping of reads to different reference Cox1 sequences (for Figure 2a): "Then, we mapped the paired-end reads onto a dataset containing non-redundant mitochondrial Cytochrome C Oxidase subunit 1 (Cox1) gene references from 137 Vererid clam species." I do not see where this is explained anywhere in the methods, where this list of references comes from, or what is in it.

      Answer: We retrieved a dataset of 3,745 sequences comprising all the barcode-identified venerid clam Cox1 fragments available from the Barcode of Life Data System (BOLD, http://www.boldsystemns.org/). Redundancy was removed using CD-HIT (Fu, et al. 2012), applying a cut-off of 0.9 sequence identity, and sequences were trimmed to cover the same region. Whole-genome sequencing data from both healthy and tumoral warty venus clams was mapped onto this dataset, containing 118 venerid species-unique sequences, using BWA-mem, filtering out reads with mapping quality below 60 (-q60) and quantifying the overall coverage for each sequence with samtools idxstats. PCR primers were designed with Primer3 v2.3.7 (Koressaar, et al. 2018) to amplify a fragment of 354 bp from the Cox1 mitochondrial gene of V. verrucosa and C. gallina (F: CCT ATA ATA ATT GGK GGA TTT GG, R: CCT ATA ATA ATT GGK GGA TTT GG). PCR products were purified with ExoSAP-IT and sequenced by Sanger sequencing.

      Action: We have included this new information in the methods section.

      1. Regarding de novo assembly of mitogenomes: "Hence, we employed bioinformatic tools to reconstruct the full mitochondrial DNA (mtDNA) genomes in representative animals from the two species involved....Then, we mapped the paired-end sequencing data from the six neoplastic specimens with evidence of interspecies cancer transmission onto the two reconstructed species-specific mtDNA genomes." In contrast to this, the methods say, "Then, we run MITObim v1.9.1 (Hahn, Bachmann, & Chevreux, 2013) to assemble the full mitochondrial genome of all sequenced samples, using gene baits from the following Cox1 and 16S reference genes to prime the assembly of clam mitochondrial genomes." It is unclear which method was used.

      Answer: In total, we performed whole-genome sequencing on 23 samples from 16 clam specimens, which includes eight neoplastic and eight non-neoplastic animals by Illumina pairedend libraries of 350 bp insert size and reads 150 bp long. First we assembled the mitochondrial genomes of one V. verrucosa (FGVV18_193), one C. gallina (ECCG15_201) and one C. striatula (EVCS14_02) specimens with MITObim v1.9.1 (Hahn, et al. 2013), using gene baits from the 7 following Cox1 and 16S reference genes to prime the assembly of clam mitochondrial genomes: V. verrucosa (Cox1, with GenBank accession number KC429139; and 16S: C429301), C. gallina (Cox1: KY547757, 16S: KY547777) and C. striatula (Cox1: KY547747, 16S: KY547767). These draft sequences were polished twice with Pilon v1.23 (Walker, et al. 2014), and conflictive repetitive fragments from the mitochondrial control region were resolved using long read sequencing with Oxford Nanopore technologies (ONT) on a set of representative samples from each species and tumours. ONT reads were assembled with Miniasm v0.3 (Li 2016) and corrected using Racon v1.3.1 (Vaser, et al. 2017). Protein-coding genes, rDNAs and tRNAs were annotated on the curated mitochondrial genomes using MITOS2 web server (Bernt, et al. 2013), and manually curated to fit ORFs as predicted by ORF-FINDER (Rombel, et al. 2002). Then, we employed the entire mitochondrial DNAs of V. verrucosa (FGVV18_193) and C. gallina (ECCG15_201) as “references” to map reads from individuals with neoplasia, filter reads matching either mitogenome and assemble and polish their two (healthy and tumoral) mitogenomes individually as above. Further healthy individuals were later sequenced and their mitogenomes assembled, to further investigate the geographic and taxonomic spread of this neoplasia.

      Action: We have included this information in the methods section (page 21-22), and in the results (pages 7 and 8). mtDNA annotations are now shown in Supplementary Figure 3. Nucleotide data for the mitochondrial DNA assemblies has been uploaded to GenBank under accession numbers MW662590-MW662611 and will be released upon publication or request.

      There is one minor claim which may not be fully supported by the data: the statement that, "The analysis of mitochondrial and nuclear gene sequences revealed no nucleotide divergence between the seven tumours sequenced." If I am understanding the filtering of the SNVs from the nuclear gene correctly, only the presence or absence of the 14 SNVs that were fixed within each of the two species were analyzed. Therefore, it is unclear whether the authors looked for any additional somatic mutations within the cancer lineage that would have occurred at other positions. For mitochondria, the authors state that sequences were "extracted from paired-end sequencing data," but it is not explained how this was done. The data suggest that there are no differences between cancer samples in the 13 coding genes and 2 rDNA genes, but data on possible SNVs in the intergenic regions is not shown.

      Answer: We obtained a preliminary nuclear assembly using short-reads only. Obviously, the resulting assemblies are fragmented and incomplete. This has limited the identification of candidate regions shared by the three genomes (V. verrucosa and both Chamelea clams). Out of the 44 candidate nuclear fragments we tested, only two (DEAH12 and TFHII) turned out to give good PCR products, adequate for Sanger sequencing. As mentioned above, we now provide additional data on a second gene (TFIIH), identified and selected on the same basis as DEAH12. We find 14 and 15 sites, respectively, for the DEAH12 and the TFIIH loci, with fixed SNVs (allele frequency >95%) that allowed to discriminate between the three relevant species (V. verrucosa, C. gallina and C. striatula) and the tumour. These diagnostic nucleotides were then used to filter the reads from individuals with neoplasia harbouring both DNA’s. Variation within the host lineage but not within the tumour was found along the nuclear DNA fragments employen in the ML phylogenies (see figure below).

      Figure. Molecular phylogenies based on the two selected nuclear markers. (a) DEAH12 gene and (b) TFIIH gene, and diagnostic loci discriminating among species and tumour. Bootstrap support values (500 replicates) from ML analyses above 50 are shown above the corresponding branches. Note all diagnostic nucleotides are identical between tumours (black dots).

      Regarding the mtDNA, firstly, we assembled the mitochondrial genomes of one V. verrucosa (FGVV18_193), one C. gallina (ECCG15_201) and one C. striatula (EVCS14_02) specimens with MITObim v1.9.1 (Hahn, et al. 2013). Then, we employed the entire mitochondrial DNAs from V. verrucosa (FGVV18_193) and C. gallina (ECCG15_201) as “references” to map reads from individuals with neoplasia, filter reads matching either mitogenome and assemble and polish their two (healthy and tumoral) mitogenomes individually as above. Further healthy individuals were later sequenced and their mitogenomes assembled, to further investigate the geographic and taxonomic spread of this neoplasia. Despite the usefulness of the mitochondrial control region (CR) to detect differences among lineages, we refrained from using it for two reasons. (1) The CR shows considerable variation in both length and sequence among the three species, making their alignment difficult (in fact, previous phylogenetic studies based on whole mitochondrial DNA sequences in Veneridae excluded the CR: https://doi.org/10.1111/zsc.12454), and (2) the CR contains quasi-but-not-identical tandem repeats, as a other mollusks (i.e., the Venerid Dosinia clams https://doi.org/10.1371/journal.pone.0196466 or the Littorina marine snails https://doi.org/10.1016/j.margen.2016.10.006). In our case, repeats are larger than the short-reads insert size, and even though we could infer them by means of long read sequencing, polishing the resulting consensus sequences to overcome the intrinsic error rate of those lectures would yield inconclusive results, hindering the comparison between normal and tumoral haplotypes.

      Action: We updated the methods for the mitochondrial DNA analyses (pages 21-22, 24) and the nuclear DNA analyses (page 23). We now include new data in the results and discussion (pages 9-10).

      Reviewer #2 (Public Review):

      In rare but well-documented instances, certain types of cancers can transmit horizontally. These transmissible cancers have a clonal origin and have adapted to bypass allorecognition. A form of marine leukemia (hemic neoplasia or HM) belongs to this class of transmissible cancers and has been detected in several bivalve species (oysters, mussels, cockles and clams). Although HM mostly propagates within the same bivalve species, instances of cross-species transmission have been reported. To better understand the mode of transmission of HM, Garcia-Souto et al. analysed mitochondrial DNA (mtDNA) by next generation sequencing in different bivalve species collected in the Mediterranean Sea and the Atlantic Ocean. The authors found that HM isolated in Venus verrucosa contained mtDNA that actually matched Chamelea gallina. Analysis of the nuclear gene DEAH12 also showed single nucleotide polymorphisms (SNPs) matching C. gallina DNA. Based on mtDNA and DEAH12 sequences, the authors use Bayesian inference to generate phylogenetic trees showing that HM found in V. verrucosa is much closer to C. gallina than the host species. They conclude that HM propagated from C. gallina to V. verrucosa.

      Overall, the study is well performed with enough samples analysed. The results are quite convincing but there are also some concerns.

      1. Transmissible cancers are known to split into clades based on mtDNA differential rate of evolution and also to incorporate mtDNA from exogenous sources, so one has to be extra careful that the results prove cross-species transmission and not HM divergence into two clades and/or exogenous acquisition. Samples HM ERVV17-2997 and EMVV18-376, both at the N1 stage, appear devoid of C. gallinae mtDNA and do not appear to have been screened for DEAH12. One explanation for this result is that there are too few HM cells in the samples (but supplementary Figure 1 shows some HM cells in ERVV17-2997. However, a different explanation is that these samples contain V. verrucosae mtDNA. ERVV17-2997 and EMVV18-376 could have been analysed in greater depth to verify that they also contained C. gallinae mtDNA and typical DEAH12 SNPs.

      Answer: Despite the high sequencing coverage obtained for the sequenced individuals, we did not find foreign reads in the N1 tumours (ERVV17-2997 and EMVV18-73) to mitochondrial nor nuclear (i.e., DEAH12, TFHII) level. This is most likely due to a very low proportion of neoplastic cells in their tissues.

      Action: We have added a sentence on page 8 that discuss this issue.

      1. To strengthen their argument, the authors could have analysed a few more nuclear genes for specific SNPs, although the sensitivity of this approach will depend on the depth of sequencing.

      Answer: We obtained a preliminary nuclear assembly using short-reads only. Obviously, the resulting assemblies are fragmented and incomplete. This has limited the identification of candidate regions shared by the three genomes (V. verrucosa and both Chamelea clams). Out of the 44 candidate nuclear fragments we tested, only two (DEAH12 and TFHII) turned out to give good PCR products, adequate for Sanger sequencing. As mentioned above, we now provide additional data on a second gene (TFIIH), identified and selected on the same basis as DEAH12. Individual ML phylogenies for these two fragments evidenced that tumours cluster together and separately from the host species and, in the case of DEAH12, closer to C. gallina. The MSC phylogeny was rebuilt including this new nuclear fragment. 12 In addition, we conducted a comparative screening of tandem repeats on the genomes of C. gallina and V. verrucosa. Two DNA satellites, namely CL4 and CL17, of, respectively, 332 and 429 bp monomer size, were very abundant in C. gallina and in the tumoral animals, but absent from all healthy V. verrucosa specimens. FISH probes designed for these satellites mapped on the heterochromatic regions, mainly in subcentromeric and subtelomeric positions, of both C. gallina and the neoplastic metaphases found in V. verrucosa, but were absent from the normal metaphases of the host species V. verrucosa. These results were consistent with the genomic abundance of these satellites in the NGS data and strongly suggest that these chromosomes derive from C. gallina.

      Action: We include the analysis of one additional nuclear locus, TFIIH (pages 9-10). We have obtained new ML and MSC phylogenies including this new locus (pages 9-10, figures 3b-c). Additional FISH approach looking for satellite DNA CL4 and CL11 was performed (page 10, figure 3d, supplementary figure 5). The methods section has been updated accordingly (pages 20- 21, 23-24).

      1. It would have been interesting to have more information in the Discussion on the potential immunological barriers that this tumour needs to overcome for cross-species transmission.

      Answer: At a glance, we could argue/discuss that this transmissibility, inside or cross-species, is prone to occur in bivalves due to their filtering feeding system and the fact that their immune system is not entirely developed and yet to be completely understood, as the reviewer may know. Also, it would be tempting to suggest that some genetic restrictions allowing for cancer contagion happening only between close taxa might be in place, but, unfortunately we do not have the means to state that with our current data.

      Action: At this point, no specific action has been taken for this query. However, we are happy to include something in the discussion if the reviewer still thinks this is relevant for improving the manuscript.

    1. Author Response:

      Reviewer #3 (Public Review):

      INaR is related to an alternative inactivation mode of voltage activated sodium channels. It was suggested that an intracellular charged particle blocks the sodium channel alpha subunit from the intracellular space in addition to the canonical fast inactivation pathway. Putative particles revealed were sodium channel beta4 subunit and Fibroblast growth factor 14. However, abolishing the expression of neither protein does eliminate INaR. Therefore as recently suggested by several authors it is conceivable that INaR is not mediated by a particle driven mechanism at all. Instead, these and other proteins might bind to the pore forming alpha subunit and endow it with an alternative inactivation pathway as envisioned in this paper by the authors.

      The main experimental findings were (1) The amplitude of INaR is independent of the voltage of the preceding step. (2) The peak amplitudes of INaR are dependent on the time of the depolarizing step but independent of the sodium driving force. (3) INaT and INaR are differential sensitive to recovery from inactivation. According to their experimental data the authors put forward a kinetic scheme that was fitted to their voltage-clamp patch-clamp recordings of freshly isolated Purkinje cells. The kinetic model proposed here has one open state and three inactivated states, two states related to fast inactivation (IF1, IF2) and one state related to a slower process (IS). Notably IS and IF are not linked directly in the kinetic scheme.

      In my humble opinion, the proposed kinetic model fails to explain important experimental aspects and falls short to be related to the molecular machinery of sodium channels as outlined below. Still it is due time to advance the concepts of INaR. The new experimental findings of the authors are important in this respect and some ideas of the new model might be integrated in future kinetics schemes. In addition, the framework of INaR is not easy to get hold on with lots of experimental findings in the literature. Likely, my review falls also short in some aspects. Discussion is much needed and appreciated.

      INaT & INaR decay The authors stated that decay speed of INaT and INaR is different and hence different mechanisms are involved. However at a given voltage (-45 mV) they have nicely illustrated (Fig. 2D and in the simulation Fig. 3H) that this is not the case. This statement is also not compatible with the used Markov model. That is because (at a given voltage) the decay of both current identities proceed from the same open state. Apparent inactivation time constants might be different, though, due to the transition to the on state.

      We apologize that the language used was confusing. Our suggestion that there is more than one pathway for inactivation (from an open/conducting state) is the observation that the decay of INaT being biexponential at steady-state voltages. In the revised manuscript, we point out (lines 546-549) that, at some voltages, the slower of the two decay time constants (of INaT) is identical to the time constant of INaR decay. We also discuss how this observation was previously (Raman and Bean, 2001) interpreted.

      Accumulation in the IS state after INaT inactivation in IF1 and IF2 has to proceed through closed states. How is this compatible with current NaV models? The authors have addressed this issue in the discussion. The arguments they have brought forward are not convincing for me since toxins and mutations are grossly impairing channel function.

      Thank you for this comment. We would like to point out that, in our Markov model, Nav channels may accumulate in IS through either the closed state or open state. This requires, of course, that Nav channels can recover from inactivation prior to deactivation. While we agree that toxins and mutations can grossly impair channel function, we think these studies remain crucial in revealing the potential gating mechanisms of Nav channel pore-forming subunits, and how these mechanisms may vary across cell types that express different combinations of accessory proteins.

      Fast inactivation - parallel inactivation pathways Related to the comment above the motivation to introduce a second fast-inactivated state IF2 is not clear. Using three states for inactivation would imply three inactivation time constants (O->IF1, IF1->IF2, O->IS) which are indeed partially visible in the simulation (Fig. 3). However, experimental data of INaT inactivation seldom require more than one time constant for fast inactivation. Importantly the authors do not provide data on INaT inactivation of the model in Fig. 3. Fast Inactivation is mapped to the binding of the IFM particle. In this model at slightly negative potential IF1 and IF2 reverse from absorbing states to dissipating states. How is this compatible with the IFM mechanism? Additionally, the statements in the discussion are not helpful, either a second time constants is required for IF (two distinct states, with two time constants) or not.

      We thank this Reviewer for this comment. We tried to developed the model based on previous data on Nav channel inactivation. Indeed, much experimental data exists for the fast inactivation pathway (O -> IF1). As we noted in the discussion, without the inclusion of the IF2 state, we were unable to fully reproduce our experimental data, which led us to add the IF2 state. As with all model development, we balanced the need to faithfully reproduce the experimental data with efforts to limit the complexity of the model structure. In addition, as noted in the Methods section, our routine is an automatic parameter optimization routine that seeks to minimize the error between simulation and experiments. We can never be sure that we have found an absolute minimum, or that the optimization got stuck at a local minimum when simulating without inclusion of IF2. In other words, there may be a parameter set that sufficiently fits the data without inclusion of IF2, but we were unable to find it. As a safeguard against local minima, we used multistarts of the optimization routine with different initial parameter sets. In each case, we were unable to find a sufficiently acceptable parameter set.

      We agree with this Reviewer that at slightly negative potentials (compared to strong depolarizations), channels exit the IF1 state at different rates, although we would point out that channels dissipate from the IF1 state (accumulating into IS1) under both conditions (see Figure 8B-C). This requires the binding and unbinding of the IFM motif to occur with some voltagesensitivity. We believe this to be a possibility in light of evidence that suggests IFM binding (and fast-inactivation) is an allosteric effect (Yan et al., 2017) and evidence showing that mutations in the pore-lining S6 segments can give rise to shifts of the voltage-dependence of fast inactivation without correlated shifts in the voltage-dependence of activation (Cervenka et al., 2018). However, it remains unclear how voltage-sensing in the Nav channel interact with fast- and slow-inactivation processes.

      Due to space constraints in Figure 3, we did not show a plot of INaT voltage dependence. However, below, please find the experimental data (points), and simulated (line) INaT in our model.

      Differential recovery of INaT & INaR Different kinetics for INaR and INaR are a very interesting finding. In my opinion, this data is not compatible with the proposed Markov model (and the authors do not provide data on the simulation). If INaT1 and INaT2 (Fig. 5 A) have the same amplitude the occupancy of the open state must be the same. I think there is no way to proceed differentially to the open state of INaR in subsequent steps unless e.g. slow inactivated states are introduced.

      Thank you for bringing up this important point. The differential recovery of INaT and INaR indicates there are distinct Nav channel populations underlying the Nav currents in Purkinje neurons. We make this point on lines 632-635 of the revised manuscript. Because our Markov model is used to simulate a single channel population, we do not expect the model to reproduce the results shown in Figure 5. We have now added this point to the Discussion section on lines 637-640.

      Kinetic scheme Comparison with the Raman-Bean model is a bit unfair unless the parameters are fitted to the same dataset used in this study. However, the authors have an important point in stating that this model could not reproduce all aspects of INaR. A more detailed discussion (and maybe analysis) of the states required for the models would be ideal including recent literature (e.g., J Physiol. 2020 Jan;598(2):381-40). Could the Raman-Bean model perform better if an additional inactivated state is introduced? Are alternative connections possible in the proposed model? How ambiguous is the model? Is given my statements above a second open state required? Finally, a better link of the introduced states to NaV structure-function relationship would be beneficial.

      These are all excellent points. We absolutely agree; it was/is not our intention to “prove” that the Raman-Bean model does not fit our dataset (as you mention, with proper refinement of the parameters, some of the data may be well fit). In fact, qualitatively we found the Raman-Bean model quite consistent with our dataset (which is an excellent validation of both the model, and our data). It was our intention to show (in Figure 7) that there is good agreement between the Raman-Bean model and our experimental data for steady state inactivation (C), availability (D), and recovery from inactivation (E). While we find the magnitude of the resurgent current (F) to be markedly different than the Raman-Bean data, we now note this to likely be due to the large differences in the extracellular Na+ concentrations used in voltage-clamp experiments (lines 440-444). Our models, however, specifically differ in our parallel fast and slow inactivation pathways (Figure 7H). As seen in the Raman-Bean model, in response to a prolonged depolarizing holding potential, there is negligible inactivation, as the OB state remains absorbent until the channel is repolarized. This is primarily because the channel must transit through the Open state on repolarization. We find distinctly different behavior in our data. As seen in the experimental data shown in 7H, despite a prolonged depolarization, Nav channels begin to inactivate and accumulate in the slow inactivated state without prerequisite channel opening. This behavior is impossible to fit in the Raman-Bean model, given the topological constraint of the model requiring a single pathway through the open state from the OB state.

      To that point, it is also unlikely that the addition of inactivated states to the Raman-Bean model would help fit this new dataset. Indeed, the Raman-Bean model contains 7 inactivated states. If there were a connection between OB ->I6, it is possible that direct inactivation (bypassing the O state) may help. Again, however, it is not our intention to discredit the Raman-Bean model, nor is it our intention to improve the Raman-Bean model. With new datasets, a fresh look at model topology was undertaken, which is how we developed our proposed model.

      This Reviewer astutely points out a known limitation of Markov (state-chain) modeling; it is impossible to tell uniqueness, or ambiguity of the model (both with parameters as well as model topology). Following the results of Menon et al. 2009 (PNAS vol. 106 / #39 / 16829 – 16834), in which they used a state mutating genetic algorithm to vary topologies of a Markov model, our group (Mangold et al. 2021, PLoS Comp Bio) recently published an algorithm to distinctly enumerate all possible model structures using rooted graph theory (e.g. all possible combinations of models, rooted around a single open state). What we found (which is not entirely surprising) is that there are many model structures and parameter sets that adequately fit certain datasets (e.g., cardiac Nav channels).

      Therefore, the goal is never to find the model (indeed we don’t propose that we have done so), but rather to find a model with acceptable fits to the data and then use that model to hypothesize why that model structure works, as well as to hypothesize higher dimensional dynamics. We make these points in the revised manuscript (lines 591-597).

      We did not specifically explore the impact of a second open state in our modeling and simulation studies, but we would certainly agree that a model with a second open state may recapitulate the dataset.

    1. Author Response:

      Reviewer #1:

      The study is elegantly done, with the outstanding questions clearly laid out and the results presented in a clear and informative fashion. I have only a few suggestions to strengthen some of the results.

      1) Determination of layers: The CSD based method used to determine the layers seems a bit ad hoc, although other studies have often used a similar approach. Some histological evidence would be great. If that is not possible, the authors should provide some more details to determine the layer specificity. For example, where were the supragranular-granular and granular-infragranular borders for different penetrations (i.e., which electrode(s) marked these boundaries)? These could be expressed as fractions of the shaft length, and from that, we would approximately know the depth. Also, were these results affected by how the CSDs were smoothed?

      We thank the Reviewer for the suggestions. Using CSD from individual penetrations to define the position of laminar compartments is a strategy has been used by several different laboratories, not only in primary visual cortex (e.g. Mitzdorf, 1985; Poort et al., 2016; Bijanzadeh et al., 2018), but also in extrastriate areas, such as other studies in V4 (Nandy et al., 2017; Lu et al., 2018, Pettine et al., 2019; Ferro et al., 2021), and in other cortical areas, for example in medial temporal cortex (Takeuchi et al., 2011). We include a new figure that shows the similarities between the CSD profiles from different studies (Figure 1-figure supplement 3). Figure 1-figure supplement 3A shows the population average of the CSD in our data. The similarity between this panel and the individual examples shown in Figure 1 and Figure 1-figure supplements 1 and 2 further highlights the fairly consistent sink-source patterns observed across individual penetrations in our data. Figure 1-figure supplement 3B shows that this pattern is also consistent with that found in macaque V4 in another laboratory (Pettine et al., 2019). Figure 1-figure supplement 3C shows that this pattern is not specific for V4, but also occurs in other areas, such as the medial temporal cortex (Takeuchi et al., 2011).

      Importantly, in the latter study Takeuchi et al. were able to verify histologically, after applying electrolytic marks, that the prominent current sink with short latency (white star in Figure 1-figure supplement 3C) corresponds to the granular layer, thus consistent with CSD analysis in V1 (Mitzdorf, 1985). Thus although we are not able to perform such histological verification in our study (one animal has been euthanized, and the other animal is destined to take part in another project), the very similar sink-source patterns between these different studies (Figure 1-figure supplement 3A-C), including with work from others that has been verified histologically, gives us confidence that we can meaningfully use them to assign electrode contacts to granular, superficial and deep layers, as other laboratories have done (e.g. Lu et al., 2018). In addition to the new Figure 1-figure supplement 3, we added language in the corresponding paragraph in Results to explain this better. We clarified in Results that interpretable CSD maps were found in 81 out of 88 penetrations and that only those were used in the laminar analysis (which was already indicated in Methods in the previous version of the manuscript).

      As suggested by the Reviewer, we have added the positions of the electrode contacts to the CSD maps in Figure 1 and Figure 1-figure supplement 2 (labels along ordinate on the right of the panels). The electrode contacts on the probe covered 3.1 mm (32 contacts with 0.1 mm distance between adjacent contacts), thus the full depth of the cortex was covered even though the vertical position of the probe varied between penetrations (also because a layer of granulation tissue develops over time between the artificial dura and the pial surface). Therefore, to aid in estimating the depth of the individual penetrations, we indicate the position of the most superficial contact on which multiunit activity was recorded (solid black triangles in Figure 1; Figure 1-figure supplements 1,2; for all cases where this contact could be identified, i.e. if the most proximal contact on the probe did not show multiunit activity). The average position of this contact is shown on the population CSD map (solid black triangle on Figure 1-figure supplement 3). The advantage of a method that assigns layers for a penetration based on data from the same penetration, as we have used here, as opposed to a method that assigns compartments based on depth derived from a population average, is that the former helps to avoid errors due to variations in probe position, and due to a variable degree of tissue compression that may occur for different penetrations.

      To test whether the results were affected by how the CSDs were smoothed, we performed the layer assignment separately using different methods of smoothing. To ensure that we did not bias the results we blinded ourselves to the original layer assignment when applying each method. We find that the laminar position of >97% of well-isolated units was identical as that obtained when using the standard procedure of smoothing the CSDs. This resulted in robust latency differences of border ownership signals between layers, irrespective of which smoothing method was used. These results are presented in a new supplementary figure (Figure 2-figure supplement 3).

      2) Another important factor is the orthogonality of the penetrations. This can also be better quantified based on the variation of the RF centers with depth.

      We followed the Reviewer’s suggestion and evaluated orthogonality of the penetrations by computing the distance D between receptive field centers along the probe (Methods). We show this metric for the vertical positions of receptive field contours shown for the penetrations in Figure 1H,I and Figure 1-figure supplement 1, and describe the population data in the first paragraph of Results (median (IQR) 0.83 o/mm (1.00 o/mm), for all 81 penetrations that were included in the laminar analyses). This indicates that the variation is small relative to the average diameter of the receptive fields (7.36 o), and that the deviation from orthogonality is limited.

    1. Author Response:

      Reviewer #2:

      The manuscript by Podinovskaia focuses on a new method to visualize and measure endosome maturation in common cell lines by enlarging early endosomes. This was achieved by producing acute insult to the cells by ionophore treatment, leading to budding of abnormally large post Golgi vesicles that fuse with early endosomes. Endosome maturation of these enlarged endosomes containing Golgi-derived cargo (GalT) proceeding with apparently normal kinetics, ultimately leading to lysosomal delivery. Taking advantage of this assay, the authors investigate Rab5-to-Rab7 conversion, acquisition and loss of PI3P, acquisition and loss of Snx1 on apparent endosomal subdomains, interaction of early and late endosomes with Rab11-positive recycling endosomes, and lumenal pH changes. The new maturation model presented here will likely be quite useful to the field with continuing impact. The current state of the endosome field in many ways remains fragmentary, with various processes studied extensively in isolation, but with little information on their relative timing and potential interactions as endosomes mature. This new assay should help understand the relationships between these processes, some of which are investigated in this manuscript.

      Concerns:

      1) The data and conclusions related to Rab11 interaction with early endosomes in Fig 8 are not convincing. There are simply too many Rab11 endosomes in the cell to know if their short term proximity indicates meaningful interaction with the early endosomes, or if the data simply reflects random collisions of small recycling endosomes with the enlarged early endosomes. No data is presented to show that the interactions are meaningful, e.g. that recycling cargo transfer occurs during these interactions. Conclusions from this analysis are overstated.

      We now provide more evidence for the interaction of Rab11 vesicles with the enlarged endosomes. We made movies with shorter intervals (2 sec instead of 1 min) between the individual frames. These data clearly show that this is not an accidental bumping into an endosome but rather that Rab11 vesicles can circle around endosomes and stay for several minutes (Video Fig. 8A, supplement 2 and 3).

      In addition, we imaged TfR-GFP together with mApple-Rab5. These data show that TfR-GFP positive vesicles bud off from mApple-Rab5 positive endosomes and that the GFP fluorescence intensity goes down over time in enlarged endosomes. These data are consistent with recycling of TfR to the plasma membrane. Moreover, CDMPR-GFP, which cycles between the TGN and endosomes was found to be present on Rab5 negative enlarged structure, which then turned Rab5 positive, and subsequently lost the CDMPR signal. Importantly those endosomes could regain CDMPR, which we interpret as acquisition from the TGN. These data may indicate that the TGN-endosome shuttle is intact after nigericin washout (Fig. 9).

      That the TfR and CDMPR are really transported out of the enlarged endosome is also contrasted by our finding that GalT-GFP stayed in the enlarged endosome and the signal intensity did not significantly drop.

      2) Lack of information on endocytic cargo acquisition by the enlarged early endosomes: to really establish this endosome maturation model the authors would need to establish if the enlarged endosomes contain endocytosed cargo, as opposed to Golgi-derived cargo, and determine how long it takes to acquire such cargo. This could be accomplished using Tf, EGF, or perhaps dextran at early timepoints after nigericin washout.

      As described above, we now show that TfR-GFP is present in enlarged endosomes and is lost from these endosomes over time (Fig. 9A,D,G).

      Additionally, we performed experiments with dextran-Alexa647 and nanobody-tagged surface TfR to show that endocytosed material from the plasma membrane indeed reached the enlarged endosomes (Fig. 3, figure supplement 1 and 2). Quantification of TfR signal at the enlarged endosomes demonstrates that TfR acquisition by the enlarged endosome takes place as soon as the enlarged compartment becomes Rab5-positive. This was also observed with the nanobody-tagged surface TfR and endocytosed Dextran-AF647, representative examples of which are provided (Fig 3, figure supplement 1 and 2). The quantification for the latter experiments was not carried out due to the very short time range during which asynchronous Rab5 recruitment events needed to be captured after addition of nanobody/Dextran pulse-and-chase.

      3) Figure 7 - It was not convincing that data in panels F and G are different from each other.

      We agree with the reviewer that the difference between the data presented in panel F and G is not very big. These panels represent the average of many endosomes and with the averaging the differences from the individual traces get cancelled out. The process is asynchronous and thus in this case the individual traces are more telling than the averaged traces. Nevertheless, we decided to keep the average traces in the manuscript because the highlight the asynchronous nature of the process. We modified the text to make this point clear.

      4) Figure 11 - it is unclear how we can interpret this as connected to Rab conversion when even the labeled compartments at the earliest time point in the czz1 knockout have abnormally high pH, and during the time-course even the last timepoint for czz1 KO is higher than that of the earliest timepoint for WT.

      We agree that the ccz1 KO cells display higher endosomal pH than WT cells throughout the time-course.

      However, the cells in which we express the rescue plasmid of Ccz1 also have apparently less acidified endosomes, even though Ccz1 can still drive Rab conversion, and the pH dropped at an intermediate rate, when comparing rescued cells to control and ccz1 KO cells. Even in ccz1 KO cells endosomal traffic down the degradation pathway is not completely blocked, similarly to what we observed for sand-1 (-/-) in C. elegans and Mon1a/b knockdown in mammalian cells (Poteryaev et al. 2010). Acidification eventually will occur, but it is massively slowed down; the molecular basis of which is still under investigation in our lab.

      We think that in the absence of Ccz1, a condition under which Rab conversion is severely impaired, acidification cannot occur at normal rate. As pointed out by reviewers 1 and 2, the pH is already higher in the ccz1 KO cells than in the control condition. However, in the rescue condition, the YFP/CFP ratio is not that different from the knockout and yet acidification can occur at an intermediate rate. Why under rescue conditions, the YFP/CFP ratio is at a similar level compared to the KO is not entirely clear. It is conceivable that too much Ccz1 has also a negative effect. Moreover, recently it has been shown that ccz1 KO cells accumulate free cholesterol in the enlarged endosomes (Van den Boomen Nat Comm., 2020). The transient expression might be not sufficient to rescue this accumulation phenotype or other secondary effects. Nevertheless, the v-ATPase appears to maintain its function because lysosomes can acidify in ccz1 KO cells, albeit with a delay (Figure 13).

      5) Figure 12 - The criteria used to determine which GalT structures are Golgi or lysosomes seems questionable. Morphology alone is not sufficient to identify the compartments with high accuracy, especially after perturbation. Also, it is unclear to what extent GalT-CFP labels lysosomes without nigericin treatment.

      To address these issues, we co-labelled cells with lysotracker. GalT-CFP (pHlemon) and lysotracker showed a very high degree of co-localization. These data are included in the manuscript (Fig. 10B).

    1. Author Response:

      Reviewer #1:

      In this manuscript, the authors make use of next-generation sequencing to provide a preliminary inventory of tribe Metriorrhynchini, a hyperdiverse group of beetles with intricate systematics mainly due to likely morphological convergence of their Millerian rings. The authors provide an admirable sampling within Africa, Asia and Oceania, with about 700 successfully sampled localities and thousands of specimens.

      The main result of the manuscript is the curated database of Metriorrhynchini that will be useful in future research. In addition, different statistical methods are used to provide an idea of the undescribed species within the tribe, the astonishing species richness in New Guinea or the use of phylogenomic data to explore major phylogenetic relationships. However, some of the author's claims should be questioned:

      • Surprisingly, the authors rely on a very low threshold to identify mOTUs (2% in the manuscript). The authors refer to Hebert et al. (2003) and Eberle et al. (2020) to justify the threshold, but still, they are likely overestimating the number of mOTUs and thus, considering putative species what it may be different populations. Figure S17 provide estimates of mOTUs with different thresholds (1 to 10%), which rapidly decrease their estimates (a decrease of 25% mOTUs is found when 6% was considered). Still, an overwhelming sampling effort but a more realistic estimate.

      • I think the phylogenomic tree did not receive the required attention (for example, the FcLM analysis is barely mentioned).

      • It is not clear why should be important to mention the "person-months of focused field research" across the manuscript. Each study group has a unique sampling technique (also not found in the manuscript), preferred localities or traits, which make comparisons impossible. The authors' effort is remarkable, but it is not an important result/finding to be highlighted all over the manuscript.

      Many thanks for all comments and suggestions that pointed to the weak parts of our argumentation. We modified the manuscript accordingly and added some references that can be used for the justification of some claims.

      We addressed the question of thresholds for species number estimations. Now, two thresholds are considered in the manuscript as relevant for discussion: 2% and 5%. We added further information on our previous studies dealing with integrative species delimitation in Metriorrhynchini. Some of them were not referred in the earlier version (to avoid self-citations) and we also expanded information on the evidence given in the study which we have already referenced (Bocek et al. 2019). The earlier comparison of nextRAd, mtDNA and moprhology-based delimitation of species in Eniclases (the trichaline clade in the present study) showed that many well defined species (nextRAD and morphology) have highly similar mtDNA and they split only recently, eventually some introgresion or incomplete lineage sorting affect mtDNA signal. If we apply 5% threshold for this group, we would delimit as a single species two entities which differ in the body size, coloration and the relative size of male eyes (diurnal and nocturnal activity in putative sister species). In such a way, we would decrease the number of species in our analyzed sample of Eniclases by 40% in clear contrast with the number based on morphology and nextRADs. We found similar rapid morphological diversification also in other metriorrhynchines (Jiruskova et al., 2019,; Kalousova & Bocak, 2017) and other not referenced taxonomic studies that have shown that closely related species have well diversified male genitalia and often belong to different mimetic rings). To limit our discussion, we do not reference our earlier nextRAD study showing the speciation in other subfamily of net-winged beetles within a single mountain range (Bray & Bocak 2016). Also this study supports morphological differentiation in species with highly similar mtDNA. Now, we noted in the manuscript that before taxonomic revisions are produced, our claim is provisional and therefore we modified the text as proposed and present the lower numbers of species as a realistic possibility.

      Phylogenetic relationships: We added additional information on the congruence with earlier studies to Results and Discussion, but we still do not describe details. The main reason is that morphology must be studied to delimit and formally name new taxa and that the morphology is out of scope of this work (except some information provided in Supplementary Text – description of delimited generic groups and subtribes). The FcLM analysis addressed only the relative position of the leptotrichaline and procautirine clade. Both clades are monophyletic, morphologically distinct and no conclusion is based on their relative position. We noted that without further data we are unable to robustly solve their positions. Provisionally, we prefer the deeper postion of the leptotrichalines (61%, a not very convincing phylogenenomic signal).

      Quantification of sampling effort: As proposed, we excluded the consideration of person months as a measure of relative collecting effort in various regions and add justification for field research methods.

      Reviewer #2:

      Conservation efforts must be evidence-based, so rapid and economically feasible methods should be used to quantify diversity and distribution patterns. The principal objective of this study is to demonstrate how biodiversity information for a hyperdiverse tropical group can be rapidly expanded via targeted field research and large-scale sequencing. The authors have attempted to overcome current impediments to the gathering of biodiversity data by using integrative phylogenomic and three mtDNA fragment analyses. As a model, they sequenced the Metriorrhynchini beetle fauna, sampled from ~700 localities in three continents. The species-rich dataset included ~6,500 terminals, >2,300 putative species, more than a half of them unknown to science. It is an amazing finding. Their information and phylogenetic hypotheses can be a resource for higher-level phylogenetics, population genetics, phylogeographic studies, and biodiversity estimation. At the same time, they want to show how limited the taxonomical knowledge is and how this lack is hindering biodiversity research and management.

      Thanks for your comments on our study. We agree with your specific recommendations and modify the manuscript accordingly.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors propose several ways of leveraging single-particle tracking experiments to distinguish between intracellular phase separation and an alternative model of clustered binding sites. The first proposed scheme is particularly intuitively appealing: in the binding site scenario, the local density of binding sites both increases particle density and slows effective particle diffusion, leading to a definite relationship between these two quantities, while the phase separation scenario would not necessarily couple these two quantities. The additional schemes based on particle movement near a cluster boundary, angles between consecutive steps, and search times add to the arsenal of potential analysis tools. Overall, the work is timely, rigorous, and generally clearly presented and given the growing list of reported observations of phase separation, will appeal to a broad audience.

      We thank the referee for the positive attitude towards our work and are happy for the insightful comments that have increased the value of this paper.

      Reviewer #3 (Public Review):

      Membraneless condensates have recently become a central focus of the molecular and cellular biophysics communities. While the dominant paradigm for their formation, liquid-liquid phase separation (LLPS), has been well established in a number of cases for large, optically resolved droplets, there are significant concerns regarding the generality of this mechanism for smaller foci or puncta, and other mechanisms have been proposed to explain their formation. The problem is that it is very difficult to distinguish experimentally between these mechanisms for sub-optical resolution condensates. In this article, Heltberg et al propose a novel method, based on the analysis of single molecule tracks, that allows discriminating between the liquid phase model (LPM) and one of the challenger mechanisms, the "polymer bridging model" (PBM). This method relies on the statistics of individual displacements - diffusion, radial displacements, angular changes - which are showed theoretically to exhibit different signatures for the two models. With realistic data this is sufficient to discriminate between the models: for instance in the case of double strand break foci (DSB), building on a recent work by some of the same authors, this article convincingly rules out the PBM in favor of the LPM. The author also investigate the influence on these two models on the search time to reach a specific small target - a commonly invoked role of condensates - and show that only the LPM substantially accelerates this, which could provide additional means to experimentally discriminate between the mechanisms, on top of the intrinsic interest of this finding.

      This article is a welcome addition to the literature in this field, as it will help clarify the nature of these condensates, in particular below the optical resolution. It is well-written, interesting and the conclusions are justified. I particularly appreciate the effort to employ simulated data that are realistic for actual experiments, which strengthens the claims of applicability. Some aspects of the data analysis and of the modeling, however, are insufficiently discussed and would need to be precised / expanded.

      1) The modeling is made under the assumption of thermal equilibrium, without further discussion. The authors should comment on why this is reasonable, in particular in view of the presence of active fluctuations and of chemical reactions in these condensates.

      First of all, the experimental measurements are carried out after the formation of the foci, and the time of observation (tens of seconds) is small compared the the lifetime of foci (tens of minutes). Therefore we can assume that these measurements are not affected by the effects of formation and disruption of foci. Secondly, the data extracted to compute the results of Figure 2 (in particular for Figure 2H) are not very sensitive to the active fluctuations, since we derive an average diffusion coefficient inside and outside of the focus as well as a free energy difference between an inside and outside level. It is indeed very likely that the soup of proteins that forms the focus is active, however Rad52 is not involved in chemical reactions at the timescales we are looking at may be considered passive. This is supported by our investigations of the experimental results, where we have not seen any statistical differences as a function of the time of measurements, and we have no reason to believe that active fluctuations affect the diffusivity of Rad52 on the observed timescales. Regarding binding sites, they may also diffuse actively along with the genome and chromatin, but we describe this by an effective description of the motion of Rad52 on short time scales, so that active effects are folded into an effective diffusivity (left as a free parameter).

      We want to highlight this issue as well as present our arguments of why this description is valid for the experiments considered in this work. We have added text between Eq. 6 and Eq. 7 summarizing the arguments outlined above.

      2) How is the diffusivity measured? Are these measures corrected for experimental error (e.g. using three-point estimators)?

      Estimates of the diffusion coefficients in Min´e-Hattab et al. 2021 were obtained in different ways. Our main method is to generate the displacement histogram, and then estimate the number of different diffusion coefficients in the population based on likelihood fitting and KStesting. Then we take for all the traces, and find the ones that we are certain belong to the slowest diffusion coefficient. These traces are the ones in the focus, but by doing it this way, we are not vulnerable to the position of the boundary and to determine which are in the focus based on their position. Then we compute the MSD curves for this distribution of slowly diffusing molecules, and fit the diffusion coefficient based on a confined fit (which has a good p-value). This method is strong since we are fitting a slow diffusion population and typically can reject traces belonging to the fast diffusion coefficient. We also include the possibility of separating traces if the molecule goes from inside the focus to outside or the other way round. The alternative way we calculated the diffusion coefficient, was based on the microscopy data, where we “cropped” all the traces that could be visibly identified as being inside the focus. This method had the strength that we could visibly follow all traces, but the drawback that we could mistakenly identify molecules as being inside the focus, then they could be under or above the focus, as discussed in the section above. However both method yielded similar results. It is also based on these methods that we extract the size of the focus.

      In order to clarify this important point, we have added two sentences in the caption to Table I, describing how the diffusion was measured in the experimental paper, and added a new paragraph about experimental measurements in Materials and Methods. In addition, we have clarified in the caption of Fig. 2F that we extract the maximum likelihood value of D˜(r) in each radial segment.

      3) The conditioning of the averages should be discussed, e.g. in Eq. 13: I assume that it is in the Ito convention? Similarly for the angle changes.

      We assume that the density of the binding sites, follow a radial distribution, with no significant angular dependency. Thus the average displacement hdri is computed as a function of the initial position of the particle and averaged over all initial displacements with similar radial positions. It is indeed formulated in the Ito convention, which is why the “spurious” term appear in the first term of the second line of equation 13.

      To clarify that we are using Ito convention, we have stated that we are using Ito convention for this paper, just before the introduction of eq. 1. We have furthermore clarified in the section related to eq. 13 and the section related to the distribution of the angles that we use the initial position when calculating the difference between the two connected points.

    1. Author Response:

      Reviewer #1:

      George Elias et al investigated the response of a cohort of individuals to Hepatitis B vaccination and analysed the role of preexisting vaccine-reactive CD4+ memory T cell receptors in the immune response. They found that the presence of these cross-reactive receptors elicits a faster and stronger response in the vaccinees. This is an extremely interesting result, as it suggests that a better understanding of the immune receptor repertoire of an individual can be used to predict and analyse its response to vaccination.

      Strengths:

      The study presents a detailed experimental analysis of the role of CD4+ T cells in the immune response to vaccination.

      The authors show clearly that the dynamics of expansion of memory CD4+ vaccine-specific clones follows the immune response, corroborating the results of previous studies that analysed effector CD4+ cells.

      The authors asked also whether the presence of preexisting vaccine-specific clones impacts the response to vaccination. They found that this is the case. They defined an estimator of a normalized number of putative vaccine-specific clones and showed that can be used to classify individuals into early or late responders. This result has the potential to be extremely impactful in the way we understand immune response to vaccination.

      We thank the reviewer for their kind comments.

      Weaknesses:

      This central result follows the definition of the R{hbs} measure. It is not completely clear how much the numerator and denominator of R{hbs} contribute to the results and how those bystander and putative receptor sequences have been chosen. Some additional explanations could help reinforce the trust to this specific analysis.

      As indicated by this and other reviewers, the original definition R_{hbs} measure was confusing for readers. We have attempted to clarify the definition, through changes in the methods, results, figure legends and code base.

      In order to specifically address this comment, we have extended the discussion on what makes up the numerator and denominator, and how each contribute to the final metric (which is visualized in figure 3b).

      The new text reads:

      "Rhbs is the ratio of the frequency of putative peptide-specific TCRβ divided by a normalization term for putative false positive predictions due to bystander activations in the training data set. This model applied to the memory repertoire at day 60 shows that early-converters tend to have a higher frequency of putative HBsAg peptide-specific TCRβ, while late-converters tend to have relatively more putative false positives as per the normalization term (Fig. 3b)."

      It is also not clear if multiple testing correction has been performed in the presentation of the results of Fig5.

      Multiple testing correction is implemented throughout the manuscript. This has now been clarified in the manuscript text.

      The correlation between the number of putative vaccine-reactive CD4+ T cells at day 60 and antibody titers is an interesting and robust result. This however does not support the claim of the authors that preexisting vaccine-specific CD4+ memory cells are associated to stronger immune response. This could be the case only if a similar correlation would be observed at day 0.

      The reviewer raises an important point. The relationship between the vaccine-reactive CD4+ T-cells and the antibody titer is already a very interesting finding. However, we do already confirm that this signal is also present at day 0 in the CD4+ T-cell memory repertoire. Thus, our results seem to indicate the presence of preexisting vaccine-specific CD4+ memory cells.

      We have clarified this section in the manuscript in the hopes of avoiding similar confusion in other readers. The new section reads:

      "Furthermore, searching for HBsAg peptide-specific clonotypes in the memory repertoires prior to vaccination (day 0) results in a Rhbs with a similar difference (one-sided Wilcoxon-test P value= 0.0010, Fig. 3d). In this manner, the presence of HBsAg peptide-specific clonotypes as represented by the ratio Rhbs can be used as a classifier to distinguish early from late-converters prior to vaccination (Fig. 3e), with an AUC of 0.825 (95% CI: 0.657 – 0.994) in a leave-one-out cross validation setting."

      Reviewer #3:

      This manuscript presents a comprehensive study of CD4 memory T cell receptor beta repertoire response to hepatitis B vaccination, including repertoire correlates of early, late, and non seroconversion, identification of antigen specific and epitope specific clones, and a statistical classifier to potentially predict early Vs late seroconverters based on their pre-vaccination bulk repertoire. The major strengths are a unified experimental and computational analysis of bulk TCR repertoire data with antigen and epitope specific sorted T cells from the same individuals, allowing them to track personalized dynamics of vaccine specific clones, as well as translate across individuals to predict vaccine-induced seroconversion outcomes from pre-vaccination repertoires. The experimental data and reproducible analysis code are publicly accessible, and represent a useful resource that will likely be of interest beyond this study to other immune repertoire researchers.

      The results seem to support the authors conclusions, however several reported findings based on statistical analysis are less convincing, and would benefit from improved validation, clarification, or reworking. I next detail these aspects ordered by results sections.

      Section beginning line 128:

      The reported finding of this section is that early-converters (and not late-converters) undergo repertoire remodeling by day 60 post vaccination that decreases repertoire clonality. The evidence presented to support this is a computation of Shannon entropy for day 60 Vs day 0 in each individual, and a paired sample statistical test that is nominally significant for early and not for late converters. However, this nominally significant p value 0.042 is quite marginal, and the associated plots (Fig 2a) indicate only a very modest visual difference, and the presence of a distant outlier. The p value for late converters is not shown, however the marginally significant p value for early converters may not be nominally significant (at alpha 0.05) after multiple test correction (two tests). Additionally, the range of possible entropy values depends on the total sample size, so part of this difference may be driven by sample size. It may be more appropriate to use the Shannon equitability index (normalizing by the maximum possible entropy given the sample size, which is the log of the richness).

      We wish to thank the reviewer for the suggestion of using the Shannon equitability index, which we had not considered before and is indeed highly appropriate for the analysis. This analysis has therefore been rerun with the Shannon equitability index, which has indeed resolved the visual outliers that appeared before. As could be expected, the marginal result that was found before was not sufficient robust, and the P-value for the late converter increase in Shannon equitability index was now found to be 0.0822 with a Wilcoxon test. These results have therefore been removed and they had no impact on the main conclusions of the paper.

      Section beginning line 147:

      Ag specific T cells were isolated from day 60 samples and sequenced, allowing the authors to track the dynamics of these clones in the bulk repertoire data across time points. In all vaccinee groups these Ag specific clones are found to increase from day 0 to day 60 in the bulk repertoires. A marginal p value (0.04909) is presented to support early-converters showing more increase in these Ag specific clones. However, statistics comparing early to non or late to non converters are not mentioned (and these would require a multiple test correction on the p value that is discussed).

      This is a valid concern raised by the reviewer. This specific analysis as performed in the original study was lacking power, in large part due to the lack of a concrete null model. Due to this lack in power, we had opted not to include the non-converters in this analysis (as these are only three samples). However, thanks to this reviewer’s suggestion, we were able to rework this analysis with a novel null model (detailed in the next response). This section has therefore been removed and replaced with the new analysis (where a Bonferroni multiple testing correction has been applied for the three responder categories).

      A more general difficulty I have with this section is that the null hypothesis isn't made clear, and is probably more subtle and complicated than it appears. Cells are sorted from day 60, then their prevalence is compared between day 0 and day 60. Don't we expect to see more of them in day 60 even if there is no specific expansion for these clonotypes, but just random repertoire churn? My concern here is that double dipping from day 60 is affecting the analysis, since this time point is initially used to define the marker clones in the first place. If you take a random set of day 60 TCRs as null marker clones do you also see they are more prevalent in day 60 Vs day 0, or are you assuming that there should be no difference under the null?

      This is a valid point of criticism raised by the reviewer, and one that was not adequately explored in the previous version of the manuscript. In the prior version, the assumption was that there would be no difference under the null, despite the vaccine-specific clonotypes are derived from samples taken from day 60. While these do originate from a different cellular compartment (memory versus activated cells) and there are several weeks of stimulation experiments that separate the two, it can be argued that the clonotypes will always have more in common with the day 60 samples than the day 0 samples.

      The establishment of a null model is difficult in this sense. Selecting random clonotypes from day 60 as the reviewer suggests, would establish a baseline based on the overlap between the two time points. This would not be a good comparison as the impact of day 60 clonotypes would be overinflated (as no experimental steps separate it) and any increase in epitope-specific clonotypes could never exceed this value.

      Therefore, we have added additional experimental data to establish a null model against which to compare. Samples from day 60 were treated in identical manner as described before but in the presence of epitope lysates derived from the varicella zoster virus (VZV) instead of the hepatitis B surface antigen. The prevalence of VZV in our study population is near absolute, thus it can be expected that all individuals have built up a T-cell immunity against the virus. Moreover, as this is a childhood disease, one can expect that the T-cell immunity against varicella on average not to change between day 0 and day 60 in our cohort. We postulate that this additional experiment is perfectly suited to establish a null model for the vaccine-specific expansion.

      Using the VZV-specific clonotypes in the comparisons between day 60 and day 0 show a difference of 1.021 [95% CI: 0.934-1.124]. Thus our assumption that the null model should show no increase seems to be valid, as here too the clonotypes were derived from the day 60 samples. In addition, all reported increases were found to be highly significant when compared to the VZV-derived baseline (e.g. P-value of 6.3e-05 for the HbsAg-specific increase of 2.080).

      The last analysis in this section (presented in Fig 2d) does group-level comparisons of Ag specific clone fractions at day 60. I don't follow why normalizing by the number of Ag-specific clones detected in each individual is correct (i.e. would result in no differences under the null). Here again it could be helpful to see if null marker clone sets (the same size as the true Ag specific sets for each individual) indeed show no significant differences between groups.

      This is an important remark made by the reviewer. In essence, we need to compare the set of marker TCR clonotypes identified in the expansion experiment with the TCR clonotypes found in the CD4+ memory compartment. Thus, we are considering the overlap between two sets of TCR clonotypes for each individual.

      When one has two sets A and B, and wish to compare the overlap across different comparisons, one has to consider the impact of the size of A and B. The larger either A and B are, the larger the expected overlap (even by chance).

      As a specific example tuned to the problem we are addressing in this study: Imagine two donor D1 and D2. Each donor has a sequenced TCR repertoire with 50,000 unique clones (which is on par with the observed values). D1 has 10 Ag-specific clonotypes derived from the stimulation and D2 has 200 Ag-specific clonotypes. These number can vary widely due to sampling bias, differences in clonal expansion and difference in immunoprevalence or immunodominance. This is also not considering any bystander (false positive) clonotypes. In any case, consider that the overlap in both cases is 5. This means that we find half of the Ag-specific clonotypes in the memory repertoire of D1, but only less than 3% for D2. Thus despite having equal overlap, we would argue that the overlap for D1 is more relevant than the overlap for D2. Thus when comparing between individuals, we argue that one must take into account the size of the TCR sets that are being used for the overlap calculation.

      As our sets are not equal in size, i.e. |A| >> |B|, we applied the Szymkiewicz–Simpson coefficient (also known as the Overlap Coefficient), wherein one divides by the size of the smallest set. In our case, the smallest set is always the set of the Ag-specific marker TCR clonotypes. Therefore, in practice, we always normalize by the number of Ag-specific clones.

      This has now been clarified in the paper and the new text reads:

      "In this case, to allow for a between-vaccinees comparison (in contrast to the within-vaccinees timepoint comparison), we calculate the Overlap Coefficient, where HBsAg-specific sequences in the CD4 T-cell memory repertoire are normalized by the number of HBsAg-specific TCRβ found for each vaccinee."

      Section beginning line 185:

      In this section, a peptide pool approach is used to identify epitope specific TCRs from each individual at day 60, and a classifier is constructed to discriminate between early and late converter bulk repertoires, using a quantity R_hbs that measures the relative fraction of peptide specific TCRs in the repertoire according to Hamming distance similarity to the peptide specific TCRs. Importantly (as stated in the methods) a cross validation procedure is employed where TCRs from a given individual are not used for classification of that same individual. Since d is Hamming distance on CDR3 sequences, presumably comparisons are only made for TCRs with identical CDR3 length differences. This seems like a limitation, since clones with identical V and J gene, and CDR3 that differ by only one in CDR3 length could very well bind the same epitope. A more TCR-specific distance function, such as the TCRdist of Dash et al., may significantly increase classifier performance.

      The suggestion raised by the reviewer is valid, and one that we had considered when designing the study. There are several methods available for making epitope-specific TCR annotations and our choice was informed by several considerations:

      • Our data is primarily beta-chain TCR sequences. Despite the high performance of TCRdist on paired alpha-beta chain TCR data, it has been shown that these approaches do not outperform hamming distances on beta-chain only (Meysman et al., Bioinformatics, 2019). Not that the length-restriction may seem like a large restriction, but in practice length of the CDR3 sequence is known as a strong predictor for epitope preference (De Neuter et al., Immunogenetics, 2018; Meysman et al., Bioinformatics, 2019; Valkiers et al. Bioinformatics, 2021).

      • The TCR repertoire data and the epitope-specific TCR data were extracted using different kits (Adaptive vs QIAGEN) due to difference in starting samples (millions of cells vs thousands of cells) and used different processing pipelines. It is well established that these processing can induce a bias in the TCRs that are reported. Thus we opted for the simplest method as it was deemed to be most robust against any such bias. Methods such as TCRdist have been designed to translate findings from one samples derived from an experimental setup to another sample with the same setup.

      • The epitope-specific sequences are few in number. We wished to have as high a coverage of the HBs antigen as possible, so opted not to use more advanced methods which usually place a restriction on the minimum number of input TCRs that are required to build an annotation model. In addition, we wished to use an equivalent model for the bystander sequences (the denominator of the Rhbs metric), likely involve a set of TCRs targeting multiple epitopes. This invalidates many approaches which require the assumption of an epitope-specific data set.

      Thus we opted for the straight-forward hamming distance approach, which has been applied in several prior studies (notably the look-up functionality of VDJdb uses hamming distance).

      The origin of this choice was indeed obscure in the previous version of the paper, and has now been clarified.

      There is a distance cutoff parameter c required to define R_hbs. How was this parameter chosen? In particular, if it was tuned to produce the best AUROC, then the cross validation procedure is not legitimate (nested cross validation would be needed, or separate held out test set).

      The reviewer is correct. The distance cutoff c cannot be informed by tuning the AUROC as it would invalidate the cross validation procedure due to information bleed.

      The cutoff was set based on prior research done several years ago on several independent data sets (Meysman et al., Bioinformatics, 2019). This cutoff was kept to not bias the current results. Furthermore the functions used within the code were already highly optimized towards this cutoff (as it allowed a hashing dictionary to be constructed for fast look-up). As the reviewer rightfully points out, the prior version of the code base did not even allow c to be set and was already baked-into the search algorithm. This has been clarified in the revision.

    1. Author Response:

      Reviewer #1 (Public Review):

      Sanchez et al investigate how the complex morphogenetic movements that drive epithelial tube formation are patterned to occur with the correct spatiotemporal dynamics by upstream transcription factor expression, a fundamental question in developmental biology. In prior work, the authors defined the cell behaviors that drive tube formation in the Drosophila salivary gland, demonstrating that localized apical constriction induces epithelial bending to form a pit and circumferential cell intercalations narrow the primordium. In this study, they show that as more peripheral cells flow into the deepening pit they switch behaviors to constrict apically, promoting continued morphogenesis of the structure. These behaviors, as well as correlated patterns of myosin pathway activation, require transcription factors Fkh and Hkb suggesting the expression pattern of these TFs may drive the dynamic changes in cell behavior. Using endogenously tagged protein reporters of Fkh and Hkb, they show the two TFs display dynamic expression patterns that initiate roughly where apical constriction will predominate and spread outward to cells that will later constrict into the pit. Fog appears to be a key downstream target of Fkh and Hkb, and interfering with the radial pattern Fog expression disrupts tube formation. Strengths of the study include the high quality, quantitative morphometric analysis in both time and space, the use of endogenously tagged reporters of Fkh and Hkb together with time-lapse analysis, and the multiscale nature of the study encompassing and connecting upstream patterning events to intermediate regulators of cell shape to downstream cell and tissue-level behaviors. A minor weakness of the study in its current form is a lack of cell tracking data that connect cell identity with the stated changes in behavior, something that should be straightforward to address.

      These data are inherent in our quantitative analysis, but were not explicitly shown.

      Reviewer #2 (Public Review):

      The authors analyze the cellular dynamics responsible for the formation of a tubular structure, taking as a model the formation of salivary gland in Drosophila embryo, which is initiated by the asymmetric invagination of cells from a circular placode. They observed a regionalized cellular behavior, dependent on Hkb and Fkh dynamic expression in the placode. Both these transcription factors are required to ensure the correct expression of Fog, which drives localized apical constriction and ensures correct morphogenesis of the salivary gland.

      Strengths: This is a detailed analysis of cellular dynamics during salivary gland morphogenesis. This study highlights the regionalized behavior of cells from the presumptive gland (or placode) with a region close to the invagination pit where Hbk and Fkh drive Fog expression, leading to medio-apical myosin accumulation and apical constriction and a more distant region where cells mostly intercalate, a process driven by a junctional Myosin polarity. Although mainly descriptive, these data are precise and convincing. The conclusions fit with the observations.

      Weaknesses: Although this work is interesting, it raises a lot of unanswered questions. How is the timing of apical constriction in the placode controlled?

      The question about the timing of apical constriction and how it evolves across the placode is exactly what we are trying to address in our study. As we show and discuss, it is the patterned and dynamic expression of the transcription of Fkh and Hkb, starting at the future pit position, that initiates the apical constriction (via downstream expression and action of Fog,) and then part of the later maintenance and continuation of apical constriction as cells move into a position in proximity to the pit is through the continued expansion of the Fkh expression.

      What is responsible for the delay in apical constriction observed in Hkb mutant?

      We explain this in the discussion of the manuscript (line 570 onwards): We conclude from our data that the initial constriction at the eccentric position where the pit forms depends on both Fkh and Hkb. In the hkb-/- mutant the central cells in the placode manage to constrict in the absence of Hkb likely only require Fkh activity to initiate Fog expression and function (and hence constriction). These cells therefore undergo their normal apical constriction once Fkh expression has reached the central position (as Fkh expression is unaffected in the hkb-/- mutant as we show). As they now are the first cells to constrict and invaginate, the hkb-/- mutants show a central and delayed constriction and invagination.

      How does apical constriction propagate? How is the switch between apical contraction and intercalating domains regulated?

      Once apical constriction is initiated at the pit position in the dorsal-posterior corner due to the action of both Hkb and Fkh (and downstream Fog), the constriction spreads across the placode, as we describe and analyse, mostly due to expanding expressing of Fkh, driving the expansion of Fog expression in its wake. The cell intercalation behaviour we observe and describe here and in (Sanchez-Corrales et al., 2018) leads to a convergence and extension of the tissue that feeds more cells into a position near the pit. Once Fkh expression has reached cells now in a closer position to the pit they also start to apically constrict.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript provides an interesting analysis of the evolution of the IR75a protein across the Drosophila phylogeny. As reported by the authors previously, IR75a in D melanogaster and several related species preferentially recognize the odor acetic acid, while IR75a in D sechellia instead prefers butyric acid. Here, the authors report that more distant Drosophila species, as well as the putative ancestral IR75a, also prefer butyrate, suggesting that IR75a changed its preference from butyrate to acetate within the melanogaster/obscura group, with reversion to butyrate subsequently occurring in D sechellia. Moreover, the authors identify a key site (position 538) whose identity as Phe or Leu tracks with odor preference. They also identify other secondary lineage-specific mutations that presumably provide structural support to help optimize ligand preference. Interestingly, different solutions for secondary optimization were observed across lineages, suggesting multiple evolutionary and structural paths for tweaking the ligand pocket. These data are generally solid and expertly generated, but I do note that there is substantial speculation based on molecular modeling (which the authors acknowledge) as well as speculation of mutational timeline which should be trimmed or removed. There are many ways to extend these findings, such as linking odor recognition properties to behavior, which would substantially increase the impact of this study.

      We fully agree that understanding the behavioral significance of the changes in odor recognition of these receptors is of high interest, but as we do not yet clearly know the behavioral function(s) of these receptors even in D. melanogaster, such an effort is likely to take months (if not years) and goes well beyond the current study. We hope that our identification of tuning changes of these Irs across the Drosophila genus might provide some additional clues to the contributions of these receptors to the odor-guided behaviors of these species.

      Reviewer #2 (Public Review):

      [...] Weaknesses: The authors attempt to link the characterized molecular events to the ecological needs that might have played a significant role in the linage's evolution, and to the structural aspects of receptor-ligand interaction. These two aspects are on the more speculative side, which the authors themselves acknowledge as limitations of the study.

      1. In terms of the ecological context, this study focuses on a narrow set of ligands that are probed at concentrations that are quite high and thus of unclear physiological relevance. While their results are very exciting, they are restricted to the inverse relationship of the responses to C2-C4 carboxylic acids. Although data in the literature shows that these ligands, in particular acetic acid, are likely relevant, other data support that ligands from the same series may also be relevant. In particular, noni volatiles are dominated by octanoic acid (Auer et al., 2020; Pino et al., 2010), and Ir75a responds very strongly to propionic acid as well as acetic acid (Pietro Godino et al., 2016, Silbering et al., 2011). While the C2-C4 relationship captures most of the variance in PCA (which leads the authors to focus on these ligands in the first place), perhaps at other more physiological concentrations the relationship between other ligands in the series becomes more prominent, which would be interesting to explore.

      It seems unlikely that odors that do not evoke robust responses at 10^-2 will stimulate biologically meaningful responses if tested at higher concentrations. Regarding the abundant noni volatile octanoic acid, we have previously shown that this odor (surprisingly) does not evoke strong responses in any acid-sensing Ir neuron, nor does it provoke strong olfactory behavioral attraction of D. sechellia (PMID 28111079, Figure 1). These results contrast with the physiological and behavioral responsiveness to hexanoic acid, and suggest octanoic acid may act principally via the gustatory system.

      The universe of odors is enormous, so for practical reasons for this study we focused on a simple series of ligands to extract principles of molecular evolution of olfactory receptor specificity. Acknowledging the limitations of our study, we have taken care to state in the Discussion that our findings may only capture part changes of the response properties of individual receptors, and that we can only speculate on the behavioral/ecological significance of at least some particular odor tuning properties.

      Regarding the quantitative analyses in our manuscript, the vast majority of the results are not restricted to C2-C4, but rather the whole linear series (C1-C6). In Figure 1E, we present the clustering of species on a C2 vs C4 response plot because we show that these two odors are the major contributors of variance in the data by PCA (Figure 1D) and because this allows easy visualization in two dimensions. When we analyze the statistical significance of the epistatic interactions between different mutations (Figure 3F), we use PC1 from the dataset, to encompass all of the main variance in the data and avoid biasing towards specific odors.

      1. The authors don't discuss whether the proposed polymorphisms are found in population genomic data, which is available at least for Dmel and Dsec. Mining these datasets and looking at intraspecific variation (or lack thereof) has the potential to support their speculations on the evolutionary trajectories of mutations with empirical data and offer complementary insight.

      We mentioned this briefly in our original submission (there is no informative population genetic variation) and have now added additional information to the manuscript on the results of our survey of intraspecific genetic variation in these drosophilids.

      1. A more critical limitation is the use of docking onto homology models. Modeling techniques are incredibly powerful as they can provide solid hypotheses for how protein-ligand interactions might occur. However, much caution should be taken to interpreting modeling results without experimental validation. The reliability of homology models scales substantially with sequence identity, which turns this protein family into rather poor substrates for extracting atomic-scale conclusions from these models. In this case, homology models are combined with docking and very little support is offered. For example, the docking scores presented for the homology models are relatively low, and there is no significant difference between the docking score of acetic and butyric acid onto Dmel Ir75a. Although it is well known that docking results will in general only qualitatively match the behavior of a receptor-ligand pair, in the absence of alternative validation of the modeling procedures, these results fail to convince the reader that the homology models and docking results are reasonably likely. It should be noted that the proposed mode of action is entirely plausible and an interesting possibility, but as it is it appears too speculative and without validation.

      We now present what we consider are better protein models to indicate the relative position of different residues in the LBD but have removed all docking analyses.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] Some weaknesses of the manuscript design include the prolonged continuous optical inhibition, lack of optical control and the possibility of a state-dependent effect on neural manipulation.

      The prolonged optical inhibition would conceivably produce tissue damage at the optic fiber tip. Accordingly, controls for the optic stimulation experiments comprised the experimental groups injected with AAV5-hSyn-mCherry—not coding halorhodopsin (HR-; controls)—that were continuously stimulated for 5 min with a yellow laser (589 nm) during either the acquisition or the expression phase. Moreover, histological analysis of the tissue at the optic fiber’s tip after the behavioral procedure showed no tissue damage in the areas surrounding the optic fiber’s tips (see Appendix Figure 11).

      The possibility of a state-dependent effect on neural manipulation was also discussed. It is important to note that for a given pathway to generate a condition to produce a state-dependent effect on memory, the pathway would be expected to be activated to the same degree during both encoding and retrieval. However, in our case, all ACA paths involved in the acquisition or expression of contextual fear responses showed a differential activation between acquisition and expression of fear memory and are unlikely to provide a condition to produce a state-dependent effect on contextual fear memory (see Discussion).

      It is unclear if the similarity in the effects obtained in some cases are due to similarity in the underlying behavioral process that is disrupted (e.g. failing to acquire an accurate representation of the context vs. associating the context with the threat).

      Our findings collectively support the idea that the ACA integrates contextual and predator-related cues during the acquisition phase to provide predictive relationships between the context and the threatening stimuli and influence memory storage. As discussed in the manuscript, silencing the AM > ACA pathway during the acquisition phase of contextual fear memory would disrupt ACA integration of the predator-related cues and impair the ability of this cortical field to provide predictive relationships between the context and the predatory threat. Conversely, if the ACA outputs were silenced, this cortical region would nonetheless be able to generate predictive relationships between predatory threats and contextual landmarks but would be unable to influence memory storage sites in the BLA and ventral hippocampus or in brain regions involved in the expression of contextual fear responses, such as the PAGdl.

      Reviewer #2 (Public Review):

      The authors' main goals were to identify key circuits for the acquisition and expression of contextual fear conditioning in a paradigm that uses a live cat as the unconditioned stimulus. They used neuroanatomical tracing, opto-, and chemo-genetic techniques to observe and manipulate activity in anterior cingulate area (ACA) afferents and efferents at either the acquisition or retrieval stages. The strengths include a thorough characterization of multiple circuits and robust behavioral effects. Weaknesses include the confound of the experimenter being in the room for the acquisition, but not retrieval phase, a lack of characterization of escape-like behaviors, and the exclusive use of male animals, which reduces the potential impact.

      In the revised version, we clarified that the experimenter was present in the experimental room during the habituation phase, cat exposure, and the context phase; in all conditions, the experimenter’s position in the experimental room remained consistent (see Methods).

      We also clarified that our protocol conditions yielded only occasional escape responses. Thus, only a few escape episodes were noted during cat exposure—mostly when the animals noticed the cat and fled back to Box 1—and during exposure to the predatory context, the animals presented only occasional escape episodes (see Appendix Behavioral Protocol).

      Finally, we discussed that the basic circuits organizing these responses should be similar in both genders. However, gender differences are expected in terms of the responsivity of the components within these circuits, and further studies are needed to investigate gender differences in the responsivity of circuits mediating contextual fear memory to predatory threat (see Discussion).

      Reviewer #3 (Public Review):

      In this study, de Lima et. al. examined the function of ACA in acquisition and expression of contextual fear memory to predator threat. The authors found that ACA is necessary for both processes. Using optogenetic terminal inactivation, the authors further demonstrate a necessary role of AM input to ACA in the contextual fear acquisition phase. At the output level, the projections from ACA to BLA and PERI are necessary for contextual fear acquisition while the projection from ACA to PAG is essential for contextual fear expression. Overall, the study is interesting and the results are straightforward. The presented data largely support the conclusions. The paper will provide new insight into the neural circuit underlying contextual fear learning and expression to predator threat. Some limitations of the study include the lack of controls to demonstrate how specific is the ACA response to threat and threat paired context is and validation of the terminal inhibition. Further characterization of the projections would also help to understand these results.

      In the revised manuscript, to demonstrate how specific the ACA response is to threat and a threat-paired context, we performed in the ACA, DAPI and Fos quantification in four conditions: exposure to the cat, the predatory context, the novel non-threat stimulus (plush cat), and the context paired with non-threat stimulus. We were able to show that ACA Fos expression in response to the cat and cat-related context was significantly higher relative to its expression level in response to exposure to a novel non-threat stimulus and to the context paired with a non-threat stimulus (see in Appendix, Comparison of ACA activity and Figure 3; and Discussion).

      We have also validated the terminal inhibition and performed patch clamp experiments that tested the efficiency of pre-synaptic inhibitions and showed clear inhibition of EPSC of postsynaptic cells after illumination at 585 nm light on halorhodopsin positive fibers, and no rebound excitation after halorhodopsin activation (see Appendix Figure 6).

      Finally, to further characterize the ACA projections, first we performed the quantification of ACA terminal fields in the BLA, PERI, PAGdl and POST, revealing the densest ACA projection field in the BLA, followed by the projections to the PERI, PAGdl, and POST, which contained the weakest projections from the ACA in all cases examined (see in Appendix, Quantification of ACA terminal fields and Figure 8).

      Next, we performed triple retrograde tracing in the same animal to investigate the distribution of ACA neurons projecting to the BLA, PERI and PAGdl, the ACA projections involved in the acquisition or expression of contextual fear memory (see in Appendix, Triple retrograde tracing and Figure 9). The results revealed a layering pattern of the ACA neurons projecting to BLA, PERI and PAGdl. ACA projecting neurons to PAGdl are located in infragranular layers (layers V and VI) and do not overlap with the neurons projecting to the BLA or the PERI. Conversely, neurons projecting to PERI and BLA are located in the supragranular layers (layers II and III) and present a nearly 50% overlap.

    1. Author Response:

      Reviewer #2 (Public Review):

      The authors used an operant task in voles to assess preferences for social relationships and whether these preferences differed by sex and species. They also correlated some outcomes with oxytocin receptor binding.

      A strength of the paper is the use of the vole models which allow comparisons between socially monogamous (prairie voles) vs. promiscuous breeders (meadow voles). Because prairie voles show a stronger preference for peers and mates than many other rodent species, they are a great model to assess selective relationships. The other major strength of this paper is the use of the operant procedure to assess preference. To my knowledge, this is the first time this procedure has been used to assess social reward in a monogamous species, and it has advantages over place preference procedures.

      A weakness of the paper is the omission of groups from certain manipulations. Male meadow voles were excluded from the operant procedure and all male data was excluded from oxytocin receptor analysis with little rationale. This limits some of the conclusions that can be made about males in the study. Additionally, some experimental design details were missing. A subset of rats went through extinction. However, it was unclear whether those used for oxytocin receptor analysis went through extinction or not (or were from both conditions). The biochemical analysis also assessed the effect of oxytocin receptor genotype, replicating the effect that C allele carriers have higher oxytocin receptor binding in specific brain regions. However, the analysis of this genotype's effect on behavior was limited due, presumably, to power issues with most measures. Thus, conclusions regarding genotype were limited. Finally, there was a missed opportunity to do a progressive ratio test to better assess motivation for the partner rat.

      The operant task and the subsequent behavioral results will be useful for the field, but the design issues somewhat limit impact. However, assessing the formation of selective relationships and using the vole model is innovative.

      Regarding testing protocol, we exclusively used a progressive ratio schedule (PR-1) in all testing phases of our study. This has been emphasized throughout the manuscript, and hopefully obviates any concerns about limits to the interpretation of fixed ratio studies.

      Excellent point about the need for more detail on inclusion of each sex. Information has now been added as to why males were not tested in two aspects of the study. Males were not included in the meadow vole portion of the study because only females of this species show pronounced seasonal differences in affiliative behavior. We have added the text: “Because the seasonal transition from solitary to social is most pronounced in female meadow voles in the field and laboratory (Madison and McShea, 1987; Beery et al., 2009), only females of this species were used.” We initially planned to include males in oxytocin receptor assays, but when early results suggested males did not work harder to access familiar females, we retained later study males to test a two-choice variant of the social operant setup (instead of collecting their brains at the conclusion of this study). That pilot led us to conduct a new study using the two-choice operant apparatus (in review), and the tissues from that second study could enable a similar analysis in both male and female brains.

      Additional details have been added regarding which voles underwent extinction, and we have removed the genotype data from the manuscript as requested.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript by Gilbert et al., the authors found that MLC1 is required for postnatal maturation of perivascular astrocyte coverage. Through various detailed experiments, they further found that Mlc1 KO mice showed a number of defects, including the reduced VSMC contractility, neurovascular coupling and parenchymal CSF flow. The data is well presented, however there are several points that need to be addressed to strengthen the manuscript.

      1) Since many PvAP proteins showed the normal expression after P60 in Mlc1 KO mice, it is also possible that many of the phenotype that the authors presented, such as the reduced VSMC contractility, neurovascular coupling and parenchymal CSF flow, can be recovered after P60. This would be important as well to understand the prime pathological cause of megalencephalic leukoencephalopathy induced by MLC1 deletion.

      Although some protein levels are normal in Mlc1 KO mice, PvAP coverage and gliovascular unit morphology are altered at P60. We also now show (using TEM) that PvAPs are swollen in 1-year-old Mlc1 KO mice (the new Fig. S4) - indicating that edema develops progressively in Mlc1 KO mice. Myelin vacuolation starts at 3 months and progressively worsens (Dubey et al., 2015). Taken as a whole, these data show that MLC is a degenerative disease. Under these conditions, the recovery of gliovascular unit function after P60 is very unlikely.

      All the molecular morphological and functional changes in gliovascular unit described in our manuscript precede myelin degradation, which starts at the age of 3 months in Mlc1 KO mice (Dubey et al., 2015). Moreover, our previous study (Gilbert et al., 2019) demonstrated that MLC1 expression starts around P5 and that the MLC1/GlialCAM complex is only mature at P15. Taken together, these results strongly suggest that gliovascular unit alterations are the primary pathological events in MLC.

      Dubey M, Bugiani M, Ridder MC, Postma NL, Brouwers E, Polder E, Jacobs JG, Baayen JC, Klooster J, Kamermans M, et al. 2015. Mice with megalencephalic leukoencephalopathy with cysts: a developmental angle. Ann Neurol 77: 114-131.10.1002/ana.24307. Gilbert A, Vidal XE, Estevez R, Cohen-Salmon M, and Boulay AC. 2019. Postnatal development of the astrocyte perivascular MLC1/GlialCAM complex defines a temporal window for the gliovascular unit maturation. Brain Struct Funct 224: 1267-1278.10.1007/s00429-019-01832-w.

      2) Does CBF get differed only after neuronal stimulation in Mlc1 KO mice? It is unclear whether the basal CBF/neurovascular coupling level is disrupted as well in Mlc KO brains and how this defect is related to the reduced vasoconstriction in these mice.

      The baselines of the functional ultrasound experiments were aligned prior to stimulation. This technique measures the percentage increase in blood flow after neuronal stimulation (here, whisker movement) but does not measure the basal flow and does not enable one to distinguish between an abnormal basal cerebral blood flow (as suggested by the reduction of the arterial diameter) and the loss of vascular contractility - both of which probably contribute to the defect in neurovascular coupling.

      3) The reduced cohesiveness of PvAPs and the associated neuronal fibers to the vessel in Mlc1 KO brains should be validated with additional experimental approach.

      To strengthen our analysis, we now give the results of a parallel quantitative immunofluorescence analysis of purified brain vessels (presented in a new figure, Fig.5). The results show that part of the Aqp4 and NF-M perivascular immunolabeling is absent in the Mlc1 KO. Taken as a whole, our data demonstrate that PvAPs and the associated neuronal fibers (which normally remain attached to brain vessels during mechanical purification) are lost during the purification process in Mlc1 KO mice but not in the WT. In conclusion, the absence of MLC1 reduces the mechanical cohesiveness of PvAPs and the associated neuronal fibers.

      4) The defective polarity of astrocytes should be better described by using other markers other than GFAP. The distribution of Aquaporin4, Cx43 or several glutamate transporters in the specific compartment of astrocytes can be examined.

      GFAP is the marker typically used to analyze the astrocytes’ overall morphology and polarity. Nevertheless, we agree that it is of interest to study the molecular polarity of PvAPs. Indeed, morphological changes in the PvAPs and astrocytes and changes in polarity in Mlc1 KO might all influence the localization of molecules in PvAPs. To address this question, we performed a quantitative stimulated emission depletion (STED) analysis of protein localization in PvAPs. Our results indicate that the perivascular localization of aquaporin 4 was not affected. However, the density and size of Cx43 puncta were greater - indicating that the gap junctions in PvAPs are not organized in the same way in the Mlc1 KO as in the WT. This observation is consistent with our electron microscopy observations of perivascular astrocytic processes stacked on the top of each other and linked by extended gap junctions.

      We have also added results for Kir4.1, a potassium channel that is expressed preferentially in PvAPs. The Kir4.1 expression level in Mlc1 KO was lower at all stages of development, indicating that perivascular potassium homeostasis was probably perturbed. These results are interesting because (i) epilepsy is a significant component of megalencephalic leukoencephalopathy (Dubey et al., 2018; Yalcinkaya et al., 2003), and (ii) Kir4.1 deletion or downregulation is associated with greater susceptibility to epilepsy (Sibille et al., 2014). These points are now discussed.

      Dubey M, Brouwers E, Hamilton EMC, Stiedl O, Bugiani M, Koch H, Kole MHP, Boschert U, Wykes RC, Mansvelder HD, et al. 2018. Seizures and disturbed brain potassium dynamics in the leukodystrophy megalencephalic leukoencephalopathy with subcortical cysts. Ann Neurol 83: 636- 649.10.1002/ana.25190. Sibille J, Pannasch U, and Rouach N. 2014. Astroglial potassium clearance contributes to short-term plasticity of synaptically evoked currents at the tripartite synapse. J Physiol 592: 87- 102.jphysiol.2013.261735 [pii] 10.1113/jphysiol.2013.261735. Yalcinkaya C, Yuksel A, Comu S, Kilic G, Cokar O, and Dervent A. 2003. Epilepsy in vacuolating megalencephalic leukoencephalopathy with subcortical cysts. Seizure 12: 388-396.10.1016/s1059-1311(02)00350-3.

      5) The authors provide interesting observations such that the formation of perivascular astrocyte coverages is required for the dissociation of the contacts between neuronal components and the vessel during development. The authors need to discuss more about potential regulation and implication of this phenomenon.

      This is indeed a fascinating phenomenon. The postnatal period is also an intense synaptogenic phase in the mouse brain (Chung et al., 2015), during which astrocytes and neurons might compete for the perivascular space. In the absence of MLC1 and thus PvAPs, the neurons might expand into the free space. We now comment on this point.

      Chung WS, Allen NJ, and Eroglu C. 2015. Astrocytes Control Synapse Formation, Function, and Elimination. Cold Spring Harb Perspect Biol 7: a020370.cshperspect.a020370 [pii] 10.1101/cshperspect.a020370.

      6) It is interesting that DOTA-Gd tracer shows different traces in Mlc1 KO brains. However, it is unclear how MLC1 deletion affects glymphatic system. Does the tracer normally enter to the perivascular spaces in Mlc KO brains? Does the tracer leak out more from the perivascular spaces in Mlc1 KO mice? Is the general clearance or drainages of the tracer impaired in Mlc1 KO mice? Would these defects be originated by the reduced perivascular astrocyte coverage or the reduced vasoconstriction itself?

      Paravascular transport (as revealed by the injection of a tracer into the CSF) depends mainly on dispersion of the tracer in the subarachnoid space (SAS), the cisternae, and the parenchyma (including the interstitial and perivascular spaces). The uneven, slow dispersion of the tracer within the SAS (compared with dispersion in the blood) means that the tracer’s kinetics in the parenchyma are regiondependent. These differences can be accentuated by regional differences in the anatomy of the brain’s vasculature, i.e. the presence or absence of a perivascular space and the vessel’s topology. Lastly, the amount of DOTA-Gd available for diffusion within the parenchyma depends directly on its local concentration in the SAS. This can be seen on our contrast concentration maps (see Fig. 8), where the highest DOTA-Gd concentrations are found near the injection site (the cisterna magna), in line with previous reports (Iliff et al., 2012). In the Mlc1 KO model, dispersion of DOTA-Gd is presumably affected in the SAS and the parenchyma.

      With regard to tracer dispersion in the SAS and the cisternae, our anatomical MRI showed that the brain volume is greater in the Mlc1 KO mouse than in the WT (see Fig. 1). These variations in the geometry of the SAS may account for much of the difference between the Mlc1 KO mice and WT mice. Although tracer concentrations appear to be similar in the cerebellum (close to the injection site), they are much lower in the more distant septal area of Mlc1 KO mice - suggesting that tracer transport within the SAS is restricted.

      With regard to parenchymal dispersion, we showed that MLC1 is essential for the position of the astrocytes’ perivascular endfeet. Thus, in Mlc1 KO mice, the formation of the perivascular space (as a conduit for solute distribution) is likely to be deficient. This aspect is revealed by the slope of the tracer’s concentration-time curve, which indicate slower kinetics in Mlc1 KO mice; this might be due to poor integrity of the perivascular space. The higher volume of fluid in the Mlc1 KO parenchyma (reflected by the increased apparent diffusion coefficient (ADC); Fig. 1 and S1) might also be involved in this phenotype.

      The heart beat is the main driver of CSF circulation in the perivascular space (Iliff et al., 2013). The heart rate is very rapid and so the heart exerts a much greater driving force on the CSF than the vasodilation of the vessels induced by neuronal activity. Alterations in vascular contractility observed in Mlc1 KO mice might be involved in the impaired CSF flux but this is unlikely.

      All these points are now discussed in the revised version of the manuscript.

      Iliff JJ, Lee H, Yu M, Feng T, Logan J, Nedergaard M, and Benveniste H. 2013. Brain-wide pathway for waste clearance captured by contrast-enhanced MRI. Journal of Clinical Investigation 123: 1299-1309.10.1172/jci67677. Iliff JJ, Wang M, Liao Y, Plogg BA, Peng W, Gundersen GA, Benveniste H, Vates GE, Deane R, Goldman SA, et al. 2012. A paravascular pathway facilitates CSF flow through the brain parenchyma and the clearance of interstitial solutes, including amyloid beta. Sci Transl Med 4:147ra111.10.1126/scitranslmed.3003748.

      Reviewer #2 (Public Review):

      This very interesting manuscript by Gilbert and colleagues uncovers that the astrocyte specific membrane protein MLC1, the mutation of which causes a rare disease called megalencephalic leukoencephalopathy with subcortical Cysts (MLC), plays a fundamental role in the postnatal development of the gliovascular unit and the organization of the perivascular astrocyte processes, in particular. To reach this conclusion, the authors used an elegant multiscale approach including in vivo MRI, in vivo functional ultrasound, ex vivo analysis of vascular constriction, anatomical approaches at the light and electron microscopic level, and molecular characterization of the gliovascular unit from isolated microvessels. The manuscript is very well-written although it uses too many (unnecessary) abbreviations, which prevents a fluid reading of the manuscript, results are well illustrated and convincing and the discussion is reasonable.

      I have a major concern regarding the results reported in Figure 4D, which seem somewhat contradictory to those shown in Figure 6A-F. Indeed, the authors report in Figure 4D that there is less Neurofilament-M protein around isolated microvessels in MLC1 KO mice, whereas Figure 6A-F shows that these animals have more neuronal processes in contact with the vessels than in wiltypes. How can the authors explain this?

      The two situations are not comparable. On one hand, we observed the structure of the gliovascular unit in situ in fixed tissues. On the other, we mechanically purified microvessels. The detachment of astrocytic processes and associated neuronal fibers (linked to the mechanical dissociation of microvessels in the Mlc1 KO mouse) was clearly not counterbalanced by the presence of neuronal fibers contacting the vessels.

    1. Author Response:

      Reviewer #2 (Public Review):

      The study by Ranzano et al. set out to reveal if the spinal cord contain motor circuits that can support co-activation and co-inhibition of diverse flexor and extensor motor neuron pools in the mouse spinal cord. For this they use modified rabies virus in a mouse model and with a set-up that will allow selective mono-synaptically restricted labeling of premotor neurons projecting to functional synergist or antagonist ankle motor neuron pools along the entire spinal cord. They show that a minor percentage of premotor motor neurons projecting to either synergist or antagonist pair of motor neuron pools diverge. Divergent premotor neurons were seen both close and rostral to the target lumbar motor neuron pools, but with an increased proportion with distance from the lumbar cord. In the cervical spinal cord the largest proportion of divergent neurons where commissural excitatory neurons with molecular characteristics of the V0 class. The study provide important, new and convincing data on the spinal anatomical landscape of distributed motor networks that may coordinate synergistic activity as well as mediate co-contraction of antagonist muscles across multiple motor pools in the same limb or across limbs. Overall the claims are well supported by the data. Some aspects of methods could need clarification and some aspects of the claims are weakened by lack of identification of premotor neuron populations. The discussion of the data could perhaps be made stronger by linking the present data to functional studies.

      1) Transsynaptic method. The authors use a different model for trans-synaptic tracing than most previous studies in the spinal cord: namely the RGT mouse line crossed with ChAT-cre mice and combined with a retrograde labelling of motor neurons. The distribution of transsynaptic flexor and extensor related premotor neurons in this model is different from previously reported. Data for this are presented in (Ronzano et al. 2021 BioRxiv). But it will be useful to mention this here as well.

      In this manuscript we show maps of single and double labelled neurons throughout the thoracic and cervical cord, for which, to the best of our knowledge, there are no published studies. Within the lumbar cord, we have focused only on double labelled neurons and we refer to our preprint for maps of single labelled interneurons. The issue of the different distributions obtained using different tracing methods is currently being addressed more extensively in the work that will follow that preprint. We feel that its details (some of them contentious with existing literature) would deviate the attention from the main point of the current manuscript, which is the presence and distribution of divergent premotor interneurons.

      The authors discuss why the labeling is not caused by a second jump from V0C neurons or motor neurons labelled by collaterals. Another course of contamination is at the level of the muscle. All injections are done in newborn mice with tiny muscle. It would be useful to know how the authors secured that there is no viral spread peripherally.

      A paragraph describing how we assured that there was no viral spread to other muscle(s) than the targeted one has been added in the methods. Briefly, before processing the tissue, the injected leg was dissected and muscles below and above the knees were examined under a fluorescence microscope to make sure the virus had not spread to any of the adjacent muscles. This was particularly critical for injection of synergist muscles, that are located side by side along the leg and are not in separate anatomical compartments separated by the fibula, as is the case for the antagonist GS and TA.

      2) Identity of neurons. A limitation of the study is that there is no transmitter phenotyping of the divergent premotor neurons in the lumbar and thoracic region. The divergent neurons can be either excitatory or inhibitory and cause coactivation or co-inhibition, respectively of synergist or antagonists. Except for the cervical CNs there is no evidence for transmitter-phenotype in the data. Perhaps the authors could just mention that this would have required in situ hybridization in the double injected animals or a third color RabV togheter with the GlyT2-GFP mice. The identification of V0 neruons interneurons is suboptimal. The use of specific AB (Evx) for the V0 population could have provided a better characterization (see Crone et al. 2008). Maybe also mention that the commissural neurons could be SIM1.

      We agree with the referee that the neurotransmitter phenotype is a crucial point. Our data show that while the cervical divergent interneurons are certainly not glycinergic or cholinergic (and therefore presumably excitatory), many of the thoracic and lumbar divergent interneurons are indeed inhibitory. Detailed analysis of divergent interneurons by phenotype would have required an a priori different study design and we agree with the reviewer #1 and #3 comment that the present work will drive further studies aimed at understanding the nature and function of divergent interneurons. In the new supplementary figures, we show examples of both glutamatergic and non-glutamatergic terminals apposed to motoneurons in the lumbar and thoracic region to highlight the finding that divergent premotor interneurons can be inhibitory or excitatory (as well as cholinergic, see new supplementary figure S3 and S6, in agreement with Stepien et al, 2011). This is now explicitly mentioned in the corresponding text (line 206-208). We agree that the identification of cervical divergent interneurons as belonging to the V0 family is not definitive. We have made a further attempt at identification by using an Evx1 antibody, as suggested by both reviewer #1 and #2, but unfortunately the downregulation of Evx1 prevented reliable labelling in test tissue of the same age as the one used for the paper, a problem we had also using the Lhx1 antibody, though to a lesser extent (see Supplementary figure S7). Another approach would be to use genetic labelling by crossing the ChAT-Cre mice with a (non-cre) reporter line for V0 interneurons and perform injections in their progeny with the RGT mice. This would require acquiring/rederiving a colony of reporter mice and performing the experiments on third generation animals. Also, the breeding of the RGT mouse line that we had at UCL had to be suspended during the closure of the labs due to the pandemic, and they would have to be re-derived from frozen embryos. Thus, these would be prohibitively long experiments. We agree that we cannot conclude that the cervical divergent interneurons belong to the V0 class and, in line with the similar concern of Reviewer 1 we have further toned down the part of our manuscript related to the identification of descending interneurons (see above).

      3) The discussion is insightful but should perhaps link the data more directly to functional studies for example by considering how synergies are bound together across limb during locomotion which could both involve co-activation of synergies or co-inhibition of antagonists from commissural neurons (see Bellardita and Kiehn 2015; Butt and Kiehn 2003) or ipsilateral neurons (Levine et al. 2014; among others). It would also be useful to discuss how co-inhibition of synergists leads to functional movements.

      As mentioned also by the other reviewers, this manuscript does not have any functional data. Determining the function of divergent interneurons is our next project, one that could not have been conceived without the findings we present here. Co-inhibition of synergists (Ia inhibitory interneurons for instance) is a well described phenomenon. On the other hand, co-inhibition of antagonists has received little to no attention. We could speculate on its functional role, for instance in movements in which limbs need to passively follow inertia (arm follow-through after the throw of an object, or foot movements during flamenco dancing) rather than maintain fine control of joint angles, but we feel that in the absence of functional data, such considerations would add too much speculation to this manuscript.

      Reviewer #3 (Public Review):

      The study by Ronzano et al uses monosynaptically-restricted transsynaptic labeling to reveal populations of "divergent" premotor interneurons, neurons that are presynaptic to multiple motor neuron pools. Various pairings of muscle injections are analyzed, including hindlimb synergists and antagonists, and hindlimb-forelimb combinations. They show that divergent neurons innervating both synergist and antagonist motor pools are similarly located and found in similar numbers. These neurons exist throughout the cord, in decreasing number but higher proportions of the total labeled neurons with more rostral locations (lumbar to thoracic to cervical cord). A subset of the population of the long descending divergent neurons is identified as part of the V0 class, with some similarly located (and possibly overlapping) neurons projecting the forelimb muscles in addition to hindlimb muscles. Other studies have shown distributions of premotor interneurons from single motor neuron pools. The novelty here is the focus on interneurons with divergent innervations of multiple motor pools.

      One of the major differences (and advantages) of this study from prior muscle injections of transsynaptic virus is that rather than co-injecting the AAV containing the G glycoprotein, necessary for transsynaptic transfer, into the muscle with the modified Rabies virus, they genetically specify it to motor neurons (and other cholinergic neurons) using Chat-Cre. The advantage of this is that the potential confounds of transfer via the sensory neuron and/or selective AAV transfection/G expression by a particular subpopulation of motor neurons is removed. Although the possibility of 'double jumps' (MNs to ChAT interneurons to upstream neurons) is possible, it is less likely to occur in sufficient numbers in the time of the experiment.

      The data presented are comprehensive for the motor pools examined. The location analyses are extensive. Divergent neurons demonstrated by the dual virus strategy are further supported by the demonstration of terminals of hindlimb premotor interneurons synapsing on thoracic and cervical motor neurons. The manuscript is well written. Overall, data are clearly presented and limitations are fully discussed. This study can form the basis for future studies regarding specific identity and function of the divergent populations. Main limitations of the study are related to the tracing technique used. The authors are fully transparent about these limitations in the Discussion. As pointed out, this technique is not optimal for quantifying double labeled neurons. Conclusions regarding the existence and location of neurons projecting to at least two motor pools can be made. However, these are likely to be severely underestimated and it is not possible to determine if these neurons are more broadly presynaptic to other motor pools in the limb (or beyond). The reduced efficiency of infection by two viruses due to viral interference is mentioned and relevant sources demonstrating the limitation are cited. Therefore, the data are solid but the functionally-related interpretations and conclusions are somewhat limited and speculative. The other related limitation inherent in the technique is the efficiency of transfer of even a single virus. Analysis is presented regarding a comparison with a more efficient virus, suggesting transsynaptic efficiency may be ~25%, but this does not fully get at the issue. The efficiency of the starter or seed cell labeling is not mentioned. Quantification of motor neurons taking up the RabV would be helpful as this will be directly related to the potential number of presynaptic neurons. This is especially crucial with the forelimb injections, in which 0-6 muscles were injected.

      The MNs infected from the injection have been quantified in every second section across the lumbar cord; these data are now reported in table S3, together with the number of infected interneurons. However, these numbers are most likely to be underestimated due to the toxicity of RabV. For the forelimb injections, the aim of the experiment was to address whether at least some of long descending premotor interneurons were also premotor to any of the forelimb muscles. We therefore performed the injections without aiming at muscle selectivity (these are much smaller muscles), but at widespread infection from multiple muscles. As a consequence, we did not count and map cervical motoneurons or interneurons, since this measure would not have been a reflection of the premotor interneuron population at the level of the single muscle, contrary to what we achieved for the hindlimbs injections.

      The percentage of divergent interneurons is also underestimated in that the denominator is single labeled neurons presynaptic to either motor neuron pool. Information may be gained by determining whether different portions of premotor neurons to one over another are more likely to be divergent. This is particularly the case with antagonist injections but also for comparisons of relative proportions of dual labeled neurons premotor to synergists and antagonists. This would likely need to be combined or controlled by the percent (or number) of motor neurons from each pool that are labeled to indicate potential differences in starter cells as mentioned above. Counterbalancing is mentioned in the Methods but it is not clear that is fully possible with an n=2 or 3.

      As shown in our supplementary figure 10, working out the proportion of double infected neurons is affected by many confounding factors, not only the (unknown) efficiency of transsynaptic labelling and the viral interference, but also the inevitable differences in the number of starter cells within and across experiments. It is difficult to draw conclusions with so many underlying unknowns, and we agree that the confounding factors would cause an underestimate of the level of divergence, In this new version of the manuscript, we have added Venn diagrams (Figure S1-3.1C-D, S1-3.2C-D, S1-3.3C-E) with the raw numbers of single and double infected cells per muscle in each experiment. Note that despite the large (up to 5-fold) differences in the number of single infected neurons, the proportion of divergent cells is very similar across experiments and muscle pairs.

    1. Author Response:

      Reviewer #1 (Public Review):

      This research follows up on prior work showing that human visual pattern discriminability is closely related to the statistical features of natural scenes. The present work developed a behavioral choice paradigm to test whether rats could discriminate between patterns, and then measured their sensitivity to different spatial correlation structures. This allowed them to test whether rats possess the same sensitivity to spatial correlation patterns as had been observed in human psychophysical experiments. The experiments found that the ordering of the sensitivity to spatial correlation patterns matched that measured in humans and follows the frequency of different structures in natural scenes, in accordance with an efficient coding hypothesis.

      This work has a strong theoretical grounding. The behavioral experiments are well executed and the results show convincingly that rat behavior follows the same pattern as measured in humans. One strength of this data is that is shows that the order of presenting of these patterns during training did not matter for the eventual relative sensitivity measured in the rats.

      This research opens up the opportunity to test whether the correlational sensitivities can be altered by changing visual environments, and if so, what neural substrates might be plastic in these cases.

      We thank the reviewer for their enthusiastic support. In our revised manuscript, we have redesigned Figure 4 (i.e., our former Figure 3) to highlight even better the fact that the relative sensitivity to different patterns was largely independent of the training of the animal.

      Reviewer #2 (Public Review):

      Caramellino et al., investigated whether rat sensitivity to multipoint correlations show a similar rank order as observed in humans. They show that rat sensitivity to multipoint correlations exhibits a rank order similar to what was shown in Hermundstad et al., 2014. Interestingly, they also show that such rank order is robust within-group and within-subject. The authors further claim that this similarity indicates that rat sensitivity to multipoint correlations follows efficient coding of natural scenes.

      The main conclusion of the paper is mostly supported by the data. However, the presentation of results may benefit from some points of clarification:

      1) In Hermundstad et al., 2014, the degree of variation in images themselves (Figure 1E) as well as perceptual sensitivity comparison (Figure 2D and 3A) was done among: 2nd-order horizontal/vertical edges, 2nd-order diagonal edges,-shaped 3rd-order correlations, and 4th-order correlations. However, the comparisons here are: first-order, all 2nd-order correlations including ALL horizontal/vertical/diagonal edges, L-shape 3rd-order correlations, and 4th-order correlations. It is unclear how these two rank-order results are parallel given that Hermundstad et al., 2014 did not include 1st-order at all.

      First of all, let us clarify some confusion that seems to exist about the identity of the texture patterns that we tested in the experiment. We did not, as the Reviewer seems to imply, test (and pool over) ALL 2nd order and ALL 3rd order correlations. What we did was to choose one pattern for each order, namely the horizontal 2-point pattern and the “bottom right pointing” 3-point pattern, to build our stimulus set. We apologize for not explaining this clearly enough in the initial version of our text. We now state this more explicitly (lines 88-114), and we discuss at length the rationale behind our choice of experimental patterns (beyond the aforementioned passage in the Results, also in the Discussion, lines 190-256, and the Methods, lines 304-333).

      Regarding the inclusion of the 1st order statistic, is true that it was only studied in Victor and Conte 2012, which only contained phychophysics data, and not in Hermundstad et al 2014, which connected psychophysics with natural image statistics. Indeed, it is not possible to analyze the variability of γ in natural images with the method established by Hermundstad et al 2014, because each image is binarized in such a way to guarantee that γ=0 by construction. In this sense, like the use of qualitative ranking discussed more at length below, gamma was included to better reflect the approach in Victor and Conte. Moreover, we wanted to include a sensory stimulus condition that we were sure the animals could detect well, in order to ensure that any failure to learn or perform the task was due to limitations in sensory processing and not in the learning or decision-making process. Before performing our experiments, the only statistic that we were confident the rats could be trained to distinguish from noise was gamma [Tafazoli et al 2017, Vascon et al 2019], and therefore it made sense to include it in the experimental design. We have modified the Results (lines 90-93, 104-108), the Methods (316-321) and the Discussion (212-218) to express this point more clearly.

      2) In Hermundstad et al., 2014, the paper emphasized that the difference of perceptual sensitivity between horizontal/vertical edges and diagonal edges is not merely an "oblique effect": Horizontal and vertical pairwise correlation share an edge, while pixels involved in diagonal pairwise correlation only share a corner. One wonders whether rats show any sensitivity difference between horizontal/vertical edges and diagonal edges. The manuscript in its current form misses this important comparison. Without showing this, the rat sensitivity does not fully reproduce the trend previously observed in humans.

      When designing our experiment, we prioritized collecting data for the other statistics as they were closer to the extremes of the measured sensitivity values, therefore offering a clearer signal for a comparison with rat data. For instance, had we found better sensitivity to 3- or 4-point statistics than to (horizontal) 2-point statistics, this would have been a very clear sign that perceptual sensitivity in rat is organized differently than in humans. Conversely, we reasoned that a comparison based on 2-point diagonal instead of 2-point horizontal would have been more easily muddled and made inconclusive by the experimental noise that we expected to observe in rats. We agree that, given the high precision of the quantitative match between rats, humans and image statistics now highlighted by the new Fig. 3, it would be interesting to test rats also for their sensitivity to diagonal 2-point correlations and check whether they matched the pattern exhibited by humans. However, as the editor rightly surmises, acquiring new data at this stage would indeed be exceedingly time consuming. Therefore, we have modified the text to better highlight that we did not seek to replicate this particular result in Hermundstad et al 2014 (as well as that we could not test as many correlation patterns as in Hermundstad et al 2014 more generally, due to practical and ethical constraints). We also note that, since we did not test 2-point diagonal, we can’t draw conclusions similar to those in Hermundstad 2014 about the difference of an effect due to efficient coding and one due to a hypothetical oblique effect for the specific 2-point horizontal vs. diagonal comparison. These points are now all brought up in the Discussion of our revised manuscript (lines 189-208). It is also worth noting that the oblique effect was a minor point of the Hermundstad et al. paper and the main arguments did not hinge on it.

      3) Combining 1) and 2), it is unclear why the ranking in rat sensitivity is evidence for efficient coding. In Hermundstad et al., 2014, efficient coding was established by comparing the image-based precision matrix with the human perceptual isodiscrimination contours. There is no such comparison here.

      4) If the authors would like to hypothesize that the rat sensitivity shows efficient coding simply because its ranking is similar to humans, more needs to be done to shore up the quantitative comparison between the two.

      In response to points 3 and 4: We thank the reviewer for underscoring the difference between, on the one hand, a quantitative comparison of the sensitivity to the variance of the statistics in natural images, and on the other hand a more qualitative comparison of their rank ordering. Besides our answers to point 1 and 2 above, we wish now to address specifically this important distinction. In our initial submission, we built our argument based on the rankings in order to better connect not only with Hermundstad et al 2014, but also with earlier human psychophysics results on the same task (Victor and Conte 2012), where there was no comparison with natural image statistics and therefore only the qualitative ranking among sensitivities was examined. We also note that Hermundstad et al do, in fact, make ample use of the rank-ordering agreement between natural image statistics and human sensitivity in order to support their argument (“rank-order”, or similar locutions, are used six times between the results and the discussion). In this sense, while it is true that “In Hermundstad et al., 2014, efficient coding was established by comparing the image-based precision matrix with the human perceptual isodiscrimination contours”, it is also true that the rank ordering was presented as part of the evidence for efficient coding.

      Having said this, we nevertheless agree that our argument can be strengthened by presenting both approaches, qualitative and quantitative. We have now added a new figure (Fig. 3), where we compare our estimates of psychophysical sensitivity in rats with the corresponding values for human psychophysics and natural image statistics reported in Hermunstad 2014 (note that we were only able to compare three out of the four statistics that we tested, because – as the reviewer themselves noted in a previous comment – Hermunstad et al. did not consider 1-point correlations). The comparisons in Fig. 3 (and the related quantitative measures reported in the text, lines 146-160) reveal a strong quantitative match, similar to that between the human psychophysics and the image statistic data.

    1. Author Response:

      Reviewer #1 (Public Review):

      Gastfriend et al present an analysis of the properties and gene expression patterns elicited by pharmacologically activating the Wnt signaling pathway in human pluripotent stem cells (hPSCs) that have been cultured under conditions that favor their differentiation to naïve endothelial cell (EC) progenitors. The result of Wnt activation is partial induction of blood-brain barrier-like properties, as judged by immunostaining for several well characterized marker proteins, trans-epithelial resistance, and RNAseq. Most of the experiments, and the largest effects, were obtained with CHIR99021, a small molecular weight inhibitor of GSK3-beta, the kinase that phosphorylates beta-catenin, leading to its ubiquitinylation and proteosomal degradation. Interestingly, low-passage ("naïve") hPSC-derived ECs were more susceptible to CHIR99021-induced BBB-like conversion than higher passage ("mature") hPSC-derived ECs. The experimental data is of uniformly high quality.

      By way of context, several publications describe the use of hPSC or other starting cell sources for the generation of ECs with BBB-like properties, and several have described the BBB-enhancing effects of activating different signaling pathways (retinoic acid, TGF-beta, Wnt). The effect on BBB properties of manipulations that would be predicted to increase Wnt signaling varied among published studies - it was detectable in some studies (e.g. Paolinelli et al., 2013, Laksitorini et al., 2019) and it was undetectable in another (Sabbagh and Nathans, 2020). The present work adds to this literature and presents additional information on this attractive experimental system for dissecting Wnt signaling and potentially other signaling systems that affect the BBB-like differentiation of CNS ECs.

      One potential question raised by the present study is the interpretation of the response to CHIR99021. GSK3-beta has many substrates, not just beta-catenin. Could the bioactivity of CHIR99021 for the BBBlike conversion reflect a combination of beta-catenin stabilization and reduced phosphorylation of other GSK3-beta substrates?

      We thank the reviewer for this important comment and have addressed this possibility with expanded analysis of RNA-seq data and expanded discussion.

      A minor point on page 8, lines 180-185. It might be appropriate to note that neural-rosette-conditioned and astrocyte-conditioned media may contain factors in addition to Wnt7a.

      Because we have removed the neural rosette- and astrocyte-CM approaches from the Results section of the manuscript, we have instead included the following statement in the Discussion:

      “Importantly, neural progenitor cells and astrocytes likely would also contribute other yet-unidentified ligands important for acquisition of CNS EC phenotype.”

    1. Author Response:

      Reviewer #1:

      Chen et al. trained male and female animals on an explore/exploit (2-armed bandit) task. Despite similar levels of accuracy in these animals, authors report higher levels of exploration in males than in females. The patterns of exploration were analyzed in fine-grained detail: males are less likely to stop exploring once exploring is initiated, whereas female mice stop exploring once they learn. Authors find that both learning rate (alpha) and noise parameter (beta) increase in exploration trials in a hidden Markov model (HMM). When reinforcement learning (RL) models were fitted to animal data, they report females had a higher learning rate and over days of testing, suggesting higher meta-learning in females. They also report that of the RL models they fit, the model incorporating a choice kernel updating rule was found to fit both male and female learning. The results do suggest one should pay greater attention to the influence of sex in learning and exploration. Another important takeaway from this study is that similar levels of accuracy do not imply similar strategies. Essential revisions include a request to show more primary behavioral data, to provide a rationale for the different RL models and their parameters, to clarify the difference between learning and 'steady state,' and to qualify how these experiments uniquely identify latent cognitive variables not previously explored with similar methods.

      We appreciate the reviewer’s thorough reading of the paper and hope that the changes we detail below will address these concerns.

      Reviewer #2:

      The authors investigated sex differences in explore-exploit tradeoff using a drifting binary bandit task in rodents. The authors tried to claim that males and females use different means to achieve similar levels of accuracy in making explore-exploit decisions. In particular, they argue that females explore less but learn more quickly during exploration. The topic is very interesting, but I am not yet convinced on the conclusions.

      Here are my major points:

      1) This paper showed that males explore more than females, and through computational modeling, they showed that females have a higher learning rate compared to males. The fact that males explore more and have lower learning rates compare to females, can be an interesting finding as the paper tried to claim, but it can also be that female rats simply learn the task better than male rats in the task used.

      We have revised the manuscript to better demonstrate that male mice did not acquire fewer rewards than females, and included all analyses and plots requested in this review. Ultimately, there was no evidence that they learned the task any less well than the females did. We appreciated this comment because it has strengthened the evidence we were able to present that males and females take different paths to the same outcome. Completing these analyses has also allowed us to clarify the relationship between RL learning rates and performance in this classic dynamic decision-making task.

      (a) First, from Figure 1B, it looks like p(reward, chance) are similar between sex, but visually the female rats' performances, p(reward, obtained), look slight better than males. It would be nice if the authors could show a bar plot comparison like in Figure 1C and 1E. A non-significant test here only fails to show sex differences in performance, but it cannot be concluded that there are no sex differences in performance here. Further evidence needs to be reported here to help readers see whether there are qualitative differences in performances at all.

      The requested bar plot has been added in as Figure 1C and illustrates our central point: male mice did not acquire fewer rewards than females, so there is no evidence that they learned the task any less well than the females did. The t-test result we originally reported suggests that we can discard the hypothesis that males and females have different mean levels of percent reward obtained, but we take the reviewer’s point that the male and female distributions may differ in other, more subtle ways. Therefore, we conducted a better statistical test here. The Kolmogorov-Smirnov (KS) test takes into account not only the means of the distributions but also the shapes of the distributions. The null hypothesis is that both groups were sampled from populations with identical distributions. It tests for any violation of that null hypothesis -- different medians, different variances, or different distributions. The KS test suggested that males and females are not just not significantly different in their reward acquisition performance (Kolmogorov-Smirnov D = 0.1875, p = 0.94), but that males and females have the same distribution of performance.

      New text from the manuscript (page 5, line 119-128):

      “There was no significant sex difference in the probability of rewards acquired above chance (Figure 1C, main effect of sex, F(1, 30) = 0.05, p = 0.83). While the mean of percent reward obtained did not differ across sexes, we consider the possibility that the distribution of reward acquisition in males and females might be different. We conducted the Kolmogorov-Smirnov (KS) test, which takes into account not only the means of the distributions but also the shapes of the distributions. The KS test suggested that males and females are not just not significantly different in their reward acquisition performance (Kolmogorov-Smirnov D = 0.1875, p = 0.94), but that males and females have the same distributions for reward acquisition. This result demonstrates equivalently strong understanding and performance of the task in both males and females.”

      (b) The exploration and exploitation states are defined by fitting a hidden Markov model. In the exploration phase, the agent chooses left and right randomly. From Figure 1E and 1F, it looks like for male rats, they choose completely randomly 70% of the times (around 50% for females). The exploration state here is confounded with the state of pure guessing (poor performance).

      This comment seems to confuse our descriptive HMM with a generative model. The HMM does not imply that choices are being made randomly. Instead, exploratory choices are modeled as a uniform distribution over choices. This was done only because this is the maximum entropy distribution for a categorical variable -- the distribution that makes the fewest assumptions about the true underlying distribution and thus does not bias the model towards or away from any particular pattern of choices during exploration. For example, (Ebitz et al., 2019) have shown that the HMM can recover periods of exploration that are highly structured and information- maximizing, despite being modeled in exactly this way.

      Because the model does not imply or require that exploratory choices are random, we could, in the future, ask whether these choices reflect random exploration or instead more directed forms of exploration. However, for various reasons, this task is not the ideal testbed for isolating random and directed exploration, though this is a direction we hope to go in the future. To clarify our model and address these issues for future research, we have added the following text (page 31, line 745-756):

      “The emissions model for the explore state was uniform across the options. The emissions model for the explore state was uniform across the options:

      This is simply the maximum entropy distribution for a categorical variable - the distribution that makes the fewest number of assumptions about the true distribution and thus does not bias the model towards or away from any particular type of high-entropy choice period. This doesn’t require, imply, impose, or exclude that decision-making happening under exploration is random. Ebitz et al. 2019 have shown that exploration was highly structured and information-maximizing, despite being modeled as a uniform distribution over choices (Ebitz et al., 2020, 2019). Because exploitation involves repeated sampling of each option, exploit states only permitted choice emissions that matched one option.”

      (c) Figure 2 basically says that you can choose randomly for two reasons, to be more "noisy" in your decisions (have a higher temperature term), or to ignore the values more (by having a learning rate of 0, you are just guessing). It would be nice to show a simulation of p(reward, obtained) by learning rate x inverse temperature (like in Figure 2C). From Figure 2B, it looks like higher learning rates means better value learning in this task. It seems to me that it's more likely the male rats simply learn the task more poorly and behave more randomly which show up as more exploration in the HMM model.

      This is an important comment and addressing it gave us a chance to show the complicated, nonlinear relationship between learning rate term and performance in this task. Per the reviewer’s request, we now include a plot showing how learning rate (ɑ) and inverse temperature (β)affect reward acquisition (Figure 3F). However, this figure demonstrates that higher learning rate does not mean better performance in this task. Performing well in this task requires both the ability to learn new information and the ability to hang onto the information that has already been learned. That can only happen when learning rates are moderate, not maximal. When the learning rate is maximal, behavior is reduced to a win-stay lose-shift policy, where only the outcome of the previous trial is taken into account for choice. This actually results in a lower percent of the reward obtained. We have addressed the difference between the learning rate parameter in the reinforcement learning (RL) model and actual learning performance in the comment above. We believe that this new figure illustrates an essential point that different strategies could result in the same learning performance.

      This result shows that the male strategy was a valid one that doesn’t perform worse than the female strategy. Not only did they have identical performance (Figure 1C), but their optimized RL parameters put them both within the same predicted performance gradient in this new plot (Figure 3F). That’s exactly why we believe that it is important to understand differences in how individuals approach the same task, even as they may achieve the same overall levels of performance.

      New text from the manuscript (page 14, line 368-385):

      “While females had significantly higher learning rate (α) than males, they did not obtain more rewards than males. This is because the learning rate parameter in an RL model does not equate to the learning performance, which is better measured by the number of rewards obtained. The learning rate parameter reflects the rate of value updating from past outcomes. Performing well in this task requires both the ability to learn new information and the ability to hang onto the previously learned information. That occurs when the learning rate is moderate but not maximal. When the learning rate is maximal (α = 1), only the outcome of the immediate past trial is taken into account for the current choice. This essentially reduces the strategy to a win-stay lose-shift strategy, where choice is fully dependent on the previous outcome. A higher learning rate in a RL model does not translate to better reward acquisition performance. To illustrate that different combinations of learning rate and decision noise can result in the same reward acquisition performance. We conducted computer simulations of 10,000 RL agents defined by different combinations of learning rate (α) and inverse temperature (β) and plotted their reward acquisition performance for the restless bandit task (Figure 3F). This figure demonstrates that 1) different learning rate and inverse temperature combinations can result in similar performance, 2) the optimal reward acquisition is achieved when learning rate is moderate. This result suggested that not only did males and females had identical performance, their optimized RL parameters put them both within the same predicted performance gradient in this plot.”

      (d) From figure 3E, it looks like female rats learn better across days but male rats do not, but I am not sure. If you plot p(reward, obtained) vs times(days), do you see an improvement in female rats as opposed to males? Figure 4 also showed that females show more win-stay-lose-shift behavior and use past information more, both are indicators of better learning in this task.

      Taken the above together, I am not convinced about the strategic sex differences in exploration, it looks more like that the female rats simply learn better in this task.

      Unfortunately, there was no change in performance across days in either males or females. Per request by the reviewer, we now included a new plot illustrating p (reward,obtained) over days in Supplemental Figure 1. Ultimately, this resonated with the points we clarified above and demonstrated in this figure: males and females had identical performance in this task.

      To the other points raised here, about sex differences in win-stay lose-shift and mutual information: these are the strategic differences at the heart of the paper, but again did not alter overall performance for the reasons detailed above. Figure 4 did show that females were doing more win-stay. However, after further examining win-stay behavior by explore-exploit states, we found that females were only doing more win stay during exploratory trials (Figure 5E). There was no difference in win-stay during the exploitative trials. Figure 5F also demonstrated that females did more win-stay lose- shift in the exploration state, indicating that females only learned better during exploration. Although males learned slower during exploration, they compensated that by exploring for longer. Both male and female strategies are equally effective and may be differentially advantageous in different tasks.

      Finally, to address the meta-learning: in developing our response to this comment and looking for any other signs of adaptation across days (sex differenced or not), we did revisit this results and decided to rewrite some passages to be more circumscribed about our interpretations. Figure 3E showed increased learning rate parameters across days in females. We were initially excited about this idea of meta-learning, however we find no other evidence of adaptation over time in multiple behavioral measures, including reward acquisition, response time, and retrieval time (Supplemental Figure 1). Changes in learning rate parameters over sessions from the RL model were marginally significant and we feel that it’s worth mentioning for completeness, but it was only a small contributor to the overall sex differences in the behavioral profile. As a result we have toned down the conclusion we drew from this result accordingly.

      New text from the manuscript (page 4, line 93-113):

      “It is worth noting that unlike other versions of bandit tasks such as the reversal learning task, in the restless bandit task, animals were encouraged to continuously learn about the most rewarding choice(s). There is no asymptotic performance during the task because the reward probability of each choice constantly changes. The performance is best measured by the amount of obtained reward. Prior to data collection, both male and female mice had learned to perform this task in the touchscreen operant chamber. To examine whether mice had learned the task, we first calculated the average probability of reward acquisition across sessions in males and females (Supplemental Figure 1A). There was no significant changes in the reward acquisition performance across sessions in both sexes, demonstrating that both males and females have learned to perform the task and had reached an asymptotic level of performance across sessions (two-way repeated measure ANOVA, main effect of session, p = 0.71). Then we examine two other primary behavioral metrics across sessions that are associated with learning: response time and reward retrieval time (Supplemental Figure 1B, C). Response time was calculated as the time elapsed between the display onset and the time when the nose poke response was completed. Reward retrieval time was measured as the time elapsed between nose-poke response and magazine entry for reward collection. There was no significant change in response time (two-way repeated measure ANOVA, main effect of session, p = 0.39) and reward retrieval time (main effect of session, p = 0.71) across sessions in both sexes, which again demonstrated that both sexes have learned how to perform the task. Since both sexes have learned to perform the task prior to data collection, variabilities in task performance are results of how animals learned and adapted their choices in response to the changing reward contingencies.”

      page 14, line 386-390:

      “One interesting finding is that, when compared learning rate across sessions within sex, females, but not males, showed increased learning rate over experience with task (Figure 3G, repeated measures ANOVA, female: main effect of time, F (2.26,33.97) = 5.27, p = 0.008; male: main effect of time, F(2.5,37.52) = 0.23, p = 0.84). This points to potential sex differences in meta-learning that could contribute to the differential strategies across sexes.”

      2) I do like how the authors define exploration states vs exploitation states via HMM using choices alone. It would be interesting to see how the sex differences in reaction time are modulated by exploration vs exploitation state. As the authors showed, RT in exploration state is longer. Hence, it would make a conceptual difference whether the sex difference in reaction times is due to different proportions of time spent on exploration vs exploitation across sex.

      That is a very interesting idea. We tested for this possibility by calculating a two-way ANOVA (with interaction) between explore-exploit state and sex in predicting RT. There was a significant main effect of state (RT is longer in explore state than exploit state, main effect of state: F (1,30) = 13.07, p = 0.0011), but males were slower during females during both exploitation and exploration (main effect of sex, F(1,30) = 14.15, p = 0.0007) and there was no significant interaction (F (1,30) = 0.279, P = 0.60). Unfortunately, this means that we cannot interpret the response time difference between males and females as a consequence of the greater male tendency to explore. Response time is a fairly noisy primary behavior metric, especially in the males, and a lot of other factors might be at play here, some of which we plan to follow up on in the future. We report this result as follows (page 10, line 248-254):

      “Since males had more exploratory trials, which took longer, we tested the possibility that the sex difference in response time was due to prolonged exploration in male by calculating a two- way ANOVA between explore-exploit state and sex in predicting response time. There was a significant main effect of state (main effect of state: F (1,30) = 13.07, p = 0.0011), but males were slower during females during both exploitation and exploration (main effect of sex, F(1,30) = 14.15, p = 0.0007) and there was no significant interaction (F (1,30) = 0.279, P = 0.60).”

      Reviewer #3:

      In the manuscript 'Sex differences in learning from exploration', Chen and colleagues investigated sex differences in decision making behavior during a two-armed spatial restless bandit task. Sex differences and exploration dysregulation has been observed in various neuropsychiatric disorders. Yet, it has been unclear whether sex differences in exploration and exploitation contributes to sex-linked vulnerabilities in neuropsychiatric disorders.

      Chen and colleagues applied comprehensive modeling (model free Hidden Markov model (HMM), and various reinforcement learning (RL) models) and behavioral analysis (analysis of choice behavior using the latent variables extracted from HMM), to answer this question. They found that male mice explored more than female mice and were more likely to spend an extended period of their time exploring before committing to a favored choice. In contrast, female mice were more likely to show elevated learning during the exploratory period, making exploration more efficient and allowing them to start exploiting a favored choice earlier.

      Overall, I find the question studied in this work interesting, and compelling. Also, the results were convincing and the analysis through. However, assumptions in the proposed HMM is not fully justified and additional analyses are needed to strengthen authors' claims. To be more specific, the effect of obtained reward on state transitions, and biased exploitations should be further explored.

      Thank you for your feedback. We have included two more complex versions of the Hidden Markov models (HMMs) that account for the effect of obtained reward on state transitions and biased exploitations. Although the additional parameters slightly improve the model fit, model comparison tests suggested that such improvement was not significant. We decided to use the original HMM from the original manuscript because it’s the simplest and best fit model that provides the best parameter estimation with the amount of data we have. We do appreciate the comments and believe that the inclusion of two new HMMs and justification of the original HMM has strengthened our claims.

    1. Author Response:

      Reviewer #2 (Public Review):

      1. Presentation, analysis, and discussion of calcium imaging results

      a) As the authors correctly pointed out, having a water control is indeed essential for interpreting calcium imaging results. As such, I would recommend having a water control panel (currently in Figure S3) in the main figure.

      Thank you for this suggestion. We now provide a new figure, which shows the change in fluorescence of lratd2a right dHb neurons first exposed to water (control) and then to cadaverine or alarm substance. (refer to Figure 2A-D).

      b) The current presentation and analysis of calcium imaging data in Figure 2B does not seem appropriate and can be improved - since the dynamics of olfactory responses are likely highly variable across neurons and fish, rather than comparing responses across time, it would be better to compare the summed response over a longer time window (as already done in Figure S2, but also including water flow control data). Do also mention the time window over which the calcium responses were integrated.

      As noted above, we have added a new figure to represent the change in fluorescence of lratd2a right dHb neurons averaged over a 5 min time window. This trace also includes the standard error of the mean (shaded) to represent the variability in responses among all neurons that were imaged (refer to Figure 2A- D).

      c) Discussion Line 393: "From calcium imaging, we validated that the right dHb appears more responsive than the left when larval zebrafish are exposed to aversive odors such as cadaverine or chondroitin sulfate" - this conclusion cannot be drawn from the existing presented data, unless calcium imaging was also performed in the left habenula.

      Thank you for pointing out this error. Indeed, we were unable to monitor calcium signaling in the left dHb owing to barely detectable levels of GCaMP labeling in the lratd2a cells on the left. We have now corrected this and added the following sentence to the Results:

      “We monitored the responses of individual cells in the right dHb, as GCaMP6f labeling was weakly or not detected in neurons on the left (data not shown).”

      d) It would be good to include in the methods section more detail on how the odor was delivered, volume delivered etc, and whether control experiments were done on the same day / clutch of fish etc.

      We added the requested information in the “Calcium imaging in larval zebrafish” section of the Materials and Methods.

      1. Presentation, analysis, and discussion of c-Fos results and comparison with calcium imaging

      a) Figure 2C-D: The difference / overlap between blue and brown are difficult to make out in the images, especially at this resolution and magnification. Is there a way to specifically quantify the % of lratd2a neurons that are activated by c-fos, rather than just neurons in the dorsal habenula as a whole? This would be necessary to support the claim in line 278: "Thus, the lratd2a subpopulation in the right dHb responds to cadaverine in both larvae and adults".

      We have shown magnified images of double-labeling with fos and lratd2a probes in Figure 2E to G’ to help with visualization of the overlap in colorimetric double in situ hybridization. We quantified the % of the lratd2a expression domain where fos is expressed and provided this information in the results (page 7, lines 128-129).

      b) In larvae (using calcium imaging), the effects of cadaverine to chondroitin sulfate were compared, whereas in adults (using c-Fos), the comparison is between cadaverine and alarm substance. Is there a reason why the alarm substance was not used in larvae, or chondroitin on adult fish? Perhaps the authors can elaborate on their rationale.

      We were basing our experimental approach on results that had been published by others. For example, Krishnan et al. (2014) had shown that dHb neurons respond to chondroitin sulfate in 6-9 day old larvae. Previous studies had reported that the earliest responses to alarm substance can be seen around 42 days post fertilization in zebrafish (Waldman, 1982) and 48-57 days post-hatching in fathead minnows (Carreau-Green et al., 2008). Jetti et al. (2014) examined the response to alarm substance at 25 dpf zebrafish.

      1. Presentation, analysis, and discussion of behavioral results a) The presentation of the alarm substance behavior results could be improved. The authors could include the words "alarm substance" somewhere in the panels so it is clear to the readers that they are looking at responses to that rather than to cadaverine which is described in the preceding panels. Similarly, to avoid confusion and to facilitate comparison, the same parameters should be presented for Figures 3-5 (currently distance in top is not shown in Figure 3 or 5, onset of fast swim and interval time not shown in Figures 4-5).

      As suggested, we added the words “Cadaverine” and “Alarm substance” to Figures 3, 4, 5 and Supplementary figure 4. We also now show the same parameters for the response to alarm substance in BoTx-GFP and intersectional BoTx-GFP transgenic lines, and in tcf7l2 and bsx mutants (refer to Figures 3, 4, 5 and Supplementary figure 4).

      b) Does cadaverine induce changes in swim speed and other kinematic parameters? Similarly, does the alarm substance induce avoidance of one side of the tank like cadaverine? It can be difficult for the reader to compare the effects of genetic manipulations on responses to both odorants since different behavioral parameters are being quantified, hence some means of direct comparison could be helpful.

      As we had described in the discussion (page 14, lines 279-280): “Despite both being aversive cues (Hussain et al., 2013; Mathuru et al., 2012), cadaverine and alarm substance elicit different behavioral responses by adult zebrafish.” Alarm substance triggers immediate (within 1 min) erratic behavior such as rapid swimming, darting and prolonged freezing. Thus, it is not feasible to measure the same behavioral responses to the two aversive cues.

      c) The effect of cadaverine on control groups seems to be quite variable. In Figures 3 and 4 the avoidance effect persists the entire duration of the experiment. In Figures 5 and S4 the effect is only significant in 2 time bins. The authors' conclusions are still valid since the correct comparisons are indeed to their respective sibling controls, however it does make it a bit difficult to compare results across genotypes. For example, non-botox-expressing lratd2a:QF2 fish appear to have about the same degree of cadaverine avoidance as lratd2a:QF2, scl5a7a:Cre, QUAS:Botox fish. Similar to point (b), are there other parameters that can be measured that are more consistent in controls across genotypes? Or at the least, some discussion of the behavioral variability in the text.

      As correctly pointed out by the reviewer, different fish with different genetic backgrounds demonstrate different degrees in their response to odorants. However, behavioral measurements were reproducible over 2 to 3 trials testing the same groups of fish on different days. We now show the aversion index for all individuals tested for the response to cadaverine in a new figure (refer to Supplementary figure 7).

      d) tcf7l2 mutants (like bsx mutants) also have a significantly lower swim speed than controls, this is also worth mentioning / discussing in the text.

      We have now mentioned this in the results section of the main text (Page 10, lines 202-203).

      e) The link between habenular LR asymmetry and aversive behavior is indeed interesting - in the discussion, one proposal was that this asymmetry could promote directed turning and escape. From the existing data (particularly for the lratd2a:QF2, scl5a7a:Cre, QUAS:Botox fish), is there any evidence of differences in turning behavior (LR asymmetry, or probability of turns in general)?

      We did not observe any correlation between habenular L-R asymmetry and the direction of turning in response to alarm substance, although this is an interesting point. We added a sentence to the discussion (page 16, lines 323-324) to reference a recent study on the neural basis of this lateralized behavior.

      f) As a related point, it is not clear to me that one would expect an enhancement of cadaverine avoidance in bsx mutants, especially if the argument is that asymmetry is important for aversive behavior. Perhaps the discussion on this point could be framed less as a negative result but as a notable observation.

      We agree with the reviewer’s interpretation and have added the following sentences to the Results (page 11, lines 218-220): “Despite the symmetric activation of dHb neurons, bsx homozygotes and heterozygotes both showed reduced responsiveness to cadaverine,” and to the Discussion (page 15, lines 303-304): “We did not observe enhanced or prolonged aversion to cadaverine in bsxm1376 homozygotes relative to controls.”

      1. Statistical analyses: Unless data is normally-distributed, non-parametric tests should be used to compare on calcium and behavioral imaging data (such as Kruskal-Wallis for time course of the calcium / behavioral data, Wilcoxon Rank-Sum Test for others).

      As correctly pointed out by the reviewer, we have corrected our statistical analyses (refer to Materials and Methods and the Figure legends). We used two-way ANOVA followed by Bonferroni's post hoc test and an unpaired t-test for analyzing calcium imaging data. For analyzing the response to cadaverine within groups, we used the Wilcoxon signed-rank test and cited publications using comparable approaches (Koide et al., 2009; Wakisaka et al., 2017). For analyzing the data between groups, we used two-way ANOVA followed by Bonferroni's post hoc test.

    1. Author Response:

      Reviewer #1:

      In this manuscript titled "The LRRK2 G2019S mutation alters astrocyte-to-neuron communication via extracellular vesicles and induces neuron atrophy in a human iPSC-derived model of Parkinson's disease", Jacquet and colleagues investigated the role of Parkinsonism gene mutation LRRK2 G2019S in hiPSC-differentiated astrocytes. By isolating extracellular vesicles from ACM and examining astrocytes with various electron microscopy techniques, the authors found that LRRK2 G2019S affects the morphology and distribution of MVBs and the morphology of secreted EVs in hiPSC-differentiated astrocytes. Furthermore, the authors observed that astrocyte-derived EVs can be internalized by dopaminergic neurons and such EVs support neuronal survival. However, LRRK2 G2019S EVs lost the ability of promoting neuronal survival. This is an interesting study showing a non-cell autonomous contribution to dopaminergic neuron loss in PD.

      The proposed idea of how LRRK2 G2019S dysregulates EV-mediated astrocyte-to-neuron communication is novel and exciting. However, the authors present some conflicting data that is not addressed during the discussion: they first conclude upregulated exosome biogenesis by RNAseq in G2019S vs WT astrocytes, but later show a decrease in the number of <120nm particles in G2019S mutants suggesting a decrease in the classical exosome-sized vesicle secreted compared to WT. Lastly, their MVB images show less CD63 gold particles in G2019S compared to WT control (though this was not quantified). Do the authors suggest and increase or decrease in exosome biogenesis in G2019S mutants? How do they reconcile these seemingly contradicting data? Several experiments, controls and additional analyses are needed to fully demonstrate the validity of the proposed mechanism.

      The RNA-sequencing data of LRRK2 G2019S astrocytes showed an enrichment in genes associated with the “extracellular exosome” gene ontology term but not with the MVB/EV trafficking or secretion pathways. While we found CD82 and Rab27b to be upregulated, the classical biogenesis markers of MVB/EV trafficking and secretion (e.g. VTA1, VPS4, ALIX) were not dysregulated. Instead, the gene list shows an overwhelming dysregulation of genes coding for EV-enclosed proteins which do not have known roles in MVB/EV biogenesis or function (we now discuss this point in the main text, see highlights in italics below). As a result, we do not believe that exosome biogenesis is upregulated but instead propose the working hypothesis that the EV pathway may contribute to LRRK2 G2019S astrocyte dysfunction. To complement the sequencing data, our study provides a characterization of this pathway by (i) describing the cellular distribution of CD63+ structures in astrocytes, (ii) measuring the size of secreted EVs, and (iii) analyzing the neurotrophic potential of control and LRRK2 G2019S astrocyte-secreted EVs. We have not characterized the cellular biology of exosome/EV biogenesis in depth, and we do not propose a mechanism by which the LRRK2 G2019S mutation dysregulates these pathways. These questions are beyond the scope of our study, which is focused on the role of astrocytes in neurodegeneration.

      The reviewer also referred to the CD63 immunogold staining used in Figures 4C and 6A to localize MVBs. After careful quantification of the number of CD63 gold particles in WT and LRRK2 G2019S MVBs, we conclude that there are no differences between the two genotypes and we apologize for selecting non-representative images. We have now replaced these with representative images. Regarding the shift in the size of WT vs. LRRK2 G2019S vesicles, we complemented our cryo-EM analysis with new data generated using Nanoparticle Tracking Analysis (NTA) (Figure 3C,D). The NTA analysis enabled the quantification of a greater number of particles, and we found that both WT and LRRK2 G2019S astrocytes secrete a significant number of particles in the 0-120 nm range. The cryo-EM data suggested that mutant astrocytes secreted fewer particles in this size range, but this is not observed in the NTA analysis. This discrepancy could be explained by the following: (i) in contrast to cryo-EM, NTA does not distinguish EVs from cell debris, which could bias the quantification and increase the number of small particles quantified (Noble et al., 2020), and (ii) studies showed that the size distributions between NTA and cryo-EM differ, the latter enabling the identification of larger particles (Noble et al., 2020). These two techniques are therefore complementary in the study of secreted EV and our manuscript now presents data generated using these two approaches (Figure 3C-G) (see italicised text below).

      Results

      Expression of exosome components in iPSC-derived astrocytes is altered by the LRRK2 G2019S mutation Gene ontology (GO) analysis revealed that components of the extracellular compartment are up-regulated in LRRK2 G2019S astrocytes – these include GO terms corresponding to the extracellular region, extracellular matrix and extracellular exosomes (Figure 1D,F). The exosome component is one of the most significantly up-regulated GO terms in both isogenic and non-isogenic astrocytes, and is comprised of a total of 67 (isogenic pair) or 95 (non isogenic pair) genes (Supplementary Tables 1 and 2). The large majority (~ 98 %) of these gene products are described to be enclosed in exosomes (e.g. CBR1) but do not perform specific functions related to EV formation or secretion. Only a few genes are associated with exosome biogenesis (e.g. CD82) and trafficking (e.g. Rab27b) (Andreu & Yanez-Mo, 2014; Chiasserini et al., 2014; Ostrowski et al., 2010) and we did not detect differences in the expression of canonical factors that regulate MVB formation (e.g. VTA1, VPS4 or ALIX).

      Profiling WT and LRRK2 G2019S EVs secreted by iPSC-derived astrocytes

      The astrocyte-derived EV pellet is enriched in exosomes, as demonstrated by the expression of 8 exosomal markers and the absence of cellular contamination (Supplementary Figure 3D). NTA quantification showed that the number of secreted EVs does not differ between LRRK2 G2019S and isogenic control (Figure 3C), and it appears that LRRK2 G2019S particles have a slightly different size distribution compared to WT particles (Figure 3D). It should be noted that TEM and NTA are methods traditionally used to estimate the size distribution of EVs, but their accuracy is often challenged by sample processing artifacts and technical biases (Pegtel & Gould, 2019). To overcome these limitations, we complemented the NTA results with cryo-EM analysis of the size of EVs secreted by WT and LRRK2 G2019S isogenic astrocytes. EVs mostly displayed a circular morphology (as opposed to the cup-shaped morphology observed by TEM) (Figure 3E), but a variety of other shapes were also observed (Supplementary Figure 3E). Cryo-EM analysis confirmed that WT astrocyte-secreted EVs display a large range of sizes, from 80 nm to greater than 600 nm in diameter, with differences between WT and mutant populations (Figure 3F). The cryo-EM data suggested that mutant astrocytes secreted fewer particles in the 0-120 nm size range, and the discrepancy with the NTA results could be explained by the following: (i) in contrast to cryo-EM, NTA does not discriminate EVs from cell debris, which could bias the quantification and increase the number of small particles quantified (Noble et al., 2020), and (ii) studies showed that the size distributions between NTA and cryo EM differ, the latter enabling the identification of larger particles (Noble et al., 2020). However, cryo-EM is a low throughput methodology that limits data collection to a small sample size and has therefore a lower statistical power than NTA. Quantification of the number of simple vs. multiple EV structures did not reveal differences between the two lines, and represent up to 16% of the EV population (Figure 3G). We then sought to complement our EV profiling experiments with an analysis of secreted CD63+ particles, which form one of the known exosomal sub-populations. We previously showed that WT and LRRK2 G2019S MVBs contain similar levels of the CD63 tetraspanin (Figure 2E, Supplementary Figure 3A,B), and an ELISAbased quantification confirmed that the number of CD63+EVs remained unchanged between the two genotypes (Supplementary Figure 3F). We conclude from these results that the total number and morphology of EVs produced by WT and LRRK2 G2019S astrocytes are similar, but mutant EVs may have a different size distribution compared to WT vesicles.

      Major concerns:

      1) In figure 1 A authors demonstrate iPSC-derived astrocytes characterization. Since there is no one unified and validated method for astrocytes differentiation, there is a need for more accurate characterization of iPSC-derived astrocytes. Authors should demonstrate the percentage of cells positive to astrocytic markers and to prove that obtained astrocytes are functional (able to promote synaptogenesis and uptake glutamate). I would also recommend analyzing the iPSC-derived astrocyte cultures for expression of more specific astrocytic markers as GLT1, SOX9 in addition to those which have been analyzed. Moreover, it is highly important to know what is the proportion of astrocytes derived from LRRK2 G2019S line and its isogenic control in order to be able to compare their effect on neurons.

      We thank the reviewer for these suggestions. It is true that there exist many different astrocyte differentiation protocols, and this study uses a protocol developed by TCW et al. that has been further optimized by our lab to derive astrocytes from a midbrain-patterned population of neural progenitor cells (NPCs) (de Rus Jacquet, 2019; Tcw et al., 2017). The protocol is published, and shows that these astrocytes are functional – they respond to inflammatory factors and alter secretion of the IL-6 cytokine. Furthermore, Supplementary Figure 2D shows a whole transcriptome analysis (by RNA-seq) of the cell populations produced for this study and demonstrates that iPSC-derived astrocytes cluster with human primary midbrain astrocytes and away from iPSCs or NPCs in an unsupervised cluster analysis. However, we agree that in-depth characterization of iPSC-derived astrocytes is essential, and the updated manuscript now shows that (i) the astrocyte differentiation protocol yields 100 % GFAP+ cells with both WT and mutant lines (Supplementary Figure 2B), (ii) expression of six astrocyte markers (GLT1, SOX9, APOE, BHLHE41, CD44, GLUD1) (Supplementary Figure 2Aii, B), as well as (iii) transient intracellular calcium signaling (Supplementary Figure 2E), and (iv) synaptosome uptake (Supplementary Figure 2F) in both WT and LRRK2 G2019S astrocytes. We also updated the text as follows (italicised):

      Results section

      Midbrain-patterned NPCs carrying the LRRK2 G2019S mutation or its isogenic control were differentiated into astrocytes as described previously (de Rus Jacquet, 2019; Tcw et al., 2017). As expected, astrocytes expressed the markers GFAP, vimentin, and CD44 as demonstrated by immunofluorescence (Figure 1A) and flow cytometry analyses (Supplementary Figure 2A). Differentiation was equally effective in WT and LRRK2 G2019S cells, with 100 % of the differentiated astrocytes expressing GFAP (Supplementary Figure 2Bi). To further demonstrate the successful differentiation of iPSCs into astrocytes, we analyzed gene expression using RNA-sequencing analysis (RNA-seq), including primary human midbrain astrocyte samples in the RNA-seq study to serve as a positive control for human astrocyte identity. iPSC-derived and human midbrain astrocytes expressed similar levels of genes markers of astrocyte identity, including SOX9 and GLUT1 (Supplementary Figure 2B). In addition, principal component and unsupervised cluster analyses separated undifferentiated iPSCs, iPSC-derived NPCs and iPSC-derived astrocytes into independent clusters, demonstrating that our differentiation strategy produces distinct cell types (Supplementary Figure 2C-D). Importantly, the transcriptome of iPSC-derived astrocytes showed more similarities to fetal human midbrain astrocytes than to NPCs or iPSCs, further validating their astrocyte identity (Supplementary Figure 2D). Lastly, control and LRRK2 G2019S astrocytes showed classic astrocytic functional phenotypes such as spontaneeous and transient calcium signaling and synaptosome uptake (Supplementary Figure 2E-F).

      2) In Figure 1, the authors found a significant upregulation of exosome components in astrocytes, demonstrating an important role of LRRK2 G2019S in EV signaling pathway. In the discussion, the authors briefly mentioned 'sub-populations of CD63- EVs may be differentially secreted in mutant astrocytes'. Since the authors have obtained the RNA-seq data, it would be nice to dig deep into the data and comment on potential EV sub-populations which can be differentially secreted. This information can be very beneficial for follow-up studies in the PD and LRRK2 field. Furthermore, the authors should assess the expression of Rab27a and CD82 in WT and LRRK2 G2019S astrocytes by western blots to verify RT-qPCR data. Furthermore, the authors should present specifically exosome biogenesis or secretion genes are altered to provide further insight into the stage of exosome biogenesis that is affected (ESCRT0-3, VPS4, ALIX, etc).

      In the first comment, the reviewer refers to the observation that the number of total and CD63-positive EVs secreted by astrocytes is unchanged between the WT and LRRK2 G2019S genotypes. The classification of different EV sub-populations based on marker proteins is an evolving field of research, and an important study by Kowal et al. defined generic and sub population-specific EV markers (Kowal et al., 2016). Our RNA-seq dataset revealed five upregulated genes identified in the Kowal study, namely actin, GAPDH, actinin, complement and fibronectin, but unfortunately there is no clear pattern correlated with specific EV sub populations. For example, actin and GAPDH are two upregulated proteins that can be found in multiple types of EVs, actinin is enriched in large and medium-sized EVs, and complement and fibronectin are enriched in high density but small EVs (Kowal et al., 2016). The majority of dysregulated genes identified in our sequencing experiment are not proteins classically used to categorize EVs, so unfortunately our data does not allow us to address the reviewer’s question. To make sure that the data is readily accessible to the scientific community, we have prepared a supplementary table with a list of extracellular exosome-related genes identified in the RNA sequencing study. To respond to the reviewer’s comment on a specific stage of EV biogenesis/secretion altered in LRRK2 G2019S, the sequencing data presented in this manuscript does not allow to conclude that there is such a dysregulation. Our gene list corresponding to the “extracellular exosome” gene ontology term contains a large majority of genes coding for proteins enclosed within EVs that do not play a role in biogenesis/secretion. For example, the gene list does not contain ESCRT0-3, VPS4, ALIX or other classical markers involved in EV biogenesis and we cannot conclude anything about the alteration of MVB/EV biogenesis or defects in specific stages of MVB trafficking or EV secretion. In addition, we thank the reviewer for suggesting the validation of RT-qPCR data by western blot. The purpose of the RT-qPCR experiment was to validate the gene expression data collected by RNA-seq. Given that our objective was to confirm gene expression levels, and that we do not further study CD82 and Rab27b, we think that collecting protein expression levels is not necessary in the context of this study.

      We agree with the reviewer’s suggestion, and we added images showing the subcellular localization of CD63 in both WT and LRRK2 G2019S MVBs by immunogold staining (Figure 2E). Validation experiments available in Figure 1A and Supplementary Figure 2A and 2B confirm that our astrocytes express CD44 as well as markers of mature astrocytes (BHLHE41, SOX9, GLUT1, APOE, GLUD1). The reason for showing CD44 instead of a more mature marker such as GFAP in Figure 2 of the manuscript is because CD44 is a membrane marker, and it therefore enables a clear visualization of the astrocyte surface area. We also note that, as shown in Supplementary Figure 2Aii, iPSC-derived and human fetal astrocytes express CD44, but iPSCs and NPCs do not significantly express this marker gene. In addition, as suggested by the reviewer, we added information related to MVB maturation in the introduction and the new text reads as follows (changes in italics):

      Introduction section The sorting and loading of exosome cargo is an active and regulated process (Temoche-Diaz et al., 2019), and the regulatory factors involved in EV/exosome biogenesis are just beginning to be identified. Among the well-known factors, Rab proteins are essential mediators of MVB trafficking and they regulate endosomal MVB formation/maturation as well as microvesicle budding directly from the plasma membrane (Pegtel & Gould, 2019; T. Wang et al., 2014). In addition, membrane remodeling is an essential aspect of MVB/EV formation that appears to be regulated, at least in part, by the endosomal sorting complex required for transport (ESCRT) machinery (Pegtel & Gould, 2019; Schoneberg, Lee, Iwasa, & Hurley, 2017).

      3) In Figure 2A and B, data shows that both WT and LRRK2 G2019S astrocytes produce MVBs and MVBs in LRRK2 G2019S astrocytes is smaller than in WT astrocytes. In Figure 2E, the authors showed the abundance of CD63 localized within MVBs in WT astrocytes but did not show the CD63 localization in MVBs in G2019S astrocytes. However, it is important to show CD63 localization in MVBs in G2019S astrocytes to fully support the conclusion that CE63+ MVBs are present in LRRK2 G2019S astrocytes. In addition, CD44 is a marker for astrocyte-restricted precursor cells. Although CD44+ positive cells are committed to give rise to astrocytes, it is crucial to include another astrocyte marker to ensure these cells are indeed mature astrocytes. -Related, authors should consider citing some of the MVB maturation literature to guide the readers.

      We agree with the reviewer’s suggestion, and we added images showing the subcellular localization of CD63 in both WT and LRRK2 G2019S MVBs by immunogold staining (Figure 2E). Validation experiments available in Figure 1A and Supplementary Figure 2A and 2B confirm that our astrocytes express CD44 as well as markers of mature astrocytes (BHLHE41, SOX9, GLUT1, APOE, GLUD1). The reason for showing CD44 instead of a more mature marker such as GFAP in Figure 2 of the manuscript is because CD44 is a membrane marker, and it therefore enables a clear visualization of the astrocyte surface area. We also note that, as shown in Supplementary Figure 2Aii, iPSC-derived and human fetal astrocytes express CD44, but iPSCs and NPCs do not significantly express this marker gene. In addition, as suggested by the reviewer, we added information related to MVB maturation in the introduction and the new text reads as follows (changes in italics):

      Introduction section

      The sorting and loading of exosome cargo is an active and regulated process (Temoche-Diaz et al., 2019), and the regulatory factors involved in EV/exosome biogenesis are just beginning to be identified. Among the well-known factors, Rab proteins are essential mediators of MVB trafficking and they regulate endosomal MVB formation/maturation as well as microvesicle budding directly from the plasma membrane (Pegtel & Gould, 2019; T. Wang et al., 2014). In addition, membrane remodeling is an essential aspect of MVB/EV formation that appears to be regulated, at least in part, by the endosomal sorting complex required for transport (ESCRT) machinery (Pegtel & Gould, 2019; Schoneberg, Lee, Iwasa, & Hurley, 2017).

      4) In Figure 3, it is impressive that the authors are able to image EVs using cyro-EM approach and analyze their sizes. The authors also observed different shapes of EVs. Is there any shape difference between WT EVs and G2019S EVs? Is there a way that the authors could categorize these shapes and do a detailed analysis in EV shapes? Also, In Figure 3D, both WT EV and G2019S EV images should present side by side for comparison. -Related, the size frequencies of EVs presented suggest a difference in the types of EV's released. Interestingly, exosomes are classically known to range from ~50-120nm and this population is significantly decreased in G2019S compared to WT. What does this suggest?

      As suggested by the reviewer, we classified the two main EV shapes as “simple” and “multiple” EVs, and found no quantitative differences between WT and LRRK2 G2019S. This new data and side-by-side images of WT and LRRK2 G2019S EV images are available in Figure 3E-G, and the text has been updated accordingly (see text in italics below). One of the observations of Figure 3 is that there exist genotype-specific differences in the size distribution of EVs, which suggests that different classes of vesicles may be preferably produced by WT vs. LRRK2 G2019S astrocytes. This could be the result of differences in dynamics related to cargo loading, or a shift from MVB-released exosomes to membrane budding and microvesicle production. These observations are of great interest and we added a short discussion (in italics below) but they are beyond the scope of this study focused on EV neurotrophic properties, and we do not currently have evidence to support these hypotheses.

      Results - LRRK2 G2019S affects the size of EVs secreted by iPSC-derived astrocytes

      EVs mostly displayed a circular morphology (as opposed to the cup-shaped morphology observed by TEM) (Figure 3E), but a variety of other shapes were also observed (Supplementary Figure 3C). (…) Quantification of the number of simple vs. multiple EV structures did not differ between the two lines, and represent up to 16 % of the EV population (Figure 3G).

      Discussion – Dysregulation of iPSC-derived astrocyte-mediated EV biogenesis in Parkinson’s disease

      The observation that LRRK2 G2019S MVBs are less frequently located in the perinuclear area suggests that they may spend less time loading cargo at the Trans-Golgi network, which could in turn produce smaller MVBs and EVs with a different size range compared to WT (Edgar, Eden, & Futter, 2014; Pegtel & Gould, 2019). We did not observe a difference in the number of secreted EVs (total and CD63+ subpopulation) between WT and LRRK2 G2019S astrocytes (Figure 3C,H), suggesting that the secretion of at least one population of EVs is independent of the astrocyte genotype.

      5) In figure 3c, SBI ELISA claims to quantify CD63+ vesicles, the authors should present more standardized particle quantification data (either by CD63 FACs for isolated EVs in WT vs G2019S or ZetaView/QNano particle tracking). The authors should also directly quantify the total number of EVs secreted in WT vs G2019S conditions (not only CD63+).

      The updated manuscript now contains the NTA analysis of WT and LRRK2 G2019S EVs (Figure 3C,D) which provides the total number of EVs secreted by WT and LRRK2 G2019S astrocytes.

      6) In Figure 4, the authors quantify LRRK2+/CD63+ particles by imaging. Importantly, it appears that there are less CD63 "large gold" particles in MVB of G2019S compared to control. This CD63 baseline quantification in MVB of WT vs. G2019S should be presented in this figure. These data are not convincing and should be quantified by FACS in secreted EV. Supplementary figure 3 should be brought into this figure.

      As suggested by the reviewer, we quantified the number of CD63 large gold particles per MVB in WT and LRRK2 G2019S lines (Supplementary Figure 3A,B), and we re-introduced Supplementary Figure 3 into the main text (Figure 4E). We also updated the text (see in italics below). Additionally, we present extensive quantification of LRRK2 levels in MVBs and secreted EVs via imaging and biochemical analysis (ELISA), two different but complementary analytical methods.

      Results - LRRK2 G2019S affects the size of MVBs in iPSC-derived astrocytes

      Tetraspanins are transmembrane proteins, and the tetraspanin CD63 is enriched in exosomes and widely used as an exosomal marker (Escola et al., 1998; Men et al., 2019). However, cell type specificities in the expression of exosomal markers such as CD63 have been documented (Jorgensen et al., 2013; Yoshioka et al., 2013). We therefore confirmed the presence of CD63- positive MVBs in iPSC-derived isogenic astrocytes by immunofluorescence (Figure 2D) and immunogold electron microscopy (IEM) (Figure 2E). Analysis of IEM images showed an abundance and similar levels of CD63 localized within MVBs in WT and LRRK2 G2019S astrocytes (Figure 2E, Supplementary Figure 3A,B), confirming that CD63 can be used as a marker of MVBs and exosomes in iPSC-derived astrocytes.

      7) In Figure 5, using CD63 as a MVB marker is not the most accurate approach. ESCRT markers should be co-stained with these experiments to truly show MVB localization (CD63 can localize to MVBs but is known to have a wider distribution throughout the cell compared to TSG1010 or other ESCRT complex proteins). Additionally, the authors must show their Supplemental Figure 3 ELISA quantification of p-aSyn in this main figure, and comment on why they conclude higher p-aSyn content in MVBs based on their IEM but then find no differences in aSyn in secreted EVs in WT vs. G2019S by ELISA.

      We thank the reviewer for the suggestion to use ESCRT proteins as MVB markers. We decided to use CD63 because it is recognized in the literature as an MVB and EV marker (Beatty, 2008; Edgar et al., 2014), and we now refer to these two studies in the manuscript to support this choice (see text in italics below). Using ESCRT complex proteins as MVB markers is an interesting alternative, but we note that proteins associated with this complex are also found to regulate other biological processes such as autophagy (Takahashi et al., 2018) and plasma membrane repair (Jimenez et al., 2014), and so they can co-localize to non-MVB structures (e.g. autophagosomes or plasma membrane). Similarly, TSG101 can also localize to non-MVB structures such as the nucleus and Golgi complex (Xie, Li, & Cohen, 1998), and also lipid droplet (LD) membranes where it promotes LD-mitochondria contact (J. Wang et al., 2021). As suggested by the reviewer, Supplemental Figure 3 has been re-introduced into the main text (Figure 6C). Regarding αSyn, the immunogold staining specifically detects the phosphorylated form of αSyn (p-αSyn), while the ELISA detects all forms of αSyn (total αSyn). We observed increased p-αSyn in LRRK2 G2019S MVBs, but similar levels of total αSyn in WT vs LRRK2 G2019S EVs. This observation suggests that the phosphorylated form of αSyn, but not the total amount of αSyn, is affected by the experimental conditions. The text has been updated and reads as follows (changes in italics).

      Results - LRRK2 is associated with MVBs and EVs in iPSC-derived astrocytes

      In light of our observations that mutations in LRRK2 result in altered astrocytic MVB and EV phenotypes, we asked if LRRK2 is directly associated with MVBs in astrocytes and if this association is altered by the LRRK2 G2019S mutation. We analyzed and quantified the co localization of LRRK2 with CD63 (Figure 4A), a marker for MVBs (Beatty, 2008; Edgar et al., 2014), and found that the proportion of LRRK2+ /CD63+ structures remains unchanged between WT and LRRK2 G2019S isogenic astrocytes (Figure 4B).

      Results - The LRRK2 G2019S mutation increases the amount of phosphorylated alpha synuclein (Ser129) in MVBs

      Since the MVB/EV secretion pathway is altered in our LRRK2 G2019S model of PD, we reasoned that mutant astrocytes might produce αSyn-enriched EVs by accumulating the protein in its native or phosphorylated form in MVBs or EVs. IEM analysis revealed an abundance of p-αSyn (small gold) inside and in the vicinity of MVBs of LRRK2 G2019S iPSC-derived astrocytes, but not isogenic control astrocytes (Figure 6A). We observed that 55 % of the CD63+ (large gold) MVBs in LRRK2 G2019S astrocytes are also p-αSyn+ (small gold), compared to only 16 % in WT MVBs. LRRK2 G2019S astrocytes contained on average 1.3 p-αSyn small gold particles per MVB compared to only 0.16 small gold particles in isogenic control astrocytes, and MVB populations containing more than 3 p-αSyn small gold particles were only observed in LRRK2 G2019S astrocytes (Figure 6B). When we analyzed the content of EVs by ELISA, we found that total αSyn levels (phosphorylated and non-phosphorylated) in EV-enriched fractions are similar between isogenic controls and LRRK2 G2019S (Figure 6C). These results suggest that astrocytes secrete αSyn-containing EVs, and the LRRK2 G2019S mutation appears to alter the ratio of p-αSyn/total αSyn in MVB-related astrocyte secretory pathways.

      8) In figure 6, it is even more clear that there is a stark difference between the CD63 presence in/near MVBs between WT and G2019S conditions. Since the authors normalize several pieces of data to CD63 (MVB localization, LRRK2 co-localization, etc), it is critical to quantify the number of baseline CD63 gold particles in MVBs in WT vs G2019S.

      After careful quantification of the number of CD63 gold particles in WT and LRRK2 G2019S MVBs (available in Supplementary Figure 3A,B), we conclude that there are no significant differences between the two genotypes, and the MVB images initially selected in Figure 6 are not representative. We therefore replaced Figure 6A with new images.

      9) In Figure 7, the authors used the co-culture of astrocytes and neurons to assess astrocyte-derived EV uptake by dopaminergic neurons. Although 3D reconstitution of neurons and exosomes can be precise, the data may not be 100% clean. It would be better if the authors collect ACM containing EV fraction from WT astrocyte and G2019S astrocytes and then incubate dopaminergic neurons with ACM containing EV fraction. In this way, only dopaminergic neurons are in the culture and there will be no CD63-GFP expressed astrocytes to contaminate the CD63-GFP signal in neurons.

      We understand the concerns raised by the reviewer, and we can ensure that state-of-the-art imaging technologies and image post-processing techniques have been used to prevent astrocytic CD63 signal from contaminating the neuronal signal. We performed confocal microscopy with a 63X oil objective lens (numerical aperture = 1.4), and the images were processed with a Gaussian Filter (0.18 μm filter width) to reduce background noise in the MAP2 channel, and deconvolved (10 iteration) to enhance confocal image resolution in the CD63 channel. Furthermore, CD63-positive structures were detected with background subtraction enabled.

      10) In Figure 9, the authors must show their ACM control. They show untreated, EV-free, and EV-rich ACM, but do not show unmanipulated ACM control.

      The results of dendrite length analysis for unmanipulated ACM was initially available in Figures 8E and 8F. For clarity, we prepared a new Figure 9 that shows treatment with unmanipulated ACM, EV-free ACM, and EV-enriched fractions.

      Reviewer #2:

      In this manuscript by de Rus Jacquet et al., authors present an interesting study to detect changes in extracellular vesicles in human PD patient derived iPSC-derived astrocytes carrying the LRRK2 G2019S mutation. Isogenic gene corrected iPSCs were used as controls in all experiments. Authors first performed RNA-Seq for global gene expression changes between G2019S and "WT" gene corrected astrocytes. GO analysis showed an upregulation of extracellular compartments (including exosome compartments) in LRRK2 astrocytes. Subsequent experiments focusing on extracellular vesicles (EVs) and multivesicular bodies (MVBs), showed specific differences of MVB area and the size of secreted EVs. Secreted EVs from G2019S astrocytes also contained more LRRK2 particles and G2019S EVs contained more phosphorylated aSyn particles. Co-culture of LRRK2 astrocytes with human dopamine neurons showed accumulation of CD63+ exosomes in neurites, compared to co-culture with WT astrocytes. Co-culture with LRRK2 astrocytes decreased viability of TH+ neurons and LRRK2 dendrites/neurites were also shorter. These co-culture findings were replicated using EV-enriched conditioned media. Finally, authors showed that the trophic effect of astrocytes on neurons was due both to soluble factors released into the media, and production and release of EVs. Overall, this is a well-written and systematically performed study. This reviewer has several comments as detailed below.

      1) Based on their data, authors conclude that astrocyte-to-neuron signaling and trophic support mediated by EVs is disrupted in LRRK2 G2019S astrocytes. Have authors measured the differences in trophic factors released by LRRK2 astrocytes in EVs and in conditioned media?

      This is an important question, and we have not measured the levels of various neurotrophic factors in the medium. We concluded that LRRK2 G2019S astrocytes failed to secrete neurotrophic factors based on the neuron viability data. Healthy neurons cultured with disease astrocytes displayed dendrite shortening equivalent to that of neurons cultured in basal medium lacking neurotrophic factors. Furthermore, the morphological alterations occurred over a long period of time (2 weeks) and did not recapitulate the rapid and high level of neuron death and neurite fragmentation typically observed as a result of exposure to neurotoxins (Liddelow et al., 2017). However, we performed a new analysis of our RNA-seq data and identified dysregulated trophic processes of interest in LRRK2 G2019S astrocytes.

      2) Authors differentiate cells (astrocytes and neurons) from midbrain lineage NPCs. The data show convincing effects of the LRRK2 derived astrocytes on neurons, but one question is whether this is specific to dopaminergic cells. Would this genotype specific effect also be expected in other lineages, e.g. cortical neurons? Authors should discuss this point.

      The reviewer is making an excellent point. We prepared mouse primary midbrain cultures, and co-cultured WT midbrain neurons with WT or LRRK2 G2019S astrocytes. We found that the survival of WT midbrain dopaminergic neurons was significantly affected by LRRK2 G2019S astrocytes, but the viability of non-dopaminergic midbrain neurons was not changed when co cultured with WT or disease astrocytes. A previous study by di Domenico et al. also showed that dopaminergic neurons are more sensitive to the effect of LRRK2 G2019S astrocytes compared to non-dopaminergic cell types (di Domenico et al., 2019).

      3) Prior work has demonstrated reductions in neurite length in neurons derived from LRRK2 G2019S iPSCs (not specific to dopaminergic neurons in LRRK2 cells) (for example Reinhard et al 2013). It is curious that the LRRK2 G2019S mutation itself can cause such a phenotype in neurons mono-cultures, and as shown in the current study, that LRRK2 G2019S astrocytes also induce a similar effect on WT neurons in co-culture. Can authors expand on this point in the Discussion?

      We thank the reviewer for this question, and we added a new point of discussion in our manuscript, which reads as follows (changes in italics):

      Evidence from this study and previous reports indicates that the LRRK2 G2019S mutation affects neurons through a variety of mechanisms. Here, we show a non-cell autonomous effect on neuronal viability via impairment of essential astrocyte-to-neuron trophic signaling, but the LRRK2 G2019S mutation can also mediate cell-autonomous dopaminergic neurodegeneration (Reinhardt et al., 2013). These observations support the idea that the LRRK2 kinase may be involved in a large number of pathways essential to maintain cellular function, cell-cell communication and brain homeostasis, and disruption of LRRK2 in one cell type has cascading effects on other neighboring cell types. In conclusion, our study suggests a novel effect of the PD-related mutation LRRK2 G2019S in astrocytes, and in their ability to support dopaminergic neurons. This study supports a model of astrocyte-to-neuron signaling and trophic support mediated by EVs, and dysregulation of this pathway contributes to LRRK2 G2019S astrocyte mediated dopaminergic neuron atrophy.

      4) Authors should provide data on % dopaminergic neurons generated in the cultures.

      We agree that this is important information, and we updated the latest version of the manuscript with this information (see below in italics). We estimate that the neuron cultures consist of 50 to 70 % dopaminergic neurons, and they are depleted of non-neuronal cells as explained in Material and Methods.

      Material and Methods - Preparation and culture of iPSC-derived NPCs, dopaminergic neurons and astrocytes

      To isolate a pure neuronal population, the cells were harvested in Accumax medium, diluted to a density of 1 × 106 cells in 100 µl MACS buffer (HBSS, 1 % v/v sodium pyruvate, 1 % GlutaMAX, 100 U/ml penicillin/streptomycin, 1 % HEPES, 0.5 % bovine serum albumin) supplemented with CD133 antibody (5 % v/v, BD Biosciences, San Jose, CA, cat. # 566596), and the CD133+ NPCs were depleted by magnetic-activated cell sorting (MACS) using an LD depletion column (Miltenyi Biotech, San Diego, CA), as described previously (de Rus Jacquet, 2019). The final cultures are depleted of non-neuronal cells and contain approximately 70 % dopaminergic neurons, the remaining neurons consisting of uncharacterized non-dopaminergic populations.

      5) p7. Authors refer to phosphorylated a-synuclein as accelerating PD pathogenesis, but the references cited do not show this. In fact, Gorbatyuk et al 2008, showed that overexpression of S129 with constitutive phosphorylation eliminated a-synuclein induced nigrostriatal degeneration. The Fujiwara et al 2002 reference showed the presence of phospho a-syunclein in Lewy bodies and neurites. Authors should revise their statement that phospho a-synuclein is associated with accelerated pathology.

      The reviewer is correct. We meant to highlight that there is a correlation between phosphorylated αSyn levels and PD pathogenesis, not that phosphorylated αSyn causes an acceleration of PD pathogenesis. We rephrased the sentence as follows, and replaced the study by Gorbatyuk et al. with a study by Anderson et al. that shows presence of phosphorylated αSyn in Lewy bodies (new text in italics):

      EVs isolated from the biofluids of PD patients exhibit accumulation of αSyn (Lamontagne Proulx et al., 2019; Shi et al., 2014; Zhao et al., 2018), a hallmark protein whose phosphorylation at the serine residue 129 (p-αSyn) is correlated with PD pathogenesis (Anderson et al., 2006; Fujiwara et al., 2002).

      6) Please provide details on the number of iPSC lines used for these experiments.

      Experiments in the first version of this manuscript were performed using a single LRRK2 G2019S iPSC line and its gene-corrected control. The manuscript now presents the results collected using a second, independent non-isogenic iPSC line, as well as mouse primary cultures.

      7) Clarify whether the WT neurons used for co-culture were derived from the isogenic human neurons?

      We confirm that the WT neurons used for co-culture experiments were derived from isogenic controls. We added subtitles to our figures to clarify when data show results from isogenic or non-isogenic iPSC-derived cells.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors make juxtacellular recordings on awake mice, which should yield clear responses of actions potentials, and employ a number of manipulations to silence pathways. They also record from a "non"-whisker secondary thalamic region, LP, as a null hypothesis to establish if certain effects are related to "behavior" - read arousal or saliency". I have no major qualms.

      In light of Petersen's paper (Cell Reports 2014) on cholinergic effects on spike rates in primary whisker somatosensory cortex, I can imagine that the authors considered measuring from cholinergic neurons in nucleus basalis during whisking. I'll assume that this is easier said than done. As such, the current manuscript passes my threshold for publication modulo issues raised below that are related to anatomy.

      The cholinergic experiments are an interesting idea. However, inactivation of S1 did not change the relationship between POm and whisking, suggesting that cholinergic modulation of S1 and thereby corticothalamic output are not the key mechanism. It is conceivable that acetylcholine modulates POm directly, but the critical experiments would involve extensive manipulations of POm (a whole additional study). Nevertheless, we have added a reference to Eggermann & Petersen and discussed this issue further in the revision.

      I provide a figure-by-figure critique:

      (1) Recent work from Deschênes et al (Neuron 2016) points to a description of whisking in terms of Angle = Set-point_angle - Whisking-amplitude [1 + cosine(Phase - Phase_0)], where Phase is a rapidly varying, typically rhythmic function of time. Why not use this notation as opposed to yet another descriptive statistic and report the kinetics as the time averaged parameters , i.e., the most forward position, and ,Whisking-amplitude>, i.e., the half-amplitude of the average whisk?

      We are not entirely sure what the reviewer means by “another descriptive statistic” as we do not introduce new approaches for analyzing whisking in this paper. (Perhaps the reviewer refers to “median angle”, which is an average of all the whisker positions on a single frame. We use this measurement because our videos contain the entire whisker field rather than just a single whisker as in our other studies, e.g. Hong et al 2018, Rodgers et al 2021). We based our parameterization of median angle on two publications: Hill et al (2011 Neuron) and Moore et al (2015 PLoS Biology). Moore et al describes whisking as a function of phase, amplitude, and midpoint:

      where 𝜃(t) is the median whisker angle at time 𝑡 , 𝜙 is the phase as computed by the Hilbert transform of the filtered whisker angle, 𝜃^Amplitude is the difference between the most protracted and retracted whisker positions over a single cycle, and 𝜃^midpoint is the central angle of a single whisk cycle. As we understand the reviewer, we are using the formulation they describe. We are happy to consider alternate formulations if we are missing something.

      A critical issue is to confirm where the recording were made. This the authors should supply at least a typical record of anatomy from their POm as well as VPM and LP recording. The beauty of the juxtacellular technique is that neurons can be labeling after the recording

      We used the juxtacellular recording technique for its superior recording quality. We did not label individual cells after recording because we recorded multiple cells per animal over several days. The number of cells would complicate matching of filled cells to recorded physiological data, and biotin filling is not stable over multiple days (beyond 36 hours). Instead, as described in the original manuscript, we tracked the relative locations of all inserted pipettes and labeled the final track with DiI. Cells were roughly localized along the tracks using relative microdrive depths. Due to the morphological homogeneity of thalamic neurons, filling individual cells would not be more informative than labelling the recording site with DiI. New Figure 1 – figure supplement 1 includes representative histology images from our recordings in POm, VPM, LP, and M1.

      (2) Did the authors make sure that the mystacial pad is not moving by imaging the pad as opposed to just the shaft of the whiskers? The top view in Figure 1A makes this hard to check.

      To address this concern, we provide new data, in which both the cut and uncut sides of the face of mice were imaged. We measured the movement of the mystacial pads as motion energy – the mean absolute difference in pixel values across video frames. The motor nerve surgery almost completely abolished movement of the mystacial pad. A new figure panel (Figure 2B) demonstrates the movement of the normal and paralyzed mystacial pads.

      Further, did the authors perform post-hoc anatomy to insure that both the ramus buccolabialis inferior and ramus buccolabialis superior muscles were cut? This is critical; it is also easy to leave the maxillolabialis (external retractor) innervated if the cut is too far rostral.

      We did not attempt to cut muscles. We only cut the motor nerve. We did not examine the face post mortem, as it was obvious that both whisker and mystacial pad movement were absent (as in new Figure 2B).

      (3/4) As relevant background, the text should note that whisker primary motor cortex maintains a copy of the envelope of the whisking, i.e., an ill-defined summation of set-point and amplitudes, even if the sensory input (Ahrens & Kleinfeld J Neurophysiol 2004) or motor output (Fee et al. J Neurophysiol 1997) in the periphery are cut.

      The Results text now cites these papers as motivation for the experiments of Figure 3.

      (6/7) Same comments in (1) in whisking parameters and anatomy.

      As we discussed in (1), we are using the conventional parameterization of others. Histological examples are now included in Figure 1 – figure supplement 1.

      Reviewer #3 (Public Review):

      Previous studies in urethane-anesthetized rats (PMID 16605304) proposed that POm cells code whisker movements. This was observed using "artificial whisking" procedures (stimulating the motor nerve to produce a whisking-like movement). It has been clear for some time now that there are substantial (obvious) differences between this procedure and natural whisking. In addition, under urethane-anesthesia animals are in a sleep-like state that is very dissimilar to waking (although some work has tested the effect of network state on artificial whisking responses in both primary thalamus and cortex; see 25505118). In the present study, the authors measured activity in POm cells during whisking in awake (head-fixed) mice to determine if they code whisking movement. However, this seems to have already been done previously. For instance, Moore et al (2015; 26393890) found that coding of whisking in the ascending paralemniscal pathway, including POm, is "relatively poor" (as stated in the abstract), which is the same conclusion reached in the present study. The authors should clarify the main differences observed in whisking coding between their study and previous work.

      The authors then focused on the idea that POm codes behavioral state. However, many studies have previously determined that state has a great impact on thalamocortical dynamics; thalamic cells are very sensitive to state including cells in primary whisker thalamic nuclei, such as VPM, and these effects can be produced by neuromodulators (see work by Castro-Alamancos' group, for example, 16306412). There is nothing special about VPM in this regard; other thalamic sensory nuclei are also sensitive to behavioral state and neuromodulators. Therefore, the observation that POm and LP cells are sensitive to state is unsurprising. It is also known that these thalamic state changes have a great impact on the state of the cortex (see 20053845), which seems very relevant to the main conclusion. The POm has to be doing something different than coding behavioral state since most thalamic nuclei do this. The study did not identify the role of POm, which certainly has to be different from LP (otherwise, why would these nuclei be differentiated?). POm is unlikely to be specialized for monitoring state since this is done by most of the thalamus -including VPM, which projects to the same cortical region. Thus, while it is interesting that most of the whisker-related activity in POm is state-dependent, the study does not clarify the role of POm.

      We have added the references we did not already include to our text and improved our discussion.

      Prior studies (such as Moore et al 2015 and Urbain et al 2015) have previously characterized the encoding of whisker motion in POm. Indeed, we note the consistency between our results and such studies in both the introduction and conclusion. Here we expand upon prior studies to directly test two prominent hypotheses about the role of the paralemniscal pathway: that it encodes sensory reafference, and that it inherits a motor efference copy from cortical and subcortical regions. We present the impact of several manipulations of the vibrissal system (facial paralysis, cortical silencing, and lesion of superior colliculus) on thalamic activity that, to our knowledge, have not been previously reported. Moreover, we leveraged a novel comparison of POm and LP to test whether movement‐correlations of POm reflected true motor modulation or rather state dependency. We have provided evidence that the coupling of POm activity to whisking reflects state rather than motor signals. We never suggested that POm is a unique monitor of behavioral state. We suggest instead that secondary thalamic nuclei may be state‐modulated and have specific impacts on response gain and plasticity in their respective cortical areas. While our work is consistent with previous studies, we believe these results are novel extensions of past work.

      The main strength of the study is that it was performed in awake mice with behavioral state monitoring, which contributes to the current understanding of active whisking coding in the complex network of the vibrissa system.

      In our opinion, the main strength of our study is its multiple manipulations to test the sources of modulation and the leveraging of a POm‐LP comparison. We have revised the text to reinforce these points.

    1. Author Response:

      Reviewer #3 (Public Review):

      The manuscript of Price et al. used TURBO-ID of multiple P granule components to identify new factors required for their assembly and function. They identify over 75 shared proteins, including 2 related TUDOR-domain proteins that they analyze in further detail through mutation and localization studies. The two proteins are necessary for the localization of several P granule components to the nuclear periphery, and they show in an ectopic tissue that EGGD-1 is sufficient to localize GLH-1 to the nuclear periphery.

      While the paper could go further to test physical interactions between EGGD-1 (and EGGD-2) with GLH-1 and the nuclear pore protein, this is not critical, although the authors should be cautious in their model that they have not proven that these associations are direct.

      We agree with the reviewer. We revised the statement and added references that support the model.

      It would be important to provide evidence that the STOP-IN cassette in eggd-2 is a true null. There may be downstream methionines that can be used. And the phenotype is weaker than eggd-1 so this could be an issue.

      This is a good point. Using two CRISPR guide RNAs, we generated a second allele of eggd-2 by deleting its full open reading frame. This allele and the allele bearing 17-nt insertion exhibited similar phenotypes: 1). No noticeable change in PGL-1::TagRFP localization in the pachytene region, 2). PGL-1::TagRFP failed to concentrate in the germ lineage in the embryos. We updated the information in the Figure 3—figure supplement 1 and text (line 292-297).

      All of the TURBO-ID strains have reduced viability. Since there is some concern that P granule composition could be affected in the tagged strains, showing the localization of other known components of P granules in these mutants (GLH-1, PGL-1) would be critical.

      Thanks for this suggestion. Please see the responses above.

    1. Author Response:

      Reviewer #3 (Public Review):

      This is a well-written paper describing the OpenNeuro data archive. OpenNeuro covers diverse mesoscale brain imaging data (fMRI, PET, etc) from many projects and including multiple experimental paradigms. The focus is on human imaging, but primate and rodent data are also represented. The project rests on a standardized and relatively mature Data format (BIDS) for imaging data. This is key to implementing FAIR principles and automated checking for compliance. OpenNeuro is already producing discoveries that could not have been made without meta-analysis across diverse data sets. OpenNeuro is also used to improve data analysis pipelines. By its nature, this kind of paper always reads a bit like an advertisement.

      The paper makes a compelling case for domain-specific repositories supported by a modern cloud architecture and sound computer science. I appreciate the discussion of the challenges that have been overcome (e.g. versioning of data sets; privacy and consent) and others that are looming (e.g. long-term maintenance in the absence of obvious commercial drivers).

      I would like to see a bit more comparison to other archives, including some mature projects in other fields (e.g. astronomy, IVOA; structural molecular biology; CCDC), as well and more nascent efforts in brain research (e.g. DANDI; BIL).

      Rather than focus on direct comparisons to other archives, we have focused on the FAIR principles for data management. We also feel that, in an article highlighting OpenNeuro, it could be seen as bad form to compare critically to other databases, rather than letting the merits of OpenNeuro stand on their own.

    1. Author Response:

      Reviewer #1 (Public Review):

      Lenz et al have shown that IP injection of atRA does not affect sEPSC amplitude, sIPSC amplitude and frequency in the denta gyrus of both ventral and dorsal hippocampus. Interestingly, they observed a strong promoting effect of atRA on sEPSC frequency in the denta gyrus of dorsal, but not ventral, hippocampus. Lastly, they did not observe an difference in I/O in vivo, but did observe enhanced in vivo LTP in denta gyrus of mice injected with atRA which is abolished in the synaptopodin KO mice. The effect of atRA on LTP is very interesting as on sEPSC frequency in dorsal denta gyrus.

      1) I do not agree with the authors' claim that atRA does not have a major effect on excitatory synaptic transmission. It seems that the sEPSC frequency increase by ~100%. Even if the 4 outlier points are excluded, the rest of the data points still clearly indicate an increase of sEPSC frequency.

      We agree with the reviewer, that this point warrants further discussion. We have highlighted out findings in the revised version of the manuscript.

      The abstract (lines 30-32) of the revised manuscript now read: “No major changes in synaptic transmission were observed in the ventral hippocampus while a significant increase in both sEPSC frequencies and synapse numbers were evident in the dorsal hippocampus 6 hours after atRA administration."

      Lines 392-395: “Nevertheless, the results of the present study demonstrate increased sEPSC frequencies and synapse numbers in the dorsal hippocampus of atRA-treated animals, thereby confirming that atRA targets excitatory synapses in the dorsal hippocampus.”

      What is the possible explanation of increased sEPSC frequency by atRA in dorsal region? Increased excitability of presynaptic neurons? (use TTX to decipher this?) Increased spine density? It seems that the authors did dye fill already… Count spine density? AND/OR increased glutamate release probability? (PPR measurement?) Did the authors perform I/O measurement in slice?

      It is imperative that the authors tackle this issue head on.

      We thank the reviewer for these important comments. To further address this issue, additional experiments were performed and structural properties of asymmetric synapses were assessed in the molecular layer of the dorsal hippocampus using transmission electron microscopy (see new Figure 6). Indeed, these experiments revealed no significant difference in the morphological properties of individual asymmetric synapses, i.e., PSD length and presynaptic vesicle numbers. However, an increase in the number of PSDs per area was observed, which may reflect -at least in part- increased sEPSC frequencies in our experiments.

      Lines 302-316: “Next, transmission electron microscopy was used to assess the structural properties of excitatory synapses in the outer two thirds of the molecular layer in the dorsal hippocampus which is the layer of the major excitatory input from the entorhinal cortex (Figure 6). Cross sections of asymmetric synapses, i.e., the numbers and length of postsynaptic densities (PSD) and presynaptic vesicle counts, were quantified in control and atRA-treated mice (Figure 6A). It is well-established that PSD length in synaptic cross sections correlates to synaptic strength [43]. In agreement with our electrophysiological recordings, which showed no significant difference in the sEPSC amplitude between the groups (c.f., Figure 1D), PSD lengths did not significantly change in the atRA-treated group (Figure 6B). However, a robust increase in the number of PSDs per area was detected, and presynaptic vesicle counts were not significantly different between the two groups (Figure 6B, C). These results indicate that the structural properties of synapses are not affected by atRA, and that increased synapse numbers may explain the increased sEPSC frequencies in the dorsal hippocampus of atRA-treated mice.”

      Lines 373-379: “In the present study, however, we did not observe changes in excitatory synaptic strength in dentate granule cells in either the ventral or the dorsal hippocampus. Specifically, no changes in the sEPSC amplitudes were observed [12]. Consistent with these findings no major changes in the ultrastructural properties of excitatory synapses, i.e., PSD lengths and presynaptic vesicle counts of asymmetric synapses, were observed between the two groups. Interestingly, ultrastructural analysis revealed an increase in the number of asymmetric synapses in the dorsal hippocampus of atRA-treated animals.”

      2) The author need to specify which part of the denta gyrus for their in vivo study, as they discovered difference between ventral and dorsal in sEPSC frequency in slice preparation.

      Done! Thank you!

      Lines 183-186 now read: “Then a tungsten recording electrode (TM33B01KT, World Precision Instruments) was lowered in 0.1 mm increments while monitoring the waveform of the field excitatory postsynaptic potential (fEPSP) in response to 500 µA test pulses until the granule cell layer in the dorsal part of the hippocampus was reached (1.7-2.2 mm below the surface).”

      Lines 324-326 now read: “To test the effects of atRA on the ability of neurons to express synaptic plasticity, long-term potentiation (LTP) experiments on perforant path synapses to dentate granule cells were carried out in the dorsal hippocampus of anesthetized mice (Figure 7A).”

    1. Author Response:

      Reviewer #2 (Public Review):

      Valentini et al. explore the contribution of inexperienced homing pigeons in a pair, while finding the most efficient route back home. My comments below mostly concern the need of broadening the scope of the introduction and discussion by discussing and citing literature beyond homing pigeons as at the moment the manuscript could be characterized as too specific for the readership.

      We thank the reviewer for their suggestions which allowed us to expand the focus of our manuscript. Our answers to the reviewer’s comments are reported below together with modifications done on the revised manuscript.

      The authors use and present transfer entropy methods which regard the transmission of information from one individual to the other and effect of this information on behaviour. I haven't used such methods myself, but I think the methodology is nicely explained and easy to follow as it's written here. However, I would still encourage the authors to avoid jargon and un-introduced terms while first presenting their methods and results in the introduction and results sections. I also think that the paragraph in the introduction (L92-104) that refers to transfer entropy (TE) has to be extended and also direct readers to reviews such as [1] that attempt to make TE accessible to a broad audience of non-physicists. Behavioural ecologists and primatologists that study leadership and influence in animals, using less data hungry methods than TE, will probably be interested in reading this manuscript. Because eLife is a journal that attracts a very broad audience I would suggest investing more on better introducing TE to biologist and anthropologists.

      We thank the reviewer for their suggestions. In the revised version of our manuscript, we clarified the meaning of symbols and unintroduced terms and extended the introduction paragraph about transfer entropy to provide more information. We now discuss data requirements of information-theoretic approaches, point the reader towards recent literature reviews aimed at introducing these (and similar) approaches to the community of behavioural ecology, and better introduce the advantages of transfer entropy with respect to methods based on models of alignment, attraction, and repulsion.

      “Leader–follower interactions of this sort can be accurately captured using information-theoretic measures that quantify causal relations in terms of predictive information (Butail, Mwaffo, and Porfiri 2016; Kim et al. 2018; Crosato et al. 2018; Ray et al. 2019; Valentini et al. 2020). This methodological approach, which generally requires large amounts of data (but see (Porfiri and Ruiz Marín 2020)), is gaining popularity among behavioural ecologists (Strandburg-Peshkin et al. 2018; Pilkiewicz et al. 2020) as tools for automatic monitoring and extraction of the necessary volumes of behavioural data become increasingly available (Egnor and Branson 2016). One of these measures, transfer entropy, quantifies information about the future behaviour of a focal individual that can be obtained exclusively from knowledge of the present behaviour of another subject (Schreiber 2000). Transfer entropy measures information transferred from the present of the sender to the future of the receiver (Lizier and Prokopenko 2010). It explicitly accounts for autocorrelations characteristic of individual birds’ trajectories (Mitchell et al. 2019) by discounting predictive information available from the sender’s present that is already included in the receiver’s past (see Figure 1). Furthermore, it does not require a model of how sender and receiver interact, and it is well suited to study social interactions both over space and time (Lizier, Prokopenko, and Zomaya 2008; Strandburg- Peshkin et al. 2018). This aspect of transfer entropy encompasses traditional methods to quantify collective movement that are based on modelling an individual’s behaviour as a combination of three motional tendencies (Couzin et al. 2002) – alignment of direction to nearby group members, attraction towards sufficiently distant members, and repulsion from sufficiently close members – that allow an individual to maintain proximity to the group. In this context, transfer entropy is advantageous as it can capture causal interactions due not only to alignment forces (Nagy et al. 2010) but also to attraction and repulsion forces that result in temporarily unaligned states (Pettit, Perna, et al. 2013).”

      A thought I had while reviewing this work regards the theory of the wisdom of the crowd [2]. This indicates that when a group or a collective averages the different estimates of its members, they reach a more accurate collective estimate. Studies have also shown that animals can average their movement directions to resolve conflicts of interest [3,4]. The current manuscript also shows that pooling infomration leads to better movement decisions. Would it thus make sense for this manuscript to discuss how its findings may support the wisdom of the crowd theory?

      We thank the reviewer for the suggestion. In the revised version of out manuscript, we included a new paragraph where we discuss a possible connection with the phenomenon of the wisdom of crowds as well as how our results might generalize to flocks of larger size.

      “The ability of groups to outperform single individuals by pooling information across their members is an aspect of collective intelligence that has long intrigued researchers. One potential mechanism underlying this phenomenon, popularly known as the wisdom of crowds (Surowiecki 2005), is averaging many individuals’ estimates independent from each other. Averaging individual decisions is expected to provide a more accurate group estimate than any individuals’ guess. Previous studies have also shown that animals can average their movement decisions to reach a compromise (Biro et al. 2006; Strandburg-Peshkin et al. 2015). Although the mechanisms by which experienced and naïve individuals pool information during route development remain unknown, our study points to the importance of naïve group members within the information-pooling process. Moreover, the wisdom of crowds is known to require personal information to be independent among group members (Couzin 2018) otherwise group performance can degrade quickly for increasing group size (Kao and Couzin 2014). Experimental pairs could thus benefit from pooling information with naïve individuals that, at least at the beginning of each generation, likely provide a source of information independent from that of the experienced bird. The potentially deleterious effects of losing independence may provide another pressure to shift over time from innovative exploration to route6 preserving exploitation. It remains to be explored how our results generalize to larger flock sizes. Previous experiments without generational replacement showed that, even in larger flocks, birds flying ahead of the flock had a tendency to assume leadership positions (Nagy et al. 2010). However, the repeated introduction of naïve individuals into larger flocks might complicate the dichotomy between leaders and followers by inducing turnover dynamics between the front and the back of the flock.”

      As briefly mentioned earlier, I think that the cited literature in this manuscript (especially in L58-138 and throughout the discussion) includes mostly studies on homing pigeons whereas relevant studies to the current manuscript have been performed on other species and by discussing and citing relevant studies on various species the manuscript would become more attractive to a broader audience and wouldn't read as homing-pigeon specific.

      We thank the reviewer for pointing us towards additional literature related to our study. We included the suggestions from the reviewer as well as further references to a broader literature to expand the scope of our manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary: In " Rapid and Sensitive Detection of SARS-CoV-2 Infection Using Quantitative Peptide Enrichment 1 LC-MS/MS Analysis" Hober, A. et al. describe the addition of peptide immunoprecipitation by means of SISCAPA technology to the Sars-Cov2 mass spectrometry-based diagnostics toolbox. The work shows in a straightforward way that this is a huge improvement and of great importance to the field. It shows beyond any doubt that mass spectrometry can become a clinically applied diagnostic instrument to detect (viral) infection.

      Overall remark: The main concern is the reported number of 83% sensitivity. This is not because the number is too low, but because the number is misleading. In line with "CLSI EP 12-A2 User Protocol for Evaluation of Qualitative Test Performance guidance" a summary of the sample analysis results are shown in a 2x2 contingency table. Unfortunately, I oppose to this representation of the results at this stage for three reasons: (i) reporting a percentage shouldn't be done on less than 100 samples because of the weight of a few misannotated samples on these numbers, be it in the qPCR or the MS results; (ii) because both assays are imperfect, it is impossible to assess the ground truth for calling patients and thus assess sensitivity and specificity; (iii) the authors still only target a single peptide, which is not conventional in MS-based assays that targets proteins.

      We have changed to PPA and NPA in the new version of the manuscript. We have also included 264 RT-PCR negative samples collected in the same study. We agree that protein quantification should not be done using only one single peptide. We have updated the manuscript to clarify that we do not perform protein quantification, but rather peptide quantification.

      Rather than the proposed confusion matrix, which assumes that the ground truth is known to call it e.g. "false negatives", the authors could refer to it as an agreement matrix and not be tempted to calculate threshold values like sensitivity, which have too much of an impact on the clinical readership that is used to seeing this value in a more controlled context. This is in line with the recent Lancet manuscript from Fitzpatrick, M. et al (2021), proposing percent positive agreement (PPA) and percent negative agreement (PNA) instead (Fitzpatrick et al., 2021).

      We have decided to keep the confusion matrix but we are referring to it as PPA and NPA and rephrased sensitivity to “estimated sensitivity” based on PPA.

      More specifically, as we and others have shown, qPCR Ct values rarely agree in two (consecutive) analyses, even within accredited settings (personal communication NHS). Above Ct30, patients regularly turned negative in our hands (https://doi.org/10.1021/jacsau.1c00048), even with an assay that had proven detectability of 1 plasmid at Ct40. Furthermore, we suspect that freeze-thaw cycles further inflate this uncertainty, two of which the current samples were subjected to. Undetected mRNA would then classify these patient samples as "false positives" if they did yield signal in the LCMS results. By chance, this did not happen in this manuscript, yet this could very well be the reason for the highest signal reported in Figure 3 as a green dot at log2 MRM response of -6 (see minor remarks).

      The authors already distinguished the patients in a High Pool of Ct <30, a Low Pool 30{less than or equal to}Ct<33 and the negative samples (Ct>40). It is clear from the gap (no 34<Ct<39) that finding patients between Ct33 and Ct39 is challenging. Indeed, qPCR has its own "diagnostic grey zone" of LOQ negative and LOQ positive that rarely is being referenced. Thus, a "sensitivity" of 95% for patients <Ct30, despite the low number of samples and considering the uncertainties in qPCR (just above or below Ct30) at least limits the comparison to samples that are positive beyond any doubt. But again, we would be thresholding against a trembling metric, in turn making the claim from the authors dangerous that "the estimated LLOQ is 3 amol/μL approximates to Ct {less than or equal to}30". Rather, the Ct30 threshold should be set for a different reason, if one is chosen at all.

      What is needed is good thresholding for clinical diagnostics, as is done in qPCR. In the public hospital in Belgium that provided us with patient samples, the positive threshold is set to Ct33 on the first measurement and practitioners use higher Ct values only in the context of physical symptoms of the disease to come to a final conclusion. For MS, we now need to measure >1000 samples in order to decide what log2 MRM response for a given set of peptides corresponds to an LOQ positive from - say - Ct27 to Ct30 and an LOQ Negative from Ct31 to Ct33. In other words, the linearity of the correlation between qPCR and MS illustrates the intrinsic value of MS; the point up until which we can provide clinically relevant information remains to be determined on large patient cohorts. In turn, these large patient cohorts can allow to sort (clinically) validated patients according to signal intensity and set a log2 threshold at which e.g. 2% or 5% negatives are expected, in line with False Discovery calculations for target decoy strategies. At this stage however, it might be most straightforward to conclude with percent positive agreement (PPA) and percent negative agreement (PNA), as is recommended for laminar flow tests validated on <100 samples.

      Finally, realizing the importance of this pivotal moment in the implementation of MS in the clinic, I find it somewhat tricky to only focus on one peptide. In fact, the authors perform the qPCR on two genes (three genes being even more common) because of the drop-outs that can occur. I feel like the use of peptide IP with MRM for detecting pathogens has not yet matured enough to rely solely on one peptide. Still, I understand that asking for a second peptide would mean repeating all the measurements, so that is most probably not realistic. Yet, I do consider this to be yet another reason not to report % sensitivity and specificity in the current manuscript and the potential to gain robustness with more peptides should clearly be emphasized at every stage of the manuscript.

      We agree that the method would be much improved by adding another peptide to the repertoire. The method was developed using the most sensitive antibody-peptide pair and the most promising pair was used in the downstream process. We have highlighted the limitations of using only one peptide and emphasized that this is a proof-of-principle study.

      In conclusion, because patient batches in the thousands are currently unavailable to MS-oriented diagnostic labs and because of all the reasons mentioned above, we cannot report the numbers of sensitivity and specificity in this manuscript, as they are misleading and do not quantify what they are intended to do.

      Fitzpatrick, M. C. et al. (2021) 'Buyer beware: inflated claims of sensitivity for rapid COVID-19 tests', The Lancet. Lancet Publishing Group, pp. 24-25. doi: 10.1016/S0140-6736(20)32635-0.

      We agree and have changed to PPA and NPA for this reason.

      Major remarks: P3L250: "on-column amount of 60 amol." Because of the enrichment procedure, could the authors specify what initial conditions they spiked into the dilution series prior to enrichment. This would allow recalculation and avoid confusion about the correctness of the 60 amol on column claim (which in our hands is still detectable).

      We made changes to this in the updated version of the manuscript.

      P8L181: "50 μL elution buffer (0.5 % 180 formic acid, 0.03% CHAPS, 1X PBS) and incubated for 5 min at room temperature." This minor sentence is placed under major remarks, because in our understanding the elution buffer needs to be acidic and adding PBS will reduce acidity. If this is a typo, please correct. If this is not, could the authors try and use H2O instead and see if their results improve?

      The access to the raw data was denied.

      The raw data is accessible through the provided Panorama link and can be accessed under the tab “Raw Data”. The entry in ProteomeXchange, however, is only a reserved data set identifier for now, but the data will be made available through this link after the review process.

      Reviewer #2 (Public Review):

      MS-based proteomics is currently discussed as a method for detection of viruses from clinical samples. Several studies have already shown the potential of this method on the example of the detection of SARS-CoV-2 from respiratory specimens. However, one of the major drawbacks still remains the low sensitivity of MS-based virus detection compared to real-time PCR, which is the gold-standard method. In their manuscript Hober and colleagues apply specific antibody-based enrichment of SARS-CoV-2 peptides from upper airway samples to concentrate the analyte prior to analysis by targeted MS (MRM). The authors determined the dynamic range of the method for four different SARS-CoV-2 NCAP peptides using a calibration curve. On the example of the SARS-CoV-2 NCAP peptide AYNVTQAFGR a correlation between the MS result and the cT value is shown. Furthermore, using stable isotope labelled (SIL) peptides as internal reference, a quantitative MS measurement was achieved. The presented approach is able to distinguish real-time PCR SARS-CoV-2 positive samples from negative samples in the used set of 88 samples from asymptomatic patients. Combined with a specificity of 100 % and sensitivities of up to 94.7 % for samples with cT values {less than or equal to} 30 the authors conclude that the method could be an alternative to real-time PCR.

      Strengths of the manuscript:

      I think the applied technique (SISCAPA) is highly interesting in the context of virus proteomics. This is because virus proteins are often underrepresented in relation to the host proteins, especially during early time points of infection, hampering their detection. Recently, the application of SISCAPA for SARS-CoV-2 diagnostics has been suggested in the discussion of a manuscript from Van Puyvelde and colleagues. The manuscript from Hober and colleagues presents the first study demonstrating that this technique can be applied to enrich, detect and quantify SARS-CoV-2 peptides from upper airway samples. The manuscript is clearly arranged, the data is sound and supports the main conclusions.

      Weaknesses of the manuscript:

      I think the manuscript in some points underestimates the PCR and vice versa overemphasizes the proteomics approach. For example, I don't agree that real-time PCR generally suffers from technical problems, degraded probes or non-specific amplification. Vice versa I think the LC-MS/MS approach is not inherently absolute specific and does not outperform PCR in terms of specificity. Further, LC-MS/MS does not eliminate the problem of false positives, which could be introduced during sample preparation or by inter-run contaminations. Although in real-time PCR no internal standards analogous to isotopically labelled peptides are used there are internal controls used to assure the quality of the extraction and the PCR reaction itself. The method presented by Hober and colleagues is clearly beneficial for the field of proteomics-based virus detection, but I suggest a more balanced discussion also including also the potential drawbacks of the method.

      Another point I like to raise is that the authors conclude at the end of the results section that patient samples were collected at an infectious stage.

      We have made changes to the manuscript accordingly and removed the claim that the samples were collected in an infectious stage since this cannot be confirmed. The patients did not show any symptoms when sampled, which has been highlighted in the new version.

      However, an assessment of the infectivity cannot be drawn from the presented data. The analysis of real-time PCR results in the manuscript is based on cT values. But to draw the conclusion, that the analysed samples contained infectious virus particles, the number of viral genome equivalents has to be determined, which in turn can be correlated to infectivity.

      We have removed this section since we cannot make any claim on infectious virus-particles.

      The detection of viral proteins itself does not proof that samples were collected at an infectious stage and there is currently no correlate of the amount of NCAP protein and infectivity. Since viral proteins are likely more stable than viral RNA, they could even be detectable for a more prolonged time in patient samples.

      Reviewer #3 (Public Review):

      Major comments

      P2, l245, Figure 2: It is not completely clear to me what is represented in panels A and B. Is this the pure SIL peptide of the endogenous peptide in a complex matrix? This may make a large difference for the determination of the LLOQ. Panel B shows a calibration curve and as these are curves for which the signal is detected based on known input amounts of sample, I assume that the input is pure SIL peptide here?

      In panel A, what does '3 amol/ul' in the middle chromatogram exactly mean? Is this the endogenous peptide that was calculated to be present at 3 amol/ul based on a known concentration of spiked-in SIL peptide?

      P4, l276: The authors need to explain the details of data imputation. It is unclear which data were imputed and how this was done. In Figure 3 the grey data points represent "not detected" or "inconclusively identified" samples by LC-MS, while some of the data points seem to have a higher 'response' values than others. Please explain.

      In Figure 3, how is 'response' defined? I don't understand the following sentence (p4, l277): "… for the LC-MS results the lowest response divided by three was used, mimicking….". Which variable does the data point size reflect? There seem to be clear differences in ball sizes. Please explain. For clarity, it would be advisable to keep the y-axes for panels A and B identical. Also, how could RT-PCR values be not obtained, apparently leading to missing Ct values (p5, l278)?

      Assuming that all collected samples from individuals in the test group in this study are visualized in Figure 3, the majority was tested positive for SARS-CoV-2. This is very different from the percentages oberserved in regular testing facilities. How was the study group composed? Were these individuals who were already admitted to the hospital?

      We have specified that the sampels were selected based on RT-PCR result and have included more negative samples in the new version of the mansucript. We have also speciied how individuals were enrolled into the study.

      It would be interesting to include more negatively tested individuals to see the distribution of 'MRM response' values in this group, since some of the negatively tested individuals (green data points) show higher than expected MRM response values if no viral protein is present at all. Related to this, I do not understand how a specificity score of 100 % (p5, l292) was obtained while some green data points (negative by RT-PCR) have higher associated MRM response values than some of the blue (positive by RT-PCR) samples. Can the authors explain this?

      The negative samples that show a stronger MRM response do not have the required qualifying ions, thereby failing the QC parameter of the assay. This has been clarified in the new version of the manuscript.

      I find the text from p6, l298 ("However…") onward more suited for the Discussion section, since this is about the interpretation of the results presented here and the use of the described methodology in diagnostics; no results are shown in this part.

    1. Author Response:

      Reviewer #3 (Public Review):

      This paper reports that levodopa administration to healthy volunteers enhances the guidance of model-free credit assignment (MFCA) by model-based (MB) inference without altering MF and MB learning per se. The issue addressed is fascinating, timely and clinically relevant, the experimental design and analysis strategy (reported previously) are complex, but sophisticated and clever and the results are tantalizing. They suggest that ldopa boosts model-based instruction about what (unobserved or inferred) state the model-free system might learn about. As such, the paper substantiates the hypothesis that dopamine plays a role specifically in the interaction between distinct model-based and model-free systems. This is really a very valuable contribution, one that my lab and I expect many other labs had already picked up immediately after it appeared as a preprint.

      Major strengths include the combination of pharmacology with a substantial sample size, clever theory-driven experimental design and application of advanced computational modeling. The key effect of ldopa on retroactive MF inference is not large, but substantiated by both model-agnostic and model-informed analyses and therefore the primary conclusion is supported by the results.

      The paper raises the following questions.

      What putative neural mechanism led the authors to predict this selective modulation of the interaction? The introduction states that "Given DA's contribution to both MF and MB systems, we set out to examine whether this aspect of MB-MF cooperation is subject to DA influence." This is vague. For the hypothesis to be plausible, it would need to be grounded in some idea about how the effect would be implemented. Where exactly does dopamine act to elicit an effect on retroactive MB inference, but not MB learning per se? If the mechanism is a modulation of working memory and/or replay itself, then shouldn't that lead to boosting of both MB learning as well as MB influences on MF learning? Addressing this involves specification of the mechanistic basis of the hypothesis in the introduction, but the question also pertains to the discussion section. Hippocampal replay is invoked, but can the authors clarify why a prefrontal working memory (retrieval) mechanism invoked in the preceding paragraph would not suffice. In any case, it seems that an effect of dopamine on replay would also alter MB choice/planning?

      In sum, we agree with this criticism and have now revised the relevant intro paragraph (p. 3/4).

      We now discuss DAergic manipulation of replay in particular (p. 24). We infer that a component of a MB influence over choice comes from the way it trains a putative MF system (something explicitly modelled in Mattar & Daw, 2018, and a new preprint from Antonov et al., 2021, referencing data from Eldar et al., 2020) – and consider what happens if this is boosted by DA manipulations. The difference between the standard two-step task and the present task is that in our task there is extra work for the MB system in order to perform inference so as to resolve uncertainty for MFCA. We later suggest that the anticorrelation we found between the effect of DA on MB influence over choice and MB guidance of MFCA arises from this extra work.

      The broader questions raised about (prefrontal) working memory and (hippocampal) replay pertains to recent and ongoing work, and we feel this should be part of the discussion, which we have re-written this to detail more clearly different possible mechanistic explanations, pointing to how they might be tested in the future (p. 23/24).

      A second issue is that the critical drug effects seems somewhat marginally significant and the key plots (e.g. Fig3b and Fig 44b,c, but also other plots) do not visualize relevant variability in the drug effect. I would recommend plotting differences between LDopa and placebo, allowing readers to appreciate the relevant individual variability in the drug effects.

      We have now replotted the data in the new Figures 4 and 5 to reflect drug-related variability.

      Third, I do wonder how to reconcile the lack of a drug x common reward effect (the lack of a dopamine effect on MF learning) as well as the lack of a drug effect on choice generalization with the long literature on dopamine and MF reinforcement and newer literature on dopamine effects on MB learning and inference. The authors mention this in the discussion, but do not provide an account. Can they elaborate on what makes these pure MB and MF metrics here less sensitive than in various other studies, and/or what are the implications of the lack of these effects for our understanding of dopamine's contributions to learning?

      Regarding a lack of a drug effect on MF learning or control, we now elaborate on this on p. 22/23:

      “With respect to our current task, and an established two-step task designed to dissociate MF and MB influences (Daw et al., 2011), there is as yet no compelling evidence for an direct impact of DA on MF learning or control (Deserno et al., 2015a; Kroemer et al., 2019; Sharp et al., 2016; Wunderlich et al., 2012, Kroemer et al., 2019). A commonality of our novel and the two-step task is dynamically changing reward contingencies. As MF learning is by definition incremental, slowly accumulating reward value over extended time-periods, it follows that dynamic reward schedules may lessen a sensitivity to detect changes in MF processes (see Doll et al., 2016 for discussion). In line with this, experiments in humans indicate that value-based choices performed without feedback-based learning (for reviews see, Maia & Frank, 2011; Collins and Frank, 2014), as well as learning in stable environments (Pessiglione et al., 2006), are susceptible to DA drug influences (or genetic proxies thereof) as expected under an MF RL account. Thus, the impact of DA boosting agents may vary as a function of contextual task demands. This resonates with features of our pharmacological manipulation using levodopa, which impacts primarily on presynaptic synthesis. Thus, instead of necessarily directly altering phasic DA release, levodopa impacts on baseline storage (Kumakura and Cumming, 2009), likely reflected in overall DA tone. DA tone is proposed to encode average environmental reward rate (Mohebi et al., 2019; Niv et al., 2007), a putative environmental summary statistic that might in turn impact an arbitration between behavioural control strategies according to environmental demands (Cools, 2019).”

      As pointed out by the reviewer as well, in the present task we did not find an effect of levodopa on MB influences per se and now discuss this on p. 22:

      “In this context, a primary drug effect on prefrontal DA might result in a boosting of purely MB influences. However, we found no such influence at a group level – unlike that seen previously in tasks that used only a single measure of MB influences (Sharpe et al., 2017; Wunderlich et al., 2012). Our novel task systematically separates two MB processes: a guidance of MFCA by MB inference and pure MB control. While we found that only one of these, namely guidance of MFCA by MB inference, was sensitive to enhancement of DA levels at a group level, we did detect a negative correlation between the DA drug effects on MB guidance of MFCA and on pure MBCA. One explanation is that a DA-dependent enhancement in pure MB influences was masked by this boosting in the guidance of MFCA by MB inference. In this regard, our data is suggestive of between-subject heterogeneity in the effects of boosting DA on distinct aspects of MB influences.”

      Another open question remains as to why different task conditions (guidance of MFCA by MB vs. pure MB control) apparently differ in their sensitivity to the drug manipulation. We discuss this (p. 22) by proposing that a cost-benefit trade-off might play an important role (Westbrook et al., 2020).

      Fourth, the correlation with WM and drug effect on preferential MBCA for non-informative but not informative destination is really quite small, and while I understand that WM should be associated with preferential MBCA under placebo, it does not become clear what makes the authors predict specifically that WM predicts a dopa effect on this metric, rather than the metric taken under placebo, for example.

      Our initial reasoning was that MFCA based on reward at the non-informative destination should be particularly sensitive to WM, on the basis that the reward is no longer perceptually available once state uncertainty can be resolved by the MB system. However, we agree with the reviewer that this reasoning does not indicate why it should specifically effect the drug-induced change. In light of this critique, we have removed this part from the abstract, introduction and the main results but still report this relation to WM in Appendix 1 (p. 44/45, subheading “Drug effect on guidance of MFCA and working memory”, Appendix 1 - Figure 11) as an exploratory analysis as suggested in the editor’s summary.

      A fifth issue is that I am not quite convinced about the negative link between dopamine's effects on MBCA and on PMFCA. The rationale for including WM, informativeness as well as DA effects on MBCA in the model of DA effects on PMFCA wasn't clear to me. The reported correlation is statistically quite marginal, and given that it was probably not the first one tested and given the multiple factors involved, I am somewhat concerned about the degree to which this reflects overfitting. I also find the pattern of effects rather difficult to make sense of: in high WM individuals, the drug-effects on PMFCA and MBCA are negatively related for informative and non-informative destinations. In low WM individuals, the drug-effects on PMFCA and MBCA are negatively related for informative, but not non-informative destinations. It is unclear to me how this pattern leads to the conclusion that there is a tradeoff between PMFCA and MBCA. And even if so, why would this be the case? It would be relevant to report the simple effects, that is the pattern of correlations under placebo separately from those under ldopa.

      The reviewer’s critique is well taken. In connection to the working memory finding reported in the previous section of the initial manuscript, we reasoned that it would be necessary to include WM in the model as well. We still consider this analysis on inter-individual differences in drug effects from different task conditions is important because it connects our current work to previous work linking DA to MB control. However, we now perform a simplified analysis on this where we leave out WM and instead average PMFCA across informative and non-informative destinations (since we had no prior hypothesis that these conditions should differ, p. 19/20). This results in a significant negative correlation of drug-related change in average PMFCA and MB control (Figure 6A, r=-.31,p=.02 Pearson r=-.30, p=.017, Spearman r=-.33, p=.009). In addition, we also ran extended simulations to verify that this negative correlation does not result from correlations among model parameters (see Appendix 1 - Figure 10 for control analysis verifying that this negative correlation survives control for parameter-tradeoff).

      Figure 6. Inter-individual differences in drug effects in MBCA and in preferential MFCA, averaged across informative and non-informative destinations (aPMFCA). A) Scatter plot of the drug effects (levodopa minus placebo; ∆ aPMFCA, ∆ MBCA). Dashed regression line and r Pearson correlation coefficient. B) Drug effects in credit assignment (∆ CA) based on a median on ∆ MBCA. Error bars correspond to SEM reflecting variability between participants.

      As suggested by the reviewer, we unpack this correlation further (p. 19/20) by taking the median on Δ MBCA (-0.019) and split the sample in lower/higher median groups. The higher median group showed a positive (M= 0.197, t(30)= 4.934, p<.001) and the lower-median group showed a negative (M= -0.267, t(30)= -7.97, p<.001) drug effect on MBCA, respectively (Figure 6B). In a mixed effects model (see Methods), we regressed aPMFCA against drug and a group indicator of lower/higher median Δ MBCA groups. This revealed a significant drug x Δ MBCA-split interaction (b=-0.17, t(120)=-2.05, p=0.042). In the negative Δ MBCA group (Figure 6B), a significantly positive drug effect on aPMFCA was detected (simple effect: b=.18, F(120,1)=10.35, p=.002) while in the positive Δ MBCA group a drug-dependent change in aPMFCA was not significant (Figure 6B, simple effect: b=.02, F(120,1)=0.10, p=.749).

      We have changed the respective section of the results accordingly (p. 19/20). Further, we have motivated this exploratory analysis more clearly in the introduction (p. 3/4) in terms of it providing a link to previous relevant studies (Deserno et al., 2015a; Groman et al., 2019; Sharp et al., 2016; Wunderlich et al., 2012). Lastly, we have endeavoured to improve the discussion on this (p. 21/22).

      More generally I would recommend that the authors refrain from putting too much emphasis on these between-subject correlations. Simple power calculation indicates that the sample size one would need to detect a realistically small to medium between-subject effect (that interacts with all kinds of within-subject factors) is in any case much larger than the sample size in this study.

      We agree with this and have, as mentioned above, substantially adjusted the section on inter-individual differences. We have moved the WM analysis to Appendix 1 (p. 44/45, subheading “Drug effect on guidance of MFCA and working memory”, Appendix 1 - Figure 11) and greatly simplified the analysis of inter-individual differences in drug effects (see previous paragraph). We also mention the overall small to moderate effects in the limitations section (p. 25/26).

      Another question is how worried should we be that the critical MB guidance of MFCA effect was not observed under placebo (Figure 3b)? I realize that the computational model-based analyses do speak to this issue, but here I had some questions too. Are the results from the model-informed and model-agnostic analyses otherwise consistent? Model-agnostic analyses reveal a greater effect of LDopa on informative destination for the ghost-nominated than the ghost-rejected trials and no effect for noninformative destination. Conversely model-informed analyses reveal a nomination effect of ldopa across informative and noninformative trials. This was not addressed, or am I missing something? In fact, regarding the modeling, I am not the best person to evaluate the details of the model comparison, fitting and recovery procedures, but the question that does rise is, and I would make explicit in the current paper how does this model space, the winning model and the modeling exercise differ (or not) from that in the previous paper by Moran et al without LDopa administration.

      A detailed response to this was provided in replay to point 6 as summarized by the editor. And we provide a summary here as well.

      Firstly, we clearly indicate discrepancies between our model-agnostic and computational modelling analyse and acknowledge that discrepancies may be expected when effects of interest are weak to moderate, which we acknowledge (p. 25/26, limitations).

      Secondly, the results from the computational model are generally statistically stronger, which is not surprising given that they are based on influences from far more trials. We now include a discussion of this in more detail in the section on limitations (p. 25/26).

      Thirdly, although the computational model uses a slightly different parameterization from that reported in Moran et al. (2019), it is a formal extension of that model, allowing the strength of effects for informative and uninformative destinations to differ. We now include a reference to this change in parameterization in the limitation section (p. 25/26), and include a more detailed description in Appendix 1 (p. 45-47).

      Finally, to test if the current models support our main conclusion from Moran et al. (2019) that retrospective MB inference guides MFCA for both the informative and non-informative destinations, we reanalysed the Moran et al. (2019) data using the current novel models and found converging support, as we now report (Appendix 1 – Figure 8).

      Finally, the general story that dopamine boosts model-based instruction about what the model-free system should learn is reminiscent of the previous work showing that prefrontal dopamine alters instruction biasing of reinforcement learning (Doll and Frank) and I would have thought this might deserve a little more attention, earlier on in the intro.

      The reviewer is indeed correct and we now reference this line of work (Doll et al., 2009, 2011) in the intro (p. 4).

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors have conducted an investigation in to the impact of evolution on the endometrium.

      A major strength of this work is the use of published single cell RNA seq and ChIPseq data sets to support their findings. The major weakness is the use of an algorithmic approach to deconvolve a specific endometrial signal.

      The authors have broadly achieved their aims and the results support the conclusions. The weakness with the algorithmic approach to determine specific transcriptomic signal have been carefully addressed and the data presented in figure 1 is persuasive. I would be still slightly concerned that issues with the comparison of a receptive and non-receptive endometrium have not been fully accounted for. It would be nice to see this. It would also be of interest to understand the impact of diapause on this analysis. Can the authors comment?

      The evolutionary impact on the developing endometrium is of major importance to translational investigation of adverse events during human pregnancy. The methods presented are well described and straight forward for a computational group to follow.

      We note that all RNA-Seq datasets used in the evolutionary analyses were from pregnant endometria, thus there is no need to account for or way to compare receptive and non-receptive endometrium samples or diapause (but this is a very interesting question for future studies!).

      Reviewer #3 (Public Review):

      Strengths:

      Mika and colleagues used a comparative transcriptomics approach to identify genes (based on binary {plus minus} expression calls) that were recruited or eliminated in the evolutionary biology of the human endometrium. The recruited genes were then analyzed for potential roles in pregnancy pathophysiology using bioinformatic approaches. The study contributes to ongoing interest in the effect of human evolution on the pathophysiology of human pregnancy, and it is proposed that evolutionary studies of this kind, in combination with traditional methods, can be used to better characterize the genetic architecture of disease.

      Weaknesses:

      The conclusions of the paper are mostly supported by the analyses. However, it is unclear how the evolution of endometrial cell gene expression would contribute to adverse pregnancy outcomes since such conditions would compromise reproduction and therefore be selected against.'

      We apologize if this reasoning was unclear. Our hypothesis is that genes that gained (or lost) endometrial expression in the human lineage will be important for the establishment, maintenance, and cessation of pregnancy. Their contribution to adverse pregnancy outcomes would be through mis-expression leading to dysfunction of the pathways they regulate. Variants that lead to mis-expression will then be selected against, although the efficacy of that negative selection will depend on numerous factors such as population size, drift, penetrance, and the effect size of the mutation.

      It is stated that hundreds of genes that gained or lost endometrial expression in the human lineage were identified but these are not listed.

      Genes that gained and lost expression were given in Figure 1 – Source data 2. However, we have added the files Figure 2 – Source data 2, which lists the HUGO gene names for genes that gained expression (BPP ≥ 0.80) in the Hominoid (human) lineage, and Figure 2 – Source data 3, which lists the HUGO gene names for genes that lost expression (BPP ≥ 0.80) in the Hominoid (human) lineage. We hope this makes these results more accessible to a broad audience.

      Three genes were examined in detail for their roles in pregnancy and human-specific maternal-fetal communication but the rationale for selecting these genes is lacking.

      We selected HTR2B and PDCD1LG2 because they had not previously been implicated in pregnancy, demonstrating the importance of explicit evolutionary studies for discovery of genes important for tissue and organ function, and CORIN because it had previously been shown to be important for pre-eclampsia but has restricted endometrial expression across species, demonstrating the importance of evolutionary studies for understanding the conservation of gene expression in tissue and organ systems across species.

      The uncertain quality of the source transcriptome data is a weakness. The level of transcriptome "noise" in the data sets is unclear. It appears that the transcriptome data from most species was from bulk tissue total RNA and stage of pregnancy and anatomical site (e.g., over the placenta or at the fetal membranes) is not specified. Dissecting and isolating pregnancy endometrium is not trivial and as such this is a likely source of significant variation. Data on placenta-specific gene expression is provided to demonstrated lack of trophoblast cell contamination, however, this does not mean that the RNA was exclusively from endometrial cells since numerous non-endometrial cells are present a the maternal-fetal interface. Consequently, binary gene expression as on/off based on a 2 TPM threshold is problematic since it may be affected by the proportion of endometrial cells in the sample rather than gene expression in endometrial cells. In addition, although the application of binary encoding is understandable, important biology may be missed because gene function extends beyond on/off state.

      We agree that there is likely significant “noise” in the cross-species gene expression data for all of the reasons the reviewer indicates. Indeed, the MDS plot shown for gene expression levels (expressed as TPMs) in Figure 1 – Figure 1A indicates there is significant noise in the data that overwhelms phylogenetic signal. However, binary encoding reduces the noise and unmasks phylogenetic signal as shown in Figure 1 – Figure 1B. Thus, binary encoding at least partly alleviates the noise problem as well as variation from different sampling locations.

      We also agree that binary encoding will be affected by the proportion of different cell-types in each sample and that it is almost certain that the proportion of different cell-types will vary between species. Again, however, this is an argument in favor of binary encoding because it will reduce noise in quantitative gene expression estimates that arise from proportional cell-type differences between species. We note, however, that we do not claim all or even most of the gene expression changes we have identified are in endometrial cells. The data does, however, suggest that gene expression changes is enriched in endometrial cells, which guided our focus in gene expression in this cell-type. This is also one of the reasons we followed up with scRNA-Seq data, to identify which recruited genes were expressed in which cell-types, and confirmed expression of HTR2B, PDCD1LG2, and CORIN, in endometrial cells.

      Finally, we also agree that binary encoding is guaranteed to miss quantitative gene expression differences between species that are important for the biology of pregnancy. However, our goal in this study was to focus on more gross gene expression changes which may have larger effect sizes than quantitative gene expression changes (although that is an assumption that itself requires validation).

      Use of the Vento-Tormo scRNAseq data set (Nature 2018, 563:347-353) to establish the first trimester endometrial cell transcriptome is a strength. The study would be improved, however, if those data were compared with the term maternal-fetal interface scRNAseq data set produced by the Gomez-Lopez group (Pique-Regi et al. eLife 2019;8:e52004).

      We have attempted several different methods to directly compare the VentoTormo and Pique-Regi scRNA-Seq datasets, and agree that this would be a potentially interesting comparison. Unfortunately, however, while the VentoTormo dataset is publicly available without restriction, the Pique-Regi dataset is only available through dbGAP. The data restrictions imposed by dbGAP require requesting (and being approved for) data access, which was not granted before submission of our revised manuscript. Therefore, we were unable to compare/contrast these datasets.

      In Caveats and Limitation, the authors admit that they are unable to identify truly human-specific gene expression changes in pregnancy endometrium, yet sweeping conclusions are made about the changing transcriptome of the human endometrium and how the presumed changes in gene expression contribute to extant pathophysiology. The claim that the comparative transcriptomic approach (based on binary gene expression) provides and insight into human pathophysiology is therefore questionable.

      While we agree with the reviewer that we interchangeably using human and human lineage specific when the taxon sampling only allows us to identify changes in apes (Hominoidea), we were writing for a more general audience than evolutionary biologists who might be unfamiliar with primate taxonomic nomenclature. Indeed, this is why we included the statement in the Caveats and Limitations section.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors provide evidence for the following key points:

      • that low and likely biologically relevant levels of oxidized phospholipids (OxPLs) can induce macrophage death and interleukin-1-beta release
      • that the pro-inflammatory activities of OxPLs can be tempered by acyloxyacyl hydrolase (AOAH) which deacylates oxPLs in vitro
      • that AOAH deficient mice exhibit exacerbated inflammation in vivo in response to exogenously delivered OxPLs, but interestingly, also in response to HCl, which presumably induces the release of endogenous OxPLs

      In general the data are a nice combination of in vitro and in vivo observations and are supportive of the conclusions. A few points should be addressed:

      • how do the authors reconcile their results with others' apparently contradictory results in the field?

      We thank the reviewer for raising this important question. We think the oxPL species used and their concentrations, the routes of MAMP and oxPL delivery, and the order of addition of MAMP and oxPLs may contribute to the observations made in different laboratories. We have added a paragraph in the Discussion and another in the Methods, lines 447-474 and lines 495-506 (highlighted).

      • which inflammasome is activated by OxPLs?

      We found that NLRP3 specific inhibitor MCC950 reduced PGPC or LPC-induced inflammasome activation and IL-1β release. To our surprise, using inhibitors we found that in addition to caspase 1, caspase 8 was also indispensable, suggesting that caspase 8 may cleave caspase 1 and activated caspase 1 cleaves pro-IL-1β (Chi et al., 2014; Philip et al., 2014). Please see lines 94-105, new Fig. 1E, F and new Fig. 3B, C.

      • can the possible effects of AOAH on the priming stimulus (Pam) be more cleanly distinguished from its effects on OxPLs?

      Because AOAH does not regulate acute responses to LPS (Lu et al., 2008) or Pam3 (Fig. 4C, IL-6) in vitro or in vivo (Lu et al., 2008; Zou et al., 2017), we do not expect AOAH to modulate the priming effects of Pam3 or LPS. To exclude this possibility, we tested CpG, which can also prime macrophages for oxPL-induced inflammasome activation. We found that when AOAH WT and KO macrophages were primed with CpG, PGPC induced more cell death and IL-1β release from AOAH KO macrophages. Please see lines 220-225 and new Fig. 4E.

      • a few other experimental controls could be provided

      We have added actin controls to all Western blots.

      Reviewer #2 (Public Review):

      Zou et al. investigated the function of acyloxyacyl hydrolase (AOAH) in inflammation caused by oxidised lipids. Using cell culture models (murine BMDs) the authors first show that oxidised lipids such as oxPAPC, POVPC and PGPC induce inflammasome activation. Focusing on AOAH, they then demonstrate that AOAH, which can act as a phospholipase A1/2 or B, can remove sn-2 oxidised fatty acyl chains and sn-1 palmitate from pro inflammatory oxidised lipids thereby modulation their ability to activate inflammasome and induce cell death inflammation (IL-1b production). Release of sn-2 acyl chains from PGPC or POVPC results in the formation of LPC (lysophophatidylcholine) which has also pro-inflammatory properties. The author demonstrate that LPC also activated inflammasomes, and that that LPS, or PGPC or POVPC-induced inflammasome activation is enhanced in BMDMs from AOAH-deficient mice. Moving to mouse models of inflammation the author find that AOAH-deficient mice have higher level of lung inflammation and injury after nasal instillation of LPS+oxPLs, and that AOAH regulates inflammation after nasal instillation of HCl.

      The conclusions of this paper are mostly well supported by data, but some aspects need to be clarified and extended.

      1) what inflammasome/s is/are activated by PGPC, POVPC and LPC?

      Zanoni et al found that PGPC or POVPC but not oxPAPC can induce IL-1β release from primed bone marrow derived macrophages (BMDM) in a NLRP3-, Caspase 1/Caspase 11-dependent manner (Zanoni et al., 2017). Yeon et al also found that POVPC induced IL-1β and processed caspase 1 release from primed BMDM, which required NLRP3 (Yeon et al., 2017). In contrast, Muri et al., found that caspase 8 but not caspase 1 or NLRP3 was required for cyclo-epoxycyclopentenone-induced IL-1β release in primed bone marrow-derived dendritic cells or macrophages We found that NLRP3 specific inhibitor MCC950 reduced PGPC or LPC-induced inflammasome activation and IL-1β release. Using other inhibitors we found that in addition to caspase 1, caspase 8 was also indispensable, suggesting that caspase 8 may cleave caspase 1 and activated caspase 1 cleaves pro-IL-1β (Chi et al., 2014; Philip et al., 2014). Please see lines 94-105, new Fig. 1E, F and new Fig. 3B, C.

      2) how does AOAH affect the anti-inflammatory functions of oxPLs which have previously been reported (PMID:29520027, 32234476 )

      It is a very intriguing question. In this study, we focus on studying the role that AOAH plays in preventing oxPL-induced inflammasome activation. We will study whether AOAH alters the anti-inflammatory functions of oxPLs in the future. We have added a sentence in Discussion, lines 471 - 474.

      3) additional controls need to be provided to increase confidence into the immunoblot analysis

      Thanks. We have added actin loading controls.

      4) experimental procedures need to be better explained and justified

      dPGPC/dPOVPC means PGPC/POVPC treated with AOAH. AOAH can release both sn-2 and sn-1 fatty acyl chains from PGPC/POVPC. In addition, AOAH deacylates LPC. Please see Fig. 2A, B and Fig. 3A. We have clarified the definition of dPGPC/dPOVPC, line 144. The samples were frozen after treatment. Freezing in the absence of glycerol inactivates AOAH. We added a sentence to make it clear, lines 568, 569.

    1. Author Response

      Reviewer #2 (Public Review): Osteoblasts are highly anabolic cells that display a high proliferation rate and secrete ample amounts of extracellular matrix, indicating that these cells have a specific metabolic profile. Here, using a set of in vivo and in vitro experiments, Sharma et al describe that SLC1A5-mediated glutamine and asparagine uptake is critical to sustain osteoblast anabolism. While the experimental setup is robust, this concept has already been put forth, questioning therefore the novelty of the results. In addition, some of the author's claims are insufficiently supported by the presented data. Especially the metabolic role of asparagine in regulating osteoblast differentiation remains enigmatic. The main concerns are detailed below.

      1. Based on their data, the authors propose that the main mechanism whereby SLC1A5 regulates osteoblast proliferation and differentiation is via glutamine uptake, while asparagine only contributes to a lesser extent. Importantly, the concept that glutamine metabolism regulates proliferation and differentiation of osteogenic cells by sustaining anabolic processes has already been described recently, even by the same research group (Yu Y. Cell Metab. 2019; Stegen S. JBMR 2021), questioning the novelty of the present study. Moreover, no metabolic rescue experiments were performed to unequivocally demonstrate that the defect in amino acid/protein synthesis in SLC1A5-deficient cells was causing the decrease in osteoblast proliferation and differentiation.

      We appreciate the reviewer’s thorough and thoughtful review and we thank the reviewer for helping us to improve this manuscript. To address this, we evaluated proliferation or osteoblast marker genes in Slc1a5 deficient cells cultured in media supplemented with 10 times the normal concentration of the reduced amino acids (excluding Gln and Asn, Fig. 4B). There was no effect on EDU incorporation, however exogenous amino acids did rescue the induction of Ibsp and Bglap to a lesser extent (Fig. S6D-E). Interpretation of these types of experiments are tricky as the uptake of NEAA may be inherently limited in osteoblasts and due to time constraints, we were unable to quantify intracellular amino acid levels in the rescued cells. Regardless, we interpret these data as affirming the necessity of Slc1a5 to provide Gln and Asn used to synthesize amino acids for osteoblast differentiation. In addition, these data indicate other metabolites (e.g. alpha-ketoglutarate, glutathione, nucleotides etc) derived from Gln and/or Asn are required for proliferation. We have modified the discussion to address this uncertainty.

      In addition, Gln and Asn tracing (carbon and nitrogen) in SLC1A5-deficient cells would confirm that Gln and Asn uptake via SLC1A5 is important for osteoblast functioning.

      We did not perform tracing experiments in the Slc1a5 deficient cells. We directly evaluated amino acid uptake using radiolabeled amino acids in Slc1a5 deficient cells (Figure 4). Slc1a5 ablation reduced the uptake of Gln and Asn. To test if Gln and Asn uptake was important for osteoblast function we directly compared the cellular effects of Slc1a5 ablation to Gln or Asn withdrawal. From these experiments we concluded that Gln and Asn uptake is essential for osteoblast differentiation.

      1. Using isotopic labeling experiments, the authors demonstrate that asparagine-derived carbon and nitrogen label several amino acids that are critical for protein synthesis, albeit at a lower level compared to glutamine. Based on these observations, they claim that the decrease in osteoblast differentiation upon asparagine depletion also occurs via a defect in protein synthesis. However, proliferation, EIF2a phosphorylation and COL1A1 levels were not affected in asparagine-deprived conditions, questioning that the decrease in differentiation is resulting from impaired protein synthesis. Further experiments to decipher the metabolic role of extracellular asparagine are therefore warranted to avoid overinterpretation of the data, including protein/matrix synthesis, analysis of amino acid levels in Asn-deprived conditions and rescue with Asn-derived metabolites.

      Again, the reviewer raises a very important point. Our data indicates that Asn does contribute to amino acid biosynthesis, chiefly Asp, however, we did not evaluate the requirement of Asn for protein synthesis directly. We think it is probable that asparagine contribution to osteoblast differentiation is multifaceted. Thus, we have softened the conclusions about asparagine and the regulation of protein synthesis to reflect this uncertainty.

      1. To inactivate SLC1A5 in vivo, the authors use the Tet-off Osx-GFP::Cre mouse line. Importantly, newborn Osx-Cre mice display severe craniofacial abnormalities, which may complicate correct interpretation of the in vivo data, especially when analyzing at embryonic stages. Do the authors observe a similar defect in osteoblast function when SLC1A5 was deleted postnatally? This might be especially relevant because the phenotype seems to wane off over time, as knockout mice at P0 only display a craniofacial phenotype, whereas long bones appear to be normal.

      The reviewer raises a very important point regarding the Sp7tTA;tetoCre line we used in this study. As mentioned, the Sp7tTA;tetoCre mice do have a partially penetrant craniofacial bone phenotype. To control for this, we only use Sp7tTA;tetoCre as “wild type” controls. In addition to the early embryonic endochondral ossification and persistent calvarial phenotypes, the Sp7tTA;tetoCre;Slc1a5^fl/fl have additional bone phenotypes compared to the Sp7tTA;tetoCre controls. This included a calvarial phenotype at both birth and 2 months of age (Figures 1 and S2). Likewise, we observe similar changes in osteoblast differentiation and bone development in the developing limbs at birth and in femurs at 2 months of age (Figure S4). Due to time constraints, we have not been able to generate sufficient numbers of mice with postnatal deletion of SLC1A5 to include here. These experiments are ongoing and will be published later.

      Reviewer #3 (Public Review): This work by Sharma et al studied the role of aa transporter, ASCT2, encoded by Slc1a5 gene, that transports mostly Glmn and Asn, in osteoblasts (OB). They use gene targeting in vitro and in vivo using Sp7-Cre driven cKO. They found that ASCT2 deletion impairs OB differentiation in vitro as well as mostly intramembranous ossification in vivo by interfering with proliferation and protein synthesis. Mechanistically, they show that Glmn uptake via ASCT2 is important for aa synthesis in OBs. This group has shown before that Glmn is essential for OB metabolism. The current work further investigates this phenomenon and identifies ASCT2 as the key mechanism of Glmn uptake into OBs. The work is logically structured and carefully done with appropriate in vivo and in vitro controls. A variety of methods is used to confirm their findings, such as in vivo immunodetection and in situ hybridization and in vitro metabolic tracing. The conclusions are well justified by the data. Minor comments are: -MicroCT methodology is not adequately described and needs to be expanded

      We appreciate this positive review of our work. We have modified the methods to adequately describe µCT methodology. We modified the methods as follows:

      “Micro computed tomography (µCT) (VivaCT80, Scanco Medical AG) was used for three-dimensional reconstruction and analysis of bone parameters. Calvariae were harvested from either newborn mice or 2-month-old mice. All muscle and extemporaneous tissue were removed and the isolated calvariae were washed in PBS, fixed overnight in 10%NBF and dehydrated in 70% ethanol. The calvariae were immobilized in 2% agarose in PBS for scanning. A fixed volume surrounding the skull was used for 3D reconstructions. In newborn calvariae, bone volume was quantified from a fixed number of slices in the occipital lobe. The threshold was set at 280. For quantification of bone mass in the long bone, 2-month-old femurs were isolated, fixed, immobilized and scanned. Bone parameters were quantified from 200 slices directly underneath the growth plate with the threshold set at 333.”

    1. Author Response:

      Reviewer #1 (Public Review):

      Kursel et al. examined the evolution of synaptonemal complex proteins in C.elegans. While the sequence of the SC proteins evolved rapidly analysis of the structure of SC central region proteins from Caenorhabditis, Drosophila and mammalian species revealed that the length and placement of the coiled-coil domains, as well as overall protein length, were highly conserved across species. This conservation in the structure of coiled-coil proteins within the SC led to the proposal that the conserved structural parameters of the SC proteins and their coiled-coil domains could be used to identify central region components of the SC in species where components could not be identified on sequence conservation alone. Kursel et al demonstrated their parameters could be used to identify a transverse filament protein of the SC in the organism Pristionchus pacificus.

      Due to high sequence divergence identifying SC proteins in new model systems has been challenging. The identification by Kursel et al. of potential search parameters to identify these diverged proteins will be useful to the those who work on the synaptonemal complex. This approach has the potential to applicable to other types of proteins that show rapid sequence divergence. As the mammalian, fly, and worm SC proteins all displayed different lengths and placements of their coiled-coil domains within their SC proteins this approach is limited by the availability of related identified sequences to the model organism of interest. Additionally, this approach may still yield multiple candidates that fit the structural parameters which will require additional means to ultimately identify the protein of interest. The data in the manuscript supports the authors' claims of structural conservation within SC proteins but only additional applications of their search methods will reveal how useful it is to search for other types of proteins based on structural features.

      We thank the reviewer for their summary and feedback. We hope that with the ever-lowered costs of genome assembly and the expansion of CRISPR/Cas9 gene-editing capabilities, the pipeline we developed will be applicable to more clades and species. We agree that it will be interesting to expand our method beyond the SC. Going forward, we are excited to test whether it will enable us to identify other types of proteins, especially those that are part of condensates. In this light, our finding that centrosomal proteins are also enriched in the same evolutionary class as SC proteins is especially intriguing.

      Reviewer #2 (Public Review):

      In this article, Kursel and colleagues sought to identify evolutionary features of components of the SC the are evident in the absence of strict amino-acid conservation. After identifying three joint evolutionary properties of SC proteins - conservation of coiled-coil architecture, conservation of length and significant amino acid divergence - they show that these properties can be used to identify unknown SC proteins in divergent species. Overall, their general conclusion is very well supported and they do an excellent job functionally testing their approach by showing that one identified candidate for a novel SC protein in Pristionchus is in fact a component of the SC. In addition to providing new insight into the evolutionary forces that shape the evolution of SC proteins, this article provides new insight into how one might generally identify functionally similar or homologous proteins despite very deep divergence. Thus, this work has broader relevance to molecular evolution and evolution of protein structure.

      There are some places where smaller conclusions need more support. In particular, it is not entirely clear that this triple pattern - conservation of coiled-coil architecture, conservation of length and significant amino acid divergence - is broadly applicable to SC components beyond Dipterans and Nematodes. In particular, the pattern is weaker in Eutherian mammals. Some further investigation is needed to claim that the pattern is similar in mammals. In addition, it is not clear if coiled-coil conservation rather than simply having a coiled-coil domain is important as a mark of SC proteins. A comparison of coiled-coil conservation among proteins that have coiled-coil domains would be needed for this conclusion. Finally, there should be some additional clarification that not all nematode SC proteins have a pattern of insertion and deletion that is limited to regions outside of the coil-coil domains.

      We thank the reviewer for their appreciation of the broader impacts of our work to molecular evolution and for their suggestions for providing more support for our conclusions. We have addressed each of these points below (1. the evolutionary pattern in mammals, 2. the value of the coiled-coil conservation score, and 3. clarification of the indel analysis).

      1) As suggested, we have added dot plots comparing mammalian SC proteins to all other mammalian proteins for the three metrics central to this manuscript - amino acid substitutions per site, coiled-coil conservation scores and coefficient of variation of protein length. The plots (shown here) can be found in Figure 3 – figure supplement 4.

      These plots provide additional evidence that the evolutionary pattern of mammalian SC proteins is similar to (although weaker than) that of Caenorhabitis and Drosophila.

      In panel (A), we show the median amino acid substitutions per site of SC proteins is higher than other proteins in mammals, although the difference is not significant. We discuss two reasons why the divergence trend is weaker for mammalian SC proteins in the results. Briefly summarized they are, 1. The overall divergence of the mammalian proteome is less than that of the Caenorhabditis or Drosophila proteome, and 2. Mammalian SC proteins may face additional evolutionary constraints due to novel functions including mammalian-specific protein interactions.

      In panel (B), we show that mammalian SC proteins have a significantly higher coiled-coil conservation score than other proteins.

      In panel (C), we show coefficient of variation of protein length for mammalian SC proteins is not significantly different than other proteins. We hypothesize that this could be due to gene annotation errors which plague even very high-quality genomes. For example, we found annotation errors in 23 (18%) of the 125 Caenorhabditis SC proteins examined in this study. Uncorrected, these errors often read as large insertions or deletions, and artificially large coefficient of variation. We use L. africana SYCE3 to demonstrate how potential annotation errors could impact our measure of length variation in mammalian SC proteins. L. africana SYCE3 has conspicuous N- and C-terminal extensions not found in any other SYCE3. Excluding that single protein - L. africana SYCE3 – reduces the average length variation from 29% to 4% in the SYCE3 orthogroup, below the median of other proteins. Correspondingly, the median SC coefficient of variation of protein length drops from 20% (unfilled black circle) to 12% (dashed, unfilled circle). While systematic manual annotation of the Eutherian mammals proteomes is beyond the scope of this manuscript, we added in the Discussion explicit reference to the implications of annotation errors on our ability to systematically address evolutionary pressures affecting indels.

      2) We thank the reviewers for this important suggestion. Indeed, the inclusion of the few examples in Figure 2 were meant as demonstration rather than a statistical analysis. To create a group of proteins that would serve as appropriate control for conservation of the length and organization of the of coiled-coils, we selected orthogroups in which 90% of the proteins in the group had a coiled-coil domain of 21 amino acids or longer. This left 916 Caenorhabditis orthogroups including all SC proteins. We found that the median coiled-coil conservation score of SC proteins was significantly higher than that of the other coiled-coil proteins, confirming our comparisons to the entire proteome. We have included this analysis as a figure supplement to figure 2 (dot plot shown here and Figure 2 – figure supplement 1) and added text to the results and methods describing the analysis.

      More broadly, this result suggests that our coiled-coil conservation score is more informative than a binary measure of coiled-coil domain prediction (i.e. presence/absence of coiled-coil). The additional information contained in the coiled-coil conservation score likely comes from the fact that we take into account whether or not the coiled-coil domains are aligned across species; which reflects a higher degree of secondary structure conservation. We believe that future work to develop better measures of conservation of secondary structures will hone our ability to identify conservation of other protein classes.

      3) We have clarified this point in our revised manuscript, highlighting that when analyzed as a group, indels are excluded in coiled-coils of Caenorhabditis SC proteins, and that significance is also observed for specific SC proteins where enough indels are present to perform statistical tests. Two of the SC proteins, SYP-2 and SYP-3, had only two indels each, preventing us from performing tests of significance. We have also added text to the discussion directly addressing the limitations of automatically-assigned gene annotations on the ability to test evolutionary pressures on indels genome-wide.

      Reviewer #3 (Public Review):

      The manuscript "Unconventional conservation reveals structure-function relationships in the synaptonemal complex" by Kursel, Cope, and Rog, describes a novel bioinformatics analysis of proteins in the eukaryotic synaptonemal complex (SC). The SC is a highly conserved structure that links paired homologs in prophase of meiosis, and in most organisms is required for the successful completion of interhomolog recombination. An enigmatic feature of SC proteins is that they are highly diverged between organisms, to the point where they are nearly unrecognizable by sequence alone except among closely related organisms. Kursel et al show that within the Caenorhabditis family of nematodes, SC proteins show a reproducible pattern of coiled-coil segments and highly conserved overall length, while their primary sequences are extremely diverged. They use these findings to develop a method to identify new SC candidate proteins in a diverged nematode, Pristionchus pacificus, and confirm that one of these candidates is the main SC transverse filament protein in this organism. Finally, the authors expand their analysis to SC proteins in flies (Drosophila melanogaster and relatives) and eutherian mammals, and show similar findings in these protein families. In the discussion, the authors describe an interesting and compelling theory that the coiled coils of SC proteins directly support phase separation/condensation of these proteins to aid assembly of the SC superstructure.

      Overall, this work is well done, the findings are well-supported, and are of interest to meiosis researchers; especially those working directly on the SC. The manuscript is also well put-together: I could barely find a typo. From a broader perspective, however, I'm not convinced that the work provides a new paradigm for thinking about "conservation" in protein families and how to best detect it. Methods that use structural information to detect homology between highly diverged proteins beyond the capabilities of BLAST or even PSI-BLAST are well-developed (e.g. PHYRE2, HHPred, and others). The use of coiled-coil length as a metric for conservation, while it works nicely in the case of SC proteins, is likely to not be generalizable to other protein families. Even within SC proteins, the method does not seem to scale past specific families to, say, allow identification of homology between distantly-related eukaryotic groups (e.g. between Caenorhabditis and Drosophila or Caenorhabditis and eutherian mammals). To be fair, this failure to scale is not because of any limitation with the method; rather, simply that SC proteins diverge quickly through evolution. Overall, however, these limitations seem to limit the application of this method to the specialized case of SC proteins, thus limiting the audience and scope of the work.

      We appreciate the reviewer’s consideration of possible limitations of our study. However, we disagree that this method, and the insights gained from it, will be limited to SC proteins. A clear demonstration is that the centrosomal protein SPD-5 (Centrosomin in Drosophila, CdkRap2 in mammals) cannot be identified across clades using sequence homology despite performing a conserved and fundamental cellular function. We hypothesize that similar forces have shaped the evolution of SPD-5 and other centrosomal proteins that are enriched in the same evolutionary class as SC proteins (Figure 3 – figure supplement 1). Functional tests of these predictions will be an exciting area of future research.

      As this review notes, an exciting hypothesis stemming from our work is that proteins with diverged primary sequence and conserved secondary structures (coiled-coils, disordered protein domains or others) will be over-represented in condensates. Anecdotally this is indeed true, as both the SC and the centrosome were shown to be condensates. The burgeoning interest in condensates, and the development of tools to study them in vivo and in vitro, are bound to test the broad applicability of this hypothesis.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, the authors present evidence that mouse blood meals containing Lyme disease spirochetes induce upregulation and activation of an adiponectin receptor (ISARL) in the midguts of Ixodes scapularis ticks. Activation of the receptor initiates transcriptional changes, not seen with blood meals from uninfected mice, that give rise to metabolic alterations in the midguts required for replication of spirochetes. Using RNA-seq, they trace these critical alterations to genes encoding enzymes for synthesis of phospholipids. Although mouse adiponectin induced transcriptional changes in engorged tick midguts related to glucose and energy metabolism, it did not influence B. burgdorferi colonization. The authors conclude by presenting evidence that tick complement C1q-like protein (C1QL3), also upregulated in response to a blood meal containing spirochetes, is an ISARL ligand and that knockdown of C1QL3 impairs spirochete colonization.

      This work extends our understanding of the complex and intimate physiologic interactions between Borrelia burgdorferi and Ixodes ticks that sustain the spirochete's enzootic cycle. It builds upon prior work by others showing that feeding ticks provide spirochetes with glycerol, an alternative carbohydrate energy source and essential building block for phospholipid biosynthesis. It also appears relevant to previously published work showing that B. burgdorferi can extract lipids from the membranes of eukaryotic cells to which they are attached.

      The strength of the study is that it uses state-of-the-art genetic, bioinformatic, and transcriptional approaches to garner novel insights into the unique transcriptional/metabolic changes that occur in ticks when they ingest blood from B. burgdorferi¬-infected mice. It enhances the now well-established, but still far from well understood, viewpoint that ticks are not mere biologic syringes for injection of spirochetes. And it demonstrates, probably more than any preceding study, the extent to which tick midgut interactions with Lyme disease spirochetes re-configure metabolic responses/adaptations to the blood meal. Viewed from these contexts, the major outcomes of this study - identification of ISARL as a midgut metabolic regulator and a tick derived ISARL ligand - are groundbreaking. On the other hand, the main conclusions of the paper, though consistent with the data, are less than definitive. The authors can only infer that spirochetes take up phospholipids produced within the tick midgut following ISARL stimulation, and they stop short of showing that C1QL3-ISARL interactions mediate the transcriptional/metabolic changes involving phospholipid biosynthesis attributed to activation of ISARL during an infection blood meal.

      We agree that the studies on glycerol and lipid uptake of Borrelia from the host supports our findings. We have added all the suggestions to the Discussion section of our manuscript.

      Reviewer #2 (Public Review):

      The authors searched for human and murine Adiponectin and Adiponectin receptors homologous sequences in the I. scapularis NCBI database. They found one homologous sequence for Adiponectin receptor 1 and 2, called I. scapularis Adiponectin receptor-like (ISARL) and none for Adiponectin. ISARL showed 71% homology with AdipoR1 and 2 human and murine, 384 amino acids long, and 87% homology with the D. melanogaster ortholog.

      Then the authors, characterized ISARL functionally in the tripartite interaction between vector (I. scapularis, deer tick), mammal (mice) and Lyme disease spirochete (B. burgdorferi, bacteria). They used an elegant paradigm by which they intervened the interaction of B. burgdorferi with its vector I. scapularis by injecting siRNAs or adiponectin in the nymphal tick guts to silence or activate ISARL and other proteins of interest. They observed that the blood meal from mice infected with B. burgdorferi increased the expression of ISARL in the tick guts and by silencing ISARL they were able to reduce the colonization of the bacteria in the tick gut without affecting the feeding habits. The silencing of ISARL, however, did not prevent or reduced the ability of the spirochete to infect mice after being biten by the tick.

      The authors then, screened for genes related to B. burgdorferi on the tick guts by comparing the RNAseq profile of tick guts when fed from uninfected and infected mice and ISARL were silenced. The comparisons shown in figure 3 were clean and follow a logical line of reasoning. On one hand, the comparison between silenced and non-silenced fed blood meal from uninfected mice showed 17 differentially expressed gene, one of those was the silenced ISARL. On the other hand, the comparison between silenced and non-silenced from infected bloodmeal showed 35 differentially expressed genes, from which one was the silenced ISARL. None of the two sets, showed genes in common except from ISARL. The GO analysis showed that several metabolic pathways were modified by B. burgdorferi. From those the authors chose 18 genes that were robustly represented and confirmed their expression using qPCR 17 of them passed the analysis and four of them changed significantly. They chose Phosphatidylserine synthase 1 (PTDSS1) as a paradigm to silence because it is involved in the synthesis of phospholipids (PE and thus PC) and B. burgdorferi lack the machinery to synthesize them. The silencing of PTDSS1 effectively reduced the content of the spirochete in the tick guts without affecting its feeding behavior and moreover, silencing of PSD another enzyme downstream PTDSS1 also involved in PE synthesis induced the same effect. This was an elegant demonstration that the pathway involved downstream ISARL was the phospholipid synthesis pathway.

      Because ISARL resembles AdipoR1 and 2 and bloodmeal may contain its natural ligand adiponectin, the authors investigated the influence of Adiponectin on B. burgdorferi effects on Tick guts. They injected adiponectin or they fed bloodmeal from mice wild type and Adiponectin KO and found in both cases that Adiponectin presence decreases the expression of G6P1 and 2 (2 isomers found in ticks) just as it does in mammalian systems, but the injection of Adiponectin only reduced the expression of two of the three phosphoenolpyruvate carboxykinase PEPCKs found downstream in ticks glycolytic and gluconeogenic pathways (PEPCK2 and 3 but not with 1). On the contrary, Bloodmeal does not reduce any of them. But what it is more important Adiponectin and glucose metabolism does not have any effect in the infectivity or colonization of B. burgdorferi.

      Because ISARL respond to ligands, the authors searched for one in the database in the tick using the C1Q motif of human Adiponectin than interacts with the receptor. They found a match of 181 aa (pre-protein) and 157 aa, the protein mature (without the signal peptide). The proposed ligand, called C1QL3, was increased in expression when the tick was fed bloodmeal of infected mice (as it was ISARL), and when silenced feeding behavior did not changed but the content of B. burgdorferi in the tick gut decreased. They demonstrated in a heterologous system (human HEK cells expressing ISARL) that recombinant C1QL3 interacted with ISARL by immunocytochemistry and pull-down assay.

      After silencing C1QL3, ISARL expression was decreased and the bloodmeal with infected mice lost the ability to increase the receptor expression level but the tick's gut.

      We are grateful for the comments to improve our manuscript. We have carefully considered each point and addressed them all. Please see below for point-to-point response.

      Critique:

      Overall, the authors have done an impeccable job in the demonstration of the pathways involving ISARL in the tripartite interaction of mammalian-insect-bacteria system. However, the medical relevance of the interaction portrayed in the present manuscript, although very interesting from the biological-evolutive, point of view remains to be demonstrated and it is an opportunity that was not taken in the discussion. In the discussion the authors just described the systems in the three species which is usually done in the introduction, instead they repeated all the conclusions drawn in results and minus a couple of paragraphs results a waste of space give such a good scientific work made with the results section. I would suggest the authors to concentrate in areas where their work would be elevated challenging the reader with new ideas. For instance, there is literature about the role of Adiponectin receptors in lipids metabolism and uptake that was not mentioned. ADIPOR1 is expressed in the retina and retinal pigment epithelial cells. Mutant of the Adipor1 gene in these cells results in the inability to take up and retain the essential fatty acid family member docosahexaenoic acid (DHA, 22:6,n-3). Therefore, phospholipids in those cells display a selective shortage in DHA, not in arachidonic acid. In addition, the elongation products 32:6 ,n-3 and 34:6,n-3 are depleted. A consequence is photoreceptor cell death and retinal degeneration( Rice DS, Calandria JM, Gordon WC, et al. Adiponectin receptor 1 conserves docosahexaenoic acid and promotes photoreceptor cell survival. Nat Commun. 2015;6:6228-6242.).

      ADIPOR1 recently has been shown critical for retinal degenerative diseases for axample a single amino acid mutation of Adipor1 occurs in different forms of retinitis pigmentosa. Also polymorphisms of this receptor have been found in age-related macular degeneration (AMD). AdipoR1 has an adiponectin independent role. This was demonstrated by the fact that adiponectin KO do not change DHA and do not result in retinal degeneration. Therefore it seems that is a regulatory switch of DHA uptake, retention, conservation, and elongation in retinal cells, to sustain photoreceptor cell integrity.

      There is also literature regarding PEs and survival pathways involving ceramides, route that was not taken by the authors. It would be specially interesting to analyze this aspect from the point of view of the therapy against Lyme disease. Other aspects would be from the point of view of control of reproduction of B. burgdorferi in tick's population and strategies to control the disease.

      Thank you so much for the suggestion. We have deleted the repetitive paragraphs, such as RNA-seq portion, and added a more meaningful discussion that reflects a combination of the reviewers’ comments.

      Therapeutics for Lyme disease are a complicated topic, as there is both early and late stage disease, and the pathogenesis is likely different. We agree that it is important to consider this issue and have added comments in the revised manuscript (Page 19-20, Line 564-592).

      Reviewer #3 (Public Review):

      Following up on an initial observation that the genome of the black-legged tick encodes an adiponectin-like receptor (ISARL), but lacks an obvious cognate adiponectin homolog, Tang et. al uncover the interesting finding that ISARL is important for colonization of the tick by the Lyme disease agent, Borrelia burgdorferi. Spurred by compelling data that silencing of the ISARL gene significantly attenuates tick acquisition of B. burgdorferi from infected mice, the authors link ISARL production to the differential expression of tick genes involved in metabolism. They show that ISARL mediates regulation of tick phospholipid metabolic pathways and that this phenotype is unique to bloodmeals taken from B. burgdorferi infected mice. Data are presented that support the contention that tick metabolic pathways linked to phosphatidylserine synthase I are critical to spirochete colonization. To investigate potential ligands for ISARL, the authors first examine mammalian adiponectin using knock-out mice. They show that adiponectin regulates glucose metabolism pathways in an ISARL-dependent manner, but has no impact on B. burgdorferi colonization. Instead, a homology search of the Ixodes genome using the C1q globular domain of adiponectin as a query led to the identification of tick C1q-like protein 3 (C1QL3). The authors show that tick C1QL3 regulates ISARL expression and like ISARL is critical for B. burgdorferi colonization. The authors conclude that B. burgdorferi influences tick C1QL3 expression through an undefined mechanism, leading to increased ISARL-mediated signaling effects on metabolic pathways that aid B. burgdorferi colonization of the tick.

      Strengths:

      This is a well written and carefully designed study that lays the foundation for asking many new questions about the complex interplay between the Lyme disease spirochete, its tick vector, and its vertebrate hosts. I agree with the authors that these findings are also likely relevant to other important arthropod-borne pathogens.

      The extensive use of an in vivo system that relies on tick acquisition from the blood of infected mice is a significant strength of the study. By silencing a series of genes in the tick the authors develop a convincing case for the mechanistic relationship of ISARL to B. burgdorferi colonization.

      Thank you for appreciating the significance of this work. We agree with the comments and have addressed all of them. Please see below for point-to-point response.

      Weaknesses:

      1) Potential mechanisms of B. burgdorferi influence on C1QL3 expression are not addressed. While outside the scope of the current work, the manuscript would be improved by some consideration of this issue in the discussion.

      Thank you for your suggestion. We have added this consideration in the revised manuscript (Page 19, Line 559-560).

      2) Adiponectin and C1QL3 are shown to be ISARL ligands that cause differential regulation of tick metabolic pathways. B. burgdorferi infection does not alter adiponectin concentrations in the blood of mice (Fig. 3H). Presumably tick C1QL3 competes with mammalian adiponectin for ISARL-binding, but this is not addressed. Similarly, the homology of murine or human C1QL3 (i.e. CTRP13) is not shown and its potential relevance, along with other C1Q/TNF-related proteins are not discussed in the context of ISARL, but are instead discussed in their known host associated roles only. An improved discussion of how adiponectin, CTRP13, and other C1Q/TNF-related vertebrate proteins may act (or not act) in the tick C1QL3/ISARL pathway should be provided.

      We agree that it is important to discuss if, or how, C1QL3 homologs in mammals may influence tick C1QL3/ISARL pathway. We have added the discussion in the revised manuscript (Page 19, Line 560-562).

      3) The authors conclude that mammalian adiponectin/ISARL-mediated glucose metabolism changes have no impact on B. burgdorferi colonization. However, the authors report a significant difference in engorgement weights between the GFP controls and G6p1/2 knock downs. Furthermore, a majority of the samples evaluated for G6P1 exhibited flaB/actin ratios near zero, indicating low colonization, including for the GFP control. The authors should clarify how these factors potentially influenced the claim that glucose metabolism changes (particularly G6p1) did not cause statistically significant differences in B. burgdorferi acquisition.

      Thank you for this comment. It is interesting that significantly increased engorgement weights were observed after silencing tick G6p genes. We hypothesized that the low glucose level probably makes ticks “hungrier”, which could lead to more feeding. However, this doesn’t result in a greater Borrelia burden in tick gut. We set two segments in the Y-axis and flaB/actin ratios for clarification.

    1. Author Response:

      Reviewer #2 (Public Review):

      In this manuscript, the authors use lattice light sheet microscopy and custom made soft micro-particles to examine the forces generated during phagocytosis and assess the molecular functions and localization of various components of the system. The imaging is truly fantastic and the discovery of phagocytic 'teeth' that exert force normal to the bead surface is a real advance to the field.

      However, the functional studies using pharmacological inhibitors are more problematic. Specifically, the authors use pharmacological agents to test the roles of NMII (Blebb), the Arp2/3 complex (CK666) and supposedly formins (SMIFH2). The formin inhibitor is particularly problematic since it has known off target effects such as NMII (Sellers et al) and has never really be validated in terms of specificity or potency. I realize this drug has been used a lot in the literature, but so was BDM before it was finally discredited. It doesn't really give much of an effect and, combined with the fact the two formins checked are not in the cup, this data should just be removed from the paper.

      We thank the Reviewer for pointing out the issue with SMIFH2. To address the reviewer’s comment, we have moved all data regarding SMIFH2 treatment to the SI (Figure 3 – supplement 5),and made a note in the manuscript on its potential off-target effects, including citation of the work by Nishimura et al. (pg. 8, line 217). While we are familiar with the work by Nishimura et al, we would also like to point out that we see very different effects of SMIFH2 treatment as compared to the direct myosin-II inhibition by blebbistatin, with the drug concentrations that we are using (see Figure 3).

      As for the Arp2/3 inhibition, the data showing that this complex is important for generating force at the 'teeth' is convincing, however, the only real straightforward test of whether the complex is required for phagocytosis (Fig. 3g) shows that this drug has no effect on the fraction of engulfed particles. Doesn't this mean that the forces generated by branched actin are irrelevant for the kinetics of phagocytosis? That would be consistent with the published literature showing that genetic deletion of the Arp2/3 complex has only a partial inhibitory effect on FcR phagocytosis of rigid particles (a point the authors avoid discussing). Perhaps the forces generated by Arp2/3-branched actin are important under more challenging conditions such as where sheer flow was affecting the cells/particles, but this part of the paper is problematic.

      We thank the Reviewer for pointing out the potentially complex role of the role of Arp2/3 in phagocytosis. It is indeed true that there may not be a one-to-one correlation between phagocytic teeth (and force exertion) and uptake efficiency, and this could further depend on the context in which phagocytosis is taking place. We note that for our experiment in Figure 3g (now figure 3h) we solely measure cups that are undergoing phagocytosis, and hence conclusions about the uptake efficiency and fraction engulfed particles cannot be drawn from this data. Instead, this data implies that there is no specific stage in phagocytosis that is affected more by Arp2/3-inhibition than other stages, and this analysis would for example not show any significant differences if phagocytic cup formation is approximately uniformly slowed during pseudopod progression. We have added a clarification in the text to make this point clearer:

      “The strong effect of Arp2/3-inhibition on phagocytic efficiency and target deformations throughout phagocytosis, and the lack of effect on the distribution of cups-in progress, suggests that this complex has an important role throughout phagocytic progression.”

      We have now performed additional experiments that show that Arp2/3 inhibition leads to a strong reduction in uptake in our phagocytosis assays. We present this data in a new figure panel (Fig. 3g). We observe a similar effect of Arp2/3 inhibition on uptake of 4 times stiffer beads (Figure 1 – supplement 2f). This is consistent with previous work with rigid particles comparable in size (~ 10 m) to the particles in our study (Rotty et al., 2017), where a significant reduction in phagocytic uptake efficiency upon Arp2/3 inhibition was also reported. We now also discuss the role of the Arp2/3 complex in more depth in our discussion section, including the results of our new uptake efficiency experiments and previous results in published literature (pg. 14, starting at line 419).

      We agree that the relation of force exertion to phagocytic efficiency may depend on the context of the phagocyte and target. They could be relevant in the case of external flow, and we discuss in our manuscript how we believe that these forces may also be more critical during partial target eating of targets that are hard to engulf fully (too large, or hard-to-reach) as well as when multiple phagocytes approach a single target.

      The blebbistatin data are interesting, but also somewhat contradictory with the literature showing this drug does not affect the uptake of rigid particles. It would be helpful if the authors could compare soft and much more rigid particles with this treatment to test this idea. The localization of myosin at the end stage of phagocytosis is very nice.

      To address this concern, we have compared uptake of 1.4 kPa and 6.5 kPa particles, and potential differential effects of inhibitor treatments depending on target rigidity (Fig. 3g, Figure 1 – supplement 2f). We opted for making this comparison over a comparison between much stiffer (e.g. polystyrene) targets, as it will be non-trivial to ensure that these targets are identical in other chemical properties (ligand density, surface charge, etc.) and therefore that any potential difference can be solely attributed to target rigidity. Importantly, we have previously shown that there is a strong mechano-dependent difference in uptake efficiency for the 1.4 and 6.5 kPa beads (Vorselen et al, 2020). In these experiments, we see no clear detrimental effect of myosin-II inhibition on uptake efficiency independent of target rigidity similar to some past literature. We believe that this may be because the cup stages in which we saw high enrichment were at very late stages of engulfment (> 90% engulfment). That such cups aren’t fully closed was clear when performing high-resolution 3D imaging, but may be missed in lower resolution 2D imaging that we, and others, used for evaluating phagocytic efficiency, and hence these cups may be hard to distinguish from fully internalized particles. We now discuss these results and this possible explanation for the apparent discrepancy starting on page 16 line 476.

      Reviewer #3 (Public Review):

      Having revealed the role of class 1 myosins myosin 1e and myosin 1f during phagocytosis and having recently developed an innovative method to reveal the forces generated in this process, Daan Vorselen and colleagues studied the molecular mechanisms involved in force production during macrophage phagocytosis. In particular, they documented the involvement in force generation at the phagosome of Arp2/3-based actin protrusions, called « teeth », which assemble into a ring-like superstructure, and of myosin-II, which presumably plays a role in phagosome closure. Finally, they document phagocytosis failures in the form of popping mechanisms. The imaging and mechanical analyses of this article are particularly impressive, and these data allow the authors to propose a new model for the generation and balance of forces during phagocytosis.

      This precise work thus paves the way to understanding the generation and transmission of forces during phagocytosis. However, some information could be added to strengthen the impact of the manuscript. In particular, current knowledge about the role of the Arp2/3 complex during phagocytosis could be detailed in the introduction, and what this article adds to the literature on this subject could be discussed better.

      We thank the Reviewer for their kind words about our work. We now discuss the role of the Arp2/3 complex in more depth in our discussion section, including the results of our new uptake efficiency experiments upon inhibitor treatment, as well as previous results in published literature.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors present a system that allows the measurement of OCR on diverse tissues. Using two optopes, one before the tissue under examination, and one after, allows the OCR to be measured as the difference between the concentration of O2 in the in-flow gas and the concentration of O2 in the out-flow gas. The system maintains the tissue at a set concentration of dissolved O2 so that experiments can be performed over a long period of time. The authors have provided ample data and full methods and their conclusions are most likely reliable.

      Currently, we know that O2 is critical for diverse physiological processes, however it is rarely as well controlled for as well as non-gas solutes such as glucose, as we lack methods to control its delivery and infer its consumption. By addressing this need, the authors contribute something valuable to the field, which will hopefully be built on by others. The authors have already begun to show the utility of their system by exploring the complicated biology of H2S. As delivering this gas in a controlled manner is hard, often people use NaHS instead. In line with previous studies (well cited by the authors), differences are observed.

      Specific points

      1) The gas control system is used with islets, INS-1 832/12 cells, retinas, and liver tissue, demonstrating its broad applicability.

      2) The system as a platform can have diverse extra measurement modalities attached to it, for example visible-wavelength absorbance and fluorescence. Metabolite concentrations in the tissue culture outflow could also be measured.

      3) The reduction state of cyt c and cyt c oxidase are measured from the second derivative of absorbance at 550 and 605 nm. Ideally, to reliably decompose these signals full spectra around 550-605 nm would be collected. As the authors are only using cytochrome reduction state as a qualitative measure and appear careful to avoid over-interpretation this method should be fine. However, the authors ought to show a representative time course including the fully oxidised and reduced states demonstrating this approach as making these measurements is demanding and will depend on the exact spectroscopic set-up. Without this information it is hard to judge the reliability of the paper.

      We appreciate giving us the latitude for a less robust measurement. However, we actually did do what you have suggested should be done. That is, with the Ocean Optics spectrophotometer, we measure the full light spectrum from 400 to 650. Using this spectral data, we calculate the first and second derivatives of the absorption. We have previously published our approach to spectral analysis, as well as the inclusion of the fully oxidized and reduced states (Sweet IR, G Khalil, AR Wallen, M Steedman, KA Schenkman, JA Reems, SE Kahn, JB Callis. Continuous measurement of oxygen consumption by pancreatic islets. Diabetes Tech. Ther. 4: 661-672, 2002; Sweet IR, Cook DL, DeJulio E, Wallen AR, Khalil G, Callis JB, Reems JA: Regulation of ATP/ADP in pancreatic islets. Diabetes 53:401-409. 2004), so we did not include all the details. In order to ensure that our description is clear, we have added a more thorough explanation that we used spectral analysis and not just data obtained as single wavelengths.

      Reviewer #2 (Public Review):

      The present project is an extension of prior work from this work group in which they describe a technological advancement to their published flow-culture system. Such improvements now incorporate technology that allows for metabolic characterization of mammalian tissues while precisely controlling the concentration of abundant gases (e.g., O2), as well as trace gases (e.g., H2S). The present article demonstrates the utility of this system in the context of hypoxia/re-oxygenation experiments, as well as exposure to H2S. Although the methodology described herein is clearly capable of detecting nuanced metabolic changes in response to variations in O2 or H2S, the lack of a head-to-head comparison with other techniques makes it difficult to discern the potential impact of the technology.

      We understand the benefit of comparing compare a new method with the currently utilized methods. However, the novelty of our methodology is that it is able to control the exposure of tissue to levels of both abundant and trace dissolved gas composition, functions that neither of these existing instruments provide. In addition, continuous flow of media allows maintenance and assessment of tissue models that cannot be accommodated by static or spinner systems. Since we are the first to report an entirely novel technology, the direct comparison to benchmarks is not possible. In the past, however, we have tested liver slices and retina in a Seahorse and the tissue died within 120 minutes presumably due to the lack of flow/reoxygenation in the tissue. In addition, islets placed in spinner systems such as the Oxygraph become fragmented and broken very rapidly. So, a head to head comparison on the tissue OCR response to changes in gas composition cannot be meaningfully carried out for the facets of our method that we highlighted. The methodology we present has capabilities that do not exist in any other commercially available system. We have stated this latter point in the last line of the second paragraph of the Introduction. Regarding the general reliability of the O2 consumption measurement: the unprecedented accuracy and stability of the O2 detectors and the ability of our flow system to maintain tissue for days while generating accurate and reproducible measurements of O2 consumption has previously been established (Sweet IR, Gilbert M, Sabek O, Fraga DW, Gaber AO, Reems JA. Glucose Stimulation of Cytochrome c Reduction and Oxygen Consumption as Assessment of Human Islet Quality. Transplantation 80: 1003- 1011, 2005; Neal AS, Rountree AM, Philips CW, Kavanagh TJ, Williams DP, Newham P, Khalil G, Cook DL, Sweet IR. Quantification of low-level drug effects using real-time, in vitro measurement of oxygen consumption rate. Toxicological Sciences 148: 594-602, 2015).

      In addition, diffusion gradients both in the bath, as well as the tissue itself likely impact the accuracy of the metabolic measurements. This is likely relevant for the liver slices experiments.

      We agree that there are certainly concentration gradients within tissue, and these are increased in the absence of capillary flow. Nonetheless, the gradients will certainly be less than what occurs in static systems. In general, optimal size of tissue pieces are a trade-off between potential for hypoxia if the tissue is too large, and a lack of untraumatized tissue if it is too small. We have added text to address this concern that these effects are to be considered when choosing the size and shape of the liver slices or other tissue models to place into the flow system.

      Following resection, liver tissue can be mechanically permeabilized (PMID: 12054447). In the present experiments, no controls were put in place to discern if the tissue was permeabilized. This could be checked by adding in adenylates and additional carbon substrates and assessing the impact on OCR. Similar controls likely need to be implemented for the islet and retina experiments.

      As we have used flow systems in the past to maintain islets and liver for 24 hours and more (Neal AS, Rountree AM, Kernan K, Van Yserloo B, Zhang H, Reed BJ, Osborne W, Wang W, Sweet IR. Real time imaging of intracellular hydrogen peroxide in pancreatic islets. Biochem. J. 473:4443-4456, 2016; Neal AS, Rountree AM, Philips CW, Kavanagh TJ, Williams DP, Newham P, Khalil G, Cook DL, Sweet IR. Quantification of low-level drug effects using real-time, in vitro measurement of oxygen consumption rate. Toxicological Sciences 148: p. 594-602, 2015) and based on stable OCR we concluded that the tissue is viable. However, it is possible that the membranes of some of the tissue would become permeabilized which would affect the responses to test compounds. We considered this issue from two perspectives. 1. Whether established models that we used to test the BaroFuse were prone to high cell permeability; and 2. Whether loading and maintenance of the tissue models in the fluidics system resulted in increased permeability. We did do experiments measuring the ADP responses in OCR by islets and retina within the fluidics system. Effects were observable but small. However, these results are not definitive, because it was difficult to know what the response in permeabilized tissue was (and permeabilizing tissue slices was difficult). We then used Propidium Iodide staining to visualize and quantify the level of permeability. In islets, the fluorescence in isolated islets before and after perifusion was negligible compared to that in islets permeabilized by H2O2 treatment (see below).

      Fig. 1. Staining of isolated rat islets with the indicator of cell membrane integrity propidium iodide. Islets were stained either before or after a 3-hour perifusion. As a positive control for PI staining, islets were treated with 500 uM H2O2 for 30 minutes and incubated overnight. Each data point was the average +/- SE for an n of 3.

      There was some fluorescence in retina and liver however, but it was difficult to interpret this data in terms of a fraction of the tissue that is permeabilized due to the fact that dye close to the surface of the tissue is preferentially imaged. So, we finally assessed the amount of permeabilized tissue in retina and liver by comparing uptake of 3H H2O and an extracellular marker C14 sucrose.

      Fig. 2. Fraction of tissue water space that is accessible to the extracellular marker sucrose. Left: Mouse retina. Right: Rat liver slice. Each data point was the average +/- SE for an n of 3.

      Extracellular water in liver and retina is well established to be about 25%, close to the volume of distribution of sucrose. Thus, we cannot rule out that there are a small percentage of cells that are permeabilized, but the vast majority are not.

      Additional comments are detailed below:

      -The experiments with H2S are particularly interesting, as this system does seem well suited to investigate the metabolic effects of H2S.

      Thanks! We are excited by the potential for this method to assess the effects of H2S and other trace gases.

      -The authors state the transient rise in O2 consumption was surprising; however, accumulation of succinate during ischemia and rapid oxidation upon reperfusion has been previously demonstrated (PMID: 32863205).

      This is an interesting paper which describes findings that speak to the role of succinate in supplying fuel that could drive the transient changes in O2 consumption observed following hypoxia. It would be an interesting experiment to perform our hypoxia-reoxygenation experiment in the absence and presence of the permeable malonate to see if the spike in O2 consumption following reoxygenation was absent in the presence of the drug. We have removed the word surprising and cited this paper.

      -In the paper, Zaprinast was used to block pyruvate uptake. However, the rationale to use this compound, as opposed to the more specific MPC inhibitor UK5099 is unclear.

      We could have used UK5099, but we had used Zaprinast in past studies (Du J, Cleghorn WM, Contreras L, Lindsay K, Rountree AM, Chertov AO, Turner SJ, Sahaboglu A, Linton J, Sadilek M, Satrústegui I, Sweet IR, Paquet-Durand F, Hurley JB. Inhibition of mitochondrial pyruvate transport by Zaprinast causes massive accumulation of aspartate at the expense of glutamate in retinas. J Biol. Chem, 288:36129-40, 2013) and so we knew that in our hands that it blocked pyruvate mitochondrial uptake and would therefore be a good test of the rapid transfer of pyruvate across the plasma membrane.

      -Throughout the paper, the authors list 'COVID-19' as a potential application. It is not clear how this technology could be used in the context of COVID-19.

      Reference to COVID-19 has been removed.

    1. Author Response:

      Reviewer #2 (Public Review):

      The authors evaluated whether their previously published model-based predictions of strategies to take an uneven step during walking agree with new experimental observations. Predictions were obtained under the assumption of optimal control based on a simple mechanical model of walking (rimless wheel). The optimality criterion was minimization of mechanical work. Experimental observations supported the following key model outcomes: (1) compensation steps occurring around the uneven step (as opposed to either after or before), (2) pattern of speed fluctuations before and after the uneven step, (3) scaling of speed fluctuations with gait speed. The paper thereby provides additional support for the optimal control hypothesis. The claim that 'humans compensate with an anticipatory pattern of forward speed adjustments, with a criterion of minimizing mechanical energy input' might be somewhat strong, given that the model relied on a set of limiting assumptions (see also below), reasonable alternative modeling assumptions have not been tested, and only a subset of the model predictions published earlier have been evaluated. The conclusions of the paper could be strengthened by:

      1) demonstrating that the predictions also hold for a model with variable step length;

      We have added predictions for step lengths that increase with speed, according to the pre-ferred step length relationship (Fig. 5c). The optimization shows that “A basic pattern for optimal speed fluctuations retains approximately the same shape across different overall walking speeds, fixed step lengths, or even step length changing according to the human preferred step length relationship (Fig. 5d)… we expect a single basic pattern, treated as a sequence of discrete speed fluctuations (Fig. 5a) to predict optimal responses regardless of an individual’s average speed, step length, or the preferred speed and step length relation-ship.” (Methods, Model)

      2) demonstrating that optimal control also predicts no time lost due to stepping up as compared to walking on even terrain;

      As discussed above, we have clarified that the optimal control was constrained (rather than predicts) to conserve time. (And that subjects were asked to approximately conserve time without feedback.) We have also added a figure demonstrating how the model loses time from the Up-step, if not constrained to do so (Fig. 1b).

      3) demonstrating that the number of compensation steps N that minimizes work corresponds to the observed number of compensation steps;

      As discussed above, we have added explanation that N was not a hypothesis, and was a pa-rameter chosen large enough to capture compensations local to the perturbation. We chose N to be more than large enough to encompass the compensation, for both model and hu-man. We do not hypothesize a particular value for N, because the model very smoothly de-parts from and rejoins steady walking, making it difficult to identify a precise N from noisy human data.

      4) demonstrating that minimal work leads to better agreement between simulations and observations than other plausible optimality criteria;

      We have tested three other plausible criteria that did not match human experiment: No compensatory reaction (constant push-offs, Fig. 1b), reactive feedback after the perturba-tion (Fig. 1c.), and tight speed regulation to avoid speed fluctuations altogether (Fig. 1d). These alternative strategies are rejected in Results.

      5) demonstrating that the predicted dependence of speed fluctuations on step height is in agreement with experimental observations.

      We have reworked the scaling test, and instead of simultaneously testing both speed and step height scaling, these are now performed separately, and the methods explained in more detail. The average Up-step response is shown to approximately predict the fluctua-tions observed for the two other step heights (0 cm and -7.5 cm).

    1. Author Response:

      Reviewer #2 (Public Review):

      revious studies showed reduced structural and functional connectivity of two hemispheres in the autism spectrum disorder (ASD), but little is known about its cellular mechanism. This paper tried to fill this knowledge gap using a mouse model of ASD. By combining optogenetics, slice electrophysiology, and Fmr1 gene knockout (KO) approaches (a leading monogenic cause of ASD), the paper demonstrated that callosal inputs to L2/3 pyramidal neurons in whisker somatosensory cortex is reduced in Fmr1 KO mice/neurons compared to wild-type mice/neurons. This reduction was due in part to the selective reduction in AMPA receptor-containing synapses and was restored by sensory input deprivation. The local circuit connection was unaffected in a sparse Fmr1 KO mouse. The paper also recapitulated previously reported the reduced coherence between two hemispheres in vivo. The data is a welcome new addition to the previous studies concerned with circuit abnormality in Fmr1 KO mice and supports the paper's main claim.

      The strength of the paper is the simultaneous comparison of the callosal inputs to L2/3 neurons with or without the Fmr1 gene in the same brain slice. This directly demonstrated the role of the Fmr1 gene on the formation of the callosal synapse, a key piece of information that could guide future basic, translational, and/or therapeutic studies.

      The weakness of the paper is the sparse Fmr1 KO model. An apparent delay in onset of Fmr1 KO effect and unclearness in the extent of Fmr1 KO achieved in one hemisphere makes comparison with global Fmr1 KO model (in this study or previous studies) and assessment of data interpretation made in the paper difficult.

      We include a new supplementary figure (Figure 2 – figure supplement 1) of representative images and quantification of the sparseness of the Cre-GFP expression in our experimental paradigm. We calculate that 3-5% of cells are virally transfected with Cre-GFP. Given the sparseness of Cre-GFP expression, we would expect sparse Fmr1 deletion in other cells in the cortex or other brain regions should be minimal and importantly equally affect inputs onto the recorded postsynaptic Fmr1 KO neurons and their WT neighbors, we think our observation should reliably reflect the function of postsynaptic Fmr1/FMRP.

    1. Author Response:

      Reviewer #1:

      A role for integrins in lowering the threshold for B cell activation was first observed over 15 years ago, but the mechanism has remained elusive. In this paper, Wang et al. investigate the role of LFA-1:ICAM-1 ligation in B cell synapse formation using live-cell super-resolution fluorescence microscopy in both primary B cells and the A20 B cell line. The use of super-resolution imaging is critical to the investigation as it reveals a level of organisation of the actomyosin network that is not visible with conventional microscopy approaches such as TIRF microscopy. They find that LFA-1:ICAM-1 ligation promotes the formation of actomyosin arcs that regulate various activities in the B cell synapse including BCR signalling, BCR:antigen microcluster transport, and the centralisation of antigen. In agreement with earlier studies, they show that LFA-1:ICAM-1 ligation is required for B cells to centralise antigen that is present at very low density. They also demonstrate that myosin IIa contractility is required for the formation of the actomyosin arcs and promotes the exertion of strong traction forces on the antigen- and ICAM-1-presenting substrate. Using a small molecule inhibitor of formin activity in combination with miRNA knockdown of the formin mDia1, the authors show that the actomyosin arcs originate at the outer edge of the synapse and that their generation is formin dependent. These data provide a much-needed advance to our understanding of the role LFA-1 plays in the earliest events in B cell responses to antigen.

      The conclusions of the paper are mostly well supported by the data, but there are a few points that would need to be clarified.

      1) The requirement for LFA-1:ICAM-1 ligation in the formation of the actomyosin arcs is not clear. The authors observe that ~30% of B cells form actomyosin arcs with anti-IgM stimulation only (Figure 1). Does LFA-1:ICAM-1 ligation simply stabilise the arcs and therefore make their appearance more likely, or does it promote the formation of a distinct actomyosin network with unique functions? The images and videos selected to represent cells stimulated with anti-IgM only (Figure 1; Movies 1A and 1B) seem form a highly branched actin network throughout the synapse, but it would be informative to see cells having the actomyosin arcs for comparison. Since B cells stimulated with anti-IgM alone are capable of signalling and centralising antigen, it would be interesting to know whether and how these two populations (with and without arcs) differ.

      We thank the reviewers for their questions regarding this central aspect of our study. In response to the reviewers’ statement “The requirement for LFA-1:ICAM-1 ligation in the formation of the actomyosin arcs is not clear”, our results state that “Consistently, scoring B cells for the presence of a discernable actin arc network showed that the addition of ICAM-1 increases the percentage of such cells from ~30% to ~70% (Fig. 1G).” Importantly, we then state that “dynamic imaging showed that the arcs in cells engaged with anti-IgM alone are typically sparse and transient (Movies 1A and 1B), while those in cells engaged with both anti-IgM and ICAM-1 are dense and persistent (Movies 2A and 2B).” To emphasize this point, which we think is clear when comparing Movies 1A/1B to Movies 2A/2B, we have now added the following two sentences to the text: “In other words, when B cells receiving only anti-IgM stimulation do form discernable arcs (see, for example, those marked by magenta arrows in Fig. 1A and 1B), they are much sparser and less robust than those formed by cells also receiving ICAM-1 stimulation. Moreover, we never saw even one B cell receiving anti-IgM stimulation alone that possessed a robust actin arc network.” Please note that the magenta arrows in Fig. 1A and 1B were added upon revision. In summary, the cell shown in Fig. 1E, which lacks discernable arcs, is representative of ~70% of anti-IgM stimulated cells, while the cell shown in Fig. 1F, which possesses a robust arc network, is representative of ~70% of anti-IgM+ICAM-1 stimulated cells.

      We would also like to address what we think is a misunderstanding regarding our images in Figure 1, as reflected in reviewer 1’s statement: “The images and videos selected to represent cells stimulated with anti-IgM only (Figure 1; Movies 1A and 1B) seem form a highly branched actin network throughout the synapse”. The outer, Arp2/3-generated, branched network comprising the dSMAC/lamellipodium in primary B cells is really quite thin under both stimulation conditions (please see Fig. 1, E1, E2, F1 and F2). In other words, we would not characterize the region between this thin, outer, canonical branched actin network and the central actin hypodense area (i.e. the region corresponding to the pSMAC) in B cells engaged with anti-IgM alone as “a highly branched actin network throughout”. We described it in the text as “a highly disorganized mixture of short actin filaments/fibers and actin foci”. While it likely contains some branched filaments, it is not a canonical branched actin network like the one comprising the dSMAC. Indeed, it is a lot like the mixture of actin asters, actin foci, branched actin and linear filaments described in Hela cells using the same imaging technique ((Fritzsche et al., 2017); we have now cited this paper). Of note, A20 B cells make a much bigger branched actin/dSMAC/lamellipodium than do primary B cells (compare the image of the representative A20 B cell in Fig. 1J to the various images of primary B cells in this figure). Interestingly, this difference between immortalized cells and primary cells is conserved in T cells, as Jurkat T cells make a much bigger branched actin/dSMAC/lamellipodium than do primary T cells (Murugesan et al, JCB 2016).

      Although the reviewers did not specifically comment on why only ~70% of primary B cells engaged with both anti-IgM and ICAM-1 make actomyosin arcs, we note that this is also the case for both Jurkat T cells and primary T cells (Murugesan et al, JCB 2016). We do not know why the number does not go to 100%, but the ~70% limit is the case for both B cells and T cells. Of note, in unpublished work we see that LFA-1 ligation also promotes actomyosin arc formation in T cells.

      With regard to the reviewers’ question “Does LFA-1:ICAM-1 ligation simply stabilize the arcs and therefore make their appearance more likely, or does it promote the formation of a distinct actomyosin network with unique functions?”, we think that ICAM-1 engagement likely leads to the strong activation of RhoA, which then serves to drive both the formation of actin arcs by recruiting, unfolding, and activating mDia at the plasma membrane, and the stabilization and concentric organization of these arcs by activating myosin 2A filament assembly and contractility. In other words, we think ICAM-1 engagement leads simultaneously to the creation and stabilization/organization of the arcs. While it is true that BCR stimulation alone activates RhoA signaling to some extent (see Saci and Carpenter, Mol Cell 2005 and Caloca et al, J Biol Chem 2008), and that this may account for the sparse actin arcs seen in cells stimulated with anti-IgM alone, it is likely that RhoA signaling is more robust with the addition of integrin co-stimulation (Lawson & Burridge, 2014) and that this would promote the creation of the actomyosin arcs seen in these cells. That said, without independent measures of the creation and stabilization/turnover of the arcs, we cannot gauge the relative significance of creation versus stabilization/turnover in determining the steady state amount of arcs. To address this limitation, we have added the following sentence to the section of the Discussion dealing with integrin-dependent signaling pathways leading to actomyosin arc formation: “Finally, future studies should also seek to clarify the extent to which integrin ligation promotes the formation of actomyosin arcs by driving their creation versus stabilizing them once created.

      With regard to the reviewers’ comment that “B cells stimulated with anti-IgM alone are capable of signalling and centralising antigen” we would like to emphasize that our study focuses on B cell immune synapse formation under limiting antigen conditions, where a previous study (Carrasco et al. Immunity 2004) and our data in Fig. S5 show that the impairments in BCR signaling and antigen centralization seen under this condition are rescued by integrin co-stimulation. We expand upon these findings by showing in Figures 5 and 6 that this integrin-dependent rescue of antigen centralization and BCR signaling requires actomyosin. In other words, the actomyosin arc network described here is required for integrin co-stimulation to promote antigen centralization and signaling under limiting antigen conditions. We agree with the reviewer that under non-limiting antigen conditions B cells can signal and centralize antigen in the absence of ICAM-1. That said, these high levels of BCR stimulation are probably not as physiological as limiting BCR stimulation. Finally, our data in Figure S7 shows that antigen centralization in primary B cells receiving non-limiting anti-IgM stimulation alone is also significantly impaired when myosin is inhibited. This suggests that cells receiving high levels of BCR stimulation employ myosin in some fashion to drive antigen centralization. We now close the section describing these results with the following statement: “That said, additional experiments should help define exactly how myosin contributes to antigen centralization in B cells receiving only strong anti-IgM stimulation."

      Finally, and most generally, we avoided the use of the word “requirement” as in the reviewer’s statement “the requirement for LFA-1:ICAM-1 ligation in the formation of the actomyosin arcs is not clear”. Given that some B cells receiving only anti-IgM stimulation create arcs (albeit sparse and transient), we were careful to say throughout the text that ICAM-1 engagement “promotes” actomyosin arc formation. We think our evidence for this is compelling.

      2) The authors propose that the contractile actomyosin network formed in the presence of LFA-1:ICAM-1 interactions promotes B cell activation especially at low antigen concentrations; however, their data focus only on early signalling (pCD79a and pCD19) and it would be helpful to know whether LFA-1:ICAM-1 interactions impact signalling further downstream.

      We thank the reviewer for this important suggestion, which we will address in a future study.

      3) The observation that some GC B cells centralise antigen is very interesting, but there are a few aspects of this investigation that should be expanded upon. The authors show that with LFA-1:ICAM-1 interactions, GC B cells are about equally likely to organise BCR:antigen complexes into peripheral clusters and centralised clusters. It would be informative to have, in the same study (Figure 7), a comparison with GC B cells stimulated with antigen alone. The reason is that other studies investigating GC B cell synapse architecture did not quantify antigen organisation in this way, so it is difficult to make comparisons with previous work. It would also be very useful to see how the actomyosin network is organised in GC B cells exhibiting different synaptic architectures (i.e. peripheral versus central clusters), especially given the critical role of myosin IIa activity in GC B cell antigen affinity discrimination. Additionally, while it is a very interesting observation that LFA-1:ICAM-1 interactions may affect GC B cell synapse organisation, it is not clear whether this has an impact on cellular function. For instance, does antigen and actomyosin organisation in GC B cell synapses contribute to differences in signalling or traction force generation? In the introduction the authors suggest that actomyosin arcs contribute to antibody affinity maturation (line 87-88), but without functional studies to support this claim I think it is too speculative.

      We thank the reviewer for their comments and suggestions regarding our GC data. Our sole purpose in performing the experiments in Figure 7 was to see if GC B cells can also make actomyosin arcs. We did this because recent papers and reviews state that the organization and dynamics of actin at GC B cell synapses are completely different from the organization and dynamics of actin at naive B cells synapses. As such, these initial observations are meant to add to previous work on GC B cells rather than generate direct comparisons. The reviewers appear to agree that the data in Figure 7 shows convincingly that a subset of GC B cells can make actomyosin arcs that are indistinguishable in appearance from those formed by naive B cells (so the specific claim we are making does not “require additional supporting data”). Rather, the reviewers request that we expand on the data in Figure 7 in several ways, some of which we had already mentioned in the Discussion (“While additional work is required to prove that the subset of GC B cells with actomyosin arcs are the ones that centralize antigen, this seems likely given our evidence here that actomyosin arcs drive antigen centralization in naïve B cells.”, and “Future work will also be required to understand why GC B cells vary with regard to actomyosin organization and the ability to centralize antigen 18 (e.g. dark zone versus light zone GCs)”). In addition to these statements, we now end the section describing the results in Figure 7 with the following statement: “We note, however, that our conclusions regarding actomyosin arcs in GC B cells require additional supporting data that include testing the ICAM-1 dependence of actomyosin arc formation and quantitating the contributions that this contractile structure makes to GC B cell traction force, signaling, and antigen centralization.”

      With regard to the reviewers concerns indicated by their comment “In the introduction the authors suggest that actomyosin arcs contribute to antibody affinity maturation (line 87-88), but without functional studies to support this claim I think it is too speculative”, we have changed the relevant sentence to “Finally, we show that germinal center (GC) B cells can also create this actomyosin structure, suggesting that it may contribute to the functions of GC B cells as well”.

      Reviewer #2:

      The manuscript utilizes elegant imaging tools to describe the contractile actomyosin arcs, induced by integrin-ligation, and their involvement in antigen gathering in B cells. The findings are important and have the potential to make a considerable impact in the field. The main conclusions are well supported by strong data and the manuscript convincingly brings across the need of integrin-ligation to induce generation of the arc network and the role of this structure in antigen gathering. The methods and the quality of imaging are state-of-the-art and provide an important example for future studies in B cell immune synapse. Some aspects of the study would benefit from clarification and extended experimentation or analysis.

      1) In addition to cultured B cells, the work includes naïve primary B cells as well as isolated germinal center B cells. While the use of primary cells adds value to the study, in most cases the cells are activated first with LPS prior to transfection with F-Tractin constructs. Such a treatment is likely to alter the cytoskeletal features of the naïve B cells and, thus, it would be informative to provide an analysis of this effect.

      We thank the reviewer for commenting on this. To clarify, we treated primary B cells with LPS to promote cell survival during the harsh nucleofection/electroporation conditions that otherwise kill these fragile cells. Moreover, the cells were rested for 24 hours post-nucleofection in the absence of LPS to promote return to a resting state, as previously described (see(Freeman et al., 2011)). Moreover, only those primary B cells used for live cell imaging of the F-actin using the F-actin reporter F-Tractin were LPS treated. The majority of our experiments employed non-treated ex vivo B cells that were fixed, stained and imaged for quantitation. Importantly, under conditions of ICAM-1 co-stimulation, the actomyosin arcs formed by ex vivo B cells and by LPS-activated cells were indistinguishable. For example, compare the F-Tractin-expressing cell in Fig. 3A to the non-treated cells in Fig. 3D and Fig. 7A. To summarize, then, only live-cell imaging experiments that required F-Tractin to visualize F-actin dynamics were performed using LPS-activated B cells. Finally, we clarified in the Methods that we refer to all primary B cells as “naïve” B cells because they had not been previously activated by antigen at the time of antigen stimulation.

      Reviewer #3:

      The work 'A B cell actomyosin arc network couples integrin co-stimulation to mechanical force-dependent immune synapse formation' by Wang et al. describes the importance of integrin mediated B-cell co-stimulation for IS formation in B-cells by fostering the formation of myosin II A driven actin arcs that are essential in the transport of IgM clusters towards the IS center.

      The work presented here, i.e. experiments and analysis, is very thoroughly done and includes tests and controls using different labelling strategies and constructs of myosin II A, multiple cell types including primary cells and a range of chemical inhibitors to rule out artefacts.

      The authors claim that the observation of actin arcs in B-cells co-stimulated by ICAM-1 - LFA-1 interaction is important for the efficient activation of B-cells in the presence of limiting levels of anti-IgM and this is very well supported by the experiments. However, it was a bit surprising that the paper did not draw much of parallels between the observed phenomenon and the reported actin arcs in activated T-cells even though some of the authors were very much involved in such work on T-cells. If there is a good reason to believe there is no ground to draw comparisons, this would then also need to be highlighted by the authors.

      We thank the reviewer for their comments. We have now added the following two sentences to the Discussion: “It is also important to note that the contractile actomyosin arcs described here in B cells and the actomyosin arcs described previously in T cells (Murugesan et al., 2016) share much in common as regards formation, organization and dynamics (Hammer et al., 2019; Wang & Hammer, 2020). Going forward, it will be vital to define how these two immune cell types harness the same contractile synaptic structure to accomplish different goals (i.e. antibody production by B cells and target cell killing by T cells).”

      The work on establishing the drivers of actin arc formation and dynamics is well done, but it is important to note that previous work has analyzed actin arc formation in other cell types. Work by Bershadsky has already established many 'ground rules' for the formation of actin arcs and the role of integrin adhesion, formin activity and myosin II in the process (Tee YH, Shemesh T, Thiagarajan V, Hariadi RF, Anderson KL, Page C, Volkmann N, Hanein D, Sivaramakrishnan S, Kozlov MM, Bershadsky AD. 2015. Cellular chirality arising from the self-organization of the actin cytoskeleton. Nat Cell Biol 17:445-457. doi:10.1038/ncb3137). It might be very instructive if the authors could put their findings in relation to this work.

      The formation of actin arcs is also well studied in U2OS cells and the results presented here could highlight interesting general features of this process observed in very different cell types (Tojkander S, Gateva G, Husain A, Krishnan R, Lappalainen P. 2015. Generation of contractile actomyosin bundles depends on mechanosensitive actin filament assembly and disassembly. Elife 4:1-28. doi:10.7554/eLife.06126; Bur-nette DT, Shao L, Ott C, Pasapera AM, Fischer RS, Baird MA, Der Loughian C, Delanoe-Ayari H, Paszek MJ, Davidson MW, Betzig E, Lippincott-Schwartz J. 2014. A contractile and counterbalancing adhesion system controls the 3D shape of crawling cells. J Cell Biol 205:83-96. doi:10.1083/jcb.201311104).

      In this regard, the findings about the importance of myosin II A activity, integrin adhesion and mDia1 in the formation of actin arcs is not that surprising and the authors might rather highlight the important role of these newly studied structures for co-stimulation in B-cells as this is the more novel and insightful bit of the work.

      We thank the reviewer for their comments. Indeed, our prior work in T cells (Murugesan et al., 2016; Yi et al., 2012) also linked formin activity and myosin 2 contractility to the formation of actin arcs and the generation of integrin-based adhesion. We now cite the papers highlighted by the reviewer using the following sentence in the revised Discussion: “It is important to note here that several earlier studies performed using other cell types have also linked formin activity and myosin 2 contractility to the formation of actin arcs and the generation of integrin-based adhesions (Burnette et al., 2014; Tee et al., 2015; Tojkander et al., 2015).” As for highlighting the relevance of our results for the B cell field, we think we have done that by demonstrating the existence of this contractile network in B cells, and by showing that it provides mechanistic insight into how integrin co-stimulation promotes synapse formation and B cell activation when antigen is limiting. Given that many recent studies of actin cytoskeletal dynamics in B cells were performed in the absence of LFA-1 ligation, we think our findings invite a critical “reset” for the way in which future B cell studies should be approached by highlighting the need for integrin co-stimulation when examining the roles of actin and myosin in B cell activation.

    1. Author Response:

      Reviewer #1:

      In this manuscript, Angela Kim et al. use a combination of in vitro and in vivo studies to determine how glucose-control of central AVP release controls pancreatic alpha-cell calcium influx and glucagon secretion to modulate blood glucose homeostasis. The manuscript clearly shows that activation of AVP release from magnocellular AVP neurons stimulates pancreatic islet glucagon secretion. Furthermore, the manuscript finds AVP (measured by circulating Copeptin) is elevated in plasma following insulin induced hypoglycemia, which also activates AVP neuron electrical excitability and calcium entry. To confirm that AVP release stimulates glucagon secretion via islet alpha-cell Avpr1b activation, both Avpr1b antagonists and an Avpr1b-/- mouse model were utilized. Finally, the manuscript looks at plasma AVP in humans undergoing a hypoglycemic clamp; while this results in AVP release in non-diabetic controls, AVP release is blunted following hypoglycemia in type-1 diabetic patients. Based on an extensive amount of high-quality data, the authors conclude that AVP release from magnocellular AVP neurons is involved in regulating glucagon secretion in response to hypoglycemia. The manuscript is well written and easy to follow. As the exact mechanism that controls glucagon secretion is still unknown, this manuscript adds important information for the diabetes research community detailing the importance of CNS control of islet glucagon secretion through glucose regulated AVP release. Overall, this is an excellent manuscript that will be very useful to the diabetes research community.

      We are grateful for the reviewers encouraging remarks and constructive feedback on our study.

      Reviewer #2:

      The authors cover a lot of ground, physiologically, by expanding from the islet up through multiple regions of the brain, but they do so in a manner that is stepwise and logical. And in the end, their efforts in probing further and further up the pathway results in a clean model of hypoglycemia sensing through to glucagon release. How AVP fits into counterregulation has been unclear, but Briant and colleagues are filling that gap. The paper is well written, the data are of high quality and well presented.

      We are grateful for the reviewers encouraging remarks and helpful feedback on our study.

      Specific comments:

      • Lines 137-139 state that reducing glucose from 8 to 4 mM does not stimulate glucagon from ex vivo islets, but the experiment does not appear to show glucose being reduced. Rather, islets were incubated in separate glucose concentrations and the glucagon from the separate wells was then measured. Methods indicate that islets were incubated at 3 mM prior to each treatment, so glucose was actually raised from 3 to 4 mM and separately from 3 to 8 mM. Suggest either changing the wording or show a perfusion secretion experiment demonstrating the drop from 8 to 4 mM.

      The reviewer is correct in that the isolated islet experiments in Figure 1 are static secretion experiments. We have now reworded this section to make this clear (Line 140-145).

      • Lines 195-196 & Figure 3f: If YM254890 blocks AVP-induced calcium, please indicate with statistics comparing frequency in AVP and AVP + YM

      Thank you for pointing out this. The p-value for this comparison (p=0.99) has now been added to the figure legend (Lines 552-553), with statistics indicated in the figure legend. However, the important point we tried to make here is that AVP has no effect in the presence of the inhibitor (YM vs. AVP + YM).

      • Adrenergic signaling as a method of physiological glucagon stimulation is dismissed multiple times, yet is not tested/compared with AVP. The known and robust activation of calcium by AVP in alpha cells notwithstanding, epinephrine is a strong activator of alpha cell calcium responses and glucagon secretion. In multiple panels of the paper, blockade or deletion of Avp1rb reduces, but does not prevent hypoglycemia-induced glucagon secretion, which demonstrates that AVP is not the only signal stimulating alpha cells under these conditions.

      We were certainly not suggesting that adrenaline is not important. The point we tried to make was that circulating levels of adrenaline are too low to stimulate glucagon secretion. In light of this comment, we have softened this section and we now acknowledge that adrenaline may contribute (Lines 371-377).

    1. Author Response:

      Reviewer #1 (Public Review):

      Summary of what the authors were trying to achieve

      Background: Myopia (short- or near-sightedness) is an ocular disorder of increasing concern to human individuals and health-care systems; these days one speaks of a "myopia epidemic" in developed countries. Usually it is due to excessive elongation of the optic axis of the eye during the ages of most rapid growth (ca. 5-16 years in humans), causing images of distant objects to be blurred at the retinal photoreceptors. The optical error can be corrected with lenses or corneal surgeries, but this does not reduce the risk of continued progression and vision loss. Despite extensive epidemiological and animal studies in the past several decades, the underlying causal mechanisms remain poorly known, and therapeutic options are limited. Therefore, further discovery of new candidate mechanisms, drug targets and drugs for inhibiting the onset and progression of myopia is urgently needed.

      Rationale: The axial length of the eye is regulated mainly by qualities of the visual environment, including light intensity, spectrum, and spatiotemporal characteristics of images on the retina. Thus the retina encodes and integrates visual information over time, and ultimately sends regulatory "grow" or "stop" signals via the choroid - a vascular plexus behind the retina - to the sclera, the fibrous outer coat of the eye. Changes in size (area) of the sclera are responsible for changes in axial length, and thereby, refraction. The choroid is in a critical position, not only to relay "stop" or "go" messages to the sclera, but also potentially to critically modify those signals (or generate signals of its own) and further modulate ocular elongation and refraction. Importantly, very little is known about how the choroid fulfills either of these roles.

      Aims of the Study: The authors' purpose was to test, in juvenile chicken models, whether the 'pro-inflammatory' cytokine, interleukin-6 (IL-6) - synthesized and released in the choroid - might play a key role in the developmental regulation of axial elongation and refraction of the eye.

      Major strengths and weaknesses of the methods and results

      Strengths:

      1) The studies are focused on the choroid, which must be important in regulating ocular growth and refraction, but whose role is still not well understood

      2) Expert use of front-line tools for quantifying mRNA and protein (microarray, RT-PCR, ELISA)

      3) Immunohistochemistry: Good choice of antibody (raised to chicken IL-6), appropriate specificity control (preabsorption with chicken antigen)

      4) IL-6 mRNA in choroid was impressively increased during recovery from form-deprivation myopia (FDM) (preliminary results, Fig. 2) - i.e., during strong positive (myopic) defocus - a defocus-dependent effect confirmed by a similar effect of lens-induced myopic defocus (Fig. 5).

      5) Good data for the time-course of IL-6 mRNA content in choroid, with some confirmation of protein levels (though at only 2 treatment intervals) (Fig. 3)

      6) Choroidal IL-6 mRNA also shown convincingly to increase, going from darkness to light (Fig. 4).

      7) It's clever to compare the growth- and myopia-inhibiting effects of positive defocus, with those of other treatments known to do the same - in this case, atropine and nitric oxide (NO). The evidence shows that the effects of these agents on choroidal IL-6 mRNA are similar to the effect of positive defocus, with an NO-donor increasing the amounts of IL-6 mRNA and protein in isolated choroid (Fig. 7), and a NOS-inhibitor decreasing the mRNA levels at an intravitreal dose that inhibits scleral growth (Fig. 6).

      8) If my calculations are correct, 0.1% atropine sulfate solution has a molarity of something like

      1.3 mM. Since alpha-2A adrenoreceptors are present in the choroid, of mammals at least (e.g., Wikberg-Matsson et al., 1996, Exp Eye Res, 63(1):57-66), it might be interesting to explore the possibility that atropine is stimulating IL-6 production in the choroid by acting as agonist via these receptors (cf. Carr et al., 2018, IOVS, 59m2778-2791). The isolated choroid, with IL-6 mRNA and protein synthesis as read-outs, should be an exceptional (and novel) model for testing this and other possible signalling pathways in the choroid.

      Weaknesses:

      1) Immunolocalization of IL-6: The images (Fig. 1) are not good enough to identify cellular localization of immunoreactive structures; identification of RPE is questionable (no DAPI+ nuclei in labeled 'RPE'); nucleated erythrocytes should be visible in vessel lumina.

      We have increased the magnification and resolution of images in Figure 1 to better distinguish immunoreactive cells. Additionally, we have included as Figure 1 - figure supplement 1, both H&E stained and immunolabelled images from adjacent serial sections (both longitudinal and cross sections) of control choroids in order to compare immunopositive cells with the histoarchitecture of the choroid. From these images, one can see that nucleated erythrocytes are located in some of the vessel lumina. The nuclei of the RPE label weakly as compared with those of choroidal fibroblasts or nucleated blood cells. In order to visualize the RPE nuclei, we had to increase the intensity of the DAPI channel (blue, 405 nm) to a level that is not optimal for viewing IL-6 immunolabelling (green). Therefore, we included an additional supplemental figure (Figure 1 – figure supplement 2) in which RPE nuclei are readily visible.

      2) Many important details of methods have been left out. Spectral peaks of LED light-sources need to be given, lines 409-412; that's just one of many examples.

      We have included graphical and tabular data describing the frequency spectra for each of the three LED light sources used in the present study (Figure 4 - figure supplement 1), and in Sheet 3 of the source data file for Supplementary figures. Additionally, we have added the method of obtaining the frequency spectra in the methods section (lines 499 – 501).

      We have also included details of the antigen retrieval procedure performed during histological processing of ocular tissues and details on the methodology and analysis of microarray data (lines 544-547).

      3) Intensity (illuminance) of "red" and "blue" lights seems unnecessarily low (58 and 111 lux, respectively, far below the "medium" and "high" intensities of white lights that were used; Fig. 4).

      We agree with this comment. The intensities of the red and blue LED lights were limited by the red LED lights. We set them at their maximum intensity setting which registered at 58 lux. To compare their effect with that of the blue light, we felt we should set the blue LED lights at a similar setting, which required us to use its lowest setting, which registered at 111 lux. We realize that these intensities are low, compared with the medium and high settings of the white LED lights. Perhaps at higher intensities we might have observed a differential effect on IL-6 mRNA with red light compared to blue light. We have added this explanation and possibility in the Discussion section (lines 333 – 353).

      4) Also, given that red and blue lights have been found to have opposite effects on FDM in chicks (e.g., Wang et al., 2018, IOVS, 59(11):4413-4424), the similarity of IL-6 responses to red and blue in the present study strikes me as a point against a role for IL-6 in regulation of eye growth.

      We respectfully disagree with this statement. The Wang et al., paper reported that continuous exposure to red LED lights for five days had no significant effect on refraction or axial length in either control eyes or form deprived (myopic) eyes. In contrast, continuous exposure to blue LED light caused a significant hyperopic shift in refraction in both control and form deprived eyes, but had no significant effect on axial length in either control or form deprived eyes (although a trend toward a decrease in axial length was observed after five days in both control and form deprived eyes). Since refraction was significantly affected (in blue light-reared chicks), but axial length was only minimally affected, we suspect that continual exposure to blue light may have affected other ocular parameters (such as corneal curvature) that would have a significant impact on refraction. We predict that choroidal IL-6 expression is involved in the choroidal and scleral remodeling processes at the posterior pole of the eye that result in changes in vitreous chamber elongation, as opposed to having effects on the anterior segment of the eye. Our data shows short term (6 hr) exposure to red or blue LED light had no effect on IL-6 gene expression. If IL-6 gene expression is involved in the regulation of eye growth, we would expect that exposure to red or blue LED light would have no effect on scleral remodeling, vitreous chamber depth, or axial length, which in fact, is consistent with the results of the Wang et al., paper. We have added this interpretation in the Discussion section of the paper (lines 333 – 353).

      5) I admire the thoroughness of confirming that some of the treatments did in fact have the predicted effects on ocular enlargement, by performing assays for scleral proteoglycan synthesis. This might not be essential to this work, although it is well done, and the scleral data won't detract from the value of the paper if retained. But the induction of opposite effects on eye (scleral) growth by such manipulations is well established, and much simpler (cheaper and faster) refraction and/or caliper measurements would have served the same purpose.

      We elected to use scleral proteoglycan synthesis as a “read out” for axial elongation, since we can detect significant changes in scleral proteoglycan synthesis much earlier (within 6 hrs) than we and others can detect changes in axial length or vitreous chamber depth in chick eyes (≥2 days). Since our studies involved very short term exposure to myopic defocus (6 hrs), we felt measurements of scleral proteoglycan synthesis would be more likely to establish causal relationships between IL-6, nitric oxide and scleral remodeling.

      6) I don't buy the argument that the source of NO is not in the choroid (lines 337-340), based on the failure of L-Arg to change significantly the amount of choroidal IL-6 mRNA (Fig. S1). Several thoughts come to mind here: (a) It is solidly established that the choroid is richly innervated with NO-synthesizing nerve fibres, and that its content of NOS is very high [e.g.: "NOS activity is widely distributed in the eye, (choroid > retina > CP > TM) …"; Geyer et al., 1997, Graefes Arch, 235(12):786-93; also (among others): Wu et al., 2007, Brain Res., 1186m155-63; Hashitani et al., 1998, J Physiol, 510(1):209-223; Fischer & Stell, 1999, cited in the present MS.]. So, there clearly are sources of NO within the choroid, in chicks as in mammals. (b) It's extremely unlikely that "NO, released from the retina … diffuses to the choroid to stimulate IL-6 synthesis", because NO is highly reactive and has a short half-life, restricting its diffusion. But yes, NO generated by iNOS in the RPE certainly could reach choroidal targets; is there anything in the literature to indicate that iNOS mRNA and protein are increased in the RPE, under conditions or treatments that inhibit axial elongation? (c) The critical experiment to test this idea - treating the isolated choroid with a NOS-inhibitor, to block synthesis of NO by cells in the choroid - was not performed here.

      That would be a complicated and difficult exercise, however, requiring the invention of a way to stimulate NOS activity to a new base-level, and then being able to detect effects due to the inhibition of NO-synthesis. It would be good to discuss the issues raised in this point, but acceptable to suggest this as another of the questions that would be suitable to address by further experimentation beyond the scope of this paper.

      Based on these comments (and the comments below) by Reviewer #1, we carried out additional experiments on isolated choroids using 50 mM KCl to depolarize the plasma membranes of choroidal cells in the presence of L-arginine (0.05 mM – 5 mM). We found that in the presence of KCl, treatment of choroids with L-arginine caused a significant increase in IL-6 mRNA. In contrast, L-arginine in the absence of KCl had no significant effect on IL-6 mRNA (as we found in our original experiments). We interpret these new results to indicate that choroidal IL-6 can be upregulated by endogenous sources of nitric oxide. We have included this data in new Figure 8 of the revised manuscript. We thank the reviewer for providing this insight!

      7) Since it's overwhelmingly likely that NO is synthesized and released locally in the choroid, alternative explanations must be considered for why the NO-donor, PAPA-NONOate, caused increases in IL-6, while L-Arg didn't. Might it have been the case, for example, that the NOS- containing choroidal cells already were fully loaded with L-Arg, under these particular experimental conditions? or that the administered concentration of L-Arg was sub-optimal? or that the proportions of cellular mass to fluid volume in the choroidal samples were highly variable, causing high variance of the individual values? or that the parent compound PAPA- NONOate, however attractive 'his' name, had destinations in mind (molecular targets, actions) in addition to or other than sGC? Any one of these hypotheses might account for the fact that L-Arg reduced the mean level of IL-6 mRNA by almost 50%, but with p=0.14 despite the sample size n=16.

      Please see explanation under item 6 above.

      8) The results of the bulk assays - of whole choroids - are a good beginning, starting to build a map of largely uncharted territory; but they will never be completely satisfactory for constructing signalling pathways or networks for visual regulation of scleral expansion, and will leave one struggling to make sense of it all (cf. lines 345-347). Better immunolabeling, with better image definition and resolution, the addition of single-label images and bright-field images (to locate the RPE securely), and possibly FISH would be helpful for this. If you're rich, have great local resources, and/or are well connected with others who do, scRNA-seq of dissociated choroidal tissue (with RPE and sclera as controls) would have great potential here. If the tissue has been perfused intravascularly or well washed and drained, to get rid of blood cells, there shouldn't be very many cell types to characterize (but, my, wouldn't it be exciting and illuminating if there were!)

      As stated under point 1 above, we have increased the magnification and resolution of images in Figure 1 to better distinguish immunoreactive cells and we have included both H&E stained and immunolabelled images from adjacent serial sections (both longitudinal and cross sections) of choroids in order to compare immunopositive cells with the histoarchitecture of the choroid in supplemental Figure 1 - figure supplement 1. We agree that scRNA-seq would yield valuable insights into gene expression changes amount individual cell populations within the choroid. We feel those studies are beyond the scope of this paper, but are ones that we are currently undertaking.

      9) The relationships between the studies and outcomes reported in this manuscript, and the possible role of choroidal IL-6 and other inflammation-signalling molecules in myopia, is hardly touched upon at all - just a short, very general statement near the end of the Conclusions (lines 368-374).

      We have added additional discussion regarding the possible role of inflammation in ocular growth regulation (lines 358-365, 370 – 372)

    1. Author Response:

      Reviewer #1:

      This manuscript reports theoretical and experimental analyses of a meiotic drive element in the yeast Schizosaccharomyces pombe, to understand whether the outcrossing rate is high enough in this species, long thought to undergo mostly same-clone mating, to explain the spread of multiple meiotic drive elements. The topic is of general interest, the experiments and analyses are clever and sound, and provide interesting answers. The experiments indeed show that the outcrossing rate in the laboratory varies among natural isolates and density conditions, and can be substantial ; the theoretical model shows that the estimated outcrossing rates do allow meiotic drive to spread.

      However, the outcrossing rates measured in the laboratory may be really different from those in nature and population genomic data are available that could allow estimate actual outcrossing rates in natural populations.

      We fully agree that rates measured in the laboratory may be different than those in nature, especially given our observations on outcrossing rates varying under different cell densities. We now note that explicitly in the introduction and the discussion to make that point more clear.

      We disagree, however, that the population genomic data are sufficient to estimate actual outcrossing rates in nature for S. pombe. Our position stems from empirical analyses of what happens when S. pombe isolates outcross. Traditional population genetics models assume Mendelian allele transmission and a baseline recombination rate. These assumptions are strongly violated in S. pombe, so applying traditional population genetics models to S. pombe to determine outcrossing rate is problematic (Hu et al., 2017; Nuckolls et al., 2017; Zanders et al., 2014). The causes of these violations are so complex that amending existing models to adequately predict outcrossing would be a major undertaking in population genetics that is well beyond the scope of this work. We added to the introduction and to the discussion to illustrate the complexity of the situation and explain why we did not provide estimates.

      Indeed, outcrossing rates depend mostly on where and when in nature dispersal and clonal multiplication occur, while laboratory experiments typically use high densities of clonemates on plates. Overall this study brings support to the idea that homothallic fungi probably do not undergo mostly same-clone mating in nature, in contrast to the most accepted view in the fungal literature, but in agreement with evolutionary considerations ; the study and the findings would thus benefit from being placed in the right evolutionary context (doi: 10.1111/j.1420-9101.2012.02495.x ; 10.1111/j.1469-185X.2010.00153.x ; 10.1128/EC.00440-07 ; 10.1038/hdy.2014.37 ; 10.1111/nph.17039).

      We have added citations and statements clarifying that homothallic fungi, and specifically, S. pombe can outcross to the introduction. We also extensively revised the introduction to better contextualize our study within the current assumptions about S. pombe biology and its capacity to harbor selfish genes.

      The terms selfing and outcrossing as used in the manuscript does not correspond to the diploid selfing and outcrossing that occur in plants and animals, and the term can thus be misleading.

      To address this confusion, we now more explicitly define the terms we used in the paper in the introduction. We also have added a supplemental figure to help illustrate S. pombe’s mating process and the terms we use (Figure 1-figure supplement 1).

      Reviewer #2:

      The authors combine cytological, genetic, mathematical, and experimental evolution techniques to connect variation in mating behavior with variation in the population dynamics of meiotic drive in the yeast Schizosaccharomyces pombe. First, the authors use cytological and genetic methods to document variation across strains of pombe in their (i) propensity to inbreed, (ii) efficiency of mating, (iii) rate of mate type switching, and (iv) variability of ascus morphology. These results will be of major standalone interest to the yeast community, and will likely find experimental use in many settings. Then the authors use population genetic modeling to study the theoretical implications of this variation in mating behavior for the spread of meiotic drivers (which have recently been shown to be pervasive in pombe). Finally, the authors use cytological techniques to track the spread of introduced drivers in experimental populations of pombe, and show that the drivers follow frequency trajectories that agree well with predictions from the theoretical analysis. These results will be of major interest to geneticists working on meiotic drive, as well as workers in the currently burgeoning field of synthetic gene drives for population control.

      The analysis is carefully done, and I am confident in the results as presented (with one minor exception detailed below). My only major suggestion for improvement concerns the scope of the population genetic modeling. As it stands, this modeling is primarily used to generate predicted frequency trajectories of meiotic drivers against which the trajectories observed in the evolution experiments can be compared. The fact that the experimental and theoretical trajectories match well is impressive, and very promising for the future of pombe as an experimental system in meiotic drive research. However, substantively, as the authors recognize, this agreement tells us mainly that the population genetic model that they use to generate the predicted trajectories takes into account all relevant parameters and is well calibrated. Thus, from the population genetic modeling and evolution experiments, we get only an indirect picture of how variation in mating behavior has actually impacted the natural spread of drivers in this species.

      I believe that the population genetic modeling, with minor modifications, could in fact be used to make more direct predictions about the natural history of drive in pombe. For example, should strains with less inbreeding harbor more fixed drivers? And in strains with more inbreeding, should drivers---because they have very long fixation times---be more likely to be observed as polymorphisms? Such questions are, I believe, well within reach of the authors' population genetic modeling.

      We agree that this is a very interesting line of inquiry. Populations with more than one driver operating have not yet been considered by population genetic modeling. Given that it has become apparent that genomes housing multiple drivers are not rare, this is also a very important question to address. Your comment emboldened us to attempt to tackle this problem, which we previously considered beyond our reach. Thank you for this push!

      An additional unpublished caveat to reconstructing the natural history of drive in S. pombe is that our lab and Li-Lin Du’s lab have found that wtf meiotic drivers are quite ancient. Fission yeast lineages have been harboring multiple meiotic drivers for over 100 million years and some species have even more drivers than S. pombe! Because of this, we are interested in exploring how multiple drivers are maintained for long periods in addition to addressing how multiple drivers can arise within a lineage.

      We are a long way away from accomplishing our goals on this project and consider this ongoing work to be beyond the scope of this paper.

      A minor concern: To track the spread of a driver introduced in their experimental populations, the authors linked the driving allele to one fluorescent marker (GFP/mCherry) and the non-driving allele to the other (mCherry/GFP), and compared the spread of the one marker relative to the other. To use their model to generate expected frequency trajectories for these experiments, the authors needed to measure, in controlled settings, the intrinsic fitness costs of GFP vs mCherry; they estimate that GFP is relatively costly in sexual (but not vegetative) reproduction. However, their estimates of the relative fitness cost of GFP are based on frequency trajectories across just 6 generations, and assume additive dominance, so that the fitness cost to a GFP homozygote is twice that to a heterozygote. It is unclear how statistically noisy the estimation procedure is given the small number of generations used, and whether it is justified to assume additive dominance (which is especially relevant since the dominance of fitness costs is known to be a critical factor in determining the frequency dynamics of meiotic drive).

      Thank you for this comment. We did use only 6 generations of data for these calculations, but we pooled data from 4 distinct control experiments each of which had 3 independent replicate populations. After 6 generations, fluorescent marker loss becomes a bigger factor in our results and the populations behave less predictability.

      We did not, however, have a good justification for using additive dominance. Because of this, we reran the maximum likelihood and allowed both the fitness cost and dominance to vary. We found that the parameters that best fit our data was a fitness cost of 0.234 and a dominance of 0.083. This fitness cost is similar to our previous value, but this revealed it was incorrect for us to assume additive dominance. We have since updated the paper and the Figures 4 and 5 to reflect the use of these values.

      Reviewer #3:

      Weaknesses

      Even though the experiments find some important parameters for meiotic-driver spread in fission yeast, the results are not sufficient to explain the apparent "success of meiotic drivers in this species". The links that the authors suggest between mating type switching efficiency, the amount of outcrossing, the speed of invasion of the driver and the cost associated with the driver cannot explain the success of drivers. Furthermore, the causality of the different factors is not explained.

      We agree we do not offer a complete explanation of wtf genes in S. pombe, but we claim our work ‘helps explain,’ which we feel is a well-supported claim. Or revised introduction, we hope, better contextualizes our study. The current understanding of S. pombe is that the species inbreeds. This has been hard to understand because one wouldn’t expect the species with the most known meiotic drivers to be an inbreeding species. Our data shows that S. pombe mating phenotypes in the lab are highly variable and include considerable outcrossing. We also show how the range of parameters we observe are consistent with the spread of meiotic drivers under specific conditions.

      That outcrossing increases the speed of invasion is true (see also Durand et al 1997 PMID:9093861), but the argument that 'reduced levels of mating type switching could lead to less inbreeding' is not supported. There are two problems with this statement. First, it is not clear to me if this is theoretically true. If switching occurs infrequently but consistently, the chances of a cell to be positioned to another cell of the opposite mating type either from self or opposite type will probably not be that different. Only in a narrow range of cell density will this probably play a role, however, this should be properly modelled in a structural environment or tested experimentally.

      Thank you for pointing this out. Our modified text makes the support for the argument that switching rates can affect inbreeding more clear. In short, the experiments we present in Figure 1-figure supplement 1 show that inbreeding is increased within an h90 population at low density. The simplest explanation of this observation is that the cells plated at low density have fewer opportunities to mate outside their clonal lineage, so instead they more frequently rely on intra-lineage mating following switching. Cells at higher density have the opportunity to mate outside their lineage, even before a switching event has enabled mating within a lineage.

      Extending that logic, if switching happens less frequently within a clonal lineage, cells within that lineage will need to undergo more generations of mitosis before a switching event enables intra-lineage mating. Those extra divisions will make it more likely that cells of a given lineage will encounter cells of the opposite mating type from a distinct lineage. This would lead to more outcrossing.

      We acknowledge that this considers that mating occurs on a surface and that the cell populations are stationary (unmixed) immediately prior to mating. These are the conditions used for all analyses in this work. These conditions likely do not recapitulate all matings that occur in nature, but we argue they are feasible in nature.

      A comparison between heterothallic and homothallic strains is - contrary to what the authors argue in line 138 - not appropriate for this test, as the first cannot reproduce by selfing. Using strains that have intermediate amounts of mating type switching (e.g. using h90 Sp strains mutant in the switching pathway; Maki et al. PMID:29852001) could give more insight in this. Reduced switching will lead to reduced spore production, because fewer of the cells will be located next to a cell of the opposite mating type (as shown in Nieuwenhuis et al. 2018 PMID:29691402 and by the authors in Fig. 1-S3), but this does not have to affect outcrossing efficiency. This also becomes apparent from the data presented in Figs 1D and 1E, which do not seem correlated.

      Thank you for this comment. We have modified the text to highlight our (previously poorly-expressed) intent of using these heterothallic controls. Briefly, these heterothallic cell mixtures model a randomly mating population in that they have an equal mix of h+ and h- cells of both colors, but cells cannot mate within a clonal lineage. This should necessarily lead to random mating in our assays. This is what we observe, which we interpret as support that our assays work as intended. We then later use this same control to model the effect of random mating in our experimental evolution analyses.

      Second, the authors have not measured mating-type switching, but used the amount of mother daughter matings as a proxy for mating type switching. This method introduces a bias towards the correlation switching and selfing, because the latter is used as a proxy for the first. Fluorescent proteins under control of a mating-type specific promotor is an established method (e.g. Jakočiūnas et al 2013 PMCID:PMC6420890, Vještica et al 2021 PMID:33406066), which will give direct observations of the mating type. The observation that the shmoo length is associated with outcrossing is very interesting, and - without changing switching frequency - appears to affect outcrossing.

      We acknowledge that we did not assay mating type switching directly in Sk. To our knowledge, no one has ever reported an experiment that assayed mating type switching directly in S. pombe. Lineage tracing paired with mating assays, like those presented in Figure 2B, were done to establish the currently accepted mating-type switching model in S. pombe.

      We agree that this is unsatisfying and that factors other than mating type switching could affect the behavior of cells in this assay. We did not, however, exclusively rely on sibling cell matings as support of our argument in support of a reduced rate of mating-type switching in Sk. The foundation of our hypothesis is the discovery of Singh and Klar (2002) that there are fewer switching-inducing DSBs in Sk. We also observed that Sk cells divided more times than Sp prior to mating at low cell density, suggesting they required more divisions to have mating competent cells within a clonal lineage. Still, because we were unable to measure switching directly, we explicitly state that less switching in that strain is an unproven model consistent with the available data.

      There are mating-type reporters in the papers mentioned, but the reporters were not used for lineage tracing of mating type switching in the papers cited or any other papers we could find. Lineage tracing is required to assay switching frequency as a clonal population founded by a cell with reduced switching frequency will produce a population with a balanced number of h+ and h- cells after a limited number of divisions, as long as the two types of switches (h+ to h- and vice versa) occur at the same rate.

      Similar to the reviewer, we were frustrated with our inability to assay switching rate directly. We previously attempted to use the markers described in Jakočiūnas et al 2013 for lineage tracing in Sp (lab isolate) cells. In agreement with the published work, we saw in initial snapshots of the cell population a roughly equal number of yellow (h-) and blue (h+) cells. When we imaged cells over time, however, they did not behave as expected. Most strikingly, all mating events were not between yellow and blue cells, as would be expected if they were absolute markers of h+ and h- cells. Instead, many of the matings were between two blue cells. This could be due to fluorophore carryover and/or delayed accumulation of the new fluorophore after switching. In addition, the fluorophore switching pattern we observed generally did not follow the expectation that 1 out of 4 cells derived from a single progenitor should have a different mating type than the other three cells. These observations were sufficient to convince us that the markers were not suitable, under our experimental conditions, for lineage tracing to assay switching patterns. We have now included a figure documenting our attempts to use this assay as other readers may also be curious why we relied on indirect measures.

      Finally, the authors argue that meiotic drivers are evolving rapidly, can invade fast and that this can occur even when selfing is prevalent. The model seems to contract this. Let's start with the claim that novel drivers can invade in a population. Novel alleles arise at a frequency of 1/N (N = population size, bottom left corner Fig 3A, not at 5% as used in the analyses) and as drive is as strong as the inverse of the population size the fitness difference is initially extremely low giving plenty of time for drift (when driver is neutral) or selection (when driver is deleterious) to remove the novel allele.

      We have added to the analysis in figure 3A to show that under our model (lacking drift) drivers can invade a population that is not exclusively inbreeding with any initial frequency greater than zero (Figure3-figure supplement 1). We have also extended our analysis to include simulations of drift (Figure 3-figure supplement 1). Note that in our models, drive is never neutral as it kills ~half the meiotic products (i.e. the progeny) made by heterozygotes.

      In addition, we note that the frequency of the driver at the time of mating and meiosis is the essential parameter. A local population of S. pombe could be founded by relatively few individuals. If a mutation generating a novel driver occurs during the clonal expansion of this population, it could rise to relatively high frequency within that population before the cells starve and mate. This effect of clonal expansion is the reason microbiologists must do fluctuation analyses to assay mutation rates: there are jackpot cultures (analogous to local populations, founded by a limited number of individuals) with a high frequency of mutants and others with few to no mutations.

      In order for drivers to increase to levels that will give 'rapid wtf gene evolution' (line 112) a prolonged level of mostly drift is probably necessary. It is difficult to make statements about the speed of wtf evolution in the fission yeast system, without having a better description of the variation of the paralogs and their ages in fission yeast. The speed of wtf evolution is not clear, as shown in earlier findings from this group that shows very old wtf loci; Eickbush et al. 2019. Comparing wtf evolution relative to neutrally evolving loci might give more insight in wtf evolution speed. Especially when drive is costly (as suggested by the authors, though not shown or quantified) the time to substantial frequencies is large. It could also be possible that drive itself is beneficial (e.g. resources from the killed spores made available to the killers or through released local competitive pressure), which will lead to increased fitness though combined drive and increased viability, even at low frequencies.

      Eickbush et al 2019 demonstrates rapid evolution of the wtf gene family. We did find in that study that most wtf loci are shared between different isolates of S. pombe. The critical point is, however, that the wtf genes that are at a given locus are generally dramatically different due to rapid evolution. Even when two strains share a driver at the same locus, they generally have distinct sequences and are thus expected to be mutually killing. We have explained this situation more clearly in the revised discussion.

      Because of this rapid evolution and the functional consequences of this variation, which we have demonstrated in cited studies, very subtle changes in wtf gene sequence leads to the birth of a novel driver. Therefore, generation of wtf driver heterozygosity does not require mating between significantly diverged previously isolated populations- a single point mutation can generate a novel driver that self-selects via drive.

      We have directly measured the fitness cost of Sk wtf4 heterozygosity when expressed in Sp, precisely the scenario assayed in this paper (Nuckolls et al 2017). We use that fitness cost in our modeling studies. In addition, the observed changes in our experimental evolution analyses were quite close to expected trajectories. This provides additional support that the parameters we used in these analyses were appropriate for our experimental conditions. We do, however, acknowledge that there are other ecological conditions under which the fitness costs of drivers could differ.

      Minor comments

      The loss of mCherry alleles due to reversion of ura+ occurs more rapidly than that in GFP. It is likely that this variable change in reversion affects the observed change in frequency. This should be corrected for in the raw data.

      We agree that the loss of the fluorescent alleles makes our data noisy, but we do not have precise measurements of these rates. We judged this phenotype did not affect the conclusions in this paper, so we did not invest in correcting this limitation of our system.

      Inbreeding is a term generally used in population genetics, where it refers the the amount of mating between related individuals. Even though it is fundamentally correct, a more appropriate term would be haploid selfing or intra-clonal mating, as mating in these strains and experiments is actually between clones. Inbreeding in this context is confusing to people who are not familiar with facultatively sexual species.

      We have provided additional guidance and explanation to avoid confusion with our use of terms.

      The effect of inbreeding on driver alleles has been studied theoretically before, showing qualitatively similar results (e.g. see Durand et al 1997 PMID:9093861; Martinossi-Allibert et al. 2021 PMID:33764512, Ament-Velásquez).

      Reference to driver systems in other fungal species (Neurospora and Podospora) that are highly selfing is completely missing (Svedberg et al. 2018, 2021; Vogan et al 2019, 2020; Martinossi-Allibert et al. 2021)

      Thank you for pointing out these omissions, we have added citations.

      There seems to be quite some variation between the different replicate experiments (Fig 1E vs Fig 2-S3 for example).

      We agree, but were satisfied that the data support our claims.

      Line 76: This paragraph is a bit misleading and internally contradicting. The data from Farlow et al. does not take into consideration the recent hybridisation of diverged populations as shown in Tusso et al. 2018 and thus overestimates the time between outcrossing events. The estimate that 20-60 outcrossing events (underestimate due to homogenization and potential meitotic drive) occurred in the last 500 years suggests a higher number than 1 per 800,000. Citing this number is obsolete.

      We removed the mention of the out of date Farlow reference.

      line 730: The inbreeding coefficient in Sun et al. 2017 (probability of IBD which is between 0 and 1) is different from the one used in Hartl & Clark 2007 between -1 and 1.

      Thank you, we have corrected this.

      The speculations on the 4913bp insertion and its effect on mating type switching is not substantiated. Variation around the mating type is rampant (see for example Beach 1986 and Nieuwenhuis et al. 2018) and the authors even show that is likely is not the case that this element affects switching in FY29033. The insertion is an interesting observation, but just that.

      We decided to keep this in the paper because if we were interested in pursuing a potential cause of changed mating phenotypes, we would likely start with testing the transposon, even though the phenotype of FY29033 argues against the hypothesis. Genetic context frequently affects phenotypes and Sk and FY29033 are different strains. Although we do not plan on following this up, we wanted to present the ideas to others who may be interested in pursuing these phenotypes further.

    1. Author Response:

      Reviewer #1:

      This manuscript applies extensive simulations with Markov state modelling to describe the activation of a pentameric ligand-gated ion channel (pLGIC). The authors have generated libraries of microsecond trajectories to sample the interconversion of channel functional states. They have described different Markov states of the pH-gated GLIC channel, including conformations that resemble open and closed functional forms, as well as possible intermediates and a "pre-desensitised" state. They have illustrated channel modulation by capturing shifts in the free energies of gating with pH, and a shift in the distribution of states due to a mutation that affects a hydrophobic gate within the narrow transmembrane pore. The authors suggest a role for asymmetry in GLIC gating that may explain experimentally observed structural diversity of the closed state and suggests entropically driven channel closure. Overall, the sampling of channel dynamics is significant and the description of state interconversions sheds some light on pLGIC mechanisms.

      Appreciated, thank you!

      The manuscript could include better descriptions of the simulation methods, accessible to both experts and nonexperts, avoiding jargon and better spelling out the motivations for choices made. Clearer relation to past simulation studies is needed to avoid any misapprehensions.

      Fair point. We have rewritten the methods section, particularly the MSM construction section, to improve clarity and better motivate choices made. To explain the motivation behind hyper-parameter selection we also added Figure 2-figure supplement 3. We have clarified the meaning of terms, such as tICA or eBDIMS, when they are first introduced on page 2, at the beginning of the results section. To address the second point we have added references 10, 11, and 15, and extended the description of them on pages 1-2 in the introduction:

      “[...] several studies have been conducted on GLIC to study short-timescale motions; such as simulations of the transmembrane domain only [1-3], studies of the ion permeation pathway through potential-of-mean- force calculations [4,5], and steady-state simulations reaching 100 ns to 1 μs timescales [6-8], some also with additional ligands or modulations [9-15].”

      The manuscript should include analysis to show that the MSM approach has converged and has yielded sampling independent of the starting elastic network/Brownian dynamics model. It is important that proof of equilibrium sampling is obtained in the subsequent free MD library: that it is not sampling just within the vicinity of the initial gating model path. How far afield from the initial ENM/BD path and how converged is the MSM solution?

      The point regarding sampling and convergence is indeed important! It is also an area where we want to be careful about claims since no method can truly converge or exhaustively sample systems of this size to guarantee independence e.g. of starting paths. We previously presented plots of the implied timescales displaying convergence of the slowest timescale, which is a common way to validate the eigenvalues of the MSMs. To this figure, Figure 2-figure supplement 2, we have added Chapman-Kolmogorov tests to further assess the eigenvectors, resulting in good agreement between our propagated models at kτ and independently estimated models at the same time points. We also added plots with a measure of the symmetry of the transition probability matrix, indicating the level of reversible sampling achieved and enable the reader to see how far simulations are from full equilibrium sampling. Although we are obviously not able to achieve full reversible sampling, we find that the gating transition of interest is generally well-sampled. To further show that our simulations were not severely restricted to the vicinity of the eBDIMS seeds, we added Figure 2-figure supplement 1 showing sampling along the two main principal components extracted from known X-ray structures. These plots display overall broad sampling of the space around the transition pathway, indicating orthogonal sampling of up to 4.9 Å away from the interpolation path, more than the difference between open and closed X-ray structures at 2.7 Å. This too is of course still limited compared to e.g. unfolding of a domain, but it should help the reader assess the magnitude of structural changes that can and cannot be captured.

      The early results (around figure 2) could include better visualisation and description of the coordinates used for Markov state modelling. tICA1 is presumed to represent the slowest transition, and it appears to capture channel closure. But many readers may wonder what the tICA1/2 vectors represents physically. Perhaps some vector mapping onto the structure can illustrate protein movements for each vector, with relevant discussion.

      The point about the meaning of the tICA coordinates is well taken. In fact, clearly understanding these motions is something we have wrestled with ourselves. Following the advice from the reviewer, we have added Figure 2-figure supplement 4 showing the 20 largest eigenvector components projected onto the protein structure as arrows. Unlike e.g. PCA, it can be difficult to interpret exactly which motions the tICA eigenvectors represent, but we can conclude that both tIC1 and tIC2 represent complex motions involving particularly the transmembrane helices - with the tIC 1 eigenvectors slightly more focused on the M2 helices and tIC 2 more interspersed between multiple transmembrane helices. This is also mentioned in text on page 4.

      Moreover, the likely pathway through the Markov states between closed and open states could be better discussed.

      Sorry this was not more clear in our initial manuscript. We have now clarified in the results section that the most likely transition pathway between open and closed states will follow the path of lowest free energy, generally through State III (page 7).

      The claims have been justified, but the importance of the findings could be better relayed. This includes newly identified states, where the roles of the intermediately closed forms could be better explained, and the role of any locally-closed form in the gating transition could be described. Note that in Fig2 both closed and LC are projected onto the state 1 cluster with narrow pore and wide ECD. Why was LC not one with compact ECD (by definition), or is this because ECD spreading vanished from the gating mechanism within this MSM?

      We agree that the features of intermediate conformations could be more extensively described. First, we have described more features of state III in text, particularly on pages 6-9 and 12.

      Regarding the locally closed state, we have extended the results and discussion with new simulations of the H235Q mutation (see Response 2.2) to better address the questions of ECD spread in relation to the locally closed state. Our conclusion from the discussion now reads:

      “Surprisingly, our MSMs of protonated H235Q resulted in only a modest deepening of the free energy minimum around the projected locally closed structures, but also in a heightened free energy barrier between open and closed states, potentially facilitating a single state to be captured in experiments. This, in combination with our other observation that ECD compaction seems to be pH- rather than state-dependent, means that the most probable conformation for the H235Q variant at low pH has a closed-like TMD and more open-like ECD, similarly to the locally closed state.”

      Additionally, we regret that there was a sentence in our previous manuscript describing state 1 with narrow pore and expanded ECD. This statement came from visual inspection of a few conformations but is not supported as a general feature by the probability distributions in Figure 5. We have corrected this mistake. Regarding whether ECD spreading vanished from the MSMs we refer to the longer answer to comment 1.14.

      Moreover, I do not see dots for LC near the state I-II border, as the text suggests on page 8.

      This might not have been clear in our initial figures. We have modified the colors in Figure 2 so that the projected locally closed structures are more easily distinguishable from the closed structures. We have also added labels marking closed, locally closed, and open clusters more clearly to Figure 3 and Figure 4.

      The outcome of a predominantly closed channel irrespective of pH could be better related to experiments, including electrophysiology and recent cryo-EM in Ref.33. In the discussion section the authors write that the minority of channels being open is consistent with electrophysiology, apparently in contrast to what is written in the beginning of the results section. The authors previously wrote that Po is not established by electrophysiology but that cryoEM (Ref.33) may suggest it is more closed than open, regardless of pH. How do the solved "open" states compare to the proposed closed low pH state reported in that preprint (ref.33) and how do the propensities (if any) relate?

      The reviewer raises an interesting point regarding how the pH 3 cryo-EM structure from Ref. 39 (previously 33) relates to the closed channels at low pH. First, we wish to point out that there is actually no conflict between the two statements mentioned by the reviewer since maximal conduction in electrophysiology does not necessarily require 100% of the channels to be open. We have clarified this in the results section (page 3).

      To compare the low-pH structure from Ref. 39 with our open conformations, we first added those structures (along with the other two from Ref. 39) to the set of experimental structures projected onto the tICA landscapes in Figure 2, Figure 3, and Figure 4. Additionally, we added the low-pH structures to plots in Figure 5 and Figure 5-figure supplement 1 to enable better comparison to the different macrostates. We also added panel F to Figure 5, which shows local backbone rearrangement around E35 - thought to be the main proton sensor in GLIC and whose side-chain rearrangements were identified as the main difference between the structures in ref.39. Observations and discussion were added to pages 9 and 12.

      Finally, the relationship of ECD asymmetry to published crystal structures, and the importance of this asymmetry to the functions of pLGICs could be better explained.

      We have extended the discussion of asymmetry on page 13 to include two additional references (ref. 56 and 57) describing published structures that display asymmetric features in the ECD.

      Reviewer #2:

      The authors are trying to explain fundamental and functional aspects of ligand-gated ion channels using extensive molecular dynamics simulations. In particular they examine the effect of pH on GLIC, a pH-gated ion channel, and also the effect of (one) mutation. They successfully account for energy barriers levels as well as free energy levels in GLIC wild-type open and closed states as well as in one gain-of-function mutant, mutated in the one of the pore-lining residues. They also uncover a protonation-dependent symmetrisation in the subunits, which had seen by crystallography but not clearly demonstrated by other techniques before. The approach, based on clustering and Markov-state-models allows to find the transition rates between the different substates and could be used for other ion channels as well.

      The study is overall well conducted and convincing. However, it suffers from the very limited scope of the mutations examined. Indeed, only one mutant is analysed, whereas dozens of mutants of GLIC have been characterised both functionally and structurally, especially some that fall in the so-called "locally-closed" (LC) state. One thus wonders how the existence of mutants that are known to adopt an intermediate conformation (LC state) fits into the scheme of this study.

      Thank you! As we wrote at the start of our response, we are indeed happy that we took the time to add a second mutant (despite initially worrying that it would mostly be related to ECD motions instead).

      The impact of this study would be undoubtedly strengthened if at least one more mutant was examined in details, namely one that is blocked in the LC state.

      We have now run an additional 120 μs of simulations and constructed two additional MSMs of the H235Q mutant, known to crystallize in a LC state at low pH. We have done similar analysis as in our previous submission and appended the results to all figures and extended results and discussion sections accordingly.

      Also, it is not entirely clear how much the results are sensitive (or not) to the protonation protocol.

      This is indeed worthwhile to cover better. We have added a paragraph comparing our protonation protocol to two experimental studies and six simulation studies on pages 13-14. Admittedly, fully resolving the question will require studies using different protonation states, or better constant-pH simulation methods, which we are working on.

      Reviewer #3:

      The gating mechanism of ligand-gated ion channels offers a challenge to both the experimentalists and the modellers; existing experimental methods lack the ability to access detailed information about conformational changes during the transient events that correspond to the opening of the channel that lets ion flows, while simulations are able to access these levels of details but do not give access to the relevant timescales of the process. At a fundamental level, this makes cross-validating the two approaches a difficult task.

      In this work, the authors tackle the second challenge by sampling the gating transition over a cumulated simulation time that exceeds 100 microseconds - thus generating very large datasets. While the analysis of these large datasets used to require a significant amount of supervised clustering (e.g. involving manual feature definitions), the authors have decided to apply the protocol of Markov State Model (MSM) construction which has matured into a semi-unsupervised approach. Indeed, it was shown that these kinetic models could be variationally optimized.

      Major strengths:

      The authors have shown a great technical expertise in showing that such simulations could be generated and analyzed, yielding results that are overall consistent with a lot of previous results, both experimental and computational. An interesting and original observation regarding the role of pH on compaction rather than gating directly is mentioned.

      Major weaknesses:

      While the intention of constructing a Markov State Model is very interesting, it does not seem to have been fully executed, by lack of convergence despite a rather large computational effort. The ability to produce an (variationally) optimized kinetic model would have been a much stronger result.

      More precisely, the authors built an MSM and optimized it using the VAMP method, but were not satisfied with the result because the kinetic model obtained emphasized "exploratory behavior" rather than "convergence of a few [slowest] interesting processes". The most likely reason for this, as pointed out by the authors, is lack of convergence: their simulations might have started to explore processes that are even slower than the ones they are interested in (desensitization? artifact? something else?) but not to convergence. To test this, maybe they should try the deflation method proposed by Husic & Noe (https://doi.org/10.1063/1.5099194) and use it to show that they did sample well the processes that they intended to sample well (gating, not desensitization)?

      A demonstration of convergence (or lack thereof) and sampling would help clarify how the VAMP approach did not work, beyond the blanket statement that optimizing MSMs are "a feasible approach for peptide- sized systems, [but the authors] find it practically unfeasible for large-scale motions in ion channels".

      We thank the reviewer for the suggestion to try the deflation method proposed by Husic et al. This is indeed something we tried but turned out to be challenging for our system. In the paper from Husic et al. the method was demonstrated on smaller peptide-sized systems, and scaling up, especially using distance features, makes it more difficult to deflate processes since components to be deflated may appear in many parts of the system (i.e. the basis is not so sparse). After correspondence with Husic, we were informed that deflation becomes difficult when the basis is not sparse. The point about convergence and sampling is of course an important one where we have now added more data & analysis - see our response to comment 1.2.

      Also, since they were not satisfied with the variationally optimized MSM, the authors decided to work on an un-optimized one and cluster it to extract states and transitions, in a way that appears to be more supervised than unsupervised. Here too, additional details on the methods and the motivation behind the choices made for clustering would help. Since insights are drawn from these analysis, it would seem important to give a sense of how robust the conclusions would be to slightly different choices in the clustering decisions, for example.

      The point about better motivating methodological choices is well taken and we have extended the methods section to make motivations clearer. Regarding the robustness of our results to different hyperparameter combinations, our previous submission included two figures in the SI showing how the estimation of open probabilities vary for many different combinations of hyperparameters (tICA lag time, commute or kinetic mappings, and the number of microstate clusters). We have now combined these two figures into Figure 2- figure supplement 3 and added a new panel C, which shows the slowest timescales for different hyperparameter combinations. We have selected parameters variationally optimal (in terms of the second largest eigenvalue) for the deprotonated conditions and used the same parameters for all models for consistency reasons. However, we note that for the protonated conditions timescales are almost within the error margin of the optimal model. In Figure 2-figure supplement 3A-B, we already showed how the open probabilities depend on different combinations of hyperparameters. We can conclude the results are robust for hyperparameters within the ranges identified in the methods section.

      Overall, the authors have shown a method that has potential in achieving their aims, and that will yield better results as more computational effort will become possible - which realistically is a lot to ask for. Given the resources available, the results obtained support the conclusions drawn.

      Unfortunately, limitations in this respect also limits the impact on our understanding of how these molecules work. Yet, the data generated, if made available, could potentially be used beyond the aims of this paper and be made useful for drug discovery, drug design, etc.

      We strongly agree with the reviewer on the importance of making more of our data open-access. In addition to the previously added sampled states from all 5-state models and simulation parameters, we have now uploaded all MSM models and trajectories to Zenodo (doi:10.5281/zenodo.5500174).

    1. Author Response:

      Reviewer #1:

      In this study, the authors use CyTOF-based analysis to characterise spike-specific T cell responses following mRNA vaccination. They seek to understand both the breadth of responses to 'wildtype'-like and variant spikes, as well as the differences between T cell responses from convalescent and previously uninfected subjects. Consistent with other studies, they find that spike-specific T cell responses are similar across different variants, both in frequency and phenotype. In contrast, however, they identify several phenotypic differences in the T cell response elicited by infection, vaccination, or vaccination following infection.

      Despite a somewhat limited sample size, they clearly identify changes in memory phenotype and chemokine receptor expression that may affect T cell trafficking to mucosal tissues across infection and vaccination. While inclusion of additional chemokine receptors (such as CXCR3) in the CyTOF panel would have aided in characterising these cells, this data highlights how infection and vaccination may elicit distinct T cell responses.

      In fact CXCR3 and CCR4 were chemokine receptors that were considered for the panel, but could not be included as antibodies against these antigens do not stain properly on cells fixed with paraformaldehyde (PFA), and for logistical and biosafety reasons the specimens analyzed in this study had to be PFA-fixed before CyTOF staining. Although we have previously analyzed expression of CXCR3 and CCR4 on T cells by CyTOF (Cavrois et al, Cell Reports 2017 20(4):984 PMID: 28746881; Xie et al, Cell Reports 2021 35(4):109038 PMID: 33910003), those studies were exclusively performed on viable cells, and not on COVID-19 patient specimens. All our prior CyTOF phenotyping studies using COVID-19 patient specimens (Neidleman et al, Cell Reports Medicine 2020 1(6):100081 PMID: 32839763; Neidleman et al, Cell Reports 2021 36(3):109414 PMID: 34260965; Ma et al, J Immunol 207(5):1344, PMID 34389625), as well as some of our non-COVID-19 studies (Ma et al, Elife 9:e55487 PMID: 32452381; Neidleman et al, Elife 2020 9:e60933 PMID: 32990219), were performed on fixed cells, where CXCR3 and CCR4 unfortunately could not be included as parameters analyzed.

      Future studies will be required to better assess the functional impacts of these phenotypic differences on T cell recall and contribution to protective immunity.

      We absolutely agree that future studies should be pursued to better assess the functional impacts of the phenotypic differences on T cell recall, and on contribution to protective immunity. Such studies will most certainly require use of animal models, and in fact are studies that we have just begun (mouse model) or will soon begin (non-human primate model). To fully acknowledge the need for such functional studies, we have now added to multiple sections of the Discussion the need for future studies to incorporate animal models (Line 472 and Lines 488-491), including the statement “Such follow-up studies should also examine the functional outcomes of the discoveries made here (e.g., effect of chemokine receptor expression on homing of infection- and vaccine-elicited SARS-CoV-2-specific T cells), including in animal models of SARS-CoV-2 infection.”

      Reviewer #2:

      The authors address an important question, whether it people who have had Covid19 and are then vaccinated with one mRNA Spike vaccines made better immune responses than those who had not previously been infected and have two shots of the vaccine. They also compare responses to different virus variants and find extensive cross reactions and no differences between the groups - an important result.Their main finding is a difference in the quality of the CD4+ T cells in the 'Covid-vaccinees' compared to the 'naive double vaccines'. They suggest that T cells in the former may home better to the respiratory tract and persist longer.

      The major strengths are:

      • The methodology used, based on Cytof multiparameter analysis of antigen responding CD4 and CD8 T cells.

      • Demonstration that the second vaccine dose in the naive group 'improves' the T cell response.

      • Demonstration that a second vaccination in the Covid19 group does not improve the T cells.

      We thank the Reviewer for the nice summary and for the positive comments.

      Weaknesses:

      Fully (and commendably) acknowledged in the manuscript:

      • The study groups are small

      • The antigen specific T cells are stimulated in vitro so may be distorted, nevertheless there were still differences

      We agree with the Reviewer about the listed weaknesses of the study. We note that we had in our original manuscript acknowledged all these weaknesses within our “Limitations” section, including the fact that we had to stimulate our samples to identify and characterize the SARS- CoV-2-specific T cells. We have now expanded the part about our having stimulated the samples, by proposing that future studies should take advantage of tetramer technology to characterize cells in their baseline (non-stimulated) states, whilst acknowledging that such studies would for the most part be limited to CD8+ T cell responses as tetramer reagents for CD4+ T cells are less robust (Lines 500-506).

      Not acknowledged but possibly outside the scope of this study:

      • The reader will wonder how this affects the antibody response which ultimately is the main protector from reinfection and also how the T cell responses might impact on disease severity after post vaccination (re)-inrfection

      Serological assays were not performed in this study; however we fully agree with the importance of associating the in-depth phenotypes of vaccine-elicited SARS-CoV-2-specific T cells with the antibody response. In fact, just as we went very “deep” into the phenotypes of SARS-CoV-2-specific T cells in this study, we are at the moment optimizing techniques to, in an analogous fashion, deeply characterize the serological response to vaccination. This entails optimizing a flow cytometry-based approach we recently introduced and implemented on a small number of specimens (Ma et al, J Immunol 207(5):1344, PMID 34389625), to be able to simultaneously assess the levels of IgA1, IgA2, IgE, IgG1, IgG2, IgG3, IgG4, and IgM against the S1, S2, and RBD domains of the SARS-CoV-2 spike protein in a large number of patient specimens. Once we’ve optimized the assay and applied it on the vaccine specimens, we plan to associate the resulting 24-parameter serological datasets (8 isotypes of antibodies each against 3 antigens = 24 parameters total) with the high-dimensional SARS-CoV-2-specific T cell datasets from this study, but that will be its own separate (and large) study and beyond the scope of this current one. As generating such serological data will take at least 3-6 months to complete, and the focus of this study is on SARS-CoV-2-specific T cells (and all conclusions we drew were based only on the T cell data), we think it appropriate that we limit this study to deep-phenotyping of the T cells. We have now brought up in the last part of our “Limitations” section the lack of serological analysis in this current study as a limitation, and how follow-up studies should associate serological responses with the T cell responses characterized here. (Lines 506-511: “A final limitation is that serological analyses were not performed in this study. As coordination between the humoral and cellular arms of immunity are likely key to effectively controlling viral replication, future studies should assess to what extent the breadth, isotypes, and functional features of spike-specific antibodies elicited by vaccination associate with the herein described phenotypic features of vaccine-elicited SARS-CoV-2-specific T cells.”)

      With regards to how T cell responses might impact disease severity and breakthrough infections, this is an aspect we are very interested in investigating, as detailed in our final response further below.

  3. Sep 2021
    1. Author Response:

      Reviewer #2:

      The study by Butner et al. leverages a previously derived mathematical model (Butner et al. Sci Adv 2020) to predict immunotherapy response using published clinical data from immunotherapy-treated cancer cohorts. The model was fitted to a calibration cohort (meta-analysis, n=189) and then applied to a smaller validation cohort (n=62, Welsh et al., JITC, 2020). The estimated model parameters were tested for their ability to classify responders and non-responders. Using the tumour volume estimated from CT scans as input for the model, the immunotherapy response was predicted with an accuracy of 81.4% (n=62) within 2 months from treatment onset.

      Modelling of the anti-tumour response under immunotherapy is a relevant approach to understand the dynamics behind this process. The results from this study suggest that the model parameters for the tumour-cell killing rate and the ratio of cancer cells to cytotoxic cells are different on average between patients with objective responses and stable/progressive disease. The main advantage of this approach is that the estimations are derived solely using the CT scans to infer tumour volume. However, given that therapy response is characterized by a large tumour and immune heterogeneity, clonal selection over time, and importantly immune escape mechanisms, which were not considered in the model, a larger validation cohort is needed to confirm that the estimated parameters are robust predictors. Their predictive value also needs to be compared to current biomarkers of response.

      The point is well-taken. We agree that tumor response under immunotherapy is affected by a large range of heterogeneous factors, which to our knowledge to date have not all been included into a (any) single response rubric. This is probably because, from a practical standpoint, characterizing the full heterogeneity of all cancer within a patient is likely not as yet possible (e.g., full characterization of clonal heterogeneity would require removal and analysis of all tumor cells or nodules, probably through highly invasive measures). Different from those modeling efforts attempting to include as many possible factors as possible (in a way that we do not believe could ever been fully informed in a real-world clinic), we have attempted to only focus on the key mechanisms (after extensive literature research and also extensive scientific discussion with our experimental and clinical collaborators) that we believe are essential to understanding the overarching response, and to refine the model into a form that could actually be used in clinical settings.

      The focus of this work is not to validate the predictive ability of the model, which has already been well-established (PMIDs 32426472, 33398132), but rather to demonstrate an entirely new measure that can be used to inform the model: i.e., that IHC measurements are associated with model parameters and may be used to quantify them, instead of imaging-based assessment as done previously. This goal, combined with an already-established predictive model, is why we have shown comparisons of the relationship between model parameters/IHC informed from both methods in here, and focused on the tumor volume, which was the only “marker” available for all patients evaluated in this study.

      Interestingly, PDL1 positive cell percentage and CD8 T cell count were estimated based on the model parameters and compared between the different response groups. The average levels of estimated and observed T cell counts and %PLD1+ cells were comparable between the groups. However, to demonstrate correlation, the estimated and observed values need to be compared on patient level. This has the potential to be the focus and significance of the study, as it could be relevant in the absence of biopsy data.

      We appreciate the astute insight. We agree with the Referee that this would represent a key step moving forward. Unfortunately, we have been unable to obtain any per-patient data that contains both patient response to ICI therapy and also IHC measures of PD-L1 expression or CD8+ tumor-infiltrating lymphocyte counts from the same patients that would enable this direct comparison. The results we have shown are the result of careful consideration as to how we could best address this problem in the absence of such an ideal data set in the real world. That said, we are now working towards in-house collection of these data (we have added discussion of this ongoing work on lines 548-554 of the revised text), and we believe that the present study represents an advance towards this goal.

      Reviewer #3:

      This work proposed a non-linear mathematical model with a particular ordinary differential equation to capture the dynamics of tumor size over time in response to the immune microenvironment with treatment of checkpoint inhibitors. The parameters in the model are initially trained by a time-course dataset from six clinical trials consisting size changes over time of six types of tumors from 189 patients in response to the treatment of PD1 or PDL1, which were validated by an additional dataset from a study of 64 patients with non-small cell lung cancer. The authors further investigated the biological relevance of each parameter and found two of them μ and Λ were capable in classification of patients who are response or not to the treatment. However, the training procedure as well as the validation/testing of the model is not carefully evaluated, which could result in overfitting of the parameters to some datasets.

      The model fitting and validation were done as we did previously (PMID: 32426472). To help clarify, we have now added a reference to this study in the revised manuscript lines 222-224: “A unique fit (and thus set of parameter values) was obtained for the data from each individual patient, and more details on this procedure may also be found in (45).”

      Strengths:

      Instead of training a classification model to directly fitting tumor characters to the drug treatment in bulk level, this study built a non-linear mechanistic model to capture the dynamics of tumor size over time in response to tumor microenvironment and indirectly using its key parameters to classify the drug effects. This approach, integrated more intrinsic information at cell-cell interaction level, is potentially allowing build a more reliable predictive model across different cancers and treatments.

      We thank the Referee for the encouraging comments and also for the detailed assessment of our work.

      Weaknesses:

      1) The prediction power of model is high depended on the robustness of performance in different tumors at different stage under different treatment, however, this study did not provide data on the effects of tumor heterogeneity.

      This is a fair criticism.

      We emphasize that our intent in this work is not to prove the model works across all possible clinical scenarios, but instead is to demonstrate how model parameters correlate with, and thus may be informed by, histological measures. This work represents a key step towards this ultimate goal. By demonstrating the model works across the 6 different tumor types included in the calibration cohort (Table 1), as well as methods to inform the model from IHC, we believe this work represents an important step in this direction. Having said that, we have now added additional discussion to highlight this limitation, and to more accurately represent what or and may not be concluded from our results, lines 454-456, 528-530, and 548-554. We thank the Referee for this constructive comment.

      2) The parameter of proliferation constants (α) defined in the study is coupled and vary with dataset structure of each clinical trial, which should be evaluated independently by patients without the treatments or controlled data from in vitro experiments.

      We understand this concern, as our dataset collected in-house for the validation cohorts confirms that there is variation in tumor growth rates among patients prior to treatment. Unfortunately, per-patient tumor measurement data prior to treatment start was not available in the calibration cohort, so we were forced to make this approximation. Even if these data could be collected from an independent cohort, it would merely represent another, different average value, as there would be no reason to assign relation between individual patients between these independent sets. This different average value could, in principle, shift the absolute value of parameters shown, e.g., in Figure 3, but the absolute value of the separation between PR/CR and stable/progressive values would remain the same. Thus, this would not change the observed trends and conclusions we have drawn. Moreover, we believe that data obtained from in vitro experiments would be vulnerable to the same criticism: that it does not accurately represent per-patient growth rates, as again there would be no reason to assume relation between experiments and individual patients. This would instead just provide a third, different estimate of pre-treatment average growth rate, and the outcome would be the same as described previously.

      3) It is unknown how the parameters of the model were trained or validated in batches and whether parameters were overfitting to the datasets.

      We understand how this could be ambiguous as described, thank you for bringing this to our attention. We have added the necessary details in lines 222-224: “A unique fit (and thus set of parameter values) was obtained for the data from each individual patient, and more details on this procedure may also be found in (45).”

      The model parameters were not trained or validated in batches. Usually, this sort of analysis (e.g., k-fold) is used in the case of small datasets or when independent validation sets are unavailable, and is only valid when the training and validation sets are from the same population. Instead of taking this computational approach, we chose to work with our clinical collaborators to obtain additional, independent validation data for more robust validation. We agree with the assessment in Essential Revisions comment #2 that this is a “large-scale validation cohort”, and offers superior validation to cross-validation. We further note that, in the revised manuscript, we have provided additional analysis to better understand how model parameters may vary between cancer types; please see our reply to Reviewer #2, comment 8 for more details. This new analysis is discussed in the main text, lines 335-341 and shown in the new figure S3 of the revised manuscript.

      Regarding the Referee concern for “overfitting” possibility, it may be observed from Equation 1 that, mathematically, the overall shape of the time-dependent model curve may take one of three shapes, these are: (i) decreasing exponential, asymptoting to a value 1>x>0; (ii) increasing exponential, asymptoting to a value ρ∞>x>1; or (iii) exponential increasing uncontrollably towards (∞,∞). Please note that, although this third case is mathematically possible, it was not observed in our analysis; however, it is included here for completeness. Because of the inability of the functional shape to include many inflection points, in combination with the dimensionally reduced form of the model in equation 1, we are confident that the model is not overfit to each per-patient dataset.

    1. Author Response:

      Reviewer #1:

      This study examines the use of terahertz wave modulation (THM), a technique for transmitting terahertz wave electromagnetic energy to the cochlea with the aim of improving the sensitivity of the cochlear outer hair cells. ABR obtained with and without THM suggests that sensitivity thresholds were improved by 10 dB when using THM. Whole-call patch clam recordings from outer hair cells suggest that THM significantly increases both K+ and MET currents of the cochlear outer hair cells. These results are convincing and potentially important for understanding normal cochlear physiology.

      On the other hand, the numerous claims about translational applicability of this work seem overstated.

      61-65 This is incorrect. For example, optogenetics or stem cell use are not currently seen as "treatment for hearing impairment" and, in fact, the manuscript says as much later in the paragraph. Also, pharmacological treatment is rarely effective, and only in limited circumstances.

      Many thanks to reviewers for pointing out this mistake, We have replaced the discussion by:

      “At present, treatment for hearing impairment is primarily administered through pharmacological treatment, hearing aid equipment, and electronic cochlear implantation (Wilson et al., 1991; Kipping et al., 2020; Gang et al., 2008). Optogenetics (Huet et al., 2021), stem cell differentiation and transplantation (Oshima et al., 2010; Li et al., 2003; Chen et al., 2012) are also being explored to treat hearing loss. However, pharmacological treatment is rarely effective, and only in limited circumstances.”

      283-294 The discussion of near-infrared vs THM is misguided. Near-infrared has been proposed as a possible alternative technology to stimulate spiral ganglion neurons, thus replacing cochlear implants. This is plausible, even though feasibility has not yet been demonstrated. In contrast, THM does not seem like a plausible alternative to cochlear implants. Patients who are candidates for cochlear implantation may not have enough (or any) outer hair cells, which are the target for THM.

      Thank the reviewer for pointing out the difference in principle between Near-infrared auditory stimulation and THM. We have now modified the main text and compared the differences and similarities between THM and NIRS. Please see the revised Discussion.

      295-299 "In comparison with wearing hearing aids, stem cell differentiation and transplantation (Oshima et al., 2010; Li et al., 2003; Chen et al., 2012), optogenetics (Huet et al., 2021) and electronic cochlear implantation (Wilson et al., 1991; Kipping et al., 2020; Gang et al., 2008), THM requires no traumatic surgery, cumbersome equipment, or genetic manipulation, and is thus more suitable for use in human subjects." In the described experiment, optic fibers had to be placed close to outer hair cells. That seems to require "cumbersome equipment" and obviously would require surgery for use in humans.

      Many thanks to the reviewer for pointing out these inappropriate statement. We completely agree. We have now revised this statement in the revised manuscript.

      The data show that sensitivity was improved by 8.75 dB. In practical terms this is a very small change. Sensitivity improvement of 10 dB (and much more than that) can be obtained non invasively and on a frequency dependent basis using traditional amplification.

      Any neural stimulation technology would require not only spatial selectivity but also temporal responsiveness. It seems that THM could meet the former criteria but the latter is unknown. In other words, for any practical application it would be necessary to show that modulation of a THM signal can be perceived by listeners. However, this criticism is moot if the claims about clinical applicability of THM are removed.

      We thank for the reviewer’s constructive comments. We completely agree with these comments and the claims about clinical applicability of THM are removed.

      Reviewer #2:

      This manuscript uses mid-infrared light to enhance the currents from natural stimuli (mechanical and voltage) of hair cells. The authors show increased voltage-gated K+ current and MET currents while being illuminated with mid-infrared light. Based on molecular dynamics simulations, the authors hypothesize that the augmented voltage-gated K+ currents are due to stimulation of C=O groups in the selectivity filter which allows K+ ions to pass through the pore more quickly to increase conductance; there was no hypothesis as to why MET currents were augmented. The authors also demonstrate improved ABR thresholds when the cochlea was illuminated with the mid-infrared light, demonstrating a potential therapeutic application. The enthusiasm for the novelty of this work is reduced because other work has shown that neurons can be excited by near-infrared (~2 microns) wavelength due to thermal stimulation and changes in cell capacitance, so this work mainly differs in their proposed mechanism and the longer wavelength of light (8.6 microns). Additionally, the Hudspeth group (Azimzadeh et al, 2018, PMC5805653) has shown thermal gating of MET channels using ultraviolet light and infrared light (1.47 microns). If the THM mechanism is indeed different from thermal stimulation, this would be a novel therapeutic mode, however, the data are not yet convincing that thermal stimulation is not the mechanism of action.

      We thank the reviewer’s suggestions that are essential for improving our manuscript, in particular to pointing out the important literature about thermal gating of MET channels. We have now cited and discussed this review paper and other related papers.

      Since the structure of the MET channels have not been resolved, we cannot study the mechanism at the atomic or chemical bond level by molecular dynamics.

      Infrared stimulation is emerging as an area of interest for neuromodulation and potential clinical application.While most studies on infrared stimulation have been conducted at near infrared wavelengths, whether mid-infrared wavelengths can impact neuronal function is unknown. A large number of studies have shown that the threshold of action potential generated by INS stimulation is correlated with the solution absorption coefficient to wavelength, that is, the higher the solution absorption coefficient is, the lower the threshold is. Therefore, the mechanism of action potential induced by INS is generally believed to be the rapid rise of solution temperature caused by INS, namely “ Photothermal effect ”[1]. However, as figure R1 shown, the absorption of water to the wavelength 8.6 μm we use is very weak.

      How does near-infrared light affect the excitability of cells or nerves through “ photothermal effect ”, so as to promote the generation or propagation of action potential in neurons or inhibit the generation or propagation of action potential? In other words, what is the target of “ photothermal effect ” ? Currently, there are few studies on the mechanisms, and the possible biophysical mechanisms include the following three:

      (1) After INS is absorbed by solution , the solution temperature increases rapidly, the membrane capacitance changes and the inward current is induced, which leads to the depolarization of membrane potential and the generation of action potential[2]; (2) INS activates temperature-sensitive TRP ion channels, which causes an action potential[3]; (3) INS enhanced inhibitory postsynaptic by acting on GABA receptor, thus producing inhibitory effect[4].

      At present, the wavelength of INS is mainly near infrared light (1-3 microns), the parameters used are not consistant, and there are many factors affecting the excitation or inhibition of INS (such as the diameter of the fiber, the energy of infraredlight, pulse width, repetition frequency). On the one hand, photothermal effect is difficult to control, and some studies have found that overheating photothermal effect will block the generation and propagation of action potential, and even cause irreversible effects of INS on inhibition of action potential and tissue damage [5]. On the other hand, it is difficult to determine the target of photothermal action, which hinders the safe and effective promotion of INS as a neuroregulatory tool to the clinical or research field. Therefore, new regulatory strategies with more explicit mechanisms are needed in the field of photoneural regulation.

      References:

      1. Wells, J., Kao, C., Konrad, P., Milner, T., Kim, J., Mahadevan-Jansen, A., Jansen, E.D.: Biophysical mechanisms of transient optical stimulation of peripheral nerve. Biophysical Journal. 93, 2567-2580 (2007).

      2. Shapiro, M.G., Homma, K., Villarreal, S., Richter, C.P., Bezanilla, F.: Infrared light excites cells by changing their electrical capacitance. Nature Communications. 3, (2012).

      3. Albert, E.S., Bec, J.M., Desmadryl, G., Chekroud, K., Travo, C., Gaboyard, S., Bardin, F., Marc, I., Dumas, M., Lenaers, G., Hamel, C., Muller, A., Chabbert, C.: TRPV4 channels mediate the infrared laser-evoked response in sensory neurons. Journal of Neurophysiology. 107, 3227–3234 (2012).

      4. Feng, H.J., Kao, C., Gallagher, M.J., Jansen, E.D., Mahadevan-Jansen, A., Konrad, P.E., Macdonald, R.L.: Alteration of GABAergic neurotransmission by pulsed infrared laser stimulation. Journal of Neuroscience Methods. 192, 110–114 (2010).

      5. Walsh, A.J., Tolstykh, G.P., Martens, S., Ibey, B.L., Beier, H.T.: Action potential block in neurons by infrared light. Neurophotonics. 3, 040501 (2016).

      The authors hypothesize that the increase in K+ current through voltage gated channels is due to increasing the speed of movement of the K+ ions through the selectivity filter, which they modeled with molecular dynamics simulations. However, the simulations are not validated with experimental manipulations.

      We thank the reviewer for pointing this out. As shown in Figure R1, we overlapped the vibration spectra of modeled channels and the attenuation of infrared light in water.

      Figure. R1. Comparisons of the absorption intensity of water molecular (green curve), Na+ channel (orange curve), and K+ channel (black curve) from our MD simulation, and the values from other molecular dynamics calculations [1] (purple star), respectively.

      As shown in the FIG. R1, the strong absorption of THz wave located at the frequency of 49.86 THz for K+ channel, but it falls in the strong absorption region of water molecules. Otherwise, THz wave modulation (THM) will be interfered with by the thermal effect caused by the large absorption of water molecules.

      For Na+ channels, the strongest absorption peak is located at 48.20 THz, which is consistent with these calculation results reported in the references of <PNAS 118, e2015685118 (2021)>. Nevertheless, it falls in the absorption region of water molecules and can be preferentially large absorbed by water molecules. In theory, the frequency of 39.82 THz can avoid the absorption of water molecules and regulate the carboxyl (-COO-) groups of Na+ channels in a non-thermal way, thus promoting or inhibiting the Na+ current. Unfortunately, these results are difficult to be confirmed by experiment methods due to no strong enough of the intensity of light source corresponding to this frequency, so the laser cannot be effectively coupled to the optical fiber to focus on nerve cells, which affects the current test of ion channel under terahertz stimulations [2]. We believe that the regulation characteristics of terahertz waves with specific frequency on Na+ channels will be further studied when the light source and coupling technology of correlation frequency are well developed in the future.

      References:

      1. Xi Liu†, Zhi Qiao†, Yuming Chai†, Zhi Zhu†, Kaijie Wu, Wenliang Ji, Daguang Li, Yujie Xiao, Junlong Li, Lanqun Mao, Chao Chang, Quan Wen, Bo Song, Yousheng Shu, Non-thermal and reversible control of neuronal signaling and behavior by mid-infrared stimulation. Proc. Natl. Acad. Sci. U. S. A. 118 (10): e2015685118, (2021).

      2. Seddon, Angela B. "Mid-infrared (MIR) photonics: MIR passive and active fiberoptics chemical and biomedical, sensing and imaging." Emerging Imaging and Sensing Technologies. International Society for Optics and Photonics, 9992, 999206, (2016).

      It was unclear to this reviewer whether the temperature effect would be measurable with the technique used. It appears that the temperature measuring system is rather large as compared to the cell, therefore it would likely measure changes in bulk solution temperature and not necessarily a local or micro-scale change in temperature that the cell may be responding too. Additionally, Littlefield and Richter has suggested that temperature changes on the order of 0.1 degrees Celsius are sufficient to evoke action potentials (Littlefield & Richter, 2021, PMC8035937), which is well within the temperature changes observed by the authors. At the longer wavelengths used in this study, the absorption of water is generally even higher as well, suggesting even greater temperature changes with the same power. In vestibular hair cells a 10 deg Celsius increase in temperature led to a 50-60% increase in peak MET current (Songer & Eatock, 2013, PMC3857958).

      We thank the reviewer for pointing out this issue. Indeed, the temperature measuring system is rather large as compared to the cell. we performed the temperature measurement protocal with an ADINSTRUMENT acquisition system (PowerLab 4/35) coupled to a T-type hypodermic thermocouple (MT 29/5, Physitemp),the diameter of the thermocouple is 100 μm. However, our new experiment on measuring tissue temperature in vitro showed that the maximum temperature elevation was less than 4 °C with the 75 mW stimulation, which was much lower than the temperature measured in the reference paper (10°C,Songer & Eatock, 2013, PMC3857958) and another paper (Littlefield & Richter, 2021, PMC8035937) mentioned by this reviewer also proposed in the introduction that light stimulation arouses neural responses due to photons rather than heat.. In addition, when the power is 10 mW, the temperature rise is not more than 1°C. two studies have found light illumination that is commonly used for optogenetics increases the temperature by ~2°C[1-2].This temperature elevation is associated with the inhibition of neuronal spiking in different brain areas and cannot explain the excitation effect observed in our experiment by the THM. We now mentioned this point in the main text. In addition, we also mention in the main text that the wavelength of 8.6 μm falls in the strong absorption region of water.

      References:

      1. Owen, S. F., Liu, M. H. & Kreitzer, A. C. Thermal constraints on in vivo optogenetic manipulations. Nat. Neurosci. 22, 1061–1065 (2019)
      2. Ait Ouares, K., Beurrier, C., Canepari, M., Laverne, G. & Kuczewski, N. Opto nongenetics inhibition of neuronal firing. Eur. J. Neurosci. 49, 6–26 (2019).

      In figure 1, when THM is on, there appears to be an increase in the inward current without any mechanical stimulation. There is no discussion of this, and this could be a baseline effect that is not aimed at simply enhancing existing conductances. The increase in K+ conductance seen in the voltage-gated K channel cannot account for this increased inward current, since K+ conductance is outward. THM itself could also activate a small amount of MET current, maybe via the thermal effect demonstrated by Azimzadeh et al. This increased conductance could also be from the Tmc1 leak conductance that the authors have published on previously.

      We thank the reviewer for pointing out this issue, in particular for suggesting several possible reasons about the increase in the inward current. We have now discussed this effect and cited related papers. In addition, the increase in MET currents caused by THM was far greater than the baseline offset, indicating that THM has a non-thermal effect.

      Line 232-233: With regard to the ABR data, data is not shown about whether an OABR can be elicited. The data show that once the THM is turned on and then a click stimulus is presented, there is no response; however, this experiment does not really test whether the THM can evoke an OABR since many repetitions are required to get the ABR waveform out of the noise. If THM is on and the stimulus is below threshold, then there is unlikely going to be an evoked response since the THM stimulus is not synchronized with the ABR recording. The authors need to show that THM onset stimulation that is synchronized with the ABR recording does not result in an ABR waveform.

      We thank the reviewer for suggesting this very important experiment. Following this suggestion, we test whether the THM onset stimulation that is synchronized with the ABR recording can evoke an OABR. We now present the new data in Figure S5.

    1. Author Response:

      Reviewer #1:

      The eukaryotic mitochondrial acyl carrier protein (mACP) has been shown to have two functions; as the acyl-chain carrier for FASII lipoic acid biosynthesis and as a chaperone for the heterodimeric cysteine desulphurase complex (IsD11-Nfs1) involved in the synthesis of Fe-S complexes. Previous studies have shown that the evolutionarily divergent protist, Plasmodium falciparum, lacks a mitochondrial FASII pathway but retains a putative mitochondrially located ACP. In this study, the P. falciparum mACP is shown to be essential for Fe-S complex formation, the assembly of the mitochondrial respiratory chain Complex III and the viability of red blood cell parasite stages. Using a conditional TetR knock-down system, pull-down experiments and homology modelling the authors demonstrate the mACP binds to the LYR protein Isd11 via a novel interface and stabilizes Nfs1, which in turn is required for expression of the Rieske protein and Complex III function. The conclusions are well supported by the data which are of very high quality. This study is important in identifying new mitochondrial processes that are essential for Plasmodium infectivity. More broadly, the study highlights the important and evolutionarily conserved role that mACP has in assembly of Fe-S complexes in eukaryotic cells and the extent to which this function can be decoupled from FASII fatty acid biosynthesis.

      We thank the reviewer for these positive comments.

      Reviewer #2:

      Major weaknesses:

      Throughout the result and discussion the authors conclude that mACP is essential for Fe-S cluster biogenesis (importantly, lines 480-486, and figure 6, are extrapolating). However, while the interaction with Fe-S cluster biosynthesis pathway component is established, a role of mACP in Fe-S cluster biosynthesis or its control is implied from indirect evidence. It is possible that the depletion of complex III subunits and defect in mETC functions are an outcome of other mitochondrial defects. One example would be a defect in mitochondrial translation that leads to complex disassembly with an outcome on the abundance of nuclear encoded complex components.

      We note that we formally show that mACP directly binds Isd11 and pulls down with Nfs1. We also show that loss of mACP is accompanied by loss of Nfs1, which is expected to ablate Fe-S cluster assembly. Consistent with this expectation, loss of mACP is accompanied by loss of the Rieske Fe-S protein. We agree with the reviewer that it remains an important future challenge to test the impact of mACP knockdown on other Fe-S proteins and thus more broadly study the impact of Nfs1 loss on other Fe-S client proteins and processes within and outside the mitochondrion. These studies are in progress.

      Nsf1 and Rieske instability is used as support for defect in assembly of Fe-S cluster biosynthesis pathway and indirect support for defect in Fe-S cluster biogenesis. However, the data is not presented with independent repetitions and statistical analysis nor with quantification of the EF1 control. Moreover, it is not specified if the EF1 proteins used for control is mitochondrial. An unrelated mitochondrial protein that is not down-regulated is essential to support the conclusion about specific instability of NSf1 and Rieske.

      We have included biological replicates of these experiments and used densitometry to quantify the levels of knockdown relative to loading controls. We have also added additional western blot analyses to Fig. 4 that show that levels of the mitochondrial chaperone Hsp60 and the ETC complex III protein, cyt c1, are unaffected by loss of mACP. The results strongly support our conclusion that mACP knockdown specifically reduces stability of Nfs1 and Rieske.

      The characterisation of the mACP phenotype is driven by the elegant hypothesis that it performs the same alternative roles that other mACPs perform in addition to FASII in other organisms. However, the work ignores other possibilities - what is the effect of depletion on other mitochondrial functions (e.g. biogenesis pathways such as protein import, division and translation) and is the effect on mETC primary or secondary. Likewise, what is the effect on other cellular functions, is the mitochondrial defect primary?

      The critical point here is that the essential interaction of mACP with Isd11-Nfs1 identified in our manuscript is sufficient to explain the observed phenotypes. We acknowledge that mACP may have other key interactions in parasites (that also contribute to mACP essentiality). However, any such interactions will be divergent relative to other studied eukaryotes and independent of known LYR-motif protein homologs that mediate conserved mACP functions in yeast and humans. As noted in our responses above, mACP knockdown does not result in mitochondrial depolarization and thus is not expected to inhibit import of mitochondrial- targeted proteins (which depends on the transmembrane potential), nor do observed phenotypes provide any evidence for impacts on mitochondrial translation.

      A direct role is Fe-S cluster pathway assembly and Fe-S cluster biosynthesis is not directly established, and other mitochondrial functions are not examined. Finally, it is also not clear weather mitochondrial functions are the primary defect, since other cellular functions are not tested.

      Please see our responses above. A full investigation of mitochondrial Fe-S cluster pathway assembly (which has not previously been studied in Plasmodium) is well beyond the scope of the present manuscript. We agree that future studies of broader Fe-S metabolism (beyond the scope of the present manuscript) will test and extend the conclusions of the present study. The new data added to Figure 4 strongly suggests that loss of mACP results specifically in loss of Nfs1 and Rieske and has no detectable impact on general mitochondrial proteins and functions that include retention of transmembrane potential, levels of the Hsp60 chaperone and the core complex III sub- unit cyt c1, and import of nuclear-encoded proteins into the mitochondrion and processing (e.g., Hsp60 and cyt c1) that require an intact transmembrane potential.

      Reviewer #3:

      Targeting of mACP to the parasite mitochondrion is nicely confirmed, as is essentiality via conditional knockdown. Some effort was undertaken to home in on leucine 51 as a likely point of cleavage for the removal of the mitochondrial targeting leader (Figure 1C & Figure 1 supplement 2). Is it possible to make a targeted search through the peptide hits and look for a peptide commencing with leucine 51 as a sort of poor man's N-terminome?

      We thank the reviewer for this suggestion. We have on-going experiments to more precisely define the N-terminal processing site for mACP.

      Binding of mACP to Isd11 is clearly demonstrated, as is further linking to Nsf1 to create a likely iron sulfur complex forming machine. I'm no structural biologist but it strikes me that a single protruding hydrophobic residue (F113) docked into a hydrophobic pocket on Isd11, plus a little cooperation from mACP V117, would make for a very weak interaction. Is that the sole binding interface? The mutagenesis (mACP F113A) abrogating pull down by nickel chromatography when expressed heterologously in bacteria is compelling. Are there data to show this pull down fails in the presence of detergent? Are there comparable examples of weak hydrophobic interactions generating such good binding?

      To clarify, the binding interface of the Plasmodium Isd11-mACP complex spans many intermolecular interactions beyond the predicted role for F113. This predicted interface includes electrostatic interactions between conserved Asp/Glu residues on mACP and conserved Lys/Arg residues on Isd11. The residues involved in these predicted intermolecular electrostatic interactions appear to be conserved between the parasite proteins and Isd11 and mACP from yeast to humans.

      The key molecular difference in the parasite binding interface is due to specific loss of a negatively charged phosphopantetheine group on mACP and its replacement with a hydrophobic F. In parallel, the parasite Isd11 R6I modification has replaced the positively charged group that canonically interacts with the phosphopanthetheine oxyanions with a hydrophobic isoleucine. These adaptive interactions change the local environment of the mACP-Isd11 binding interface near mACP residue 113 to one that is more hydrophobic. Our F113A mutagenesis results highlight the important contribution of F113 to the stability of this interface, even though the interface spans a much larger molecular surface with additional electrostatic interactions. We have on-going structural and biochemical studies to fully understand the intermolecular interactions that stabilize association of Isd11 and mACP in Plasmodium.

      We have modified the text to clarify that additional conserved electrostatic interactions also contribute to stability of the mACP-Isd11 interface. We have also added Figure 3- figure supplement 1 to explicitly show these electrostatic interactions in the parasite complex and now include the PDB file of the Rosetta model as source data so readers can view the low-energy model themselves.

      Line 900 - Figure 2 supplements 4 & 5. There appear to be many spectral counts in the table of mass spec hits for mAPC/Isd11 complex retrieved from bacteria by nickel chromatography, but only one peptide (being the largest possible) is displayed in supplement 5. Is there a reason that smaller, sub-peptides were not observed?

      To clarify, Figure 2- figure supplement 4 (figure supplement 5 in the revised manuscript) is data from the IP/MS analysis of parasites expressing mACP-HA2 or aACP-HA2 that establishes similar detection of both bait proteins despite only substantial detection of Nfs1 in the IP sample for mACP-HA2. Figure 2- figure supplement 5 (figure supplement 6 in the revised manuscript) is data based on recombinant expression of Isd11 and mACP in E. coli. For IP/MS-MS of the recombinant proteins, we did indeed observe smaller tryptic peptides. The underlined region was simply meant to convey the overall sequence coverage spanned by the observed peptides. We have modified the legend of the revised Figure 2- figure supplement 6 to clarify the overall sequence coverage from IP/MS-MS analysis and now include lines for individual peptides.

      On line 548 the authors speculate that a small molecule inhibitor of the protein/protein interaction between mACP and Isd11 might be a pan-apicomplexan drug. To better substantiate this speculation, it would be nice to include an alignment of the chromerid and apicomplexan mACP proteins to illustrate the apparent switch from a 4-phosphopantetheine prosthetic group attached via serine to a phenylalanine and the postulated hydrophobic binding interaction.

      We have followed the reviewer’s suggestion and now include this alignment as Figure 6- figure supplement 1.

      Why was 1µm proguanil used for the 7 day growth assay (Fig 5A) and 5 day assay (Fig 5B) but 5µm for the MitoTracker imaging?

      For growth assays spanning several days, we used the lower proguanil concentration to avoid any toxicity from proguanil alone, which can inhibit parasite growth at IC50 ~10 μM via targets that are poorly defined but appear to be independent of the parasite electron transport chain (discussed in ref. 46). Because the imaging experiment in Figure 5C involved only two days of proguanil treatment and was primarily concerned with impacts on MitoTracker staining rather than parasite growth, we used a higher 5 μM proguanil concentration. This proguanil concentration by itself had no impact on MitoTracker accumulation in the mitochondrion but resulted in dispersed signal when combined with mACP knockdown -aTc.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, Akaki et al. describe a new mechanism by which the activity of Regnase-1, an endonuclease that degrades mRNAs encoding inflammatory mediators, can be regulated. By determining the interactome of Regnase-1 in IL-1b or TLR-ligand stimulated cells, they found that Regnase-1 binds to bTRCP (as previously described) as well as to 14-3-3 proteins, which is novel. The authors further identify the phosphorylation sites on Regnase-1 that are required for the Regnase-1:14-3-3 interaction, and show that the interaction is mediated by the activity of IRAK1/2. By generating knock-in mice carrying a phosphodeficient mutant of Regnase-1, the authors demonstrate that the interaction with 14-3-3 blocks the ability of Regnase-1 to degrade its target mRNA IL-6, as it can no longer bind to the target mRNA. Finally the authors show that binding to 14-3-3 prevents nucleocytoplasmic shuttling of Regnase-1 and therefore target mRNA recognition.

      General comment:

      This is an important study that describes a new mechanism by which Regnase-1 is inhibited upon immune activation, which mediates efficient synthesis of inflammatory mediators whose mRNAs are normally degraded by Regnase-1. The interaction with 14-3-3 presented here was not known before, and the authors describe the interaction and its consequences in great detail. In general, the study is well conducted and the results are both clear and convincing. The analysis of phosphodeficident Regnase-1 knock-in mice is a major strength of the study. However, there are some smaller points that the authors could address to further strengthen the manuscript, e.g. the mutually exclusive binding of Regnase-1 to bTRCP or 14-3-3, and the possibility that IRAK1/2 may directly phosphorylate Regnase-1. In addition, they should more directly measure the effect of phosphodeficient Regnase-1 on IL-6 mRNA decay, and generalize their observation that 14-3-3 binding prevents Regnase-1 mRNA binding and decay.

      Specific comments:

      The data suggest that IRAK1/2 may directly phosphorylate Regnase-1 (Fig.2G-I), although the authors do not address this question either experimentally or in the discussion. Do the authors have evidence that Regnase-1 is a direct target of IRAK1/2? Minimally, the authors should discuss this point and assess whether the identified phosphorylation sites conform to consensus IRAK target motifs.

      We thank the reviewer for the suggestion. A previous kinome study comprehensively identifying kinase substrates suggests that the sequence motifs of target phosphorylation site of IRAK1 is pSxV and KxxxpS (Sugiyama et al., 2019; PMID: 31324866). However, these sequences do not match the sequence at S494 and S513 of Regnase-1 (Figure 2F). We speculate other kinases are activated by IRAK1/2 and phosphorylate Regnase-1 at S494 and S513, although we cannot exclude the possibility that Regnase-1 is directly phosphorylated by IRAK1/2 at S494 and S513. We added this point in the Discussion section.

      The evidence for mutually exclusive binding of Regnase-1 to bTRCP or 14-3-3 is rather indirect, through the analysis of Regnase-1 phosphorylation status and phosphomutants (Fig.3). This point could be strengthened by competition assays, in which expressing increasing amounts of one protein should weaken the interaction with the other.

      We thank the reviewer for the criticism. As the other reviewers point out, the wording of "mutually exclusive" might mislead readers. As shown in Figure 3C and 3D, βTRCP recognizes 14-3-3-free Regnase-1 but not slowly migrating Regnase-1, which is the binding target of 14-3-3. In addition, Regnase-1-S513A mutant is unstable after IL-1β or LPS stimulation (Figure 4A, 4B, 4C, and 5A). These results suggest that 14-3-3 stabilizes Regnase-1 by preventing the formation of Regnase-1-βTRCP complex. However, we have not shown the data indicating βTRCP inhibits Regnase-1-14-3-3 association. We therefore corrected the sentences about the relationship between Regnase-1-14-3-3 complex and Regnase-1-βTRCP complex throughout the manuscript. These binding events occur independently and not sequentially, and 14-3-3 inhibits Regnase-1-βTRCP binding. We did not investigate whether βTRCP affects Regnase-1-14-3-3 interaction or not because once proteins (substrates of SCF complex) bind to βTRCP, they get ubiquitinated and degraded via proteasome system.

      We agree with the reviewer that the competition assay will help to clarify the detailed mechanism how 14-3-3 inhibits Regnase-1-βTRCP binding. However, we feel the in vitro assay is beyond the scope of this study. 14-3-3 mediated abrogation of the nuclear-cytoplasmic shuttling of Regnase-1 might be one of clues to answer this question.

      Reviewer #2 (Public Review):

      The authors used immunoprecipitation followed by mass spectrometry to identify proteins interacting with Regnase-1 before and after stimulation with IL-1β. IL-1β treatment induced a previously unknown interaction between Regnase-1 and 14-3-3 proteins. 14-3-3 bound predominantly to phosphorylated Regnase-1 and specific phosphorylation sites were identified. 14-3-3 binding to Regnase-1 was mutually exclusive with βTRCP, binding of which is known to induce ubiquitination and degradation of Regnase-1. 14-3-3 binding prevented Regnase-1 degradation, but also inactivated it by blocking mRNA binding. 14-3-3 binding also prevented translocation of Regnase-1 from the cytoplasm to the nucleus. This study has identified a second mechanism by which Regnase function can be blocked to increase expression of inflammation-related mRNAs.

      Overall, the authors' conclusions are supported by the data. The results of this study significantly advance the understanding of the regulation of Regnase-1 activity in inflammatory gene expression. The data are likely to be of interest to those investigating the intracellular signaling pathways that control gene expression in response to inflammation. The authors identified important sites for Regnase-1 regulation and created several mutant Regnase-1 constructs that will be of use to the research community. In addition, the transcriptomic and proteomic datasets generated in this study are likely to be of further benefit.

      We thank the reviewer for evaluating out manuscript interesting.

      Reviewer #3 (Public Review):

      Here, Akaki and colleagues set out to identify how Regnase1 is regulated upon cells being stimulated with IL-1Beta or TLR ligand stimulation. To do this they stimulated cells and then carried out a proteomic analysis to identify proteins that specifically interact with Regnase1 in stimulated cells. They identified Rengase1 interacting with the Beta-transducin-repeat containing complex (TRCP), a previously published interaction, which leads to Regnase1 ubiquitination and degradation. Interestingly, they also identify 14-3-3 proteins. Based on other data, they conclude that TRCP and 14-3-3 interact with Regnase1 in a mutually exclusive manner. They go on to show that the interaction between 14-3-3 and Regnase1 is mediated in IL-1B/TLR-stimulated cells by IRAK1/2 through an uncharacterized C-terminal domain. Two phosphorylation sites (S494 and S513) regulate 14-3-3 interaction with Regnase1, while different sites are required for Regnase1 interaction with TRCP and proteosomal mediated degradation. Finally, they conclude based on their data that 14-3-3 binding to Regnase1 stabilizes Regnase1 but prevents nuclear-cytoplasmic shuttling of Regnase and also Regnase1-mRNA association.

      The manuscript is interesting and presents another layer with respect to how Regnase-1 activity is regulated during the immune response. However, several points should be addressed in this reviewer's opinion that would help strengthen the manuscript.

      We thank the reviewer for considering our manuscript interesting.

    1. Author Response:

      Reviewer #2:

      Cai & Padoa-Schioppa recorded from macaque dorsal anterior cingulate cortex (ACCd) while requiring animals to choose between different juice types offered in variable amounts and with different action costs. Authors compared neural activity in ACCd (present study) with previous, directly comparable, findings on this same task when recording in macaque orbitofrontal cortex. The behavioral task is very powerful and the analyses of both the choice behavior and neural data are rigorous. Authors conclude that ACCd is unique in representing more post-decision variables and in its encoding of chosen value and binary outcome in several reference frames (chosen juice, chosen cost, and chosen action), not offer value, like OFC. Indeed, the encoding of choice outcomes in ACCd was skewed toward a cost-based reference frame. Overall, this is important new information about primate ACCd. I have only a few suggestions to enhance clarity. Figures 5 and 7 are maximally informative, but it is not clear that Figure 6 adds much to the reported Results. It is also suggested to abbreviate the comparison with Hosokawa et al. as it presently takes up 3 paragraphs in the Discussion: it is clear the methods and task designs were different enough to not be so easily compared with the present study. An additional suggestion would be to include mention of the comparison with OFC in the abstract and possibly also in the title, since the finding and direct comparison in Figure 7 are some of the most novel and interesting effects of the paper. Other suggestions are minor, and have to do with definition of time windows, variables, and additional papers that authors may cite for a well-rounded Discussion.

      Please refer to Essential Revisions point #4. And we added “In contrast to the OFC” in the abstract to highlight the difference between these two regions.

      Essential Revisions Point #4 Response:

      We shortened the discussion from 3 paragraphs to 1 paragraph as follows.

      "In another study, Hosokawa, Kennerley et al. (2013) compared the neuronal coding in ACCd and OFC in a choice task involving cost-benefit tradeoff. Our findings differ in two aspects. First, Hosokawa et. al. (2013) reported contralateral action value coding in ACCd while we did not discover significant offer value coding in either spatial- or action-based reference frames in our ACCd recordings. Second, they reported that there was no action-based value representation in the OFC therefore concluded that OFC does not integrate action cost in economic choice. Two elements may help explain the discrepancies between our findings in ACCd and OFC (Cai and Padoa-Schioppa 2019) and those of Hosokawa et. al. (2013). First, we recall that Hosokawa et. al. (2013) only tested value-related variables such as the benefit, cost and discounted value in action-based reference frame. Most importantly, they did not test the variable that is related to the saccade direction, which is highly correlated with the spatial value signal. As a consequence, contralateral value signal may not be significant if chosen target location was included in their regression analysis. Indeed, in our analysis, saccade direction (or chosen target location) was identified as one of the variables that explained a significant portion of neuronal activity in ACCd (Cai and Padoa-Schioppa 2012, Cai and Padoa-Schioppa 2019).The second and often overlooked aspect is that value may be encoded in schemes other than the action-based reference frame. In their study, each unique combination of reward quantity and cost was presented by a unique picture. Thus, information on good attributes were conveyed to the animal with an “integrated” visual representation. Accordingly, a distinct group of neurons may have been recruited to encode the reward and cost conjunctively represented by a unique fractal, which would result in 16 groups of offer value coding neurons."

      Reviewer #3:

      Cai and Padoa-Schioppa present a paper titled 'Neuronal Activity in Dorsal Anterior Cingulate Cortex during Economic Choices under Variable Action Costs'. They used a binary choice task where both offers indicated the reward type, reward amount, and the action cost (but not the specific action.) Variable action costs were then operationalized by placing targets on concentric circles of different radius. Here, and in a previous study that included OFC recordings (Cai and Padoa-Schioppa, 2019), monkeys integrated action costs into their decisions. Single-unit recordings in ACCd revealed that neurons predominantly coded for post-decision variables, such as cost of the chosen target and the juice type of the chosen offer, but not pre-decision variables, such as offer values. Given this finding, the authors compared the percentage of neurons in OFC and ACCd that coded for decision variables. In OFC neurons, the activity was mostly restricted to the offer presentation phase, whereas ACCd neurons showed sustained coding of chosen value and costs that lasted until the appearance of the saccade targets. Overall, this is an interesting study that provides evidence that decision-related signals evolve from coding offer values in the OFC to representing chosen costs in the ACC. This finding could highlight the roles of ACC neurons in learning and decision making. We have only a few questions.

      1) Do any of the variables used in this study correlate with a conflict? When the authors previously studied ACC, they discarded the conflict monitoring hypothesis - a hypothesis that is well established for ACC hemodynamic responses - for ACC single cell activity based on neural data from 'difficult' decisions (Cai and Padoa-Schioppa, 2012). The definition of difficulty they used, then, was descriptive and based on reaction times (RTs). They defined the most difficult trials as those trials with the longest RTs and discovered that those trials had options with similar offer values. This definition of choice difficulty appears to be contrived from evidence accumulation models/tasks, where normatively harder judgments elicit longer RTs. However, there is no normative economic reason that trials with similar offer values are more difficult or should cause conflict. After all, according to theory, choosing between two options with the same value is as easy as flipping a coin. Here, it seems like the authors could have a more fitting definition of conflict. For example, conflict can be operationalized by considering trials when the animal must choose between a high value/high-cost option and a low-value/low-cost option. In that case, the costs and benefits are in conflict. What do the RTs look like? Do the RTs indicate conflict resolution? If so, is this reflected in neuronal responses?

      We thank the reviewer for raising this important point. First, we would like to clarify that both in this study and in our previous study of ACC (Cai and Padoa-Schioppa 2012) we imposed a delay between offer presentation and the go signal. Such delay is critical to disentangle value comparison from action selection. However, the delay effectively dissociates reaction times from the decision difficulty. Normally, we operationalize the decision difficulty (or conflict) with the variable value ratio = chosen value / unchosen value. In an early behavioral study conducted in capuchin monkeys, where no delay was imposed between offer presentation and the go signal, we found that reaction times were strongly correlated with the value ratio, as one would naturally expect (Padoa-Schioppa, Jandolo et al. 2006). In the previous study of ACC (Cai and Padoa-Schioppa 2012) we referenced that earlier result but, again, we did not analyze reaction times.

      Coming to the present study, we addressed this question by including in the variable selection analyses the two variables value ratio and cost/benefit conflict = cost of A * sign(offer value A – offer value B) (see also Table 2). The results of the updated analysis are illustrated in the new Figure 4, which we include here below. In essence, including these two variables did not affect the results of the variable selection analysis. That is, both the stepwise and best-subset methods selected the variables chosen value, chosen cost, chosen juice, chosen offer location only and chosen target location only.

      Figure 4. Population summary of ANCOVA (all time windows). (A) Explained responses. Row and columns represent, respectively, time windows and variables. In each location, the number indicates the number of responses explained by the corresponding variable in that time window. For example, chosen value (juice) explained 34 responses in the post-offer time window. The same numbers are also represented in gray scale. Note that each response could be explained by more than one variable and thus could contribute to multiple bins in this panel. (B) Best fit. In each location, the number indicates the number of responses for which the corresponding variable provided the best fit (highest R2 in that time window. For example, chosen value (juice) provided the best fit for 40 responses in the late-delay time window. The numerical values are also represented in gray scale. In this plot, each response contributes to at most one bin.

      2) The authors claimed that the ACCd neurons integrated juice identity, juice quantity and action costs later in the trial. As they acknowledge, the evidence for this claim is marginal. The conclusion the authors made in line 211, therefore, could be moderated. Given that the model containing cost-related variables is more complex, it is equally valid and more appropriately to write '… we cannot reject the null hypothesis that action cost was not integrated by chosen value responses later in the trial.

      We acknowledge the complexity of this claim. However, results from previous studies (Kennerley, Dahmubed et al. 2009, Kennerley and Wallis 2009, Hosokawa, Kennerley et al. 2013) are in favor of establishing a null hypothesis of integration rather than non-integration. Therefore, we feel that it is more appropriate to keep the null hypothesis of cost integration while in the meantime acknowledging that in our study the evidence for cost integration is rather weak.

    1. Author Response:

      Reviewer #1:

      Stem cells in the Drosophila ovary provide a great model to understand cell behavior and regulation due to its genetic tractability, organized morphological pattern, and ease to perform live imaging. Compared to the adult ovary which have been studied quite extensively, the pupae ovary is a much less explored stage. Here the authors extensively studied the cell lineage in the pupae ovary, which helps understanding the development of early cell fates and the formation of the first set of egg chambers. They first described the different stages of pupal ovary development, followed by several different lineage tracing experiments that conclude a subset of Intermingled Cells (ICs) as Escort Cell (EC)/ Follicle Stem Cell (FSC) common precursors. Then they described Extra-Germarial Crown Cells (EGCs) and basal stalk cells and showed by live imaging that they contribute to the first budding cyst.

      Strength:

      Several lineage tracing experiments and statistical calculations are performed to conclude that a subset of ICs function as EC/FSC common precursors. The methods are written in great detail to help understand the calculation.

      The finding that EGCs and basal stalk cells contribute to the first budding cyst is new and intriguing. This initial developmental process, different from what happens in the adult ovary, might provide insight into how germline and somatic cells are coordinated.

      Weakness:

      The authors showed in their 2017 NCB paper that FSCs contribute to ECs in adult ovary. Here they showed that there is a common precursor of EC/FSC. Are these two cell types the same? It has been shown in single cell analysis during third instar that the ICs and FSCPs present Con and bond as their specific markers (Slaidina et al. 2020). Both give rise to EC/FSC/FC in lineage tracing experiments. Therefore, the novelty of this finding is weakened.

      ECs and FSCs in adults have important different properties. ECs do not divide; FSCs do divide. ECs interact with developing germline cysts to support their progressive differentiation. FSCs are not known to have an analogous role. FSCs all have long processes that span the germarium. Most ECs have shorter processes, especially in anterior regions. Quite likely there are also many similarities between FSC and ECs, for example in gene expression profiles, since FSCs can readily become ECs.

      In adults, new ECs are continually produced from FSCs. There is evidence that the production of marked ECs from marked adult FSCs saturates over only a few days, suggesting that adult- produced ECs turnover by dying or returning to FSC status [1]. The same studies suggest that ECs produced during development do not turnover at a comparable rate. So, while there may be some conversion of adult-born ECs to FSCs it is unlikely that a significant proportion of ECs present at eclosion later become FSCs. So, overall, the relationship between adult FSCs and ECs appears to be a typical one of stem cell and maintained derivative cell.

      Our work here shows that the attribution (quoted by the reviewer) by Slaidina et al., (20-20) of “FSCP” status to cells specifically expressing bond is incorrect on two counts. First, there is no such thing as an “FSCP”, dedicated to produce just FSCs and FCs at the start of pupation. Almost all FSC-producing precursors at that stage also produce ECs. Second, all FSC precursors are within the IC population at the start of pupation and remain within the developing germarium throughout pupation. They are not posterior to ICs and the germarium, like the population of bond-expressing cells noted by Slaidina et al., in late third instar larvae. Our results refute the earlier conclusions that inappropriately assumed (i) bond-GAL4 labeled only cells posterior to ICs in lineage experiments (it does not) and (ii) ovarioles with marked ECs and FSC/FCs derive from two or more distinct precursors rather than a common precursor (which we deduced by looking at lineages derived from single cells).

      Our deduction of a common precursor of ECs and FSCs is new and opposite to the conclusions of Slaidina et al., who proposed separate precursors for each at the start of pupation.

      While the finding that EGCs and basal stalks contribute to the first budding egg chamber is intriguing, the definition of EGCs and basal stalks are quite vague. Do EGCs have a distinct feature that is worth noting as separate cell types, or are they simply early FCs locating posterior of the first germline cyst? Do basal stalks express mature stalk markers, or are they simply accumulating FCs that are not fully differentiated yet?

      We have clarified EGC and basal stalk definitions in the text. Prior to budding of the first egg chamber, there are some partially intercalated Fas3-positive cells posterior to the germarium previously termed the “basal stalk”. We noticed that the most anterior of those cells expressed Traffic-Jam, which was expected to be important for all ovariole cell types (ECs, FSCs and FCs), and therefore worthy of highlighting with a different name. Further analysis showed that not only cells in the EGC but also many in the basal stalk subsequently became FCs on the first egg chamber (acquiring Traffic-jam expression at some point along the way). Thus, EGC and basal stalk designations from 0-48h APF do not define different outcomes but the separate names are still useful descriptors of cell populations in specific locations and with slightly different expression profiles during the first half of pupation.

      We did not examine whether, or exactly when, pupal basal stalk cells express markers seen in the stalks between egg chambers. Most cells in the pupal basal stalk population contribute to the FC epithelium and, with the exception of polar cells, lose Fas3 expression by adulthood. The basal stalk cells that remain posterior to the first egg chamber form the basal stalk of a newly- eclosed adult and retain Fas3 expression. The stalks between egg chambers, like the FC epithelium, express Fas3 at early stages but lose Fas3 expression at later stages.

      Is the process of EGCs and basal stalk contributing to the first budding egg chamber similar to posterior FCs contributing to the budding egg chamber? Is it because the budding of the first germline cyst takes longer than normal, there are more FCs accumulating at the posterior, thus making this region look like EGCs and basal stalks?

      The two processes appear to be substantially different. In the adult, FCs are already associated with a germline cyst when it leaves the germarium. In pupae, the first germline cyst leaves the germarium without associated somatic cells and enters an accumulation of somatic cells, moving into and through the EGC and most of the basal stalk. Neither process is understood well enough to explain the underlying reasons for this difference. However, we believe that one important difference may be the absence of a posterior signaling center, provided by polar FCs of the previously budded egg chamber in adults, prior to budding of the first egg chamber in pupae. That could affect several germline and somatic cell properties relevant to adhesion among cells, accumulation of cells and cell movements.

      The presentation is too lengthy. A more concisely written paper would help the audience to get the key points that the authors hope to convey.

      We have trimmed and summarized substantially, spurred by specific comments of both reviewers.

      Reviewer #2:

      Reilein, Kogan and co-workers chart the origin and fate of the different cell types in the Drosophila ovary throughout pupal stages using a combination of mosaic analysis, live imaging and immunohistochemistry. Their results challenge some of the assumed lineage and niche relationships between adult progenitors and support cells. The authors identify progenitors that can give rise to more cell types than previously thought (e.g. precursors that yield follicle stem cells and their adult niche and product cells) and revise some cell interactions/lineage relationships (for example, by providing evidence for separate precursors of follicle cells in the first-formed egg chamber, or by observing that germline progression can be supported by developing escort cells precursors rather than differentiated escort cells). Collectively, their data suggest a gradual and flexible adoption of these cell types according to the position of specific precursors during development.

      Although mosaic/clonal analyses do not always provide clear-cut answers, the authors are fully aware of possible caveats, and have done everything in their genetic power to ensure that their interpretations are as sound as they can possibly be. This includes, for example, extensive quantifications and statistical analyses/predictions based on clone frequency, the use of multicolour marking strategies to ensure that lineages are derived from single cells, and consideration of the effects of temperature shifts on developmental times. The resulting data are invariably comprehensive, have been documented and quantified extensively, and are often accompanied by stunning images.

      In a way, the comprehensive nature of this manuscript is also its Achilles heel: it is VERY long and dense. Any readers unfamiliar with the Drosophila ovary may not take the time to digest the data. This would be a pity as the manuscript's main messages are timely. They also resonate with observations in other systems such as the mouse gut field which, collectively, are beginning to challenge concepts such as hardwired "stemness" and "genetic programs", and to rediscover the importance of positional/mechanical cues in specifying cell fate.

      The power of description is somewhat underestimated in our post-genetic revolution era, and there is a lot to be learned by carefully observing and documenting "what does happen" - as opposed to exclusively relying on genetic loss/gain-of-function experiments. This manuscript is a good illustration of this. That said, the authors' observations make a number of predictions which could be genetically tested (e.g. through temporal ablation experiments to confirm flexibility/temporal requirements, or experiments targeting specific pathways to confirm their contribution as spatial organisers). These experiments would make their revised models much more compelling.

      We appreciate these comments, especially the value placed on studying how cells behave, the difficulty of ascertaining that through lineage studies, and the search for putting findings in a general context accessible to a broad group of scientists. We have tried to edit the manuscript further to achieve the last aim. However, the key virtues of thoroughness and the surprising difficulty of deducing “what does happen” from a combination of fixed and live imaging together with multiple lineage studies require careful and thorough explanations with accessible documentation. The result of our revisions is, I believe, analogous to keeping the qualities and achievements that earned Achilles’ reputation, while making the inevitable Achilles heel less obvious, though still present.

      Experiments testing the effects of altering specific signaling pathways are already underway. Some results certainly support the concept that cell locations and outcomes remain flexible, subject to external influences, through pupal development. However, such experiments, results and interpretations demand careful scrutiny and cannot practically be included in this already lengthy manuscript.

    1. Author Response:

      Reviewer #1:

      The authors address an interesting but neglected issue in pigment cell biology, concerning the developmental origin of red erythrophores, especially in relationship to yellow xanthophores, and the genetic basis for their differing pigmentation. Red-yellow colouration in vertebrates usually arises from accumulation of dietary carotenoids, and often has significant behavioural importance, e.g. as an honest signal of individual quality. This and the biochemistry of carotenoid colour variation is nicely covered in the Introduction, providing helpful background to a broad audience.

      The authors document the widespread presence of erythrophore in Danio, highlighting the unusual nature of Zebrafish within the genus as lacking them. They then develop some quantitative and objective measures of the xanthophores and erythrophores based upon Hue and Red:Green autofluorescence ratios, allowing clear distinction of the mature cell-types, and note the often binucleate nature of the erythrophores.

      The authors then use a variety of tools to assess, with differing degrees of certainty, the lineage relationships of the erythrophores; together these provide a consistent and convincing picture of shared lineage between the two cell-types. This is consistent with the observed gradual shift in properties of proximal cells from xanthophore-like to erythrophore. A more direct test of the conversion of early xanthophores to erythrophores comes from the clonal analysis of aox5:nucEosFP cells (Fig. 4). They then use a fin regeneration assay to assess the plasticity of these cells in the mature adult. This is a neat experiment, but I am struggling with the interpretation of Figure 5A: which cells are being used as landmarks to justify the conclusion that the cells shown are clonally-derived form that single cell in the 5 dpa image? It may be that the full series of images could be provided in a supplementary figure and might make this clear, but the current images do not seem convincing to me. The experiment in Fig. 5B is convincing, so conclusion seems sound.

      We added a supplementary figure (Figure 5—figure supplement 1) to show more context and nearby landmarks, including the amputation plane. We additionally swapped out the images in Figure 5A with an example that more clearly makes our point that cells seem to both lose red coloration and increase in number. Cells of both the original and the new example are visible in the new supplemental figure. Given the concern expressed we additionally modified the salient portion of the text, to make it clearer that the brightfield-only analyses were intended merely to see if a transformation is plausible, based on overt cell colors and behaviors in the absence of formal clonal analysis. The revised text reads:

      “We first assessed the possibility that transfating occurs by repeatedly imaging individual fish in brightfield, to learn whether cells near the amputation plane might lose their red color during regenerate outgrowth. Individual erythrophores could often be reidentified using other cells as well as distinctive features of fin ray bones and joints as landmarks (Figure 5A; Fig- ure 5—figure supplement 1). As regeneration proceeded, small groups of cells having paler red or orange coloration, were sometimes observable where individual cells of deep red col- oration had been found, suggestive of proliferation and dilution of pre-existing pigments. Later, only yellow cells were found in these same locations. These observations were con- sistent with the possibility of erythrophore → xanthophore conversion, and so to test this idea directly we marked nucEosFP+ erythrophores by photoconversion prior to amputation (Figure 5B; Figure 5—figure supplement 2A). ”

      The authors then use a transcriptomic comparison to identify candidate genes influencing erythrophore v xanthophore differentiation. They study 3 with mutant phenotypes affecting these cell-types, identifying likely roles of 3 erythrophore genes. Whilst most of this analysis is beautifully presented, I am confused by Fig. 7 in which I think panel D and F as described in the legend are inverted.

      We fixed the relative ordering of panels and legends. We also changed the Y axis label in Figure 7F to indicate cells per 40 μm2 rather than density, which might be misinterpreted to mean cells per mm.

      As is expected form this lab, the manuscript is generally very carefully and clearly written and includes thorough data presentation and statistical analysis. Conclusions drawn are appropriately nuanced, and justified by data presented. The manuscript provides an important first step in understanding the developmental relationship of erythrophores to xanthophores, and a number of genetic resources for the further exploration of this question.

    1. Author Response:

      Public Review:

      Li and colleagues used data from 2000 to 2014 in 54 low and middle-income countries (LMICs) to study the association between exposure to landscape fire smoke PM2.5 and birthweight, including very low birthweight. While there is a relatively robust epidemiological literature that supports an association between non-biomass fire smoke PM2.5 and low birthweight, there are relatively few studies that are specific to biomass smoke PM2.5 and birthweight. The authors of this paper conducted their study to specifically address this data gap. They took advantage of satellite data which provide estimates of PM2.5 levels that are now available for most locations in the world at a high geographic resolution (0.5 x 0.5 km). They enhanced the satellite exposure data using a chemical transport model to distinguish fire-sourced PM2.5 from non-fire PM2.5. The exposure modeling approach is sophisticated as is the statistical analysis of the association between the fire-sourced PM2.5 exposure estimates and birthweight outcomes.

      The study has multiple strengths, including the first study of the association between fire-sourced PM2.5 and birthweight to use a sibling-matched case-control design, a large sample size (227,948 births born to 109,137 mothers), the focus on LMICs, the exposure modeling, a careful statistical analytic approach with alternate non-linear regression and sensitivity analyses, and the outcome of very low birthweight that is one of the World Health Organization targets to reduce the global burden of disease. Limitations notwithstanding, this is an impactful study. The results of the authors' analyses provide strong support for the concept that exposure to biomass smoke -- whether from a landscape fire set by farmers, a wildfire, or cooking with solid fuels -- can lead to low birthweight. This concept is especially important for LMICs that have large portions of their populations engaging in slash and burn agriculture and/or cooking with solid fuels. Given that reducing the incidence of low birthweight is a necessary to meet the 2025 United Nations Sustainable Development Goals, it is incumbent that policies to reduce landscape fires and household air pollution from cooking with solid fuels be considered by governments of LMICs. Such policies would also have a climate change mitigation benefit through reduction of greenhouse gases and aerosols.

      Future research efforts to actually measure landscape fire smoke PM2.5 in different locations to provide ground-truthing for the chemical transport model exposure estimates used by the authors would be useful as would a study that could obtain gestation duration data.

      We thank the reviewer for pointing the strengths of this study. We agree with the reviewer on the shortages of this study, particularly the exposure misclassifications caused by multiple reasons. We revise the manuscript accordingly, and enhance the discussions on limitations of this study.

    1. Author Response:

      Reviewer #2 (Public Review):

      This manuscript details an investigation into whether blinding NIH grant reviewers to the name and institution may affect their review scores. They demonstrate that unblinded grants lead to slightly higher scores for white applicants than blacks, however, a deeper dive demonstrates that grantsmanship and history of prior funding can be even greater predictors of scores regardless of race.

      Overall the manuscript touches on a presently vogue topic and that is of equality in outcomes and systemic racism. The major limitations of the study however, are ironically demonstrating the very topic that the manuscript tries to address. There are no considerations in the manuscript or mention of applications from Asian, Hispanic or Native American applicants, as the authors distill the problem literally down to only Black and White.

      We now incorporate more of this perspective.

      We rewrote the introduction, adding information about funding rates for Hispanic and Asian PIs (the 2 largest groups of minority applicants), and provided a stronger explanation for why this study focused on Black-White differences only (lines 86-95) Our aim was to provide a broader context while keeping the intro reasonably focused. Demographic differences in patterns of application numbers, review outcomes, and funding success is a complex topic, not easily presented concisely. More importantly, we think that this information, while no doubt of interest to some, is not relevant background to the experiment at hand. We tried to strike a balance between context and focus.

    1. Author Response:

      Joint Public Review:

      In this manuscript, the authors analyze multiunit data recorded from macaque motor cortex, and compare the data with theoretical results of a network model that is close to a critical point. Their analysis uncovers two main features of the data: (1) Covariances between spike counts of pairs of neurons depend only weakly on distance, while one would expect a much stronger dependence given the scale of local axonal and dendritic arborizations; (2) Patterns of covariances are dynamic, and differ significantly between different epochs of the behavioral task.

      To understand these findings, they turn to a spatially extended network model. The analysis of this model is performed using an extension of tools introduced by a subset of the authors in a recent publication, that analyzed a network with no spatial structure. The authors show that the first feature can be obtained in their model provided the network is close to a critical point, and that the second feature is also observed in their network when external inputs to the network are epoch-dependent.

      The recordings are from a standard Utah array and reveal correlations across millimeters during either rest or the task. While the heavy-tailed distribution of both positive and negative correlation is striking, it is not unexpected. Long-range anatomical connections cannot be completely ruled out.

      The modeling and analytical results reveal how a network with spatially heterogeneous connections can give rise to a heavy-tailed scaling in the correlation. While long-range correlations arising from a disordered model near a critical point are not surprising, the analytical results obtained here are thorough and show how to obtain rigorous approximations even with heterogeneous 2D models.

      The results that the long-range covariance structure in the primate cortex changes during different stages of a reach-to-grasp task is the most intriguing finding in the paper. While more needs to be done to reveal the "why" of this change in network structure and its impact on neural computation, this work shows that this kind of careful dissection of network state should be explored further.

      The generality of the result beyond motor cortex is argued for and reasonable, though other data would be needed to substantiate this claim.

      We thank the reviewers for this summary to which we agree overall. We would only like to comment on two points:

      1.) The long-range correlations are found with either sign and similar magnitude, independent of the involved neuron types (excitatory / inhibitory). In the revised version we discuss that one would expect i) different long-range correlations for excitatory and inhibitory neurons if correlations were predominantly driven by long-range connections; ii) Correlation patterns should be more static, if they were caused by direct connections. Still, of course we agree that long-range connections definitely should have an effect on the investigated measures and their analysis is highly interesting, but also challenging.

      2.) The positive and negative long-range correlations of equal magnitude are a specific feature of the critical point generated by the disorder in the studied system; this is why our work required the development of new theory. This feature also distinguishes the proposed mechanism of critical dynamics from criticality in homogeneous systems, where long-range correlations are typically positive.

    1. Author Response:

      Reviewer #1:

      The paper describes the development of a mechano-chemical model for plant root development that incorporates mechanical aspects of cell growth and division, chemical aspects of (amongst other auxin processes) polar auxin transport and auxin patterning, and the feedbacks between these processes. As such it presents a significant advance relative to other root models that have focussed predominantly on either the mechanical or auxin patterning aspects of root development, as evidenced by the potential of the model to reproduce a series of hormonal and mechanical perturbation experiments. Additionally, the efficient flexible manner in which mechanics are incorporated make the model potentially suitable for studying e.g. conditions in which aberrant division planes lead to additional cell file formation, or for studying tissue shape regeneration after root tip excision.

      Still, the claim that this study reveals a set of minimal principles for self-organized root tip patterning in which interplays between mechanics, growth and auxin patterning are essential is less strongly substantiated. By superimposing an auxin source in the middle vasculature cells and an auxin sink in the topmost outer cells the authors effectively impose the polar auxin transport directions of the tissue outside the simulated domain, which likely causes the simulated domain to align with this rather than displaying truly self-organized patterning.

      We thank Reviewer for recognition of our work and suggestions that helped to improve this manuscript.

      Most plant organ models include some sort of boundary conditions to mimic connection to the rest of the organisms (i.e. Mahonen et al., 2014; Grieneisen et al., 2007, Band et al., 2012). In fact, root is connected to hypocotyl and auxin must flow in and flow out to reflect this natural phenomena. We add clarification on that issue in the revised manuscript and supplemental section (Lines: 215-222 and 497-503). Therefore, we believe this assumption is biologically sound and necessary to retain continuity of the model. Furthermore, cell polarity is not established by source or sinks but local mechanisms that act at the level of each individual cell that interpret incoming signals (flux or concentration). Without that local interpretation cells are blind as these cells do not sense global gradients everything is local. Again, as suggested by Reviewer we had weakened the claim of entirely self-organizing phenomena in the manuscript.

      Also, in the model mechanics has been made to impact PIN polarity, but it has not been demonstrated if in absence of PIN polarity dependence on mechanics PIN patterning would be different, i.e. if the mechanical feedback on PIN patterning is necessary or rather that the source and sink prepattern dominate.

      Both components are necessary. Mechanics constrains where PINs can be deposited whereas auxin defines which side is preferred (either flux or regulator-based) and controls the rate of cell growth. When we remove mechanics feedback growth on PIN polarity is less robust (as auxin only defines growth rates and final PIN polarity) , see Fig. 2 - supplement 7). Results of severely perturbed mechanics are presented in Fig. 5E. We added clarification on that matter in the revised manuscript (Lines: 267-274).

      Similarly, the feedback of auxin on mechanics is I believe limited to cellular auxin levels determining cellular growth rates and does not appear to control cellular growth anisotropy (i.e. predominant longitudinal growth of cells), which arose from the initial symmetry breaking of differential growth rates. As such the actual coupling between mechanics and auxin patterning and the extent of self-organization is less than suggested.

      Indeed, we thank reviewer for spotting this potential confusion. We revised text to explain that particular matter better (Lines: 226-230). However, it is important to mention that auxin effect on cell growth speed (although non-polar) will have consequences for overall organ shape and thus its mechanical properties. In other words, there is always coupling between mechanical and biochemical layers in our model.

      Some matters are rather unclear in the manuscript in its current form. It is for example unclear how exactly does auxin translate into cellular growth rate and how does this result in stable, coordinated growth across cell files. In a previous study it was shown that since auxin levels differ significantly across cell files (e.g. much higher in vasculature than in neighboring cell files), problems in coordinated cell growth may occur (https://pubmed.ncbi.nlm.nih.gov/25358093/). It is unclear how such problems are avoided here.

      We thank reviewer for raising this matter. Now, we have explained what relation between auxin levels and growth rates is (Lines: 226-230). As for suggested study it seems to have a different inner working (for instance cells tend to slide against each other) and therefore it is difficult to compare with our approach. In our model cells share the common wall with their neighbors (as in real plant tissues) and therefore cell sliding is impossible. Moreover, cells are considered as almost incompressible objects (as they are filled with water). This is mimicked in our model by the internal meshing of the cell with provide resistance to compression and shearing due to the action of the distance constraint. This model feature is described in bullet point 1 of the supplemental section: Position-based dynamics implementation. To conclude, we have never seen that problem in our simulations as biomechanics layer always compensates for overgrowing cells.

      Similarly, in the discussion authors mention possible uses of the model such as studying tropisms. However the latter requires incorporating an elongation zone in which cells undergo rapid and extreme cell elongation. It seems that the current model only incorporates slow cytoplasmic cell growth and division occurring in the meristem, and it is unclear whether the used model formalism would be capable of stably simulating these far more anisotropic growth processes in a numerically stable and efficient manner.

      Although outside of scope of this study, we are exploring those possibilities in current framework. The current implementation of the model will require the addition of new features in order to allow the modelling of the elongation zone. Most notably, it would be necessary to dynamically remeshing the elongating cells in order to avoid the appearance of extremely thin and long triangles, which would definitely cause some numerical instabilities to the system solutions. Therefore, the extension of the current model to incorporate alternative biological processes such as rapid cell elongation are feasible but it requires the implementation of new computational procedures and thus it is a matter of ongoing research efforts.

      Summarizing the authors have generated a highly valuable combined mechano-chemical modeling framework for root tip development that can be used to various applications, but that was somewhat oversold.

      Reviewer #3:

      Marconi, M et al. developed a new mechano-biochemical computational framework to study plant morphogenesis. A positional information is self-organized by a diffusing substance that regulates acquiring cell polarity and modulates cell growth by changing the cell's biomechanical properties. The model for the root meristem functioning in Arabidopsis thaliana is composed of a minimal set of experimentally derived principles for self-organization of organ patterning. This study is an excellent methodological achievement that also brought some new biological results. Although all the results are there, the manuscript requires major revision to present the framework better to avoid miscommunication.

      We thank Reviewer for positive assessment of our work and for valuable comments of how to improve this manuscript.

      Framework:

      Good: The framework looks very promising to study tissue morphogenesis (not necessarily in plants). So it has the sense to make it available for the scientific community. Now the code can be downloaded from the google disc using a password; I encourage the authors to add a permanent link and the tutorial to the framework in the final version of the paper.

      Not good: The manuscript structure does not allow us to see all the advantages of the framework; the information about it is spread throughout the text. The main text misses essential details, while the materials and methods section contains a lot of discussions. E.g., CTM are mentioned several times in the results section without explaining how you modeled them. Another example is on the line 188 where the reference about "put together growth biomechanics" leads us the Figure 1 without any details about that. I would suggest adding a first section of the Results and the main figure about the framework, its basics, advantages, and limitations. Also, to add a bit more about PBD technology.

      We very much value suggestions from Reviewer, however, our intention was to build up the story staring from initial symmetry breaking through pure mechanics and later add the polar transport to complete the framework. We believe revealing all aspects of the model at once would overwhelm reader and we would very much avoid it. Otherwise, as suggested we revised manuscript to clarify assumptions and model elements.

      Root model:

      Good:

      • The application of the model delivered some new and important results for plant biologists. E.g., the authors confirmed the hypothesis that the start of anisotropic root growth (elongation) results from the differential expansion of neighboring tissues.

      • They also showed self-organization of a complex auxin distribution (PIN polarization) map, which reproduces tiny properties like the switch in PIN polarity from rootward to shootward in the cortex.

      • The authors also showed that auxin reflux in the meristem is not that important for auxin maximum maintenance under normal conditions but gives an advantage in shoot-independent growth.

      • The fact that the authors were able to "grow" a "whole root" from the embryonic-like structure under a limited number of predefined rules is inspiring and promising.

      Can be better:

      1) I am not sure that I understand the visualization of PINs on the figures. The authors do not distinguish between PIN levels and PIN polarities. It is clear that they managed to model PIN polarities correctly, but I am not sure about PIN levels. If PIN levels are shown by red rectangular, then columella does not have any PINs (which is incorrect). Is it so?

      Figures in pdf may have been lower quality due to conversion. Now, we provide high resolution figures with revised version of the manuscript. Columella PINs although weakly expressed (low auxin) are clearly visible in magnification as thin rectangular elements.

      2) It sounds great that you simulated an oscillating behavior in the root growth, but actually, you did not. Simulating periodic application of auxin treatment to simulate oscillation is a trivial solution, that only sounds great, but disappoints when you look into detail. Instead, I would suggest simulation of restoration of root growth after it is inhibited by auxin application.

      We apologize for the misunderstanding. Experiments by Fendrych et al, 2018 use external auxin applications to modulate growth inhibition in the periodic fashion. We intended to replicate these particular experiments using our framework. We clarify this description in the revised manuscript (also pointed by Reviewer 2) (Lines: 327-333).

      3) Looking at the resulting solution, it looks like that auxin maximum in QC emerges because one QC cell adopts auxin from two vascular cells. Or maybe because columella does not have PINs. If it is so, it has to be stated; otherwise, there is an impression that auxin maximum self-organizes due to flux pattern only.

      In fact, auxin does come from the vascular, columella does have PINs, it is emerging property of auxin transport in our model.

      4) It also looks like that QC does not grow during calculation, if it so, it has to be stated, because then it is one of the factors that "eventually reproducing the non-trivial shape of the root" (line 214). The rules to get "the non-trivial shape" should be explicitly stated. E.g., how did you get that only columella stem cells have divided and not other columella cells that are bigger in size?

      Indeed, we now specify that the QC does not grow nor divide, so are columella cells, which do not divide anymore as they are differentiated. All these assumptions are supported experimentally as pointed in the revised text (Lines: 236-237).

      5) I noticed the formation of left-to-write asymmetry in auxin distribution in the root meristem that was self-organized. It will be great if you elaborate on that more. Especially considering that you simulated pin2 mutant where this asymmetry got lost.

      We check that left-right asymmetry is not a prevalent pattern it may change in WT-like simulations.

      6) I was also quite excited looking on the bending root (Figure 2 - supplement 8) correlated with this left-to-write auxin distribution asymmetry. It will be great if you elaborate on the root bending more in the manuscript.

      The root bending is just a spurious event, roots in the model tend to bend after a while (currently there is no gravity response integrated in the model, so roots are not forced to grow straight).

      Not good:

      1) There is a mess with anatomical terminology usage in the manuscript. The authors name the basal part of the embryo as the root or radicle, which is incorrect. There is no root or hypocotyl at the heart stage and even at the torpedo or bending-cotyledon stages. Consider starting growing the root from the mature embryo stage and not from the heart one. Otherwise, you study the formation of "apical-basal polarity" in embryo development and not root development.

      We thank Reviewer for this suggestion and modified text accordingly. We refer to the organ as the basal part of the embryo (BPE) (Line: 131).

      2) Another problem with the terminology is that the definitions are sometimes given later than they were used, and there is a lot of introduction in the results and material and methods section, which disturb the comprehension of the text. E.g., QC is introduced in the last results chapter (line 324).

      We have inspected that issue and found that the QC is introduced at Line: 236, together with its full name.

      3) The only experimental data given in the manuscript is an analysis of hypocotyl and root growth in the seedling soon after germination. Using this data to confirm the results on "growing the root" from the heart-stage embryo is confusing. The results of the experimental data analysis are also quite obvious and did not require an experiment :). Having Eva Benkova among the co-authors, the manuscript should be supplemented with better experimental data that confirm or demonstrate the modeling results' appropriateness. The authors refer to confirmations from the published data in the rest of the text instead of explicitly comparing computational and experimental results. The manuscript will certainly win from such a direct comparison.

      Actually, it would not be bad if the authors did not use the experimental data at all, because all the facts discussed are well known; the authors can give just references. It just looks strange why only one (and not very relevant) experiment is shown from a vast collection of Prof. Eva Benkova.

      As indicated by Reviewer, majority of the work is computational tested against experimental observations and additional experiment was done to support model predictions regarding the symmetry breaking part which was not published before and strengthen the manuscript in our opinion.

    1. Author Response:

      Reviewer #1:

      This work demonstrates that functional KATP channels exist in most neuronal cell types in the mouse somatosensory cortex. While the transcriptomic profiling of electrophysiologically characterized neurons is only indicative of the existence of the Kir6.2/SUR1 KATP channel, the acute slice pharmacological/electrophysiological experiments convincingly supports this notion.

      The uncertainty of single-cell RT-PCR is likely due to a small amount of starting material inherent to the sample collection method. As the authors discuss, low copy numbers of target transcripts may also have contributed to the negative/uncertain results.

      We fully agree that scRT-PCR analysis underdetected Kir6.2 (kcnj11) and SUR1 (abcc8) mRNAs. This is likely due to their low abundance at the single-cell level, the sample collection method and the low efficiency of the reverse transcription (RT).

      As requested by reviewer 2 we now report the low detection rate of these subunits in neurons responsive to diazoxide and tolbutamide and acknowledge the limitation of scRT- PCR (pages 7,8, lines, 34,1-6).

      We have also improved the discussion by providing the copy number of these mRNAs detected by single cell RNAseq (Zeisel et al. 2015, DOI: 10.1126/science.aaa1934, data available online https://linnarssonlab.org/cortex/) and the estimated sensitivity limit of the scRT-PCR (page 13, lines 29-33).

      Next, the authors demonstrate that lactate is taken up by neurons and elevates the discharge rate via an increased ATP production due to the oxidative metabolism downstream of lactate, which is in line with earlier studies including Ivanov et al. (2011, doi: 10.3389/fnene.2011.00002).

      We thank the reviewer for pointing out this reference that we have added in the discussion (page 17, line 16).

      The authors showed this by introducing 15 mM lactate, and discuss a possibility that extracellular lactate can be elevated by a systemic increase of lactate. However, such an increase is likely more modest in the brain (Carrard et al., 2018, doi: 10.1038/mp.2016.179). So, the lactate-enhanced firing might occur in extreme conditions such as during anoxia or ischemia; however, intracellular ATP would most probably decrease and hence KATP channels would open in this case. A discussion on extracellular lactate levels in physiological conditions would be helpful.

      We have improved the discussion on the physiological extracellular level of lactate which can be as high as 5 mM at rest. Since during neuronal activity lactate levels are almost doubled (i.e. up to 10 mM), lactate-enhanced firing might occur under physiological conditions (page 18, lines 9-13). We agree that a systemic lactate increase modestly elevates its extracellular concentration to a level with little or no effect on firing rate. Accordingly we now also quote references reporting this observation. Nonetheless, peripheral lactate could represent an additional source facilitating lactate-sensing when both the brain and the body are active, as during physical exercise (page 18, lines 13-19).

      Overall, this is a rigorous study that confirms the existence of functional KATP and dominant oxidative metabolism in most types of juvenile somatosensory cortical neurons.

      Thank you.

      Reviewer #2:

      The authors present an impressive array of experiments testing the effect of lactate on a number of neocortical cell types. They uncover a mechanism by which lactate might enhance neuronal firing although direct physiological relevance needs further support for CSF lactate concentrations. Most of the experiments are sound and interesting and the remaining experiments have limitations inherent to the methodology and presented accordingly in the discussion. The results are convincing, however a number of specific points need to be addressed.

      We thank the reviewer for the specific points raised that helped us to improve and clarify the manuscript.

      Specific points:

      • Page 6 line 21 onwards. The authors state consistent expression of Kir6.2 and SUR1 in various cortical cell types. Data presented in Fig1 challenge this statement showing that Kir6.2 and/or SUR1 was expressed in the minority of cells tested regardless of cell type. For example, out of the 10 intrinsically bursting cells shown in the Ward cluster plot on Fig1A-B, only two was positive for Kir6.2 according to Fig1D. Surprisingly, Fig1F shows that 10% of intrinsically bursting cells express Kir6.2 which is clearly not the case (it is 20%).

      We thank the reviewer for pointing out this apparent incoherence. Indeed, Fig. 1D showed two intrinsically bursting cells that appeared positive for Kir6.2. However, one of them was also positive for genomic control and was discarded from the calculation of detection rate, as already discussed (pages 13,14 lines 34,1-5). For the sake of clarity Fig. 1D now depicts potential Kir6.2 false positive as shaded colored rectangles.

      Amplification was used for the detection of mRNAs by the authors, thus it is unlikely that detection threshold plays a role in having Kir6.2 or SUR1 negative cells.

      We agree that PCR amplification can detect a single DNA molecule (e.g. Li et al 1988, DOI: 10.1038/335414a0). However, the low reverse transcription (RT) efficiency is an important limiting factor for the mRNA detection by scRT-PCR. In addition, dendritic mRNAs are almost inaccessible to the harvesting from a somatic patch pipette, thereby decreasing the detection rate. Similar issues of mRNA detection by scRT-PCR have been reported for neuropeptide receptors despite a functional expression in a majority of recorded pyramidal cells (Gallopin et al. 2006, DOI: 10.1093/cercor/bhj081). scRT-PCR detection limit was estimated to be around 25 molecules of mRNA in a previous study quantifying at the single-cell level AMPA receptor mRNAs harvested in the patch pipette (Tsuzuki et al. 2001, DOI: 10.1046/j.1471-4159.2001.00388.x).

      We have now improved the discussion by providing the copy number of Kir6.2 (kcnj11) and SUR1 (abcc8) mRNAs detected by RNAseq from single isolated cells (Zeisel et al. 2015, data available online https://linnarssonlab.org/cortex/). The estimated sensitivity limit of the scRT-PCR is also now provided (page 13, lines 29-33).

      Along the same vein, amplification makes it difficult to understand what the authors mean by "low copy number at single cell level". Specifically, the sentence (p6l22-25) is self-conflicting suggesting reliable detection of KATP subunits yet downplaying the significance of moderate single cell detection rates.

      Since the point on the "low copy number" is now discussed in more detail the sentence has been removed from the results section. To avoid confusion between detection and expression we now use only "detection" for scRT-PCR data and "expression" for functional data. Accordingly, in Figures 1F, 3B, 6A and S5, "Occurrence" was changed to "Detection rate".

      I think a moderate statement with percentages of expression would adequately describe the findings with an emphasis on potential variability between individual cells regardless of cell type. Throughout the text, the authors should avoid the use of uniform expression of KATP channels in neurons.

      • Page 6 line 30. The authors conclude co-expression of Kir6.2 and SUR1 subunits. Fig1D shows that out of approximately n=71 Kir6.2 positive cells and n=28 SUR1 positive cells only n=16 expresses Kir6.2 and SUR1 together and the evidence presented shows that n=83 cells do not co-express Kir6.2 and SUR1. Again, the conclusion in the manuscript seems biased towards the minority of cases and does not reflect the overall dataset. Accordingly, the suggestion that neurons and beta cells use the same KATP channel is not supported (p6l32).

      The statement has been mitigated as follows (page 6, lines 21-27): "Apart from a single Adapting NPY neuron (Figure 1D), where Kir6.1 mRNA was observed, only the Kir6.2 and SUR1 subunits were detected in cortical neurons (in 25%, n=63 of 248 neurons; and in 10%, n=28 of 277 of neurons; respectively). The single-cell detection rate was similar between the different neuronal subtypes (Figure 1F). We also codetected Kir6.2 and SUR1 in cortical neurons (n=14 of 248, Figure 1D) suggesting the expression of functional KATP channels."

      We have also avoided the use of uniform expression throughout the text and do not refer anymore to pancreatic beta-cell like KATP channels in the results section.

      • KATP channel presence in neurons. With respect to the points above, it would be helpful to see in the results section and possibly on Fig2 whether there is an electrophysiological indication of pharmacologically unresponsive cells. This would help in assessing the relative sensitivity of the two approaches. Fig.2G is helpful here, however signal to noise is hard to assess in the current version in individual experiments. Please state if single cell PCR was performed on any pharmacologically examined cells.

      We now clearly report that all neurons pharmacologically analyzed in voltage clamp were responsive to diazoxide and tolbutamide. We also mention the range of the effects of these KATP channel modulators on membrane resistance and whole-cell current (page 7, lines 12-15).

      We thank the reviewer for suggesting to state if scRT-PCR was performed on pharmacologically examined cells, which helps to evaluate the relative sensitivity of scRT- PCR and pharmacological/electrophysiological experiments. We now report the number of neurons pharmacologically characterized and successfully analyzed by scRT-PCR (pages 7,8, lines 34,1-6). All these neurons were found to express functional KATP channels, but Kir6.2 and SUR1 subunits were detected in only a minority of them. We thus conclude that scRT-PCR underdetects these mRNAs.

      Fig3B recapitulates the results of Fig1 that only a small fraction of RS cells express Kir6.2 and SUR1.

      Since scRT-PCR is less sensitive than electrophysiological investigations, as just discussed above, the absence of detection of mRNAs does not mean an absence of functional expression of KATP channels. The absence of outward ATP-washout current in Kir6.2 KO neurons, in marked contrast with neurons from wild-type mice, supports the notion of a widespread functional expression of Kir6.2-containing KATP channels in cortical neurons. To avoid the confusion between detection and expression, we have reformulated the sentence (page 8, lines 11-12) as follows: "We first verified that Kir6.2 and SUR1 subunits can be detected in pyramidal cells from wild type mice by scRT-PCR".

      In spite having a clever pharmacological design, due to limitations inherent to spatially nonspecific drug application methods, one cannot exclude that the results measured on individual cells could also reflect network interactions with astrocytes and/or neurons and should be discussed.

      We agree with the reviewer that bath applications of drugs can induce network effects leading to potential confounding results. However, the kinetics and biophysical properties of the whole-cell currents recorded during pharmacological manipulations do not support such a network effect. This possibility, nonetheless, is now discussed page 13, lines 18- 23.

      We have also discussed the possibility that the blockade of lactate transport by 4-CIN could reflect an impairment of lactate uptake by neurons but also of lactate release by astrocytes. However, under our conditions the contribution of astrocyte-derived lactate is expected to be negligible (page 16, lines 10-18).

      • Lactate concentration in blood vs CSF. As the authors point out, there is a discrepancy in glucose concentration between the blood and CSF, yet they use lactate concentrations measured in the blood (and not in the CSF) during exercise in their experiments. The physiological relevance of these experiments is unclear unless there is evidence that lactate concentration in the CSF is indeed in the range found effective here.

      We thank the reviewer for pointing out the discrepancies between plasma and extracellular levels of glucose vs. lactate. Although surprising at first, and in contrast to glucose, extracellular lactate level is higher than its plasma level. Such a difference, most likely reflects the ability of the brain produce lactate and not glucose.

      As also requested by reviewer 1 we have improved the discussion on the physiological extracellular level which can be as high as 5 mM at rest. Since during neuronal activity lactate levels are almost doubled (i.e. up to 10 mM), we believe that lactate-enhanced firing might occur under physiological conditions (page 18, lines 9-13).

      We have improved the rationale of the lactate concentration used which is an isoenergetic condition to 10 mM glucose for having the same number of carbon atoms (page 10, lines 4-5).

      We also discuss the possibility, that peripheral lactate could represent an additional source facilitating lactate-sensing when both the brain and the body are active, as during physical exercise (page 18, lines 13-19).

      • MCT1 and MCT2 expression and widespread lactate effects. Here, the authors admit that relatively low single cell detection rates were observed for MCT1 (19%) and MCT2 (28%). It seems consistent (and a bit worrisome) throughout the manuscript that expression of mRNAs additionally tested functionally have a limited range of PCR detection yet (again) ubiquitous presence was found when tested pharmacologically.

      Similar to KATP channels subunits and as reported by single cell RNAseq data (Zeisel et al. 2015, DOI: 10.1126/science.aaa1934, data available online https://linnarssonlab.org/cortex/), MCT1 (slc16a1) and MCT2 (slc16a7) are expressed in cortical neurons at a copy number below the detection limit of scRT-PCR.

      We have now discussed the discrepancy between MCT1 and MCT2 detection and the widespread lactate effects which are most likely due to their low abundance at the single cell level (pages 15,16, lines 32-34, 1-6). We also provide a counter example with LDH subunits which are expressed at higher single-cell levels, and for which a higher scRT- PCR detection rate was found to match the functional data (page 16, lines 6-9).

    1. Author Response:

      Reviewer #2:

      This paper investigates cell size-dependent regulation of G1/S cell cycle transition in budding yeast, with a focus on the relationship between the activator Cln3 and the inhibitor Whi5. A prominent 2015 paper proposed that cell growth dilutes the inhibitor Whi5 while Cln3 levels remain constant. This 'inhibitor dilution' model has been challenged by several recent papers. In the present paper, Sommer et al. perform a series of quantitative western blots of whole cell extracts from synchronized cell cultures. They show that Cln3 concentration increases 10-fold before bud emergence (i.e. G1/S) but Whi5 concentration is largely constant, at least in rich media. Similar results were obtained in poor carbon media with a smaller increase in Cln3. These data argue against the inhibitor-dilution model and indicate that Cln3 levels are tuned by carbon availability and cell growth rate. Interestingly, Cln3 increases are not dependent on actin-based growth or bud emergence, but rather depend on membrane trafficking and TORC-SGK signaling. A series of experiments altering ceramide synthesis identify a link with Cln3 synthesis, although it remains unclear how directly this ceramide-Cln3 connection occurs.

      The combination of results in this paper represent a significant contribution to the field. Major strengths include the careful quantitation of Whi5/Cln3 levels, and the clear effects on Cln3 from membrane trafficking events. I also appreciated the balanced tone of the text, which describes the strengths and weaknesses of each experiment and interpretation. I have a series of comments/concerns that could be addressed to strengthen the paper, as described below.

      1) I understand why cells were pre-grown in poor carbon media for these experiments, but it seems important to know how Cln3 and Whi5 levels change for cells pre-grown in rich media. Otherwise, each paper reporting different results for Cln3/Whi5 could be dismissed as using a unique set of growth conditions. Along these lines, it would be ideal for the authors to test Cln3/Whi5 levels in their western blot assay using the same strain background and media as the Schmoller paper. It would be very interesting if the inhibitor-dilution model were observed under these conditions, whereas alternative mechanisms like Cln3 accumulation were observed under other conditions.

      We attempted to grow cells in YPD, isolate small unbudded cells, and then release the cells back into YPD. However, we found that it was not possible to isolate a uniform population of small unbudded cells under these conditions. The problem is that very little growth occurs in G1 phase in YPD so that newly born cells are nearly the same size as mother cells (PMID: 28939614). This, combined with the normal variation in cell size observed in wild type yeast, means that elutriation yields a mix of unbudded and budded cells. Others have faced the same problem (PMID: 31685990, 10728640). The fact that so little growth occurs in G1 phase in YPD is an additional argument against the idea that dilution of Whi5 plays a substantial and general role in cell size control.

      As an alternative, we grew cells in complete synthetic medium (CSM) containing 2% glucose. Under these conditions, cells grow more slowly and are smaller because CSM is limiting for nutrients other than glucose. We isolated small unbudded cells and released them into the same medium so that there would not be shift in carbon source. We found that Cln3 levels increased 3-fold, while Whi5 levels did no change substantially, similar to the effects observed in YP medium containing poor carbon. These data are shown in a new figure (Figure 1 – figure supplement 2). In addition, we have included new text to highlight these issues and how they can influence interpretation of the results.

      We agree that it could be interesting to see how Cln3 and Whi5 behave in the mutant background and media conditions used by Schmoller et al. However, we were concerned that any behavior observed only in the bck2∆ background would say more about the effects of bck2∆ on accumulation of Whi5/Cln3 than it would about how cell size control works in wild type cells. Therefore, to limit the number of time-intensive elutriation experiments that we needed to complete the manuscript we would prefer to leave this experiment for others to complete if they are interested.

      2) The authors over-express WHI5 to test the inhibitor-dilution. Their results dovetail with a recent study from the Murray lab (Barber et al., PNAS) suggesting that cells are not very sensitive to Whi5 levels. However, one can envision mechanisms (e.g. PTMs) that inhibit Whi5 molecules when expressed beyond their physiological concentration. Instead, it would be interesting to know what happens in WHI5/whi5 heterozygous diploid mutants that cut Whi5 levels in half. Perhaps this experiment exists in the literature, but it would be an ideal setting for the authors to perturb the inhibitor-activator ratio, and test Cln3/Whi5 protein levels along with cell size in synchronized cultures.

      We were not able to find an analysis of the size of WHI5/whi5∆ cells in the published literature. We carried out the analysis and the data are shown in a new figure panel (Figure 3C). The effect is small – deletion of one copy of WHI5 in a diploid strain caused only a 0.9% decrease in median cell size. These data nicely complement the data showing little effect of 2xWHI5 on cell size. We were surprised that we did not think to do this simple experiment, and we were also surprised that we couldn’t find it in the literature. We thank the reviewer for suggesting the experiment. Since the heterozygous WHI5/whi5∆ cells showed minimal size defects, we have not elutriated the strain to test for changes in the Cln3/Whi5 ratio.

      3) I found the result in Figure 5E very correlative and hard to interpret. For example, Ypk1 phosphorylation is lost at 2.5 min, but Cln3 levels seem unaffected at this timepoint and the next (?). I would suggest softening the (already soft) tone of explaining these results. In general, the connection between ceramide synthesis and Cln3 levels remains quite unclear to me.

      We agree that our interpretation of the data in Figure 6E was confusing in the original version. Part of the confusion may arise from a lack of clarity in our writing and in the literature about the different phosphorylation inputs into Ypk1/2. The literature suggests that changes in the electrophoretic mobility of Ypk1 could be due largely to the Fpk1/2 kinases. TORC2 also influences Ypk1/2 phosphorylation, as detected by a phosphospecific antibody, but it remains unclear whether TORC2 also influences the electrophoretic mobility of Ypk1/2. The data suggest that the phosphorylation of Ypk1/2 that can be detected via electrophoretic mobility shifts is correlated with Cln3 levels, while TORC2-dependent phosphorylation with a phosphospecific antibody is not well correlated with Cln3 levels. We have edited the manuscript to make this more clear and to clarify what can and cannot be concluded from the data.

      4) The text would need to describe a potential role for protein localization in this pathway. All the results come from cell extracts, whereas local protein concentration in the nucleus could be changing and impact the pathway.

      The last three paragraphs of the Discussion include a discussion of potential roles for protein localization in the context of data from our work and previous studies that point to a potential role for localization of Ypk1/2 and Cln3 to the endoplasmic reticulum. In addition, we added the following sentence to the Results section to highlight potential localization issues: "Population level analysis of Cln3 and Whi5 protein levels by western blotting could miss changes in Whi5 or Cln3 concentration driven by changes in localization to specific subcellular compartments.”

    1. Author Response:

      Reviewer #2:

      In this study, the authors develop a novel method, called MCGA, extending from their previous gene-based methods, to detect gene-trait association removing redundant signal. They further leverage expression QTL into their model to improve the resolution of gene-trait association. The overall structure is clear, and data is presented well. I am concerned about the simulation methods, and would like the authors to present some clarifications.

      1) When comparing MCGA-eQTL and MCGA-sQTL, the authors simulate a single isoform-trait association, and the simulated gene expression is averaged among isoforms, which is kind of unfair for MCGA-eQTL model. Hormozdiari et al reveal that sQTL contributes few to traits after conditioning on eQTL (Hormozdiari et al., 2018, doi: 10.1038/s41588-018-0148-2). I would suggest to simulating a case that gene-trait association is mediated by overall expression, instead of a single isoform (transcript);

      We thank Reviewer #2 overall for the numerous insightful and helpful suggestions and comments. Thanks for pointing out this problem! We agree with the reviewer that the gene-trait association can be mediated by the overall expression instead of a single isoform. However, we think that, mathematically, the two scenarios are equivalent. We also added a scenario in which gene-trait association is mediated by the overall expression of multiple susceptibility isoforms, and its power is similar to the scenario of single isoform-trait association (see Table 1 in the revised manuscript). In the real data analysis, we did observe that MCGA based on the isoform-level eQTLs detected more significant genes than that based on the gene-level eQTLs. Besides, we noticed that the sQTL (splicing QTL) in Hormozdiari et al. is different from the isoform-level eQTL used in our manuscript.

      2) When comparing MCGA-eQTL and MCGA-sQTL, only power is considered. The authors should include the analysis to demonstrate the performance in control for false positive;

      We thank the reviewer for this comment and suggestion. In the revised manuscript, we reported the results for controlling the false positive. Please refer to Essential Revisions point 2 (see line 261-262 in the revised manuscript).

      3) When choosing a favorable exponent value c (1.432 chosen in the study), the authors found that the c value is robust to trait type, sample size or variant size, but the authors didn't explain what factors affect the choosing of c. Considering the potential application of MCGA method in other studies, the authors should explain what factor affects c value, and provide the guidance how to choose an optimal c;

      We thank the reviewer for this comment and suggestion. Please refer to Question A and B of Essential Revisions point 3.

      A: "Motived from the boundary of chi-square correlation, we adopted simulation studies to empirically choose c for controlling the type I error of the effective chi-square test. Besides the correlation of chi-square statistics, the choosing of c for the effective chi-square test may also be affected by the approximated non-negative solutions. However, the correlation of chi-square statistics is the major factor. Our simulation showed that the derived boundary and influence trend of LD on chi-square statistics were also applicable to the effective chi-square test. In the revised manuscript, we showed that the correlation of chi-square statistics is affected by the non-centrality parameter of chi-square statistics (see lines 640-655 in the revised manuscript)."

      B: "As the optimal c for controlling the type I error of the effective chi-square test would be affected by the non-centrality parameter of chi-square statistics which are generally unknown in practice, we have to resort to a grid search algorithm to explore an empirically optimal c. In our last manuscript, we mixed the methods of choosing optimal c with the introduction of new effective chi-squared statistics. We wrote a new subsection in Materials and Methods to describe the procedure of choosing the optimal c in the revised manuscript (see lines 610-628 in the revised manuscript)."

      4) The mediation analysis result in Yao et al. estimates that 11% of trait heritability is mediated by gene expression (Yao et al., 2020, doi: 10.1038/s41588-020-0625-2), while in simulation section of this study, 100% of trait heritability is mediated by gene expression. Simulations mimicking real scenarios should be used;

      We thank the reviewer for this comment and suggestion and apologize for the confusion here. To our knowledge, the estimation by Yao et al. was for the entire genome. Note that many contributing variants of a trait may be far away from gene regions and beyond the scope of our approach. It is possible that some genes may have larger trait heritability (>11%) mediated by gene expression. Certainly, we agree with the reviewer that it is also necessary to mimic the scenario in which the gene expression mediates part of trait heritability. In the revised manuscript, we also added the scenario that part of trait heritability is mediated by the gene expression (see Table 1 in the revised manuscript). As expected, when the majority is mediated by other factors (except the gene expression), using all variants could be more powerful than only using eQTLs (see lines 247-279 in the revised manuscript).

      5) It is important to choose a background gene set when conducting GO enrichment analysis. It is not clear what kind of genes are used as control when evaluating significance;

      We thank the reviewer for this comment and apologize for the confusion here. We used the g:Profiler, a web server for functional enrichment analysis, to perform GO enrichment analyses. The conventional GO enrichment analysis took all annotated human protein-coding genes as a background in the present study (see lines 739-743 in the revised manuscript).

      6) GTEx v8 contains samples from diverse populations, and it is crucial to handle the issue of population structure. Based on the description on https://pmg-lab-docs.readthedocs.io/en/latest/KGGSEE_doc/KGGSEE.html#id18, it seems that eQTL/isoQTL were detected ignoring population structure. The authors should explain why they applied a pipeline like that, and show that their conclusion wouldn't be affected by the choice.

      We thank the reviewer for this comment. Indeed, in the original manuscript, we estimated the gene-level and isoform-level eQTLs without considering the population structure in GTEx v8. One reason is that though GTEx v8 contains samples from diverse populations, the majority (~85%) of the subjects are Europeans. Another reason is that the article of the GTEx consortium (https://www.science.org/doi/abs/10.1126/science.aaz1776) pointed that only 178 population-biased cis-eQTLs (pb-eQTLs) for 141 unique eGenes (FDR ≤ 25%) were identified across 31 tissues, which suggested that pb-eQTLs are hard to find at current sample sizes.

      In the revised manuscript, to avoid the potential population structure issues, we only used the expression profiles and genotype data of the Europeans for the eQTLs identification (see lines 788-801 in the revised manuscript).

      Reviewer #3:

      The manuscript, "MCGA: a multi-strategy conditional gene-based association framework integrating with isoform-level expression profiles reveals new susceptible and druggable candidate genes of schizophrenia", describes an approach to conduct gene-level association testing in GWAS data with integration of gene expression data. The authors have conducted comprehensive simulation studies for main modules involved in this framework, demonstrating the advantages of the MCGA strategy compared to established similar work. The method has also been applied to the analysis of schizophrenia GWAS, with several interesting discoveries. All methods proposed are implemented in the KGGSEE package, a command tool written in Java with good documentation, data resource and examples for the type of analysis proposed in this work.

      Overall, the framework is solid and the analyses performed are thorough. In particular, the simulation study and real data demonstration of advantages of isoQTL over conventional eQTL is novel and interesting. With the user friendly software available, I can envisage that MCGA will receive interest from the community and be adopted to many projects.

      My major reservation on the methods is the component using conditional analysis to identify gene specific signals. Even though the MCGA framework is as solid as the methods it is based on, alternative methods are available for gene-level association analysis that takes into consideration of contribution from multiple SNPs and the LD without having to rely on conditional analysis. For example, fine-mapping approach such as SuSiE (https://github.com/stephenslab/susieR) uses summary statistics and LD, and can produces gene-level evidence of association in terms of Bayes Factor, when a gene region is analyzed. Such an approach does not have a potential type I error issue, is efficient enough to analyze multiple genes in LD with each other. Most importantly it provides inferences directly for multiple genes accounting for LD, without having to rely on conditional analysis. Conditional analysis, as a greedy algorithm, suffers an obvious limitation: suppose genes A and B are two causal genes in weak LD with each other. A non-causal gene C physically in between A and B are correlated with both A and B. Then C may have a stronger marginal signal than either A or B. A conditional analysis may identify C, and conditional on C, association signals of the true causal genes A and B will become weaker. I therefore am not convinced that a conditional analysis such as ECS is the best approach on which MCGA should be based.

      We thank Reviewer #3 overall for the numerous insightful and helpful suggestions. We are happy that the reviewer found that our work will receive interest from the community and be adopted to many projects. To the best of our knowledge, MCGA had different application scenarios from SuSiE. The former worked with summary statistics, while the latter can only perform fine-mapping analysis with individual-level genotypes and phenotypes. Besides, MCGA can also be suitable for the three-gene case supposed by the reviewer. For example, if A and B are two causal genes, they may have larger selective expression scores than gene C in the phenotype-associated tissue. In the conditional analysis, A and B will enter the conditional procedure prior to gene C, which will make gene C not to be significant when conditioning on gene A and B.

    1. Author Response:

      Reviewer #1 (Public Review):

      This study is well-written and well-presented. The conclusions are clear and robustly supported by the data. The figures provide useful visualizations for the major findings. Virophage are an important and underappreciated component of global viral diversity, and they likely play important roles in eukaryotic genome evolution; this work is therefore quite timely. Relatively few studies focus on virophage or giant viruses compared to other viral lineages, so studies like this are highly valuable.

      Strengths of this work include the high quality of the reference genomes, which were constructed using both short-read and long-read sequencing, as well as the diverse locations and isolation times of the host genomes.

      We thank the reviewer for his encouraging and constructive comments!

      I found no major weaknesses in this study. One minor issue is that the details of how EMALEs were delineated and initially detected seem a bit unclear to me. Based on my reading I am curious if some divergent or degraded EMALEs could have been missed. This may be important for assessing the consequences of possible retrotransposition-mediated EMALE inactivation.

      Thank you for pointing this out. We added two sections to Materials and Methods called “Detection and annotation of EMALEs” and “Detection of Ngaro retrotransposons” where we describe the procedure in detail.

      Based on our approach of visually screening the entire genome assemblies for GC anomalies, combined with blast searches of Cafeteria genomes using as input manually annotated EMALEs as well as databases of all available virophage sequences, we are quite confident that we have not missed any obvious virophage genomes. We would only have missed putative virophage sequences if their GC-contents were similar to that of the host (~70% GC) and if these sequences bore no detectable similarity to known virophage genes/proteins.

      In contrast, our sequencing and assembly strategy probably did not result in a complete account of all EMALEs in these host genomes, as is evident from the large number of partially assembled EMALEs. However, partial does not equal degraded, but simply means that contig assembly stopped somewhere within the EMALE, resulting in an artificially truncated sequence. We therefore do not think that our approach introduced any relevant bias towards addressing the question whether retrotransposon insertion may lead to EMALE inactivation.

      These points are now included in the discussion.

    1. Author Response:

      Reviewer #2 (Public Review):

      There is now a considerable body of knowledge about the genetic and cellular mechanisms driving the growth, morphogenesis and differentiation of organs in experimental organisms such as mouse and zebrafish. However, much less is known about the corresponding processes in developing human organ systems. One powerful strategy to achieve this important goal is to use organoids derived from self-renewing, bona fide progenitor cells present in the fetal organ. The Rawlins' lab has pioneered the long-term culture of organoids derived from multipotent epithelial progenitors located in the distal tips of the early human lung. They have shown that clonal cell "lines" can be derived from the organoids and that they capable of not only long-term self-renewal but also limited differentiation in vitro or after grafting under the kidney capsule of mice. Here, they now report a strategy to efficiently test the function of genes in the embryonic human lung, regardless of whether the genes are actively transcribed in the progenitor cells. The strengths of the paper are that the authors describe a number of different protocols (work-flows), based on Crisper/Cas9 and homology directed repair, for making fluorescent reporter alleles (suitable for cell selection) and for inducible over-expression or knockout of specific genes. The so-called "Easytag" protocols and results are carefully described, with controls. The work will be of significant interest to scientists using organoids as models of many human organ systems, not just the lung. The weaknesses are that they authors do not show that their lines can undergo differentiation after genetic manipulation, and therefore do not provide proof of principle that they can determine the function in human lung development of genes known to control mouse lung epithelial differentiation. It would also be of general interest to know whether their methods based on homologous recombination are more accurate (fewer incorrect targeting events or off target effects) than methods recently described for organoid gene targeting using non homologous repair.

      We thank Reviewer #2 for capturing the key advances of our toolbox for understanding gene function using a tissue organoid system and the constructive suggestions for the manuscript.

      We agree with the Reviewer that it would strengthen the current manuscript if we could differentiate the genetically targeted organoids. Therefore, as a proof of concept, we have successfully differentiated the SOX9 reporter organoids into the alveolar lineage (New figure: Figure 2-figure supplement 1g, shown above). We have also tested the dual SMAD inhibition approach recently reported for basal cell differentiation (Miller et al., 2020). However, this has led to massive cell death even in WT organoids (data not shown). We reason that this might be because our organoids are ~8 pcw, whereas in the literature ~12 pcw organoids were used. We believe that efficient airway differentiation will take a long time to optimise for our organoids and is therefore beyond the scope of this manuscript.

      In regard to the Easytag workflow in comparison with the recent CRISPR-HOT method using non-homologous end joining (Artegiani et al., 2020), we consider our approach as a complement to the CRISPR-HOT approach. This can be reflected in the following points: (1) The Organoid Easytag workflow allows precise N-terminal tagging of endogenous genes, exemplified by N-terminal tagging of ACTB. This is not possible using CRISPR-HOT as large pieces of plasmid DNA would disrupt the targeted gene; (2) The Organoid Easytag workflow is based on HDR and the efficient insertion sites for exogenous genes are within a ~30-bp window of the gRNA cleavage sites (Kwart et al., 2017), which gives more flexibility for choosing gRNAs compared with CRISPR-HOT tagging; (3) The Organoid Easytag workflow gives researchers more control of where and how the targeted sites can be modified, and offers a minimal change to the targeted genomic region, whereas CRISPR-HOT introduces large pieces of backbone plasmids, which potentially increases the risk of gene dysregulation. However, HDR requires cells to be at the G2/M phase of the cell cycle, therefore heavily relying on fast cycling cells to gain the most efficient targeting. CRISPR-HOT has the great advantage of not depending on a specific cell cycle stage and therefore being more efficient in slow cycling cells. With this said, we do believe that the efficiency would very much rely on the context, including the cell type used and locus targeted, as a recent report suggested targeting efficiency is influenced also by genomic context (Schep et al., 2021).

      In summary, when N-terminal tagging, minimal changes and precise control of targeting is desired, Organoid Easytag is more favourable; whereas when targeting slowly cycling cells, CRISPR-HOT has its strength. Therefore, we consider these two methods as complementary approaches that will both be of benefit to organoid-based research. We have summarised this comparison into a simple table (New table: Figure 2-figure supplement 5f)

      Figure 2-figure supplement 5(f). A comparison of Organoid Easytag and CRISPR-HOT methods (Artegiani et al., 2020).

      Reviewer #3 (Public Review):

      Sun et al have assembled, modified, and applied a series of existing gene editing tools to tissue-derived human fetal lung organoids in a workflow they have termed "Organoid Easytag". Using approaches that have previously been applied in iPSCs and other cell models in some cases including organoids, the authors demonstrate: 1) endogenous loci can be targeted with fluorochromes to generate reporter lines; 2) the same approach can be applied to genes not expressed at baseline in combination with an excisable, constitutively active promoter to simplify identification of targeted clones; 3) that a gene of interest could be knocked-out by replacing the coding sequence with a fluorescent reporter; 4) that knockdown or overexpression can be achieved via inducible CRISPR interference (CRISPRi) or activation (CRISPRa). In the case of CRISPRi, the authors alter existing technology to lessen unwanted leaky expression of dCas9-KRAB. While these tools have previously been applied in other models, their assembly and demonstrated application to tissue-derived organoids here could facilitate their use in tissue-derived organoids by other groups.

      Limitations of the study include:

      1) is demonstrated application of these technologies to a limited set of gene targets;

      2) a lack of detail demonstrating the efficiency and/or kinetics of the approaches demonstrated.

      While access to human fetal lung organoids is likely not available to many or most researchers, it is probable that the principles applied here could carry over to other organoid models.

      We thank the Reviewer for accurately summarising the details of our manuscript and positive comments on its potential to facilitate tissue-derived organoid related research. We are very grateful for the Reviewer’s detailed and constructive comments to help strengthen our manuscript.

      In regard to the limitations pointed out by Reviewer #3, we have systematically tested the kinetics of the inducible CRISPRi knockdown effect and its reversibility using CD71 and SOX2 (New figure: Figure 3-figure supplement 2). At the same time, we have generated SOX9 reporter human foetal intestinal organoids using the Easytag workflow to further demonstrate it can be applied to another organoid system. As suggested by Reviewer #3, we also attempted to implement the inducible CRISPRi system in HBECs. However, due to their sensitivity to lentiviral transduction, infected HBECs died shortly after transduction with gRNA lentivirus. We believe that further optimisation of DNA delivery approach is required for implementation of the inducible CRISPRi/CRISPRa systems in HBECs (perhaps nucleofection and PiggyBac-based vectors).

    1. Author Response:

      Reviewer #1 (Public Review):

      The key question that the authors were addressing was how ethnicity differentially affects the microbiota of subjects living in a particular area (in this case East Asians and Caucasians living in San Francisco that have been enrolled in an 'Inflammation, Diabetes, Ethnicity and Obesity cohort - although inflammatory disease was apparently excluded in these subjects).

      The existence of differences between different populations allows potential discrimination of the underlying factors - such as host genetics, diet, lifestyle, physiological parameters, body habitus or other environmental influences. In this case body habitus has been selected as a stratification factor between the two ethnicities. Immigration potentially allows distinction of environmental and host genetical influences.

      The strength of the study is in the level of robust analysis of the microbiotas by a very experienced group of researchers, distinguishing the microbiota differences, especially in lean subject, with analysis of associations that may be driving the differences. It is interesting that diet is not one of the apparent associations in this study, yet the relationship of microbiota diversity to body habitus is strong in Caucasian subjects. These associations cannot easily be extrapolated to causation or mechanism - a fact well recognized in the paper - but remain important observations that rationalize in vivo modeling with experimental animals or in vitro analyses of microbial interactions between different taxa simulating the context of differences in the intestinal milieu. The paper includes work showing that differences of the microbiota can be recapitulated after transfer to germ-free mice, at least over the short term: this is important to provide tools to model the reasons for differences in consortial composition.

      A very large amount of work required to assemble the samples and the clinical phenotypic metadata set making the data an important and definitive contribution for the subjects studied. Of course, this is one sample of extremely variable human conditions and lifestyles that will help build the overall picture of how differences in our genetics and environment shape our intestinal microbiota.

      We appreciate the reviewers' positive summary of our manuscript and agree with the reviewer’s assessment of the need for both mechanistic follow-on studies and extensions to larger and more diverse cohorts.

      Reviewer #2 (Public Review):

      The study's primary aims are to test for the differences in the microbiome between self-identified East Asian and White subjects from the San Francisco area in the new IDEO cohort. The study builds on an growing literature which describes variations among ethnic groups. The major conclusion of "emphasize the utility of studying diverse ethnic groups" is not novel to the literature.

      It was not our intention to imply that our study is novel in studying two distinct ethnic groups, but rather to emphasize that differences exist between ethnicities with regard to the gut microbiome and to provide a systematic analysis of this including gnotobiotic mouse models along a key health disparity in Asian Americans. We include references of prior examples of this work in our introduction (including several references in our introductory paragraph). We have modified our abstract to clarify this point further:

      “Taken together, our findings add to the growing body of literature describing variation between ethnicities and provide a starting point for defining the mechanisms through which the microbiome may shape disparate health outcomes in East Asians.”

      Overall, the strength of the results is that they confirm patterns from different cohorts/studies and demonstrate that ethnic-related differences are common. The results are subject to sample size concerns that may underpin some of the conflicting or lack of significant results. For instance, there is no overlap in highlighted species-level taxonomy differences between 16S and metagenomic analyses, which precludes a clear interpretation of the meaning of those differences and whether taxa should be highlighted in the abstract; there are low AUC values for the random forest modelling; and there is a lack of significance in correlations between BMI and East Asian subjects in F4a where there may be a correlation. While a minor point, it serves to highlight the sample sizes as the range of the variation in East Asian subjects is not as substantial as the White subjects because there are fewer East Asian data points above a 30 BMI (~N=5) relative to those of White subjects (~N=11).

      We agree that our study was limited by sample size and that future studies increasing sample size would be valuable to assess the intersection of metabolic health in colocalized EA and W subjects. We include this in our discussion:

      “Due to the investment of resources into ensuring a high level of phenotypic information on each cohort member, and due to its restricted geographical catchment area, the IDEO cohort was relatively small at the time of this analysis (n=46 individuals). This study only focused on two of the major ethnicities in the San Francisco Bay Area; as IDEO continues to expand and diversify its membership, we hope to study a sufficient number of participants from other ethnic groups in the future.”

      The microbiome transfers from humans to mice also demonstrate that certain features of interpersonal or ethnic-related differences can be established in mice. This is useful for future studies, but it is not unexpected in and of itself given the robustness of transferring microbiome differences in other human-to-mouse studies. If the phenotype data were more compelling, then the utility of these transfers could be valuable.

      We respectfully disagree with this point. To our knowledge, this is the first study demonstrating that ethnicity-associated differences in the gut microbiota are stable following transplantation, which is certainly not guaranteed given the marked and currently unpredictable variations between donor and recipient microbiotas shown here and in prior studies by us (Nayak et al., 2021; Turnbaugh et al., 2009b) and others (Walter et al., 2020).

      We state this rationale in our results section:

      “Taken together, our results support the hypothesis that there are stable ethnicity-associated signatures within the gut microbiota of lean EA vs. W individuals that are independent of diet. To experimentally test this hypothesis, we transplanted the gut microbiotas of two representative lean W and lean EA individuals into germ-free male C57BL/6J mice…Next, we sought to assess the reproducibility of these findings across multiple donors and in the context of a distinctive dietary pressure. We fed 20 germ-free male mice a high-fat, high-sugar (HFHS) diet for 4 weeks prior to colonization with a gut microbiota from one of 5 W and 5 EA donors....”

      Furthermore, while the phenotypic data may not be as dramatic as the reviewer had hoped, this is to our knowledge the first demonstration that ethnicity-associated differences in the gut microbiota play a causal role in host phenotypes, as highlighted in our discussion:

      “Our results in humans and mouse models support the broad potential for downstream consequences of ethnicity-associated differences in the gut microbiome for metabolic syndrome and potentially other disease areas. However, the causal relationships and how they can be understood in the context of the broader differences in host phenotype between ethnicities require further study.”

      However, in the current state, I am concerned with the experimental design since the LFPP experiments used N=1 donor per ethnicity for establishing the mice colonies and are resultantly confounded by mice pseudo-replication with recipient mice derived from one donor of each ethnicity. This concern is relevant to interpreting results back to interpersonal or interethnic variation. Are phenotypic differences due to individual differences or ethnic differences? It's not clear.

      We presented our data in summary form integrating the results from 3 independent experiments across two figures. To account for pseudoreplication as the reviewer suggests, we have restricted permutational space to account for one donor for multiple recipient mice using the parameters outlined in the adonis software package. Analyzing our results from 3 separate experiments, our results are statistically significant, which we mention in the revised text:

      “In a pooled analysis of all gnotobiotic experiments accounting for one donor for multiple recipient mice, ethnicity and diet were both significantly associated with variations in the gut microbiota (Fig. S9), consistent with the extensive published data demonstrating the rapid and reproducible impact of a HFHS diet on the mouse and human gut microbiota (Bisanz et al., 2019).”

      Figure S9. Combined analysis of recipient mice reveals significant associations with donor ethnicity and recipient diet. A PhILR PCoA is plotted based on 16S-Seq data from all gnotobiotic experiments. Individual mice are colored by (A) donor ethnicity or (B) the recipient’s diet. Both ethnicity and diet were statistically significant contributors to variance (ADONIS p-values and estimated variance displayed using blocks restricted by donor identifiers to account for one donor going to multiple recipient mice). We also observed a trend for interaction between diet and ethnicity in this model (p=0.068, R2=0.047, ADONIS).

      The HFHS experiment also used N=5 donors that somewhat mitigates these concerns, but mixed sexes were used here and there can be sex-specific human microbiome differences.

      Our study was designed to evaluate ethnicity and metabolic health. As we report in our original and updated analysis, we found no significant associations between the gut microbiota and biological sex (Figs. 2E and S4) in the IDEO cohort, perhaps due to the small effect size of sex reported in prior studies by other groups (Arumugam et al., 2011; Ding and Schloss, 2014; Schnorr et al., 2014; Zhang et al., 2021) coupled to the limited size of the current IDEO cohort.

      The Turnbaugh and Koliwad labs use mixed sexes as donors for studies in conventionally raised and gnotobiotic mice due to our active funding from the NIH, which has clear guidelines meant to prevent continued discrimination against studies in females. The following link has additional information for your consideration: https://orwh.od.nih.gov/sex-gender/nih-policy-sex-biological-variable.

      Importantly, our study was not confounded by sex due to the use of similar numbers of male and female donors (2 male and 2 females in the LFPP experiments and 3 female and 2 males for both ethnicities in the HFHS experiment). All of our recipient mice were male, as specified in our methods section and our revised main text:

      “To experimentally test this hypothesis, we transplanted the gut microbiotas of two representative lean W and lean EA individuals into germ-free male C57BL/6J mice…Next, we sought to assess the reproducibility of these findings across multiple donors and in the context of a distinctive dietary pressure. We fed 20 germ-free male mice a high-fat, high-sugar (HFHS) diet for 4 weeks prior to colonization with a gut microbiota from one of 5 W and 5 EA donors....”

      To further investigate any potential sex-specific signal we have stratified our analysis for the HFHS experiment by the gender of the donors (Reviewer Figure 2). This reveals that the significance between ethnicity in the microbiota transplantation experiments is preserved in mice that received stool from male donors (Reviewer Fig. 2A) but not female donors (Reviewer Fig. 2B). In Reviewer Fig. 1 above, LFPP1 and LFPP2 were conducted using different donors of different biological sex. Splitting our LFPP experiments up revealed the consistent signal for ethnicity in microbial community composition that we report above. The small sample sizes in this stratified analysis makes it difficult to conclude that there are reproducible sex-specific differences in the microbiome transplant experiments, but we agree with the reviewer that this question should be more thoroughly explored in future work.

      We have added a brief note to the discussion to emphasize this important point:

      “...differences between the human donor and recipient mouse microbiotas inherent to gnotobiotic transplantation warrant further investigation as do differences in the stability of the gut microbiotas of male versus female donors”

      Reviewer Figure 2. (A,B) Principal coordinate analysis of PhILR Euclidean distances of stool from germ-free recipient mice transplanted with stool microbial communities from (A) male (n=2 EA and n=2 W donors) or (B) female (n=3 EA and n=3 W) donors of either ethnicity and fed a HFHS diet. Significance was assessed by ADONIS. Pairs of germ-free mice receiving the same donor sample are connected by a dashed line (n=2 recipient mice per donor). Experimental designs are shown in Fig. S7.

      Finally, experimental results are not always consistent and sometimes show opposite trends that may be related to the sampling sizes. For instance, fat and lean mass increased and decreased respectively in LFPP, but there were no statistically-similar differences in HFHS. Moreover, the metabolic fat mass outcomes in mice do not match the expected human donor data. For instance, in LFPP1, White subjects had lower fat mass in humans but recipient mice on average gained more fat. It is difficult to reconcile these differences to a biological or sampling scheme reason.

      We wholeheartedly agree with this point and were also surprised that the recipient mouse phenotypes did not match our original hypothesis based upon the observed health disparities between EA and W individuals. These surprising and perhaps counter-intuitive results demand further study and mechanistic dissection. We have tried to capture potential explanations for these findings while highlighting the limitations of our current study in our expanded discussion. With respect to the glucose tolerance data, the lack of a microbiome-driven phenotype might be due to the use of genetically identical mice that are not prone to metabolic illness without significant perturbation. If we had used mice prone to metabolic disease, such as non-obese diabetic (NOD) germ free recipient mice where the microbiome is known to impact the development of diabetes, we may have seen between ethnic differences in glucose tolerance.

      Our revised discussion, with key points underlined is copied below for your convenience:

      “Our results in humans and mouse models support the broad potential for downstream consequences of ethnicity-associated differences in the gut microbiome for metabolic syndrome and potentially other disease areas. However, the causal relationships and how they can be understood in the context of the broader differences in host phenotype between ethnicities require further study. While these data are consistent with our general hypothesis that ethnicity-associated differences in the gut microbiome are a source of differences in host metabolic disease risk, we were surprised by both the nature of the microbiome shifts and their directionality. Based upon observations in the IDEO (Alba et al., 2018) and other cohorts (Gu et al., 2006; Zheng et al., 2011), we anticipated that the gut microbiomes of lean EA individuals would promote obesity or other features of metabolic syndrome. In humans, we did find multiple signals that have been previously linked to obesity and its associated metabolic diseases in EA individuals, including increased Firmicutes (Basolo et al., 2020; Bisanz et al., 2019), decreased A. muciniphila (Depommier et al., 2019; Plovier et al., 2017), decreased diversity (Turnbaugh et al., 2009a), and increased acetate (Perry et al., 2016; Turnbaugh et al., 2006). Yet EA subjects also had higher levels of Bacteroidota and Bacteroides, which have been linked to improved metabolic health (Johnson et al., 2017). More importantly, our microbiome transplantations demonstrated that the recipients of the lean EA gut microbiome had less body fat despite consuming the same diet. These seemingly contradictory findings may suggest that the recipient mice lost some of the microbial features of ethnicity relevant to host metabolic disease or alternatively that the microbiome acts in a beneficial manner to counteract other ethnicity-associated factors driving disease.

      EA subjects also had elevated levels of the short-chain fatty acids propionate and isobutyrate. The consequences of elevated intestinal propionate levels are unclear given the seemingly conflicting evidence in the literature that propionate may either exacerbate (Tirosh et al., 2019) or protect from (Lu et al., 2016) aspects of metabolic syndrome. Clinical data suggests that circulating propionate may be more relevant for disease than fecal levels (Müller et al., 2019), emphasizing the importance of considering both the specific microbial metabolites produced, their intestinal absorption, and their distribution throughout the body. Isobutyrate is even less well-characterized, with prior links to dietary intake (Berding and Donovan, 2018) but no association with obesity (Kim et al., 2019). Unlike SCFAs, we did not identify consistent differences in BCAAs, potentially due to differences in both extraction and standardization techniques inherent to GC-MS and NMR analysis (Cai et al., 2016; Lynch and Adams, 2014; Qin et al., 2012).

      There are multiple limitations of this study. Due to the investment of resources into ensuring a high level of phenotypic information on each cohort member coupled to the restricted geographical catchment area, the IDEO cohort was relatively small at the time of this analysis (n=46 individuals). The current study only focused on two of the major ethnicities in the San Francisco Bay Area. As IDEO continues to expand and diversify its membership, we hope to study a sufficient number of participants from other ethnic groups. Stool samples were collected at a single time point and analyzed in a cross-sectional manner. While we used validated tools from the field of nutrition to monitor dietary intake, we cannot fully exclude subtle dietary differences between ethnicities (Johnson et al., 2019), which could be interrogated through controlled feeding studies (Basolo et al., 2020). Our mouse experiments were all performed in wild-type adult males. The use of a microbiome-dependent transgenic mouse model of diabetes (Brown et al., 2016) would be useful to test the effects of inter-ethnic differences in the microbiome on insulin and glucose tolerance. Additional experiments are warranted using the same donor inocula to colonize germ-free mice prior to concomitant feeding of multiple diets, allowing a more explicit test of the hypothesis that diet can disrupt ethnicity-associated microbial signatures. These studies, coupled to controlled experimentation with individual strains or more complex synthetic communities, would help to elucidate the mechanisms responsible for ethnicity-associated changes in host physiology and their relevance to disease.”

      Reviewer #3 (Public Review):

      The authors aimed to characterise how gut microbiota changes between different ethnic group for bacterial richness and community structure. They also wanted to address how this is associated with ethnic group within a defined geographical location. They have started to their story by comparing the fecal microbiota of relatively small cohort consisting of 46 lean and obese East Asian and White participants living in the San Francisco Bay Area. For that reason they used 16S and shotgun metagenomics. They demonstrated that ethnicity-associated differences in the gut microbiota are stronger in lean individuals and obese did not have a clear difference in the gut microbiota profile between ethnic groups, either suggesting that established obesity or its associated dietary patterns can overwrite long-lasting microbial signatures or alternatively that there is a shared ethnicity-independent microbiome type that predisposes individuals to obesity. The authors did also show the metabolic differences between these ethnic groups and the major differences were in the branched chain amino acid and the short-chain fatty acids. To prove their point, at this stage they have also used different metabolomic methodology. Although some aspects of the work are not very novel, the work does provide additional insights into the effect(s) of ethnicity, current living location and diet on shaping microbiota. Honestly, while reading through the manuscript, I have several questions where I believed that clarification was needed. But somehow, I felt like the authors have been reading my mind every step of the way. At the end of each section whatever I questioned was addressed in the next paragraph There are, however, a few points that I think would like to hear the authors' clarification.

      • The authors pursued the story using 16S data. However, they have shotgun Metagenomics data which gives more power and resolution to microbiota profile. Is there any specific reason why the story was not build with shotgun Metagenomic data? However, if this is the case it will be nice to justify in the text or legend which figure was built with what dataset exactly?

      As discussed above, 16S rRNA gene and metagenomic sequencing both have strengths and weaknesses. For example, 16S-seq is inexpensive and allows analysis of low abundance species, whereas metagenomics permits analysis of gene and pathway abundances of abundant taxa. As requested, we have now expanded Figure 2 (metagenomics) to better match Figure 1 (16S-seq). The type of technology is defined within each legend and the relevant text within our results.

      • Even though the authors mentioned in the discussion that they have not used the same inocula from a donor to different diet, it will be nice if the authors further comments whether they would expect the same results or slightly different results which each different inocula.

      As requested, we have modified the text in our discussion to include these comments:

      “Additional experiments are warranted using the same donor inocula to colonize germ-free mice prior to concomitant feeding of multiple diets, allowing a more explicit test of the hypothesis that diet can disrupt ethnicity-associated microbial signatures. These studies, coupled to controlled experimentation with individual strains or more complex synthetic communities, would help to elucidate the mechanisms responsible for ethnicity-associated changes in host physiology and their relevance to disease.”

      Overall, the study is well executed and claims and conclusions seem relatively well justified by the provided evidence. The findings are interesting for a broad audience of biologists. The findings are interesting for a broad audience of biologists.

    1. Author Response:

      Reviewer #1:

      In this work, the authors present a model for double nerve transfers in the forelimb of the rat. The authors provide a detailed description of how the model is developed and they characterize neuromuscular regeneration through nerve crush, neurotomy, behavioral analysis, and retrograde labeling. The peripheral innervation of muscle with a double nerve transfer is compared to that with a single nerve transfer.

      Major strengths:

      • Strong motivation for necessity of this model given
      • Experimental design and surgical techniques are clearly described. The authors include methodologies, materials used, figures, and supplementary videos to support the discussion of how the experimental model is developed.
      • Large number of animals are used for both the double nerve transfer and single nerve transfer, and results appear to be consistent within these populations.

      Thank you for your comments.

      Weaknesses:

      • The work assumes specialized knowledge of peripheral nerve anatomy and some surgical techniques. The article may be less accessible to someone without a background in these areas who seeks to learn more about nerve transfer models.

      Thank you for your feedback, which helped us to improve our manuscript! We agree that the work assumes some knowledge of peripheral nerve anatomy and some surgical techniques. For better understandability, we included more information on nerve transfers in the manuscript to make it more accessible to a wider readership. Please see p11 line 342-343.

      The authors do a rigorous job of describing the techniques used to develop the double nerve transfer model. The experimental design and surgical methods provide detailed accounts of how the model is realized, including descriptions of the techniques as well as highlighting materials that are necessary for the procedures. This is particularly valuable for a reader who desires to replicate this model. The efficacy of the nerve transfer is examined in multiple ways and compared to a single nerve transfer model. These results, which are statistically verified, demonstrate that the double nerve transfer more effectively reinnervates muscles and, in some measures, that there is no difference between animals with double nerve transfers and healthy comparisons. This provides confidence and excitement for how this model may be used in the future for studies involving therapeutics.

      We thank reviewer #1 for the thorough summary and for pointing out major strengths and room for improvement in our work. Indeed, our study proposes a model for single nerve and double nerve transfer to a single target muscle and aims to provide the detailed model for optimal reproducibility .

      Reviewer #2:

      The paper is well-written with sufficient sample sizes. The figures are generally clear and easy to understand.

      The clinical utility of multiple nerve transfers should be better delineated in the introduction. Current limitations, difficulties, and strategies to mitigate such limitations should also be overviewed to provide better context to the reader as to the importance of this work.

      Thank you for this helpful comment. We agree that the reader benefits from more information on the clinical utility of multiple nerve transfers and their limitations. Therefore, we revised the discussion (p8 line 280-284) and the introduction (p3 line 85-89) to better inform the reader and included current limitations.

      What were the relative proportions of innervation between the two nerves in the dual-innervation model based on the retrograde labeling?

      We agree that this is a very interesting aspect of double reinnervation nerve transfers. In this project, the experimental setup did not allow such an investigation yet. However, we are already planning to investigate this aspect via sequential double retrograde labeling (Katada et al., 2006). In addition, we are currently developing an EMG system to reliably quantify relative proportions of innervation. This project is currently in development and will be started following ethical approval and funding.

      Section 3.4.1: Please present the raw data and mentioned scatterplot? What was the correlation coefficient for the linear regression?

      We are pleased to fulfill this request. The raw dataset will be made available on Dryad. Please see Figure 2 – figure supplement 1 for the scatterplot and the correlation coefficients are as follows; SNT: R2 linear=0.390, DNT: R2 linear=0.516.

      Section 3.2. The distribution of scores from each group should be presented in a graphical or tabular format. What was the course of recovery? Were the evaluations performed anytime before 12 weeks?

      We included the data in table 1, as to your suggestion. The evaluation was conducted in all animals once at 12 weeks to see if any restrictions in motion persisted, which was not the case. For testing, evaluations were also performed in some randomly selected animals within the first weeks after surgery (which all achieved the maximum score). We only see a brief reduction in function during the initial healing phase, but not that can be attributed to the denervation of the target and donor nerve.

      Section 3.3: SNT vs DNT should be evaluated in a comparative fashion.

      Thank you for suggestions. We included this information in essential revision 1).

      Section 3.4 How is 'adequate' muscle fibrillation determined?

      Thank you for this important query. We defined an ‘adequate’ response as a macroscopically and clearly recognizable response similar to the control side. An inadequate fibrillation was defined as a non-observable or almost not existing contraction upon crushing. Judged by the two staff members grading the response independently, we observed reproducible and unmistakable responses in all cases. Please see Video 2 for an adequate muscle fibrillation in control animals.

      Was any electrophysiology performed to assess the quality of reinnervation and nerve conduction velocity? Compound muscle action potentials or motor unit counts would help identify the proportion of the muscle that was reinnervated.

      In the current study, we did perform first preliminary EMG analyses in selected animals to test a novel electrophysiology setup. Here, we did see proper EMG responses to stimulation. Due to a limited N and further improvements needed for thorough analyses, we did not include this data here. We are currently working on a next trial to properly assess EMG response, CMAP to stimulation of either donor nerve or both simultaneously. We look forward to these results, which we hope to publish in a follow-up study in the near future.

      Can the authors please comment on the way in which the DNT was adopted by the animals as opposed to the SNT? Was there any noticeable difference in the functional recovery of the muscle or retraining process? The neuroplastic adaptation would be an interesting characterization.

      Thank you for the interesting inquiries. We did not observe any behavioral differences nor in the functional recovery between the SNT and DNT group. Two motor nerves innervating the same muscle did not result in noticeable differences. However, neuroplastic adaptation has not been investigated in this work but future research focusing on that is necessary to characterize potential neuronal changes. In previous models with single nerve transfers to the lateral head of the biceps, even within the denervation phase, little burden to the animal was noted. We believe that the animal can easily compensate the loss of function and therefore, the double innervation may therefore not be evident in functional analyses.

      In the discussion, the authors suggest that "hindlimb models do not adequately represent the physiology of upper extremity nerve transfers and targeted muscle reinnervation procedures." based on outcomes for lower vs upper transfer. A number of additional factors, including usage of the limbs, weight bearing, sensorimotor circuitry etc. play a role and should be accounted for.

      We gratefully accept these proposals and included these additional factors in the discussion (p6 line 188-189)

      In the DNT model, was the distance between entry points of the coaptation held constant between animals or optimized? The increased muscle mass observed in the DNT group is likely a result of a better axon : myocyte ratio and spatial distribution. This could be studied and optimized to improve the outcomes and utility of DNT.

      We tried to keep the coaptation sites of the two nerves constant with approximately 2-3mm of distance to each other with the UN placed proximal and the AIN placed distal. We also believe that the increase in muscle mass observed in the DNT group is probably due to a better ratio between axons and muscle fibers. We look forward to incorporating these considerations into our next projects.

      The discussion should more thoroughly explore the limitations of this model and experimental constraints.

      We agree that a thorough discussion on the limitations of this model and the experimental constraints will improve the manuscript and have included and discussed relevant aspects in more detail. Please see p8 line 280-284.

      What remains to be optimized prior to clinical translation? What types of scientific questions can this help answer? In which types of clinical cases would DNT not be appropriate?

      We agree, many things about nerve transfer physiology are unknown but at the same time utterly fascinating. Before clinical translation, a clear understanding of the quality and quantity of innervation must be acquired. Furthermore, the question of which portions of the muscle are innervated by either of the two nerves and if influencing factors which modify this reinnervation process can be identified. It must also be investigated whether a patient can actually control the muscle voluntarily with two nerves after double reinnervation which is crucial for prosthetic interfacing. We believe that a thorough EMG analysis and the assessment of neuro-muscular-junctions by imaging may help answering these questions. In addition, double innervation may be important for prosthetic interfacing to acquire EMG signals rather than functional reconstruction of a muscle. This is another fascinating topic to investigate in the future.

      Reviewer #3:

      The authors aimed to establish a rodent upper limb model to test double vs. single nerve transfers, and provided base results for histological retrograde labeling, topographic findings, functional (behavioural) analysis and outcomes for reinnervated vs. control muscle mass. The manuscript is well balanced, and contains a detailed description to reproduce the experimental model.

      The authors demonstrated equal functional outcomes for both types of transfers and have in this reviewers opinion succeeded in establishing a novel model for future (experimental) tests before (clinical) application.

      Thank you for acknowledging the good balance and detailed description in our study, which we hope will help other researchers to use this model for their investigations. For this purpose, we provided diligent photo documentation in addition to an extensive description of the surgical nerve transfer procedures.

    1. Author Response:

      Reviewer #1 (Public Review):

      Watanabe presents a set of EEG-TMS experiments to show that brain stimulation in specific frontal regions affect both perception and brain states during bistable perception. The patterns of results appear interesting and potentially significant. The work uses relatively idiosyncratic methodologies in terms of data analysis and modelling, which makes the work harder to relate to extant literature. This situation requires authors to "go the extra mile" in explaining their approach and ensuring that readers can easily understand the findings in the light of what they're likely to already know - and here, I find steps could be taken.

      We are sorry for our unclear original manuscript that required extra efforts to read. As shown in Responses to the essential revisions, we have now modified it along with the reviewer's helpful and thoughtful suggestions. We hope that such modification will address the reviewer's concerns.

      Specific comments;

      • The author has an idiosyncratic definition of DLPFC; especially pDLPFC seems to coincide with iPCS retinotopic regions as found by Mackey et al, and with rIFJ from earlier work such as that by Sterzer and Kleinschmidt. When you look up DLPFC on wikipedia, this shows a region more 'superior' than both regions designated DLPFC in the present work, closer to aDLPFC than pDLPFC. Perhaps the author could try to more explicitly connect his nomenclature to the literature?

      We can understand the reviewer's concern on this terminology issue. In accordance with the reviewer's suggestion, we have now changed: aDLPFC to DLPFC and pDLPFC to IFC. Also we have added references to support such anatomical labels as follows:

      Methods section (lines 8-15 on page 21).

      "As in our previous work, the seven ROIs consisted of the right FEF (x = 38, y = 0, z = 60 in MNI coordinates), DLPFC (x = 44, y = 50, z = 10), IFC (x = 48, y = 24, z = 9), anterior superior parietal lobule (aSPL; x = 36, y = –45, z = 44), posterior superior parietal lobule (pSPL; x = 38, y = –64, z = 32), lateral occipital complex (LOC; x = 46, y = –78, z = 2) and V5 (hMT/V5; x = 47, y = –72, z = 1) (Fig. 1b). These coordinates are based on the following previous studies: a study by Sterzer and colleagues23 for the FEF; one by Knapen and colleagues24 for the DLPFC; one by Kleinschmidt and colleagues25 for the IFC; three studies by Kanai, Carmel and their colleagues26–28 for the aSPL and pSPL; one by Freeman and colleagues29 for the LOC and V5."

      • The possible relation of the individual brain state dynamics with the ongoing sequence of bistable perception apart from the TMS manipulation is not treated. This may feel self-evident to the author perhaps because this is the topic of previous studies, but it's confusing to a novice reader. To me, linking the sequence of bistable perceptual states to the sequence of brain states as found using the author's methodology is a fundamental step to allow interpretation of all of the subsequent results, because it speaks to the meaning and significance of the existence of these brain states. Without this step, I find it difficult to interpret figures 2a and 2b (which, I have to say, do indeed look like enticing patterns in the data). So, specifically, does the author replicate the brain-state vs behavior correlations that he reported in his earlier (2014) work on this topic? And, because this publication reported mainly across-observer correlations, what about relations between brain states and their transitions and perceptual events on a within-subject basis?

      Thank you for giving us to the opportunity to show detailed results that validate our application to the EEG data. Now we have inserted all such results into the first part of the Results section with new figures.

      • The exact analysis procedure that leads up to the brain state designation is not very transparent. What, for example, is the brain state that is "Frontal"? I would appreciate to see state-transition triggered time-frequency plots to be able to understand what exactly in the EEG the procedure picks up. The same holds for the TMS-triggered changes; is there any pattern in terms of TMS-induced time-frequency changes?

      In accordance with the reviewer's suggestion, we have now added time-frequency plots during brain state transitions as well as those before/after a TMS administration.

      • It would be a valuable addition if the author could clarify what he means with a brain state; this term means different things in different fields. The concept of brain state is now primarily based on high frequency EEG signatures, but there are likely many other possible measurements that could produce estimated brain state. How would the findings change if other measures were used as a basis for the same methods?

      As the reviewer stated, a "brain state" in this study is different from so-called "miscrostate" in EEG research but indicates an activity pattern of multiple (here, seven) brain regions or a group of such activity patterns. This concept of the brain state is based on the energy landscape analysis. In terms of generalisability of this analysis method, previous studies have demonstrated that such brain states are identifiable in both task-related and resting-state functional MRI data1,2,30,31.

      In the meantime, hidden Markov model (HMM) can also identify similar brain states32–35. However, few studies have applied the HMM to task-related data, and no work has used it to the neural signals during bistable visual perception.

      Given this, this study used the energy landscape analysis to identify the brain states and dynamics between them.

      We clarified this point by adding the following descriptions into the Methods section.

      Methods section (lines 14-18 on page 26)

      "Note that a “brain state” in this study is not a so-called “miscrostate” in conventional EEG research; it indicates an activity pattern of multiple (here, seven) brain regions or a group of such activity patterns. Although other analyses, such as hidden Markov model (HMM), can also identify brain states78–81, we adopted the energy landscape analysis in this study because it was previously used to identify the brain states underpinning the bistable visual perception20. "

      • Figure 1l. From methods and explanations it's not really clear how this figure is produced. If it is created from single-subject surface locations that were explicitly targeted, and these locations are then transformed into an average-subject surface, that would be correct. But weren't these locations targeted based on MNI coordinates? In this case one would expect more of a spread in specific locations because of the across-subject variability in surface folding. So, could the author please explain in more detail how this figure is generated?

      We are sorry for the insufficient description on this issue. We have now added the following explanation to the Methods section and the legend of the figure (Fig. 4a):

      Methods section (line 33 on page 22 – line 2 on page 23)

      "We confirmed that the coil did not substantially move throughout the experiment by re-measing the location with the neuro-navigation system at the end of each experiment day. The green circles in Fig. 4a show such end-of-the-day locations averaged across the four-day sessions in the main experiment."

      Fig. 4. a. We administered inhibitory TMS over the three PFC regions. "In one TMS condition, we placed the TMS coil over one of the PFC areas using a stereoscopic neuro-navigation system based on the MNI coordinates at the beginning of each experiment day. At the end of the day, we re-measured the coordinates of the TMS coil using the navigation system. Finally, we averaged the coordinates across the four-day sessions in the main experiment. The green circles represent such mean MNI coordinates of the stimulated brain site for each participant. The green circles were mapped closely onto the original coordinates (the centres of the yellow circles)."

      • Page 8, I appreciate the logic that "barrier heights are associated with the dwelling time in the brain states and inversely correlated with the transition frequency between them", but this needs to be fleshed out more. What are the numerical simulations here? These aren't described in the text and as a reader, I'm left having to believe the accuracy of the 'numerical simulations' without being given the opportunity to understand them. This explanation would be a nice opportunity to go into detail about how the author does (and, consequently, the audience should) understand and interpret both the brain states, and their transitions.

      We are sorry for our insufficient description on this issue. Now in accordance with the suggestion, we have now re-written the methods of the numerical simulation. Please see our response to (1)-(iii) in Responses to the essential revisions.

      Reviewer #2 (Public Review):

      The author tested the hypothesis that the causal influence of the PFC on bistable perception is dynamic and depends on the (fluctuating) state of the cortical networks. Using offline and online EEG measurements and a sophisticated analysis procedure, the author characterized their dynamic brain states when observers perceived a bistable rotating sphere defined by Structure-from-Motion with alternating perceived direction of rotation. TMS applied to different regions of the frontal, parietal, and visual areas had different effects on observers perceptual dynamics, depending on the dynamic state of the cortical networks. It's quite impressive to see the large effect size from TMS to the aDLPFC, and the opposite direction of the effects observed from aDLPFC and pDLPFC/FEF stimulation makes it more convincing that the PFC has specific and robust roles in bistable perception.

      Thank you for the reviewer's positive evaluation.

      Although the effect on bistable perception from state-dependent TMS of DLPFC is robust and very interesting, the functional mechanism of how different regions of DLPFC contribute to the perceptual dynamics remains unclear. I find it surprising that the author did not address the potential role of attention in mediating DLPFC's contribution to observers' perceptual dynamics. Given that attention does play a role in the dynamics of many forms of bistable perception, it is important to distinguish between an intrinsic contribution of DLPFC to bistable perception vs. an effect mediated by changes in attentional state. It is also useful for the author to discuss how and why certain brain states are linked to certain perceptual states.

      We agree with the reviewers on effects of attention on the DLPFC and IFC functions in the bistable perception. Also, we admit that we should have to state our inference on the neuropsychological functions of each brain state. Now, to address these concerns, we have added detailed discussions into the Discussion section.

      There are many forms of bistable perception, and their dynamics are controlled or influenced by shared as well as independent mechanisms (e.g., Cao T et al, Frontiers in Psychology 2018). It would be useful to discuss the generalizability and limitations of the current results in relation to different types of bistable stimuli. The methodological approach developed by the author will be quite useful in researching the neural mechanisms of other types of bistable perception.

      We appreciate for letting us know the nice behavioural study. Now we have extended our notion on the generalisability of the current observations as follows:

      Discussion section (line 30 on page 17 – line 2 on page 18)

      "These findings may not be directly applicable to other types of multistable visual perception, such as binocular rivalry, which is linked to lower-level brain architectures such as the visual cortex 32–38. In fact, a comprehensive behavioural study reported the relative dissimilarity in perceptual switching rate between the current SFM-induced bistable perception and the binocular rivalry61. In contrast, the same study found the similarity in between the SFM-induced bistable perception and other fluctuating perception triggered by spinning dancer62 and Lissajous-figure63. Given this, the current observations might be more applicable to types of bistable perception that requires construction of a 3D image from 2D motion compared to the other types such as the binocular rivalry."

      Reviewer #3 (Public Review):

      This is an ambitious study by a competent single author who has previously published highly innovative work on this topic. The study incorporates real-time closed-loop EEG-TMS and computational modeling to causally test the role of PFC in perceptual switching of bistable perception triggered by ambiguous visual input. While the work is technically impressive and involves a substantial amount of work spread over multiple experiments (especially notable in the context of a single-author manuscript), I have some major concerns as described below.

      1) The author's previous work on energy landscape in the context of bistable perception was conducted using fMRI. This current study employs EEG, and records time series from 7 ROIs (Fig. 1a-b). Some of these ROIs are very close together, less than a few centimeters (e.g., a-p SPL; a-p DLPFC; LOC-V5). The conventional thinking is that scalp EEG does not have the spatial resolution to separate signals from such closely spaced areas. While the author employs a Laplacian montage, validation data suggesting that the resulting signal had high SNR and could differentiate between neighboring regions is missing.

      To address this concern, we probed the data and found results to support the sensitivity and specificity of the current EEG system. We have added these new observations into the Methods section.

      2) The EEG analysis rests on gamma band (30-80 Hz) power. This should be explained in the main text. It is technically risky to record gamma band activity using scalp EEG, due to muscle, eye, and, most concerningly, microsaccade-related artifacts (see work by Yuval-Greenberg). Since the task employs a structure-from-motion stimulus, the effect of microsaccades is especially worrying. No control data was presented to suggest that these artifacts do not contribute to the analyses.

      We agree on the necessity of reducing the microsaccade-related artefacts on the gamma-band signals. We adopted a derivation method (i.e., Hjorth signal calculation) and ICA to reduce such artefacts (for the derivation method, see 5–8; for the ICA, see 9–11), but the original manuscript did not present explicit results to support effects of these signal processing methods. To address this situation, we conducted a new EEG experiment, in which 30 healthy adults underwent the same psychophysics paradigm. In the additional experiment, 28 EEG electrodes were placed around the seven regions of interests (ROIs) in the same manner as in the original experiment, whereas the other four electrodes were located around the eyes for electrooculography (EOG)5. Based on previous literature12–14, we used these EOG signals to infer the timings of the occurrences of microsaccades. In this experiment, we confirmed that the current preprocessing methods could reduce the artefacts induced by microsaccades, which was described in the Methods section. We also explicitly stated that we used the gamma-band EEG signals in the first paragraph in the Results section.

      3) P. 10 There is a concern here that the hypothesis testing is circular. The models were fit by using EEG data (and behavior?) to calculate the energy landscape, so is it trivially expected then that the dwell times seen behaviorally correlate with the energy barrier estimated by the model?

      The energy landscape analysis used no behavioural data to identify the brain state dynamics. We clarified this by adding a sentence into the Results section.

      4) It's not clear to me why pDLPFC's result was interpreted as "functional diversity".

      To clarify this, we have modified the description on the IFC function as follows:

      Abstract

      "Moreover, these findings indicate distinct functions of the three PFC areas: in particular, the DLPFC enhances the integration of two PFC-active brain states, whereas IFC promotes the functional segregation between them."

      Discussion section (lines 18-21 on page 15)

      "Moreover, the current findings suggest distinct functions of the PFC regions in terms of the brain state dynamics: the activation of DLPFC enhances the functional integration between the Frontal and Intermediate state, whereas the IFC activity promotes the functional segregation between the two brain states; the FEF activity stabilises Frontal state."

      5) The pDLPFC region here would be more accurately referred to as inferior frontal gyrus (IFG) or ventral frontal cortex (VFC), or inferior frontal cortex (IFC). It is not part of the classic DLPFC.

      In accordance with the reviewer's suggestion, we have now replaced pDLPFC with IFC throughout the manuscript.

    1. Author Response:

      Reviewer #3:

      Maintaining the balance between stem cell proliferation and cell differentiation is an essential challenge of all stem cell niches. In the shoot apical meristem of plants, these functions are spatially separated into the central zone and peripheral zone, respectively. How these zones communicate to give rise to proper stem cell behavior has been a research focus for many years.

      In this manuscript, the authors suggest that the small secreted peptide CLE40 and the receptor kinase like protein BAM1 form a novel pathway that contributes to meristem homeostasis by stimulating the expression of the central stem cell inducer WUSCHEL primarily from the meristem periphery. Importantly, this pathway acts antagonistically to the well-studied CLV pathway, which is only active in the center of the meristem and is molecularly highly similar to the CLE40/BAM1 system. This model is experimentally supported mainly by analysis of spatial localization patterns in the meristem using transcriptional and translational reporters and by the analysis of genetic interactions.

      The findings of the authors are novel, highly relevant and would certainly be of great interest for the plant community. However, the manuscript could be substantially improved to provide better support for the conclusions laid out.

      Of major concern are the reporter genes and imaging data: Partial colocalization and exclusion from CZ and OC are one of the main arguments of the authors to claim that CLE40/BAM1 function together and antagonistically to CLV3/CLV1 in controlling WUS expression.

      Working with reporters as proxies for endogenous gene expression needs to be backed up by proper controls. Given the central importance of the reporters for the conclusions it is essential to show that the regulatory sequences used for the CLE40 reporter are sufficient to rescue a cle40 mutant.

      We show now in a new supplemental figure (1) the expression patterns of two different CLE40 reporter lines (differening in length of the promoter region) in the root, which are identical, and (2) that expression CLE40 from the CLE40 promoter rescues the cle40 mutant root phenotypes, which were described in earlier work. See Fig2-SupplFig. 1

      It is essential to show that ... the observed expression of the reporter is consistent across the majority of different T1 lines and, most importantly, that the pattern reported here is consistent with in situ data for endogenous CLE40 mRNA.

      RNA in situ analysis is difficult due to the low expression level of CLE40, and the small size of the CLE40 transcript. We show in Fig2-SupplFig2 expression data for 4 independent transgenic CLE40 reporter lines, confirming the general conclusions that we present in this manuscript.

      The authors have previously published in situs for CLE40 that do not show the exclusion from the CZ and OC (Hobe et al., 2003, Figure 2a,c), which urgently needs clarification.

      The RNA expression data from Hobe et al. are displayed at low mag and low resolution, and might have suffered from high background.

      Figures 2, 4 and 5 show imaged meristems in great detail but each focus only on a single sample. I strongly recommend to also include quantitative data on multiple samples to substantiate the claims. This could be likely be done with standard software, such as MorphographX.

      The data we showed before represented typical examples from a wide range of data that we analysed. All original data are being made publicly available for reanalysis. We have now added multiple examples from multiple samples, and also added quantitative data from fluorescence analysis. See new Supplementary Fig2-SupplFig. 2, Fig.4-SupplFig. 1, Fig.4-SupplFig. 2, Fig.4-SupplFig. 3, Fig.5-SupplFig. 1, Fig.5-SupplFig. 2, Fig.5-SupplFig. 3, Fig.5-SupplFig. 4

      Whereas the inhibitory effect of WUS on CLE40 is convincingly shown using ectopic WUS expression and the hypomorphic wus7 allele (Figure 2) the quantification of WUS positive cells in Figure 7 is problematic. Although it was done over multiple samples it heavily relies on manual scoring, which is prone to bias. The same is true for the width/height measurements of different meristems. An unbiased computational image analysis would certainly give more reliable results.

      We are grateful for this suggestion. We normally analyse samples in an anonymised manner. We have now also quantified the number of WUS positive cells using the Imaris software, as suggested, see Fig.7-SupplFig.1, and found that this analysis supported our previous conclusions. We also added a figure showing multiple samples from this experiment. See new Supplementary Fig7-SupplFig. 2

      One major point that the authors try to establish is that the CLE40 signal that eventually leads to reduction in meristem size is transduced via the BAM1 receptor. However, only genetic interactions, which are complicated by intricate feedbacks, are show to substantiate this claim. For a strong statement on CLE40/BAM1 ligand/receptor interactions, advanced imaging technologies available to the authors or biochemical experiments would be necessary.

      We are currently not aware of a reliable and applicable experimental approach that would allow us to show direct interaction of the CLE40 peptide with its receptors in vivo. Biochemical experiments using purified peptides and/or receptors are, so far, contradictory: Shinohara et al. (2015) used chemically synthesized arabinosylated CLV3 peptide and photoaffinity labelling to show binding of CLV3 to a BAM1-Halo-TAG fusion protein expressed in BY-2 cells. However, using BAM1 protein purified from insect cell lines which was biotinylated in the Creoptix WAVE system, Crook et al. (2020) found no significant binding activity for synthetic CLV3 peptide. Our preliminary conclusion from these data sets is that binding of peptides to receptors should be best evaluated in vivo, since important posttransciptional and posttranslational modifications, as well as coreceptors, can strongly modify peptide-receptor interactions.

      We have here added data showing that in the root, BAM1 receptor but not CLV1 is required for CLE40 dependent regulation of root meristem development, indicating again that CLE40 and BAM1 are likely to act in the same signaling pathway throughout development. See new Supplementary Fig6-SupplFig. 1

      Similarly, the genetic studies need some clarification: The authors show that cle40 and bam1 single mutants as well as cle40/bam1 double mutants all show a comparable reduction in meristem size, suggesting epistasis. In contrast, a reduction in meristem size can not be observed if cle40 is combined with clv1, which according to the proposed model appears to be unexpected. The interpretation of the genetic experiments is complicated by the well-known fact that BAM1 expression is regulated by the CLV pathway and loss of CLV signaling leads to ectopic expression of BAM1 in the OC which can partially compensate for the loss of CLV1, due to the molecular similarity of the two receptors. The shift of BAM1 expression from the PZ towards the OC could explain why there is no significant reduction in meristem size since CLE40 induced signaling at the PZ would be inhibited by the lack of the BAM1 receptor. To clarify the specific interaction of CLE40 with BAM1 and/or CLV1 the authors could try to restore BAM1 levels in the PZ of cle40/clv1 mutants by expressing BAM1-GFP from an appropriate promoter (e.g. RPS5 or UBQ10). This experiment would allow to distinguish between the genetic interaction of CLE40 with CLV1 from the feedback between CLV1 and BAM1 expression.

      The suggested experiment, to misexpress BAM1 from the RPS5 or UBQ10 promoter, is not feasible, since this results generally in a much higher expression level, which, in our hands, is not "tolerated" by CLV-family receptors. We found that higher level expression of RLKs generally causes mislocalisation of nonfunctional proteins.

      Overall, the manuscript could be strengthened by inclusion of additional molecular data probing the directness of WUS inhibiting CLE40 and/or BAM1 expression.

      We are planning to set-up experiments for detailed studies on the transciptional regulation of genes in the stem cell control pathways, and will in the future also investigate the feedback regulation of WUS onto CLE40 and BAM1. However, such analysis goes far beyond the scope of our current manuscript.

    1. Author Response:

      Reviewer #1:

      Strengths

      This study is a technical and analytical tour de force. The evolution experiments with barcoded lineages involved an immense amount of work and clever design, and the scale of the data challenged the authors to develop new statistical summaries. The figures are clear and results easy to interpret, even outside the evolution-experiment bubble. While the essential findings are not especially surprising, the robustness enabled by this level of replication is appreciated.

      Weaknesses

      I'm not exactly sure what I learned. I'm biased to like this work and while I'm confident that if I studied these findings more I would learn more, it wasn't obvious. For example - I want to know more about the effects of ploidy on pleiotropy, and while there are some differences e.g in Figure 4A, I don't know what these PCs actually are saying. If particular phenotypes associate with PC's, it'd be helpful to "load" them on these axes.

      To more clearly show general trends and variation in pleiotropy, we have added a summary of the changes in fitness across all populations in Figure 2B and Figure 2– figure supplements 2–5. We have also expanded our consideration of these trends, including the effects of ploidy on pleiotropy. To supplement Figure 4, we have included the contribution of each assay environment to the principal components (Figure 4–figure supplement 5), as suggested.

      Also, do some treatments lead to faster or more complete diminishing returns than others, and does this influence pleiotropy?

      To compare changes in fitness across evolution environments and over time, we have computed the change in fitness for each population over the first 400 generations and the last 400 generations. This is plotted in Figure 2-figure supplement 6A. To assess the statistical significance of apparent diminishing returns, we compared the mean change in fitness over these time intervals using a t-test and provided the resulting p-values in Figure 2-figure supplement 6B. Overall, we see that different treatments lead to different extents of declining adaptability and note this in the Results. This declining adaptability may certainly influence pleiotropic outcomes, but unfortunately it is difficult to disentangle any potential such effects from other differences between environments (or assign any causality to correlations in the strengths of diminishing returns and differences in pleiotropy between replicates in the same environment), so we refrain from drawing any conclusions about this possibility.

      In total I think this manuscript can be improved by being presented / read by others, which is the job of peer review but here I think it's also to broaden its implications.

    1. Author Response:

      Reviewer #1:

      The submitted manuscript 'Distinct higher-order representations of natural sounds in human and ferret auditory cortex' by Landemard and colleagues seeks to investigate the neural representations of sound in the ferret auditory cortex. Specifically, they examine the stages of processing via manipulating the complexity and sound structure of stimuli. The authors create synthetic auditory stimuli that are statistically equivalent to natural sounds in their cochlear representation, temporal modulation structure, spectral modulation structure, and spectro-temporal modulation structure. The authors use functional ultrasound imaging (fUS) which allowed for the measurement of the hemodynamic signal at much finer spatial scales than fMRI, making it particularly suitable for the ferret. The authors then compare their results to work done in humans that has previously been published (e.g. Norman-Haignere and McDermott, 2018) and find that: 1. While human non-primary auditory cortex demonstrates a significant difference between natural speech/music sounds and their synthetic counterparts, the ferret non-primary auditory cortex does not. 2. For each sound manipulation in humans, the dissimilarity increases as the distance from the primary auditory cortex increases, whereas for ferrets it does not. 3. While ferrets behaviorally respond to con-specific vocalizations, the ferret auditory cortex does not demonstrate the same hierarchical processing stream as humans do.

      Overall, I find the approach (especially the sound manipulations) excellent and the overall finding quite intriguing. My only concern, is that it is essentially a null-result. While this result will be useful to the literature, there is always the concern that a lack of finding could also be due to other factors.

      Thank you for taking the time to carefully read our manuscript. We have done our best to address all of your questions and concerns, which has improved the paper.

      We note that our finding differs from a typical null result in two ways. First, our key finding is that responses to natural and synthetic sounds are closely matched throughout primary and non-primary auditory cortex. Unlike a typical null result, this finding cannot be due to a noisy measure, since if our data were noisy, we would not have observed any correspondence between natural and synthetic sounds. Second, we have a clear prediction from humans as to what we should observe if the organization were similar: matched responses in primary auditory cortex and divergent responses in non-primary auditory cortex. Our data clearly demonstrate that this prediction is wrong, for all of the reasons noted in our general response above. In essence, what we are showing is that there is a region by species interaction in the similarity of responses to natural vs. synthetic sounds (as reflected by a significant difference in slopes between species, see our response above). We have investigated and ruled out all of the alternative explanations we can think of for this interaction (e.g. differences in SNR or spatial resolution) and are left with the conclusion that there is a meaningful difference in functional organization between humans and ferrets. If there are any additional concerns you have, we would be happy to address them.

      Major points:

      1) What if the stages in the ferret are wrong? The authors use 4 different manipulations thought to reflect key elements of sound structure and/or the relevant hierarchy of the processing stages of the auditory cortex, but it's possible that the dimensions in the ferret auditory cortex are along a different axis than spectro/temporal modulations. While I do not expect the authors to attempt every possible axis, it would be beneficial to discuss.

      Thank you for raising this question. We now directly address this question in the Discussion (page 11):

      "Our findings show that a prominent signature of hierarchical functional organization present in humans – preferential responses for natural vs. spectrotemporal structure – is largely absent in ferret auditory cortex. But this finding does not imply that there is no functional differentiation between primary and non-primary regions in ferrets. For example, ferret non-primary regions show longer latencies, greater spectral integration bandwidths, and stronger task-modulated responses compared with primary regions (Elgueda et al., 2019). The fact that we did not observe differences between primary and non-primary regions is not because the acoustic features manipulated are irrelevant to ferret auditory cortex, since our analysis shows that matching frequency and modulation statistics is sufficient to match the ferret cortical response, at least as measured by ultrasound. Indeed, if anything, it appears that modulation features are more relevant to the ferret auditory cortex since these features appear to drive responses throughout primary and non-primary regions, unlike human auditory cortex where we only observed strong, matched responses in primary regions."

      2) For the ferret vocalizations, it is possible that a greater N would allow for a clearer picture of whether or not the activation is greater than speech/music? While it is clear that any difference would be subtle and probably require a group analysis, this would help settle this result/issue (at least at the group level).

      Below we plot the distribution of NSE values for ferret vocalizations, speech, and music, averaged across all of auditory cortex and plotted separately for each ferret tested (panel A). As is evident, we observe larger NSE values for ferret vocalizations in one animal (p < 0.01, Wilcoxon test), but no difference in the other two (p > 0.55). When we perform a group analysis, averaging across all three animals, we do not observe any significant difference between the categories (panel B) (p = 0.27). Moreover, even for ferret vocalizations, NSE values were similar throughout primary and non-primary regions, and this was true in all three animals tested (panel C). Given these data, we do not believe our study provides evidence for a difference between ferret vocalizations and other categories. Panel A is plotted in the revised Figure 4 - figure supplement 1E. The distance-to-PAC curves (panel C) and the corresponding slopes are plotted in Figure 4D-E.

      Individual and group analyses of the difference between natural and spectrotemporally matched synthetic sounds, broken down by sound category. A, The NSE between natural and synthetic sounds plotted separately for each animal and sound category. NSE values have been averaged across all of auditory cortex. Each circle represents a single pair of natural/synthetic sounds. We find that the NSE values are larger for ferret vocalizations in Ferret A, but this effect is not present in Ferret T or C (** indicates p < 0.005, Wilcoxon test). B, NSE values averaged across animals. C, NSEs for ferret vocalizations, plotted as a function of distance to primary auditory cortex (PAC). Figure shows both individual subject (thin pink lines) and group-averaged data (thick pink line).

      Below, we have reproduced the relevant paragraph of the results where we discuss these and other related findings (page 6):

      "To directly test if ferrets showed preferential responses to natural vs. synthetic ferret vocalizations, we computed maps plotting the average difference between natural vs. synthetic sounds for different categories, using data from both Experiments I and II (Figure 4C). We also separately measured the NSE for sounds from different categories, again plotting NSE values as a function of distance to PAC (Figure 4D-E). The differences that we observed between natural and synthetic sounds were small and scattered throughout primary and non-primary auditory cortex, even for ferret vocalizations. In one animal, we observed significantly larger NSE values for ferret vocalizations compared with speech and music (Ferret A, Mdvoc = 0.137 vs MdSpM = 0.042, Wilcoxon rank-sum test: T = 1138, z = 3.29, p < 0.01). But this difference was not present in the other two ferrets tested (p > 0.55) and was also not present when we averaged NSE values across animals (Mdvoc = 0.053 vs MdSpM = 0.033, Wilcoxon rank- sum test: T = 1016, z = 1.49, p = 0.27). Moreover, the slope of the NSE vs. distance-to- PAC curve was near 0 for all animals and sound categories, even for ferret vocalizations, and was substantially lower than the slopes measured in all 12 human subjects (Figure 4F) (vocalizations in ferrets vs. speech in humans: p < 0.001 via a sign test; speech in ferrets vs. speech in humans: p < 0.001). In contrast, human cortical responses were substantially larger for natural vs. synthetic speech and music, and these response enhancements were concentrated in distinct non-primary regions (lateral for speech and anterior/posterior for music) and clearly different from those for other natural sounds (Figure 4C). Thus, ferrets do not show any of the neural signatures of higher-order sensitivity that we previously identified in humans (large effect size, spatially clustered responses, and a clear non-primary bias), even for con- specific vocalizations."

      3) Relatedly, did the magnitude of this effect increase outside the auditory cortex?

      We did not record outside of auditory cortex. Unlike fMRI, it is not easy to get whole-brain coverage using current fUS probes. Since our goal was to test if ferret auditory cortex showed similar organization as human auditory cortex, we focused our data collection on auditory regions. We have clarified this point in the Methods (page 13):

      "fUS data are collected as a series of 2D images or ‘slices’. Slices were collected in the coronal plane and were spaced 0.4 mm apart. The slice plane was varied across sessions in order to cover the region-of-interest which included both primary and non- primary regions of auditory cortex. We did not collect data from non-auditory regions due to limited time/coverage."

      4) It would be useful to have a measure of the noise floor for each plot and/or species for NSE analyses. This would make it easier to distinguish whether, for instance, in 2A-D, an NSE of 0.1 (human primary) vs. an NSE of 0.042 (ferret primary) should be interpreted as a bit more than double, or both close to the noise floor (which is what I presume).

      All of our NSE measures are noise-corrected such that the effective floor is zero (noise- correction provides an estimate of what the NSE value would be given perfectly reliable measurements). The only exception are cases where we plot the NSE values for example voxels/ROIs (Figure 2A-D, Figure 2 - figure supplement 1), in which case we plot both the raw NSE values along with the noise floor, which is given by the test-retest NSE of the measurements. To address your comment, we have included a supplemental plot (Figure 2 - figure supplement 3) that shows the median uncorrected NSE as a function of distance to primary auditory cortex, along with the noise floor given by the reliability of the measurements. The figure is reproduced below.

      Figure 2 - figure supplement 3. Uncorrected NSE values. This figure plots the uncorrected NSE between natural and synthetic sounds as a function of distance to primary auditory cortex (PAC). The test-retest NSE value, which provides a noise floor for the natural vs. synthetic NSE, is plotted below each set of curves using dashed lines. Each thin line corresponds to a single ferret (gray) or a single human subject (gold). Thick lines show the average across all subjects. Format is the same as Figure 2F.

      We have clarified this important detail in the Results (page 4):

      "We used the test-retest reliability of the responses to noise-correct the measured NSE values such that the effective noise floor given the reliability of the measurements is zero."

      Reviewer #2:

      Landemard et al. compare the response properties of primary vs. non-primary auditory cortex in ferrets with respect to natural and model-matched sounds, using functional ultrasound imaging. They find that responses do not differentiate between natural and model-matched sounds across ferret auditory cortex; in contrast, by drawing on previously published data in humans where Norman-Haignere & McDermott (2018) showed that non-primary (but not primary) auditory cortex differentiates between natural and model-matched sounds, the authors suggest that this is a defining distinction between human and non-human auditory cortex. The analyses are conducted well and I appreciate the authors including a wealth of results, also split up for individual subjects and hemispheres in supplementary figures, which helps the reader get a better idea of the underlying data.

      Overall, I think the authors have completed a very nice study and present interesting results that are applicable to the general neuroscience community. I think the manuscript could be improved by using different terminology ('sensitivity' as opposed to 'selectivity'), a larger subject pool (only 2 animals), and some more explanation with respect to data analysis choices.

      Many thanks for your thoughtful critiques and comments. We have attempted to address all of them, which has improved the manuscript.

    1. Author Response:

      Reviewer #1:

      In this manuscript by Rankin et al., the authors proposes a model of reciprocal mesoderm-endoderm interactions involving Tbx5 activation of retinoic acid (RA) production in the posterior second heart field (pSHF) that activates endoderm expression of patterning ligands such as Shh, which feeds back to activate the pSHF to coordinate cardiopulmonary development. This is a nice model that bridges previous work from the same authors, which had shown that Tbx5 can alter Shh expression in a non-cell autonomous manner (Steimle et al, 2018), along with prior experiments showing that RA induces Shh expression (Rankin et al, 2016). As such, the novelty here lies in the mechanistic portion of the paper that describes how Tbx5 induces Aldh1a2, a gene responsible for RA production in the pSHF, along with the interactions of Tbx5 with putative enhancers of Aldh1a2. The use of the xenopus model system allows the authors to perform elegant epistasis experiments using morpholinos and Crispr/CAS9 excisions in the whole embryos, which nicely illustrates the role of Tbx5 in inducing Ald1a2 expression and the role of Tbx5 in downstream pathways described. The experiments are well-presented and follows a clear logic, and they are mostly supportive of the experimental models presented at the end of the manuscript. However, a potential weakness of this manuscript is the reliance on pharmacologic methods of modifying RA pathways rather than using a genetic/RNA targeting approach, which would confer more specificity related to the functional importance of the Tbx5 transcriptional target described (in this case, Aldh1a2). Furthermore, more precise colocalization of Tbx5 and Aldh1a2 within the developing cardiopulmonary tissues is important, with some clarification as to why there appears to be broad Aldh1a2 expression independent of Tbx5.

      We thank the reviewer for appreciating the mechanistic novelty and for the fair constructive criticisms. We have addressed your concerns with additional data and revisions to the text, which we agree have greatly improved the study.

      Reviewer #2:

      In this manuscript Rankin et al. combined mouse and frog genetic models to study the gene regulatory network orchestrated by the transcription factor Tbx5. The authors demonstrated that Tbx5 regulates expression of Aldh1a2 which catalyzes the production of retinoid acid (RA) in the lateral plate mesenchyme, thereby activating RA signaling which then signals to the foregut endoderm and induces Shh expression there. In turn Shh activates Hedgehog signaling in the mesenchyme where it works with Tbx5 to promote expression of Wnt2/2b. Wnt2/2b then initiates lung specification from the anterior foregut endoderm. Biochemistry assays were used to assess howTbx5 and RA regulate the transcription of Aldh1a2 and Shh, respectively. Two evolutionarily conserved enhancers were identified for the regulation of the transcription of Aldh1a2 and Shh. The authors in the end suggested that their work provides knowledge basis for better understanding the pathogenesis of the human birth defects DiGeorge Syndrome and Holt-Oram Syndrome. The findings help to fill the knowledge gap, connecting several observations made by previous studies. Moving forward, Tbx5 and other Tbx genes (e.g. Tbx4) continue to be expressed in the developing lungs. Whether a similar regulatory axis is present to modulate lung epithelial and mesenchymal development remains to be explored.

      We thanks the reviewer for appreciating the value of the work and for the helpful suggestions.

      Reviewer #3:

      Previously the Zorn lab has published that retinoic acid-hedgehog signaling is a key step in lung specification. (Rankin et al, Cell Rep 2016, 66-78.) Previously, molecular networks have been proposed for the early heart/lung differentiation. Examples include: (Xie, L.et al (2012). Dev. Cell 23 (2), 280-291; Steimle, J. D., et al. (2018). Proc. Natl. Acad. Sci. USA 115 (45), E10615-E10624. Peng T, et al, (2013) Nature 500(7464):589-92). Although many pieces of these signaling and transcription factor activities have been described, this manuscript demonstrates additional information. The strengths are the use of an in vivo system to tease out the transcriptional elements regulated by Tbx5 and RA. The authors perform sufficient experiments to support their claims although some of these readouts are qualitative rather than quantitative. The authors include relevant controls where possible. The authors were also rigorous by providing a time window for when Tbx5 control of Raldh2 occurs. One weakness is that the manuscript is difficult to follow for individuals who are not familiar with past published data in these networks. Another weakness is that some of the data is drawn from whole mount images and bulk sequencing which could lead to overstatements. A third weakness is that the manuscript does not have a clear focus. Its main concept is filling in the gaps for some of the gene transcription networks that have been described previously. An additional weakness is that almost all of the gene manipulations are global either by morpholino or chemical treatment (inhibition and activation). Finally, it is unclear what the outcomes of the signaling disruptions are in the embryos. We see a snapshot of gene expression but not how this affects organ development in the long run.

      We thank the reviewer for appreciating the value of elucidating the molecular mechanisms by which Tbx-RA interactions regulate cardiopulmonary development- we agree that this is the main value of our study. We also appreciate the reviewer’s perspective that we did not describe the work as clearly as we could, particularly for a non-expert, and that we may have overstated some of the conclusions. We have tried to modify the text to address these issues and have tempered our claims. In addition, we added new experiments to complement the inhibitor studies and test more rigorously our hypothesis, all of which support our model. We also provide additional data and a better description of previous publications on the final anatomical outcome of Tbx5-RA deficiency.

    1. Author Response:

      Reviewer #3:

      Sordillo and Bargmann report a detailed study of mechanisms by which RIM interneurons control foraging by C. elegans. A comparison of the effects of knocking out the vesicular glutamate transporter in RIMs to the effects of knocking out synthesis of the monoamine transmitter tyramine leads the authors to conclude that glutamate/monoamine cotransmission is required for RIM function. The authors further find that acute perturbation of RIMs by chemogenetics has a surprising effect. Manipulation of RIMs with HisCl - a histamine-gated ion channel that permits rapid and reversible inhibition of target cells - had the opposite effect of RIM ablation or mutations that affect RIM neurotransmission. The effects of HisCl-mediated perturbation require gap junctions, leading the authors to conclude that gap-junction connectivity between RIMs and their targets promotes specific foraging behaviors while neurochemical signaling from RIMs to their targets promotes antagonistic behaviors.

      This study has several strengths. Sophisticated genetic tools are developed to perturb RIMs. Measurements of RIM-dependent foraging behaviors are made using high-resolution video tracking systems, and these rich datasets are clearly presented and rigorously analyzed. The manuscript is clearly written and beautifully illustrated. And the overarching hypothesis that RIM interneurons support distinct behavioral programs when depolarized and hyperpolarized is provocative and significant. The study also has weaknesses, some of which significantly impact the strength of the authors' conclusions. These weaknesses are listed below.

      1) One major conclusion is that RIMs use both glutamate and tyramine as co-transmitters. This conclusion is based on the observed effects of VGLUT knockout in RIMs. It is known, however, that VGLUT facilitates the loading of monoamine neurotransmitters into vesicles, raising the possibility that the observed effects of VGLUT knockdown are via effects on tyraminergic signaling. The authors discuss this point and argue that an observed difference between the effects of tdc-1 mutation and RIM-specific VGLUT mutation indicates separable functions of glutamate and tyramine, i.e. co-transmission. However, most of the data are also consistent with a model in which VGLUT facilitates VMAT function. This point is critical for one of the study's main conclusions and should be resolved. For example, a clear role for a known glutamate receptor in RIM-mediated behavior would support the authors' conclusions.

      We have not examined receptor mutants; there are many excitatory and inhibitory glutamate receptors in this circuit, and each is expressed in multiple neurons, so we believe the only rigorous approach will be single-cell receptor knockouts like those presented here, possibly combined with single-cell eat-4 knockouts. With that said, Li et al. (2020) have identified a glutamate receptor subunit, avr-14, that (1) acts within command neurons and motor neurons to affect spontaneous reversals and (2) shows a genetic interaction with RNAi knockdown of eat-4 in RIM. These results support the suggestion that RIM uses glutamate as a transmitter. We now cite that result on page 14.

      2) The surprising observation that HisCl-silencing of RIMs causes the opposite effect as ablation of RIMs or mutation of the monoamine biosynthetic pathway is the basis for the other major conclusion of the study. The authors conclude that this difference reflects a signaling function for hyperpolarized RIMs that is eliminated by ablation. This difference might also reflect differences between chronic and acute perturbations. Methods exist to chronically silence neurons by expressing hyperpolarizing conductances, and the authors' model suggests that these manipulations would cause effects similar to those caused by acute inhibition via HisCl.

      We acknowledge this point. We added new data to Figure 5 to address it, showing that we saw similar behavioral results upon acute or chronic (48 hours) silencing of RIM.

      3) HisCl silencing of RIMs was performed using a tdc-1::HisCl transgene, which supports expression in RIMs and RICs. The authors should be certain that RICs have no role in the effects they see using this transgene. Similarly, perturbation of RIM gap junctions uses a tdc-1-based transgene, which should be paired with a control that allows the authors to rule out any contribution of RICs.

      The key difference between RIC and RIM(+RIC) chemical synapses is the large effect of RIM on reversal frequencies and length. To ask if that distinction applies to gap junctions, we have now expressed the unc-1(dn) transgene in RIC. This transgene did not affect reversal frequencies and length, unlike unc-1(dn) in RIM(+RIC). It did cause a decrease in reversal speed, forward run speed, and forward run length, results matching those of RIC chemical synapses. These results have been added as Figure 6- figure supplement 3.

      4) The authors' final model proposes that the constellation of chemical and electrical synapses endows RIMs with a kind of 'inertia.' In the absence of any data that report how perturbation of RIMs affects dynamics of the foraging circuit (AIB/RIB/AVA/AVB) is it difficult to assess this model.

      We added calcium imaging data demonstrating that silencing RIM reduces AVA activity as the new Figure 5 – figure supplement 3.

      Minor comments:

      The authors use variants of 'RIM glutamate KO' when referring to to the strain carrying RIM-targeted allele of eat-4/vglut. They should be consistent.

      We have addressed this point.

      A critical set of control experiments for the eat-4 conditional allele is presented in Figure 2S1 but not mentioned in the manuscript until data from Figure 3 are being discussed. These controls should be clearly described earlier in the results section - they are beautiful experiments and establish confidence in the method.

      Thank you -- we added that information to the text on page 5, as it is also relevant to some of the questions about variability between assays addressed in Essential Revisions point 3.

      Line 192 refers to 'a synaptobrevin-dependent transmitter.' This is correct, but it might be more clear to simply say 'another neurotransmitter.'

      Changed.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors bring compelling biochemical evidence that MatP and ParC compete to interact with the hinge of MukB. In addition, the authors made an effort to support their hypothesis with the description of the phenotype of a mutant of the hinge, mukBKKK, which fails to interact with both ParC and MatP.

      Thank you for your positive response!

      Reviewer #2 (Public Review):

      In the manuscript entitled "Competitive binding of MatP and Topoisomerase IV to the MukB hinge modulates chromosome organization-segregation", Fisher and colleagues characterized in vitro the interaction between MukB and MatP or between MukB and TopoIV using different biochemical approaches. First, they identified that a dimer of MatP or a dimer ParC interacts in vitro with the hinge domain of MukB but failed to form a tripartite complex ParC/MatP/MukB. Second, they observed that the interaction of full-length protein MatP with MukB hinge is competed out by ParC suggesting that MatP and ParC share overlapping binding sites on the MukB hinge. Third, using a MukB mutant (MukB-kkk) known to be deficient in ParC binding, the authors did not detect any major topo-IV defective phenotype questioning the requirement of direct interaction between TopoIV and MukBEF for chromosome unlinking by decatenation.

      Several major difficulties regarding this manuscript are indicated below:

      1) Although the authors have thoroughly and elegantly analyzed the in vitro interactions of the MukB hinge with either MatP or ParCCTD, there is no strong evidence revealing the significance of these interactions in vivo. While the interaction between MukB hinge and ParC has been reported by several groups over the years using two-hybrid screens and pull-down assays, the data presented in this manuscript reveal that the absence of such interactions in vivo do not cause major defects in decatenation/chromosome segregation resulting from an impaired action of TopoIV and do not importantly affect MukBEF activity.

      The above statement is largely correct, but in our opinion does not detract from the ‘elegant’ and ‘thorough’ work in the manuscript! We note that as here, most science advances are incremental. Using the methodologies available, we have not uncovered the detailed mechanism or significance of the hinge-MatP interaction in vivo and how it relates to the recently discovered second binding site for MatP-matS in the MukB coiled-coils. The relationship between the different MatP-bound states revealed by Bürmann and colleagues and ourselves sets the scene for future work by others! Furthermore, the new analyses clearly show that MukBKKK expressing cells have a phenotype more closely related to that of ΔmukB cells than ΔmatP cells, consistent with the second MatP binding site (in the presence of matS DNA in the hydrolysis-impaired ‘locked’ state of MukBEQ). In relation to the significance of the TopoIV/ParC-MukB hinge interaction, it is already clear from the literature that in the complete absence of MukBEF, there is sufficient topoIV activity in cells to allow relatively normal growth chromosome segregation during growth in minimal medium or in rich medium at low temperature. At restrictive temperature ΔmukB cells are filamentous and have unsegregated nucleoids. Prior to this work, we had proposed that much of the ΔmukB phenotype, including temperature-sensitive growth in rich medium as well as delayed decatenation of new replicated oris was related to the knock-on effects of mis-localising and mismanaging TopoIV recruitment to chromosomes, and in particular ori. Our analyses show that MukBKKKEF expressing cells may have a modest defect in decatenation at ori, as measured by an increased fraction of cells with unsegregated oris; intermediate between WT and ΔmukB cells, but this does not lead to temperature-sensitivity (Figure 5 and its Supplementary Figure).

      2) A robust readout is required to demonstrate the significance of the MatP-MukBEF interaction in vivo. The read-out used in this study (morphology of MukBEF foci) to reveal the absence of interaction between MatP and MukBEF is not easily quantifiable. Different methods have been used before that revealed the inhibition/displacement of MukBEF by MatP. Such methods (Mäkela and Sherratt, 2020; Lioy et al., 2018; 2020) should be used to estimate the MukBEF positioning/activity in the absence of interaction with MatP.

      In the revised manuscript, we have since extended our analyses to explore further the properties of the MukBKKK mutant and the significance of the MatP-MukBEF interaction in vivo, and state what conclusions arise from the new analyses. Elsewhere in this response, we explain why Hi-C, ChIP-Seq and SIM would not be helpful in our opinion. We did not have the reagents to undertake PALM by the time our group disbanded. In our opinion, the information from PALM would be helpful in ascertaining more about the behavior of MukBKKKEF in cells, but that is beyond the scope of this work.

      3) The in vivo interaction of MatP with the MukB hinge revealed by the bacterial two-hybrid assay (Nolivos et al., 2016) seems to be weak. A recent study (Bürmann et al. BioRxiv) has revealed the CryoEM structure of MukBEF complexed with MatP. In the structure obtained, MatP bound to its target matS interacts with MukE and the "joint"region of MukB, not with the hinge. Based on the structure, Bürmann and colleagues propose a very attractive unloading model enhanced at matS sites, explaining how MatP prevents MukBEF activity in the Ter region. In the present manuscript, it is not addressed whether the interaction of MatP with the hinge corresponds to a subsequent stage during unloading or contributes to another aspect of MukBEF activity.

      We agree with the above. At the time of completing our experimental work we became aware of the work of Bürmann and colleagues, but it was not in the public domain and therefore not suitable for discussion in the original manuscript. Since that work has now been deposited on bioRxiv, we now discuss it and how it relates to our own work. There is nothing inconsistent between the two complementary sets of work. Together they support the idea, proposed in the original manuscript, that the MatP binding to the hinge that we characterize is just one stage in a multi-step reaction, and provide the platform for future studies.

      4) DNA entrapment by MukBEF requires ATP hydrolysis (Bürmann et al. BioRxiv). It is therefore not obvious to understand how MukBEFEQ is bound in vivo to matS sites. Also as matS sites compete with the MukB hinge for MatP binding in vitro, it is not clear how MukBEF could interact with MatP bound to matS sites in vivo.

      EQ mutants of multiple SMC complexes studied load onto chromosomes in vivo, (imaging; ChIP-Seq etc.), where they remain stably associated, turning over slowly. In-line with this, MukBEQ mutants stably associate with chromosomal DNA (and accumulate at MatP-matS sites), where they have very slow off rates (e.g. Science, 338, 528–31). The DNA entrapment assay of Burmann will trap some types of stably bound MukBEF complexes (‘topologically entrapped’), but not others, as has been shown for several different SMC complexes. Importantly, ATP hydrolysis is required to release EQ SMC complexes from chromosomes (whether it be MukBEF or others), assayed in many different systems and published in many places. With respect to the second sentence of the reviewer, the second binding site for MatP-matS provides a means for how MukBEF interacts with MatP-matS in vivo. The revised manuscript clarifies all of this.

      5) It is not clear how matS sites could prevent MatP-MukB in vitro interaction. This is not consistent with the observation that MatP is required bound at matS sites to unload/prevent MukBEF activity.

      This is fully discussed in the revised manuscript.

      6) The title is misleading as no clear evidence is reported concerning an effect of the interactions on chromosome organization/segregation.

      We have modified the title to take account of this point: ‘Competitive binding of MatP and topoisomerase IV to the MukB hinge domain’.

      Reviewer #3 (Public Review):

      This paper focuses on protein-protein interactions of the E.coli Structural Maintenance of Chromosomes (SMC) complex MukBEF through the hinge domain of its MukB subunit. Previous work has demonstrated that the MukB hinge interacts with the ParC subunit of topoisomerase IV (ParC2E2), which decatenates sister chromosomes, and the MatP homodimer, which preferentially binds chromosomal matS sites near the replication terminus (ter) and excludes MukBEF from ter. This paper expands on previous studies by using a wide range of in vitro assays (isothermal titration calorimetry, native mass spectrometric, fluorescence correlation spectroscopy, analytical size exclusion chromatography, etc) to characterize MukB-MatP and MukB-ParC interactions qualitatively and quantitatively. One major finding is that MatP and ParC compete for MukB hinge binding, rather than forming a MukB-MatP-ParC ternary complex. Additionally, this study reports that a ParC-binding deficient MukB mutant (MukBKKK) is also deficient in MatP binding, suggesting that ParC and MatP have overlapping binding sites on MukB. Further, MatP-matS binding prevents MatP from binding MukB, suggesting that MukB and matS have overlapping binding sites on MatP. Live-cell fluorescence imaging of WT MukB and the binding-deficient MukBKKK mutant confirm that MukBKKK does not colocalize preferentially with ori or bind ParC, in contrast to WT MukB, although it does not show some of the expected Muk˗ phenotypes such as temperature-sensitive growth.

      Strengths of this study:

      1) Using a range of experimental techniques to study binding interactions between MukB, ParC/topo IV, and MatP helps increase confidence in the findings. For example, multiple lines of evidence (analytical size exclusion chromatography, native mass spectrometry, and fluorescence correlation spectroscopy) all indicate that there is no or minimal formation of a MukB-MatB-ParC ternary complex and that instead MatP and ParC bind competitively to MukB.

      2) The in vitro assays use only partially reconstituted complexes, including a substantially truncated form of MukB containing the hinge and a portion of the coiled-coil domain. The inclusion of in vivo imaging experiments showing that the binding-deficient MukBKKK mutant is impaired in ParC binding and proper localization at ori helps to support the relevance of the in vitro results.

      3) Experiments are well-controlled and the use of a number of MukB, ParC, and MatP mutants helps to support the conclusions regarding the interactions between these proteins.

      Thank you for the above.

      Weaknesses of this study:

      1) In some cases, different experimental techniques are used to measure binding interactions between WT and mutant proteins, and no explanation is given for the choice of technique. For example, isothermal titration calorimetry and native mass spectrometry are initially used to measure the MukB hinge binding to MatP, but these techniques are not used to characterize the binding of the MukBKKK hinge mutant to MatP. Comparisons between WT MukB and MukBKKK binding to MatP are provided by other methods (native PAGE, fluorescence correlation spectroscopy, analytical size exclusion chromatography), so the difference in binding between the WT and mutant is clearly demonstrated, but the lack of these corresponding experiments creates some confusion and makes it more challenging to interpret some of the results.

      We have clarified the text to ease the reader through the assortment of biochemical and biophysical techniques utilised. Our choice of techniques in part reflected the different personnel and equipment available over the several years that this project spanned and also the properties and sizes of the complexes under study; these influence the analytical technique chosen. We believe that the quantitative data re. stoichiometries and KDs etc. are directly comparable between the different analyses. We would argue that the range of techniques and comparisons between analytical tools is a strength rather than a weakness.

      2) Although the experiments and analyses are for the most part rigorous and well-controlled, there are a few minor experiments or analyses with some weaknesses. In the analysis of cell size in the in vivo imaging experiments (Figure 5D), the median cell lengths are reported for WT MukB and different mutants. The authors conclude that the cell size is essentially the same for WT MukB and the binding-deficient MukBKKK mutant, suggesting no major chromosome segregation defects for the mutant. However, the difference in median cell length between WT and MukBKKK is the same as that between WT and ΔmukB cells. Thus it is not clear how the authors draw their conclusion from these data. Further discussion and analysis (perhaps the presentation of the full distribution of cell lengths and/or statistical analysis) might support these claims. Further, the demonstration that the MukBS781F mutant is not growth-sensitive (Supplementary Figure 3C) appears to represent a single experiment, whereas even a second replicate would help increase confidence in the results.

      We now present the distributions of median cell lengths (Supplementary Figure 5A). It is true that there is no significant difference in cell length distributions of WT, mukB and mukBKKK cells grown at 30oC in minimal medium, leading to the conclusion that none of these strains have a major cell division defect under these conditions. This is all clarified in the revised text.

      The complementation assay presented in Supplementary Figure 3C is indeed from a single repeat. Use of a Phe substitution at position S718 was intended to give an indication as to whether substitution of residue S718 with 4-azido-L-phenylalanine for unnatural amino acid (UAA) labelling impeded MukB activity (something not tractable to unequivocally test for in vivo). Following purification of the UAA-containing protein, subsequent labelling and analysis of its ATP hydrolysis activity and its in vitro binding to MukEF, ParC and MatP, it was clear this protein retained its assayable functions. We are happy to remove this in vivo complementation if the reviewer wishes - it does not impact the quality of the experiments with the fluorophore-labelled protein.

      3) Although this study provides a more in-depth characterization of the ParC/topo IV and MatP interactions with the MukB hinge than in previous work, it is not clear that the results lead to substantial changes in our understanding of the role of these protein-protein interactions in coordinating chromosome segregation.

      Prior to this work, the binding interface of MatP upon MukB had not been mapped other than at the domain level (Nolivos et al. 2016). The work presented here builds upon this and pinpoints a refined patch upon the MukB hinge domain essential for MatP interaction. Moreover, this same surface is bound by ParC indicating likely competition for binding, something previously only alluded to by in vivo studies but not yet demonstrated in vitro. Both of these revelations, alongside the important demonstration that a single topoIV heterotetramer binds a dimeric hinge, reinforce a model in which MukBEF activity is regulated in a spatial and temporal manner by binding of its partner proteins, MatP and TopoIV, which is reliant upon the competitive nature of their binding to MukB.

    1. Author Response:

      Reviewer #2 (Public Review):

      In this work, authors investigated the versatility of the beta-proteobacterium Cupriavidus necator from the proteome perspective. For this purpose, they cultivated the microorganism in a chemostat using different limiting substrates (fructose, fructose with limited ammonia, formate and succinate) and under different dilution rates. Integration of experimental proteomic data with a resource balance analysis model allowed to understand the relation between enzyme abundances and metabolic fluxes in the central metabolism. Moreover, the use of a transposon mutant library and competition experiments, could add insights regarding the essentiality of the genes studied. This shed light on the (under)utilization of metabolic enzymes, including some interpretations and speculations regarding C. necator's physiological readiness to changes in nutrients within its environmental niche. However, several parts of C. necator metabolism are not yet well analyzed (PHB biosynthesis and photorespiration) and some conclusions are not well reported.

      Strengths:

      1) The manuscript is well written, easily understandable also for (pure) experimentalists, and adds a novel layer of comprehension in the physiology and metabolism of this biotechnologically relevant microorganism. Therefore, it is likely to raise attention and be well-cited among the metabolic engineering community of this organisms.

      2) More generally, the scope of the study is broad enough to potentially attract experts in the wider-field of autotrophic/mixotrophic metabolism, especially regarding the metabolic difference in the transition from heterotrophic to autotrophic growth modes and vice versa.

      3) Findings from different experimental techniques (chemostat cultivation, proteomics, modelling, mutant libraries) complement each other and increase the level of understanding. Consistency of the results from these different angles increases the roundness of the study.

      Weaknesses:

      1) A main conclusion of this paper is that it concludes that the CCB cycle operation in heterotrophic conditions (fructose and succinate) is not useful for the biomass growth. However, Shimizu et al., 2015 claim that the CBB cycle has a benefit for at least PHB production is increased, in the presence of the CCB cycle (as demonstrated by a decrease in PHB production when Rubisco or cbbR are knocked out). In this work the authors do not analyze PHB production, but they do analyze fitness in mutant libraries. They claim not see this benefit in this study, however in their data (Figure 5 F) also small fitness drops are seen for cbbR mutants on fructose, as well as on succinate. So I think the authors have to revisit this conclusion. The type of modelling they use (RBA/FBA) may not explain such re-assimilation as 'a theoretically efficient' route, as this type of modelling assumes ' stochiometric' metabolic efficiency with setting a maximum growth objective, which is not what seems to happen in reality fully.

      We agree that a minor decrease in fitness is visible for cbbR transposon mutants in heterotrophic conditions (Figure 5F). However, we have noticed that small changes in fitness can occur -particularly at a late stage of cultivation- as an artifact of the sequencing method (fast growing mutants displacing slow-growing ones). A replication of the experiment with pulsed instead of continuous feed showed a slightly increased instead of decreased fitness on succinate for cbbR (Figure 5-figure supplement 1). We therefore conclude that the resolution of the transposon library experiments is not sufficient to decide if the cbbR KO mutant conveys a small fitness benefit or loss. As the reviewer correctly points out, Shimizu et al. do not show a general fitness benefit but only increased PHB yield from CO2-refixation. We have rewritten our conclusions to account for the fact that our results do not contradict the findings from Shimizu et al., but that both increased PHB production and slightly decreased fitness (= growth rate) is possible at the same time. We also toned down our conclusions such that the question of a potential small fitness burden/benefit of the CBB cycle in heterotrophic conditions remains open.

      2) The authors focus a lot on readiness as a rational, but actually cannot really prove readiness as an explanation of the expression of 'unutilized' proteome, in the manuscript they also mention that it maybe a non-optimized, recent evolutionary trait, especially for the Calvin Cycle (especially because of the observed responsiveness to PEP of the cbbR regulator). The authors should discuss and not present as if readiness is the proven hypothesis. It would be interesting (and challenging) if the authors can come up with some further suggestions how to research and potentially proof readiness or ' evolutionary inefficiency'.

      We rephrased the respective sections to highlight readiness as one potential explanation among others. We added a suggestion for an experimental strategy to test this hypothesis (laboratory evolution of lean-proteome strains).

      3) C. necator is well-known for the production of the storage polymer polyhydroxybutyrate (PHB) under nutrient-limited conditions, such as nitrogen of phosphate starvation. Even though the authors looked at such a nitrogen-limited condition ("ammonia") they do not report on the enzymes involved in this metabolism (phABC), which can be typically very abundant under these conditions. This should be discussed and ideally also analyzed. The formation of storage polymers is hard to incorporate in the flux balance analyze with growth as objective, however in real life C. necator can incorporate over 50% of carbon in PHB rather than biomass, so I suggest the authors discuss this and ideally develop a framework to analyze this, specifically for the ammonia-limited condition

      As mentioned above to Reviewer 1, we have now performed nitrogen-limited chemostat cultivations in order to disentangle the formation of biomass and PHB. We have updated our model by incorporating separate fluxes 1) to biomass, and 2) to PHB according to the experimental results. We have also analyzed the enzyme abundance and utilization for phaA (in the model reaction ACACT1r), phaB (AACOAR) and phaC (PHAS). The first two enzymes showed high abundance that increased with degree of limitation for all substrates. PHAS showed a different pattern with much lower, constant expression. All enzymes were expressed regardless of N- or C-limitation, but the model did only show utilization during N-limitation where PHB production was enforced. These results were summarized in the new Figure 3-figure supplement 2.

      4) The authors extensively discuss the CCB cycle and its proteome abundance. However during autotrophic growth also typically photorespiration/phosphoglycolate salvage pathways are required to deal with the oxygenase side-activity of Rubisco. The authors have not discuss the abundance of the enzymes involved in that key process. Recently, a publication in PNAS on C. necator showed by transcriptomics and knockout that the glycerate pathway on hydrogen and low CO2 is highly abundant (10.1073/pnas.2012288117). Would be good to include these enzymes and the oxygenase side-activity in the modelling, proteome analysis and fitness analysis. An issue with the growth on formate is that the real CO2 concentration in the cells cannot be determined well, but not feeding additional CO2, likely results in substantial oxygenase activity

      C. necator has several pathways for 2-phosphoglycolate (2-PGly) salvage, as the reviewer points out. The key enzymes for the universal 'upper part' of 2-PGly salvage, 2-PGly-phosphatase (cbbZ2, cbbZP) and glycolate dehydrogenase GDH (GlcDEF), were all quantified in our proteomics experiments. The cbbZ isoenzymes showed identical expression compared to the other cbb enzymes: highest on formate, lowest on succinate (Figure 1-figure supplement 2D). The GDH subunits encoded by GlcDEF showed no significant trend between growth rates or substrates, and were more than 10-fold lower abundant than 2-PGly-phosphatase. This is in line with the findings from Claassens et al., PNAS, 2020, that showed only a 2.5-fold upregulation of GDH transcripts in a low versus high CO2 comparison (changes on protein level are often less extreme than transcript). The same study demonstrated that the glycerate pathway is the dominant route for 2-PGly salvage and found four enzymes extremely upregulated in low CO2: glyoxylate carboligase GLXCL (H16_A3598), hydroxypyruvate isomerase HPYRI (H16_A3599), tartronate semialdehyde reductase TRSARr (H16_A3600), and glycerate kinase GLYCK (H16_B0612). Here, these enzymes showed only slightly higher abundance on formate compared to the other conditions we tested (~2-fold). The increase was much lower than what the transcriptional upregulation in Classens et al. would suggest; It is therefore difficult to say if 2-PGly salvage plays a role during formatotrophic growth. Moreover, we also investigated conditional essentiality and found that none of the 2-PGly salvage mutants showed impaired growth on formate (see Figure R1 below).

      Unfortunately there is -to our knowledge- no data available on the rate of Rubisco's oxygenation reaction during formatotrophic growth, and our bioreactor setup does not support measurement of pCO2. It is known though that only 25% of the CO2 from formic acid oxidation is consumed for biomass (Grunwald et al., Microb Biotech, 2015, http://dx.doi.org/10.1111/1751-7915.12149), effectively creating an excess intracellular CO2 supply. Further, the substrate specificity of the C. necator Rubisco for CO2 over O2 is very high, about twice that of cyanobacteria (Horken & Tabita, Arch Biochem Biophys, 1999, https://pubmed.ncbi.nlm.nih.gov/9882445/). This indirect evidence suggests that flux through this pathway is most likely marginal. We therefore decided to omit it from model simulations. We have added a paragraph summarizing our findings regarding phosphoglycolate salvage to the results section.

      Figure R1: Fitness of 2-phosphoglycolate salvage mutants during growth on three different carbon sources, fructose, formate, and succinate. Four genes essential for growth on formate were included for comparison (soluble formate dehydrogenase fdsABDG). Fitness scores are mean and standard deviation of four biological replicates.

    1. Author Response:

      Reviewer #1:

      There is a critical need for new methodologies to study the physical properties of biomolecular condensates in living cells under normal and pathological conditions. To address this, Schlüßler et al innovatively combined Brillouin microscopy with Optical Diffraction Tomography (ODT) and epi-fluorescence imaging. The current study can have a significant impact on the community. A major strength of the study resides in the application of Brillouin microscopy, which offers a label-free and nondestructive approach to investigate the complex viscoelastic behavior of biological materials. The study initially attempts to benchmark their new methodology using control samples, called cell phantoms. Subsequently, the authors apply their new method to study the physical properties of biological materials including nucleoplasm, cytoplasm, phase-separated organelles, and adipocytes. The results are largely convincing and offer interesting insights into the complex material properties of these subcellular fluids and organelles.

      Thank you very much for this encouraging assessment.

      Reviewer #2:

      The multimodal instrument presented here provides an independent measurement of the spatially dependent cellular refractive index, which yields a more quantitative extraction of the longitudinal modulus from Brillouin spectroscopy. To my knowledge, this instrument is unique, and its capability addresses unresolved problems in Brillouin studies. The method was judiciously validated on standard samples. The experiments and analysis were carefully performed, and the statistics seems solid. The manuscript is very well written and clear.

      The results highlight some discrepancies between the generally accepted assumptions regarding the cell density and refractive index. One striking example is the finding that the nuclear matter exhibits lower mass density but higher longitudinal modulus. Using the fluorescence channel for specificity, the authors investigated successfully other cellular compartments.

      While the objectives of the study seem to have been achieved, I wonder how large an impact this development will have in the field. At the end of the day, the method yields the longitudinal modulus at GHz frequencies. Cell mechanics is indeed very important, but at much lower frequencies. For example, actin filament lifetime is of the order of minutes. It seems very difficult to infer cell mechanics information relevant to its function, from the GHz range, as the dispersion of the material is unknown.

      Thank you very much for this positive and accurate feedback.

      Indeed, Brillouin microscopy measures mechanical properties at a fundamentally different (higher) frequency range than common methods for accessing cell mechanics, such as atomic force microscopy, microrheology, deformability cytometry, and others, and converting between elastic and longitudinal modulus is generally not possible as the Poisson’s ratio and its dispersion is unknown. However, differences in mechanical properties measured should not be considered insignificant just because the underlying model might remain yet unknown. Hence, Brillouin microscopy results in the GHz and GPa range can still indicate differences in the mechanical properties important to cells, even though common cell mechanics happens at much lower time and frequency scales. In fact, Brillouin microscopy was shown to be sensitive to e.g. actin polymerization and branching of actin fibers (see Scarcelli et al. Noncontact three-dimensional mapping of intracellular hydromechanical properties by Brillouin microscopy. Nat. Methods 12, 1132–1134 (2015)) and empirical correlations to the elastic (Young’s) modulus have been found (Scarcelli et al. In vivo measurement of age-related stiffening in the crystalline lens by Brillouin optical microscopy. Biophys. J. 101, 1539–1545 (2011); Schlüßler et al. Mechanical mapping of spinal cord growth and repair in living zebrafish larvae by Brillouin imaging. Biophys. J. 115, 911–923 (2018)).

      Reviewer #3:

      In the manuscript by Kim et al, the authors present a combined optical system, termed FOB microscopy which bring together the epi-fluorescence, optical diffraction tomography (ODT), and Brillouin microscopy. The main purpose of FOB is to establish a colocalized measurement of Brillouin shift and the refractive index (RI) to calculate absolute densities of biological sample, especially the biomolecular condensate.

      The major strength of this paper (method development) is the added measurement of ODT which can correct for and thus provide a more precise RI and density of a given sample. If the RI and densities can be reliably measured in in vitro and cell samples, it will be a great tool that can complement other microrheological measurement. The major weakness is the lack of appropriate controls and lack of comparisons to other conventional methods (microrheological), which together lead to questionable outcomes from various measurements shown throughout the manuscript. Another concern is the pervasiveness of the method which involves excessive level of illumination and vibrational excitation.

      If the work could be revised to present careful calibration with samples that are pertinent to biological systems (both in vitro and cellular) and make a comparison to other conventional methods in every possible case, the strength and limitations of the combined microscopy will be clear, making it very helpful for the researchers in the field.

      Thank you very much for this review.

      The accuracy of optical diffraction tomography (ODT) has been shown in previous publications already and the combination with Brillouin microscopy does not affect this, since the basic working principle remains untouched. E.g. in McCall et. al. “Quantitative Phase Microscopy Enables Precise and Efficient Determination of Biomolecular Condensate Composition” bioRxiv, 2020 Oct; p. 2020.10.25.352823. doi: 10.1101/2020.10.25.352823 we show that the protein concentrations acquired with ODT/QPM agree well with results acquired with a volume-based approach and did not suffer from uncertainties of the fluorescent dye quantum yield, as it can happen for concentration measurements based on the fluorescence intensity ratio. Further publications tested the correctness of ODT by measuring samples with known geometry and RI value, such as microspheres (Y. Sung et al., Opt. Express. 17, 266–277 (2009); K. Kim et al., Opt. Lett. 39, 6935 (2014); A. Kuś et al., J. Biomed. Opt. 20, 111216 (2015); S. Chowdhury et al., Optica. 4, 537 (2017)). Also, hemoglobin concentration of red blood cells calculated from ODT measurements is consistent with mean corpuscular hemoglobin concentration (MCHC) measured independently by complete blood count (CBC) test (Y. Kim et al., Sci. Rep. 4, 6659 (2014)).

      Due to the low scattering efficiency of Brillouin scattering, Brillouin microscopy requires higher laser illumination powers (typically around 10 mW) than e.g. fluorescence microscopy. While certain illumination strategies, such as focusing the nucleus, can substantially damage certain cell types, such as GFP-FUS HeLa cells, adjusting the illumination strategy proved to reduce the negative effect on the cells, and wild-type HeLa cells were not affected in the first place. Furthermore, larger laser wavelengths in the near-infrared with less photon energy are known to be less phototoxic and would work similarly well for FOB microscopy. Hence, we think the method can be considered non-invasive, especially when realized with adjusted laser sources.

      Furthermore, the presented Brillouin microscopy makes use of spontaneous Brillouin phonons, which are – in difference to the phonons excited when using stimulated Brillouin microscopy – intrinsic to the sample. Hence, no vibrational excitation possibly altering the sample occurs when using the FOB microscope.

    1. Author Response:

      Reviewer 1 (Public Review):

      'Presynaptic Stochasticity Improves Energy Efficiency and Alleviates the Stability-Plasticity Dilemma' by Schug et al moves energy efficiency questions of stochastic synaptic transmission that were asked at the level of the single synapse and the single cell to the network level. This is important since local advantages in terms of energy cost may have unknown consequences at the larger scale. And stochastic synapses may have an unknown advantage in learning paradigms at the network level.

      I have some concerns regarding this work

      (1) The considerations are made in one/two particular network architecture with one parameter combination. The generality of the conclusions is not given and there is no reason to believe that the observations made here will hold for other network architectures or even different parameters. In this way, the current manuscript seems to describe the beginning of a project that hasn't really been worked through.

      We agree that it is an important concern that our findings are not a result of overfitting certain parameters to specific networks and tasks. We took considerable care to ensure robustness of the results and to avoid overfitting parameters to specific tasks. This information was not easily accessible in the original manuscript and we made corresponding changes to address this issue, see subsections "Metaplasticity Parameters" and "Model Robustness" in the Materials and Methods. In addition, we would like to point out that on top of standard rate-based neural network models used for the main experiments, we test our presynaptic learning rule on a standard perceptron model where we found qualitatively matching results. These results are complimented by a theoretical analysis of our learning rule, which further suggests robustness.

      (2) Additionally, the network architectures used here are rather artificial (multilayer perceptron) and come from machine learning. Linking a physical measure in a biological system (the metabolic cost) with task solving in a machine learning setting that does not have a biological pendant seems far-fetched and would not be the first thing in my mind to do to study the information transmission in biological neuronal networks.

      We decided to choose as simple models and metrics as possible that allowed us to isolate the effect of presynaptic stochasticity and plasticity on neuronal networks in goal driven tasks. We believe that the rate-based neural network models we mainly study present a parsimonious choice to approach the question presented. Regarding the link between physiological measures and our model, we point out that, in rate based models, firing rate is a common proxy for metabolic cost (see e.g. Levy & Baxter, 1996). This is one of the measures we use, see Figure 6(b). In addition, some of our results are evidence for improved metabolic efficiency, even without a 1-to-1 match from model- to biological networks. For example, increased sparsity would most likely imply improved metabolic efficiency in biological neural networks as well.

      (3) A lot of different measures for efficiency of the network are all briefly addressed but not dissected properly. A more fundamental understanding of why and when stochastic synapses in the network might be useful is missing and seems rather unexplored apart from some select manipulations.

      We focus on one measure for efficiency, namely the ratio of mutual information and metabolic cost. This is a natural measure which has been employed in prior work. Subsequently, we provide detailed explanations for how the proposed mechanism operates. For example sparsity is a natural, biologically relevant lens onto our network, as are the lesion experiments and the theoretical analysis. We believe that presenting different views strengthens rather than weakens the evidence.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The approach taken by the authors is very thorough, and the conclusions are well supported by the data. I think this is an important contribution to the field, and I have only a few specific comments:

      – The authors should sequence the mgrB gene and upstream sequence, and the rpoS gene for TMPR6-10. If these strains don't have mutations in mgrB, I think it's important to sequence their genomes to find out why DHFR levels are higher than in wt cells.

      Response: This is an important point that we had overlooked. We have now amplified and Sanger sequenced the mgrB gene and its promoter from all 10 TMPR isolates. As expected, we do indeed find mutations at the mgrB promoter in all 10 isolates. These data have been added to the revised manuscript in the Figure 1A schematic.

      – Presumably the higher number of mutations in mgrB rather than folA reflects the mutational space available, i.e. there are more possible mutations that reduce mgrB expression than there are gain-of-function folA mutations. This is worth mentioning, since it has a big impact on the evolutionary path to resistance.

      Response: We thank the Reviewer for pointing this out. We have discussed this point in the revised version of the manuscript (Page 17, Line 477-480).

      Reviewer #2 (Public Review):

      [...] 1) The authors find that mutations in the mgrB locus precede mutations in folA during E. coli's response to TMP. Why only sequence 5 of the 10 TMPR mutants? Was this subset chosen for sequencing based on any specific criteria? Below are some follow-up comments.

      Response: We thank the reviewer for this comment. Initially TMPR 1-5 were chosen since these isolates encompassed the entire range of drug IC50 values observed by us. We have now amplified and Sanger sequenced the mgrB gene and its promoter from all 10 TMPR isolates. As expected, we do indeed find mutations at the mgrB promoter in all 10 isolates. These data have been added to the revised manuscript in the Figure 1A schematic.

      a. Do any of the mutations cause growth defects relative to the wild-type strain?

      Response: This is an insightful question indeed. We have not measured growth rates of the trimethoprim resistant isolates. However, we have measured fitness of TMPR1-5 relative to wild type in competitive growth assays. In these experiments all 5 isolates have measurable fitness costs (relative fitness for isolates was between 0.7-0.8) when grown in drug-free media. Since mgrB mutations are found in all 5 TMPR isolates, we believe this result to be generally in line with our values of fitness for the mgrB-knock out strain. However, since TMPR1-5 have multiple genetic changes, attributing the measured fitness costs of these isolates to mgrB-deficiency alone is not possible. We are currently in the process of dissecting out the relative contribution of the various mutations in TMPR1-5 towards shaping the final fitness of the isolates. However, these will likely be reported in a later manuscript.

      b. Line 103: What are the mutations in folA promoter region? Only mutations in the coding sequence are listed in table 1 and figure 1A.

      Response: We apologise for this error. Though we have sequenced both promoter and ORF of the folA gene, we only found mutations in the coding sequence. We have made the necessary change in the revised manuscript.

      c. Line 109: The authors speculate that IS-element insertions in the mgrB promoter region reduce its expression, maybe they can provide a reference here from previous studies that have analyzed such mutations. Also, including details of the length/size of these insertion elements within table 1 would be helpful.

      Response: We have added references substantiating our claim that IS-element insertion in the mgrB promoter reduces its expression (Page 4, Line 110, ref 34, 35). The length of the insertions is indicated in Table 1.

      d. Line 111: the phrase "stop-codon readthrough" is misleading. The authors should rephrase to clarify that the single nucleotide deletion leads to a shift in the reading frame leading to an altered protein sequence at the C-terminal end.

      Response: We agree that this phrase is mis-leading. We have modified it in the revised manuscript (Page 4, Line 112).

      2) Based on growth assays including competitions, and measurements of folA gene expression in mgrB-deficient E. coli cells, the authors conclude that tolerance to TMP is caused by PhoP-dependent upregulation of DHFR.

      a. The authors should rewrite the text (lines 143-155) to make the experimental design of the competitions more obvious to the reader. Indicating either within the figure legend or main text what ∆mgrB/total means would definitely make analysis of the figure and results easier for the reader The reader needs to go to the materials section to get a full understanding how exactly this experiment was performed.

      Response: We have re-written this section for greater clarity and also changed Figure 1D accordingly.

      b. In Figure 1C, the IC50 value for ∆phoP is similar to that of wild type. If PhoP-dependent expression of folA important for TMP tolerance/resistance, shouldn't we expect to see a lower IC50, similar to that of ∆mgrB∆phoP? Intriguingly, the data for wild type in Figure 1C appears to be in conflict with the data in Figure 3B, please clarify.

      Response: This is an important issue, and we thank the Reviewer for pointing this out. We think that the reason phoP deletion reverses the phenotype of mgrB-deletion, but has no detectable effect in an mgrB-expressing background is due to the culture media used by us. Our experiments were performed in LB, which is a low magnesium medium. Since magnesium activates the PhoPQ pathway, in LB basal activity of PhoPQ is expected to be very low. Upon deletion of mgrB, we believe that there is an elevation in ‘unstimulated’ PhoPQ activity. This elevation is due to loss of feedback inhibition by MgrB protein. As a result, the effects of PhoP deletion are most pronounced in an mgrB knockout strain. We are, however, unable to explain why the IC50 of ∆mgrB∆phoP is lower than wild type. The possibility that there may be cross-phosphorylation of other response regulators by uninhibited PhoQ cannot be ruled out, however we do not have any data to substantiate this yet.

      The data is Figures 1C and 3B come from independently performed replicates. The mean values of IC50 of Wt in these figures are 26±13 ng/mL and 40±20 ng/mL respectively, which are not statistically significantly different.

      c. In Figure 1D, it is hard to figure out the exact strains and conditions of each competition. For instance, the ratios 10:1, 100:1 and 1000:1 needs to be clearly labeled, "wild type: mgrB" or "wild type: specific mutant" as applicable, the label on the X-axis is misplaced. Does "WmgrB" refer to ∆mgrB? If yes, change to ∆mgrB. Fitness values need a label or put into a table.

      Response: We have re-formatted this figure for better clarity as suggested. ‘w’ refers to calculated value of relative fitness and we have moved these values to the main text (Page 5, Line 149-151).

      d. Line 172: incorrect figure citation, replace Figure 2B with 2A.

      Response: We have made this correction.

      e. Lines 180-181: Only 5 out of the 10 TMPR isolates were sequenced and found to have mutations in the mgrB locus. In the absence of sequencing data confirming such mutations in TMPR 6-10 isolates, the increased levels of DHFR cannot be attributed to loss of mgrB.

      Response: We have now amplified and Sanger sequenced the mgrB gene and its promoter from all 10 TMPR isolates. As expected, we do indeed find mutations at the mgrB promoter in all 10 isolates. These data have been added to the revised manuscript in the Figure 1A schematic.

      f. In Figure 2C, it would be helpful to show the GFP fluorescence data for the single deletions, ΔphoP and ΔrpoS, to further support the claim that TMP tolerance via DHFR upregulation is PhoP dependent. In addition, the X-axis should specify the promoter reporter that was used.

      Response: We have added these data to Figure 2C and also specified the promoter reporter used.

      g. Lines 181-183: reference for the previous work on W30G folA is missing.

      Response: We thank the reviewer for bringing this to our notice. We have added the appropriate reference.

      h. In Figure 2, there is a discrepancy in the level of DHFR observed for both TMPR2 and 3 isolates in panels D and E - the DHFR protein levels are much higher in panel E. Can the authors explain this discrepancy, especially given the W30G mutation in TMPR3 (expected to show reduced levels of DHFR)? Is the same amount of protein loaded in both experiments? If so, why are the levels of protein different (and vastly different for TMPR3)? Better quantification of the western blots depicting the signal for the replicates would be helpful.

      Response: In order to be able to detect the lower levels of DHFR in ΔphoP derivates of TMPR strains, we have had to overexpose the Western blots. This may explain the apparent discrepancy between Figure 2D and E. To enhance clarity and ease of interpretation we have now quantitated all the immunoblots in the manuscript and reported fold changes in expression level.

      3) The data presented here also show that mgrB and folA mutations act in synergy in TMP resistant E. coli.

      a. It would be useful to the reader to include a table listing the MIC values in Figure 3. The plate images showing the E-tests are difficult to read and less helpful in interpreting the MICs and can be moved to the supplement.

      Response: We thank for reviewer for this suggestion. We have removed the E-test images from the figure and have included a table with the MIC values in Figure 3.

      b. In Figure 3E (and lines 234-238), what was the strain background used for DHFR overexpression? The details are missing from the paper.

      Response: The pPRO-DHFR plasmid was transformed into wild type E. coli MG1655. This information has been included in the revised Figure 3E.

      4) To follow the adaptive pathway for TMP resistance, the authors sequenced genomes of TMP-resistant isolates.

      a. Line 283: How many strains were sequenced at each time point? "3 to 5" is confusing.

      Response: The number of strains sequenced by us varied for different time points and lineages. We have rephrased this to ‘upto 5’ strains to prevent confusion. The exact number of isolates sequenced at each timepoint are given in the supplementary tables.

      b. In Figure 4, the data points/symbols and lines are hard to read in both panels A and B. These graphs can be replotted with open symbols or different colors to help the reader analyze the figure much more easily.

      Response: We have used different colours for clearer representation of data in the revised figure.

      c. Overall, it is still unclear how folA expression is regulated by PhoP regulation. An alternate hypothesis is that loss of MgrB may influence folA gene expression in a PhoP independent manner. Have the authors ruled out this possibility?

      Response: We agree that our study has not shed light on the precise molecular mechanism by which PhoP signalling affects folA levels, except that it is unlikely to be a direct effect. The reason we do not think that the effect is PhoP-independent is that phoP-deletion reverses the phenotype of the mgrB knockout, as well as the TMPR1-5 isolates. However, we cannot yet argue that there is no contribution from PhoP-independent mechanisms. Further genetic analyses are underway in our laboratory to determine other molecular players of this pathway.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this study, Kuppan, Mitrovich, and Vahey investigated the impact of antibody specificity and virus morphology on complement activation by human respiratory syncytial virus (RSV). By quantifying the deposition of components of the complement system on RSV particles using high-resolution fluorescence microscopy, they found that antibodies that bind towards the apex of the RSV F protein in either the pre- or post-fusion conformation activated complement most efficiently. Additionally, complement deposition was biased towards globular RSV particles, which were frequently enriched in F in the post-fusion conformation compared to filamentous particles on which F exists predominantly in the pre-fusion conformation.

      Strengths:

      1) While many previous studies have examined the properties of antibodies that impact Fc-mediated effector functions, this study offers a conceptual advance in its demonstration that heterogeneity in virus particle morphology impacts complement activation. This novel finding will motivate further research on this topic both in the context of RSV and other viral infections.

      2) The use of site-specific labeling of viral proteins and high-resolution fluorescence microscopy represents a technical advance in monitoring interactions among different components of antiviral immune responses at the level of single virus particles.

      3) The paper is well written, data are clearly presented and support key claims of the paper with caveats appropriately acknowledged.

      We appreciate the reviewer’s supportive comments. In our revised manuscript, we have focused on improving clarity regarding the minor weaknesses noted below.

      Minor weaknesses:

      Working models and their implications could be clarified and extended. Specifically:

      1) The finding that globular particles enriched in F proteins in the post-fusion conformation (Fig 3F) are dominant targets of complement activation as measured by C3 deposition by not only post-F- but also pre-F-specific antibodies (Fig 4B, left) is interesting. This is despite the fact that, as expected, pre-F antibodies bind less efficiently to globular particles (Fig 4B, right). How do the authors reconcile these observations, given that C3 deposition seems to be IgG-concentration-dependent (Fig 2E)?

      The reviewer raises an excellent point: globular particles, which accumulate as the virus ages, contain more post-F and less pre-F than particles that have recently been shed from infected cells. These ‘aged’ particles nonetheless accumulate more C3 when incubated with pre-F mAbs than ‘younger’ particles, where the proportion of pre-F is higher. We attribute this to the lower surface curvature of globular particles: they accumulate more C3 in the presence of pre-F mAbs in spite of the reduced availability of pre-F epitopes. Figure 1C and 1F help to support this point. This data shows C3 deposition driven by different antibodies bound to particles enriched in either pre-F (Figure 1C) or post-F (Figure 1F). Importantly, for this experiment the conversion to post-F was driven in such a way that virion morphology is preserved (Figure 1E). In this case, we see a clear reduction in C3 deposition by pre-F mAbs on post-F particles (e.g. for CR9501, the percentage of C3-positive particles drops from 24% on pre-F virus to 6% on post-F-enriched virus). This demonstrates that, in the absence of other changes, conversion of pre-F to post-F reduces complement deposition by pre-F specific mAbs.

      Similarly, the reviewer correctly points out that reduced levels of antibody binding lead to lower levels of C3 deposition (Figure 2E); however, as in Figure 1, this data is collected from particles with the same morphologies. Thus, in the absence of additional factors, reduction in mAbs bound to pre-F leads to a reduction in C3 deposition driven by these mAbs. The fact that we observe the opposite trend when changes in particle morphology accompany changes in post-F abundance points to an important role for particle shape in activation of the classical pathway.

      2) Based on data in Figure 5-figure supplement 2, the authors argue that "large viruses are poised to evade complement activation when they emerge from cells as highly-curved filaments, but become substantially more susceptible as they age or their morphology is physically disrupted." Could the increase in C3 deposition be alternatively explained by a higher density of F proteins on larger particles instead of / in addition to a larger potential decrease in membrane curvature?

      We agree that the density of F on a virus – the number of F trimers per unit surface area - likely contributes to the efficiency of C3 deposition. In Figure 6 – figure supplement 2 (Figure 5 – figure supplement 2 in the original submission), we control for this potential effect by comparing viruses that have the same amount of F (as measured by fluorescence intensities of SrtA-labeled F) that are either in filamentous form or globular form (induced through osmotic swelling). The total amount of F per virus is preserved during swelling, and the membrane surface area will remain constant due to the limited ability of lipid bilayers to stretch7. As a result, the input material for these comparisons is the same in terms of F trimers per unit area, yet the C3:F ratio differs substantially. This leads us to conclude that the differences must be attributable to factors other than the density of F. Importantly, this does not mean that the amount of F per unit surface area does not matter for C3 deposition – only that this is not the effect we are observing here. We have added text (Line 299) to help clarify this point: “This effect is unlikely to arise due to changes in the abundance or density of F in the viral membrane, both of which will remain constant following swelling. Similarly, it does not appear to be purely related to size, as larger viral filaments show similar C3:F ratios as smaller viral filaments.”

      3) In the discussion, the authors acknowledge that the implications based on the findings are speculative. However, more clarity on the basis of these speculative models would be useful. For example, it is not clear how the findings directly inform the presented model of immunodominance hierarchies in infants.

      We agree that this was unclear in the original manuscript. We have rewritten paragraph 4 of the Discussion to clarify how our results may contribute to the changes in immunodominance that have been observed in RSV between infants and adults.

      Reviewer #2 (Public Review):

      This is an intriguing study that investigates the role of virus particle morphology on the ability of the first few components in the complement pathway to bind and opsonize RSV virions. The authors use primarily fluorescence microscopy with fluorescently tagged F proteins and fluorescently labeled antibodies and complement proteins (C3 and C4). They observed that antibodies against different epitopes exhibited different abilities to induce C3 binding, with a trend reflecting positioning of IgG Fc more distal to the viral membrane resulting in better complement "activation". They also compared the ability of C3 to deposit on virus produced from cells +/- CD55, which inhibits opsonization, and showed knockout led to greater C3 binding, indicating a role for this complement "defense protein" in RSV opsonization. They also examined kinetics of complement protein deposition (probed by C4 binding) to globular vs filamentous particles, observing that deposition occurred more rapidly to non-filaments.

      A better understanding of complement activation in response to viruses can lead to a more comprehensive understanding of the immune response to antigen both beneficial and detrimental, when dysfunctional, during infection as well as mechanisms of combating the viral infection. The study provides new mechanistic information for understanding the properties of an enveloped virus that can influence complement activation, at least in an in vitro setting. It remains to be determined whether these effects manifest in the considerably more complex setting of natural infection or even in the presence of a polyclonal antibody mixture.

      The studies are elegantly designed and carefully executed with reasonable checks for reproducibility and controls, which is important especially in a relatively complex and heterogeneous experimental system.

      We thank the reviewer for the insightful comments. We have revised the manuscript to help to clarify points of confusion and to address some of the technical points raised here.

      Specific points:

      1) "Complement activation" involves much more than C3 or C4 binding. Better to use more specific terminology relating to the observable (i.e. fluorescently labeled complement component binding)

      We agree with the reviewer. We have revised the manuscript throughout to make our language more accurate and precise.

      2) What is the rationalization for concentrations of antibodies used? What range was tested, and how dependent on antibody concentration were the observed complement deposition trends? How do they relate to physiological concentrations, and how would the presence of a more complex polyclonal response that is typically present (e.g. as the authors noted, the serum prior to antibody depletion already mediates complement activation) affect the complement activation trends? The neat, uniform display of Fc for monoclonals that were tested is likely to be quite garbled in more natural antibody response situations. This should be discussed.

      We have added discussion of antibody concentrations and possible differences between monoclonal and polyclonal responses to the revised manuscript. Below, we address the specific questions raised here by the reviewer.

      We chose to use antibody concentrations that are comparable to the concentrations of dominant clonotypes in post-vaccination serum1. Our goal in selecting relatively high antibody concentrations for our experiments was to focus on understanding the capacity of an antibody to drive complement deposition when it has reached maximum densities on RSV particles. This is discussed starting on Line 125 of Results, and in paragraph 2 of Discussion. Experiments testing a range of antibody concentrations would be valuable, but are likely to strongly reflect differences in the binding affinities of these antibodies, which have been characterized previously.

      Although we have not performed titrations for each of the antibodies tested due to the large number of conditions needed and the limited throughput of our experimental approach, the manuscript does present a dilution series for CR9501, the IgG1 mAb with the greatest potency in driving C3 deposition among those tested here. This data (shown in Figure 3E & F in the revised manuscript) shows that as the amount of antibody added in solution decreases over a 16-fold range, C3 deposition decreases as well. The decrease in C3 deposition is roughly commensurate with the reduction in antibody binding, reaching levels that are just above background at an antibody concentration of ~0.6μg/ml (1:800 dilution). We think it is likely that other activating antibodies would show similar trends, while antibodies that do not activate the classical pathway at saturating concentrations would be unlikely to do so across a range of lower concentrations.

      We agree with the reviewer that complement deposition driven by polyclonal antibodies is more complex than the monoclonal responses studied here. As discussed in paragraph 2 of our revised Discussion, one effect that polyclonal serum might have is to increase the density of Fcs on the virus by providing antibody mixtures that bind to multiple non-overlapping antigenic sites. We speculate that this would generally increase complement deposition, provided that sufficient antibodies are present that bind to productive antigenic sites (e.g. sites 0/ , II, and V).

      Finally, we note that we observe a similar phenomenon where globular particles are preferentially opsonized with C3 in our experiments with polyclonal serum where IgG and IgM have not been depleted (Figure R1). The major limitation of this data – which is resolved by using monoclonal antibodies – is the difficulty of determining to what extent this bias arises due to the epitopes targeted by the polyclonal serum versus the intrinsic sensitivity of the virus particles.

      Figure R1: RSV opsonized with polyclonal human serum. A similar bias towards globular particles (white dashed circles) is observed as in experiments with monoclonal antibodies.

      3) Are there artifacts or caveats resulting from immobilization of virus particles on the coverslips?

      As pointed out by the reviewer, a few possible artifacts or caveats could arise due to the immobilization of viruses on coverslips. These include (1) spurious binding of C1 or other complement components to the immobilizing antibody (3D3); (2) reduced access to viral antigens as a result of immobilization; and (3) inhibition of antibody-induced viral aggregation. We are able to rule out issues associated with (1), because we do not see attachment of C1 or C3 to the coverslip (i.e. outside regions occupied by virus particles). This is consistent with the fact that the antibodies are immobilized on the surface via a C-terminal biotin attached to the heavy chain, which would limit access for C1 binding and prevent the formation of Fc hexamers.

      Immobilization on coverslips could reduce the accessibility of a portion of the virus for binding by antibodies and complement proteins. This could effectively shield a portion of the viral surface from assembly of an activating complex, which we estimate requires ~35nm of clearance above the targeted epitope on F8. Importantly, the fraction of the viral surface area that would be shielded would vary for filaments and spheres; to determine if this could influence our results, we calculated the expected magnitude of this effect (Figure R2). To do this, we modeled the virus as being tethered to the surface via a 25nm linkage. This accounts for the length of the biotinylated PEG (~5-15nm for PEG2K, depending on the degree of extension), streptavidin (~5nm), and the anti-G antibody (~10-15nm including the biotinylated C-terminal linker). Although limited structural information is available for RSV G, the ~100 residue, heavily glycosylated region between the viral membrane and the 3D3 epitope likely extends above the height of F (~12nm). Our model assumes that a shell of thickness d surrounding the virus is necessary for antibody-C1 complexes to fit without clashing with the surface (this shell is shaded in gray in the schematic from Figure R2). Tracing the angles at which this shell clashes with the coverslip allows us to calculate the fraction of total surface area that is inaccessible for activation of the classical pathway. The results are plotted on the right side of Figure R2. The relative surface area accessible to a 35nm activating antibody-C1 complex differs between a filament and a sphere of equivalent surface area by about 15%. We conclude that this difference is modest compared to the ~5-fold difference in deposition kinetics we observe between viral filaments and spheres (Figure 4), or the 3- to 10-fold difference in relative C3 deposition we observe on larger filamentous particles after conversion to spheres (Figure 6 – figure supplement 2C).

      Finally, by performing experiments on immobilized viruses, we eliminate the possibility for antibody-dependent particle aggregation. While this was necessary for us to get interpretable results, the formation of viral aggregates could affect the dynamics and extent of complement deposition. For example, activation of the classical pathway on one particle in an aggregate could spread to non-activating particles through a “bystander effect”, as has been reported in other contexts9. We are interested in this question and have begun preliminary experiments in this direction; however, we believe that a definitive answer is outside the scope of this current work. To alert readers to this consideration, we have added this to paragraph 2 of the revised Discussion (Line 359).

      Figure R2: Estimating the surface accessibility of RSV particles bound to coverslips. Definition of variables: af: radius of cylindrical RSV filament; as: radius of spherical RSV particle of equivalent surface area (see Figure 6 – figure supplement 2A); d: distance needed above the viral surface to accommodate IgG-C1 activating complexes; h: height of viral surface above the coverslip; L: length of the viral filament.

      4) How is the "density of antigen" quantitated? What fraction of F or G is labeled? For fluorescence intensity measurements in general, how did the authors ensure their detection was in a linear sensitivity range for the detectors for the various fluorescent channels? Since quantitation of fluorescence intensities is important in this study, some discussion in methods would be valuable.

      We have performed this important additional characterization of our fluorescence system and our overall labeling and quantification strategy to address these concerns. The results of this characterization are now included in two new figure supplements in the revised manuscript (Figure 1 – figure supplements 2 & 3).

      5) The authors also show that the particle morphology, whether globular or filamentous, as well as relative size and resulting apparent curvature, correlate with ability of C3 to bind. Some link to the abundance of post-fusion F (post-F) is examined and discussed, but I found the back and forth discussion between morphology, C3 binding, and post-F abundance to be confusing and in need of clarification and streamlining. Is there a mechanistic link between morphology changes and post-F level increases? Are the two linked or coincidental (for example does pre-F interaction with matrix help stabilize that conformation, and if lost lead to spontaneous conversion to post-F?). Please clarify.

      Specifically, we have separated the discussion of pre-F versus post-F abundance and particle morphology into two different sections in Results, and we have rearranged Figures 4 and 5 (Figures 3 and 4 in the original submission) to improve clarity.

      Regarding the question of whether changes in morphology and the pre-F to post-F conversion are coincidental or mechanistically linked: the answer is not entirely clear, although we have collected new data that suggests a connection. We first want to note that the two effects are at least partly separable: brief treatment with a low osmolarity solution causes particle shape to change while preserving pre-F (Figure 6A & B), whereas treating with an osmotically balanced solution with low ionic strength converts pre-F to post-F without affecting virus shape (Figure 1E). However, we were motivated by the reviewer’s questions to look into this further. To determine if the change in viral shape may serve to destabilize the pre-F conformation over time, we compared the relative amounts of pre-F and post-F present in particles that were osmotically swollen to those that were not at 0h and at 24h. In these experiments, particles were swollen using a brief (~1 minute) exposure to low osmolarity conditions before returning them to PBS (Figure R3, left). As expected, we observe no immediate change in pre-F abundance following the brief osmotic shock (Figure R3, right: 0h time point), consistent with Figure 6B. After incubating the particles an additional 24h at 37oC, the post-F-to-pre-F ratio is ~3.5-fold higher in osmotically-swollen particles than in those where filamentous morphology was initially preserved (Figure R3, right: 24h time point). This supports the reviewer’s suggestion that interactions with the matrix may help to stabilize F in the prefusion conformation, since the conversion to post-F is faster when this interaction is disrupted. Whether or not this has any relevance for RSV entry into cells remains to be determined; however, it is worth noting that we observed no clear loss or gain of infectivity in RSV particles following osmotic swelling (Figure 6 – figure supplement 1A). Since this result may be of interest to readers, we have included this new data in Figure 6 – figure supplement 1B, and it is discussed briefly in Results (Line 250).

      Figure R3: Determining stability of pre-F following matrix detachment. Left: Experimental design. Right: Comparison of pre-F stability on untreated particles (gray) and particles subjected to brief osmotic swelling (magenta). Distributions show the ratio of post-F (ADI-14353) to pre-F (5C4) intensities per particle, combined for four biological replicates, sampled at 0h (immediately after swelling) and after an additional incubation at 37oC for 24h. Black points show median values for each individual replicate. P-values are determined from a two-sample T test.

      6) Since their conclusion is that curvature of the virus surface is a major influence on the ability of complement proteins to bind, I feel that some effort at modeling this effect based upon known structures is warranted. One might also anticipate then that there would be some epitope-dependent effect as a result of changes in curvature that may lead to an exaggeration of the epitope-specific effects for more highly curved particles perhaps than those with lower curvature? Is this true?

      The reviewer raises two excellent points: that it may be possible to gain insight into the mechanisms through which curvature dictates C1 binding and other aspects of complement activation through structural modeling, and that such a model may help to identify specific epitope effects that could contribute to curvature dependence.

      We developed simulations based on the geometry of RSV, F, and hexameric IgG to try to better understand how curvature may influence initiation of the classical pathway. This model is described in the Methods section (Modeling IgG hexamers on curved surfaces), and the results are discussed in the final two paragraphs of the Results section. In addition, we have included a new figure (Figure 7) to summarize the model’s predictions. This model corroborates the curvature sensitivity of IgG hexamer formation and suggests a possible intuitive explanation for our findings: high curvature effectively increases the distance between epitopes that sit high above the viral membrane, decreasing the likelihood of hexamer formation (Figure 7D). Regarding epitope specific effects, this model suggests that the further the epitope is above the viral membrane, the greater the effect that decreasing curvature will have. However, we find that epitopes closer to the membrane (e.g. those bound by 101F or ADI-19425) are overall very inefficient at activating the classical pathway, potentially due to steric obstruction of the formation of IgG hexamers. Thus, there may be an inherent tradeoff between overcoming steric obstruction (by binding to epitopes distal to the membrane) and sensitivity to surface curvature.

      It is important to note that this model is reductionist and does not include detailed structural information. Additional factors may be important for considering epitope-specific effects. For example, antibodies that bind equatorially on F (e.g. ADI-19425, which binds to antigenic site III), show minimal complement deposition in our experiments. However, particles whose curvature approaches the diameter of hexameric IgG or IgM (~20nm) may display these epitopes in a manner that is more accessible. If the curvature necessary to observe such an effect falls outside of the biologically accessible range, it would not be observable in our experiments. Nonetheless, it is possible that a different set of antibodies may drive complement deposition on highly-curved nanoparticle vaccines that are in development10. We have added this important point to the second paragraph of the Discussion.

      7) Line 265: it would be useful to confirm the increase C1 binding as a function of morphology as was done for antibody-angle of binding experiments.

      We believe that this data is shown in Figure 6B (Figure 5B in the original manuscript).

      Reviewer #3 (Public Review):

      Overall the manuscript is clearly written and the data are displayed well, with helpful diagrams in the figures to illustrate assays and RSV F epitopes. The engineering of the RSV strain to include a fluorescent reporter and tags on F and G that serve as substrates for fluorophore attachment is impressive and is a strength. The RSV literature is well cited and the interpretation of the results is consistent with structure/function data on RSV F and its interaction with antibodies. This reviewer is not an expert on the experiments performed in this manuscript, but they appear to be rigorously performed with appropriate controls. As such, the conclusions are justified by the data. One weakness is the extent to which the results regarding virion morphology are biologically relevant. Non-filamentous forms of the virion are generally obtained only in vitro as a result of virion purification or biochemical treatment. However, these results may be relevant for certain vaccine candidates, including the failed formalin-inactivated RSV vaccine that was evaluated in the late 1960s and caused vaccine-enhanced disease upon natural RSV infection.

      Thank you for these suggestions, which have helped us to better place our results regarding RSV morphology in the context of prior work. We agree with the reviewer that non-filamentous RSV particles are commonly obtained in vitro, and that this morphology does not reflect the structure of the virus as it is budding from infected cells. Our work has characterized the transition from filament to globular / amorphous form, with the finding that it can occur rapidly upon physical or chemical perturbations, as well as more gradually during natural aging: i.e. in the absence of handling or purification. We are also able to detect globular particles accumulating in cultured A549 cells, where no handling has occurred prior to observation (Figure 5 – figure supplement 1). While we do not currently know how well this reflects the tendency of RSV to undergo conversion from filament to sphere in vivo, we propose that it is plausible that such a transformation could occur. To distinguish between what we demonstrate and what we speculate, we write (Line 401): “Although more work is needed to understand the prevalence of globular particles during in vivo infection, our observations that these particles accumulate over time through the conversion of viral filaments – even under normal cell culture conditions - suggest that their presence in vivo is feasible, where the physical and chemical environment would be considerably harsher and more complex.”

      We agree with the reviewer that our results may have relevance towards understanding the failed formalin-inactivated vaccine trial. We have added this to paragraph 5 of the Discussion section.

    1. Author Response:

      Reviewer #2 (Public Review):

      Methods to characterize cell types in intact tissue using large scale analysis of molecular expression profiles are now readily available, with the best example being in situ RNA sequencing (spatial transcriptomics). However, these methods depend on separate immunohistochemical investigations to define the precise cellular and subcellular distribution of the protein products. Cole et al use iterative indirect immunofluorescence imaging (4i, Gut et al Science 2018) to compare the immunoreactivity of an impressive 18 different molecules within the same brain sections containing the dentate gyrus from young and old mice. First, they demonstrate that the method can be applied to not only adult mouse brain tissue, but also to human embryonic stem cell derived organoids and mouse embryonic tissue, which is an advance on the original report (Gut et al 2018). This demonstration is particularly important as it shows the potential for applying 4i to different biological disciplines. The rest of the manuscript focuses on the mouse dentate gyrus (DG) at 2, 6 and 12 months of age in order to map the complex changes and associations in the tissue across age. Various combinations of the 18 molecules are used to define different cell types and it incredibly informative to be able to view so many molecules in exactly the same area and will advance the field. This is the greatest strength of the manuscript. They find that neurogenic, radial glia-like stem cells (R cells) and proliferating cells are reduced in aged animals, as are immature (DCX+) cells, but claim that fluorescence intensity increases for the remaining R cells in 12 month old mice. They report that the density of vasculature also decreased with age, as did the associated pericytes, but astrocytes associated with the blood vessels increased. The last part of the manuscript defines 'microniches' (random or targeted regions of interest within the DG) and attempts to show how cell types, especially Nestin+ R cells, change in their associations with vasculature within these sub-regions at 2, 6 and 12 months of age. It is a commendable approach and the authors use a variety of statistical tests to compare the different cell types. However, there are several parts of the methods, along with insufficient details of the results that prevent full interpretation of the data, meaning that it is difficult to determine whether all conclusions are supported.

      1) There are many factors that can affect the measurements of immunoreactive structures (Fritschy, Eur J Neurosci, 2008 vol 28, p. 2365-70). The main limitation is not providing sufficient detail for the immunolabelling design and imaging parameters but providing some unclear details for the imaging analysis (below).

      We understand the reviewer’s concerns (outlined below) and tried to carefully address all raised points.

      a. In terms of immunohistochemistry, with the impressive number of tested antibodies, there is potential for variation due to antibody antibody penetration, unreported combinations of secondary antibodies, tissue quality (variations in fixation), etc. It is difficult to have confidence in the conclusions based on a total of 3 mice per age group for a single 40 um section per mouse. Ideally, to increase confidence in individual section variability, it is recommended that measurements should be taken from at least 3 sections per mouse then averaged, before averaging for the age group.

      We have now added additional experiments testing the elution properties of used antibodies (please refer also to point 4 of Rev#1). We have also tested the properties of secondary antibodies in terms of elution properties (now included in revised extended data Figure 1). Indeed, all analyses were done in 6 dentate gyri per mouse with the exception of quantifications shown in Figure 3B, C. Following the reviewer’s advice we have now expanded the analyses and include data from 3 sections of the DG per mouse per age group (please refer to revised Figure 3 and modified Supplemental Table 1).

      b. Assuming there were 3 primary antibodies with 3 secondary antibodies per cycle before elution, were the combinations used consistent for all brain sections and mice? Was the testing and elution order the same (i.e. systematic)? There is a risk of cross-excitation and mis-interpretation of true immunoreactivity if spectrally close fluorophores for the secondary antibodies were selected for primary antibodies that recognize spatially overlapping structures. Can the authors show the cycle number and fluorophore for the examples in figures 1 and 2 to determine which markers were imaged together in the same cycle? This would give confidence to the methods for colocalisation and cell type descriptions. For example, can cross-excitation be ruled out for some of the signals in the images used in Fig 2 (duplicated in Fig 4) such as intensely immunopositive Laminin-B1 cells in the MT3 and Sox2 channels (2A) and Ki167, SOX2 and phospho-histone 3 channels (2C)?

      We understand the reviewer’s point and have now added cycle combinations on page 20 of the manuscript (as we had done previously for Figure 1D). Given the fluorophores used and the setting of the laser scanning microscope (the description of which we have now expanded) there is basically no or extremely little chance of cross-excitation/detection. For the individual cells pointed out, cross-excitation is not possible because LaminB1, MT3, SOX2, KI67, and phospho-histone 3 were stained in separate cycles and therefore had no fluorescent labeling at the time each were imaged respectively. Figure 2C: indeed, that is a biological overlap as this SOX2-labled cell is in mitosis (Ki67 and phospho-H3 positive). The cycle order is now also provided in the revised Figure 2 supplement 1.

      c. For image acquisition, details are required on the resolution (numerical aperture of the lenses) in order to interpret colocalisation measurements in the later figures. Which beamsplitters/filters were used, and was the same laser power used for the same markers over different specimens (important for interpreting figure 4 data)?

      We have included that information in the revised manuscript. Please refer to page 20-21 of the revised manuscript. For Figure 4 data: we have added new analyses of proteins where expression levels do not change with advancing age (please also refer to point 2 of Rev#1).

      d. For the analysis of ROIs (figures 3-6), were the 20x or 40x images used?

      We used 20x images for analysis shown in Figure 2. This has been clarified. Please refer to page 20 of the revised manuscript.

      e. Details of the antibody specificity controls should be provided.

      All antibodies used are standard in the field and have been used in dozens of studies. None of the presented stainings is “novel” per se. The iterative approach is novel. This has been clarified. All antibody information is also available via the RRID that we provided.

      2) Numerous markers have been used to define different cells, but the proportions are not reported. For example, R cells are defined differently in figures 3 and 4. How many types of R cells (based on combinations of markers) were observed? High resolution examples of each defined cell type (neuronal and glial) would assist the reader in the confidence of the measurements (ideally as single channels side by side, with arrows indicating areas of detectable immunoreactivity that the authors would use to define each cell).

      All R cells were identified using the criteria outlined on pages 6/7 in the main text. The regions of interest created during the quantification of cell density in Figure 3 were used to measure the fluorescent intensities of HOPX, MT3, and LaminB1 in R cells. (see page 22 of the manuscript). We have added further clarification of this in the main text on page 8.

      “We next used 4i to analyze expression levels of selected proteins in the same R cells identified in the quantification of cell density.”

      3) The authors use HOPX and GFAP immunoreactivity and a lack of detectable S100beta immunoreactivity to distinguish R cells from triple immunopositive mature astrocytes. In Figure 3, the images are too low power to be able to confirm this. This part would benefit from some single cell examples showing the separate channels.

      We have added now high-magnification images in the revised extended Figure 3 to show the S100beta-negativity of R cells.

      a. Furthermore, the results (paragraph 2, page 7) report changes in cell number, but rather density is reported. Please either state the numbers or refer to density.

      This has been corrected.

      b. Related to Fig 3, there are no details of the number of R cells counted in supplementary table 1. How were the density measurements obtained? How thick were the image stacks and how many R cells per section? Similarly, as stated in methods, for glial cells, 100 cells were randomly counted in each section (presumably the same count for each age), so how was it reported that specifically the numbers of astrocytes were reduced and no significant differences in other glial cell types? (bottom of p.7)

      We have clarified how cellular densities were calculated on page 23. For density measurements, all immune-positive cells in each section were counted. The subset of 100 cells were only used for analysis of LaminB1 fluorescence intensity. All cells were counted throughout the entire images using the Cell Counter plugin in Fiji using localization identifiers for the ML, hilus/CA3, and the supra and infrapyramidal blades of the GCL. The areas of the hippocampal subregions were measured. Cell density was calculated by dividing the number of cells by the regional volume expressed as mm3 (region area[mm2 ] x tissue thickness [0.04mm]).

      4) An increase in fluorescence intensity for HOPX and MT3 (also marks R cells) was observed with age (Fig 4), with methods stating that the 5 ROIs used to calculate the background intensity were measured at each [optical?] slice for where the cells were measured, to account for unequal antibody penetrance. Several clarifications are required in order to interpret these results: For the example HOPX images in Fig 4A, for the 2 month old mouse, the background is low, whereas for 12 months, the background is far higher, meaning different background ROI values. Can this difference be explained by differences in laser power, contrast adjustments, optical slice thickness, or whether these are maximum intensity projections of different z thickness? These values must be reported, and for each image presented in the manuscript, details must be included as to what type of image (z-projection or single optical slice, z thickness). Was the optical section(s) of the 12 month mouse imaged closer to the surface of the section for this example in Fig 4A? Were cells sampled at all depths of the imaged volume? Did the antibody show better penetration in the 12 month old mice than the 2 month old mice? How many optical slices would a cell soma cover? In these cases, how was the fluorescence intensity measured? If a soma covered several optical slices, which one was selected for the ROI measurement?

      It is common to have higher background in immunofluorescence in tissues from older mice. All images for each individual stain were acquired in a single continuous imaging session using identical microscope settings as we have now clarified on page 20.

      “Within each cycle, all samples were labelled with the same antibodies, and imaged with identical microscopy settings for laser power, gain, digital offset, pinhole diameter, and z-step.”

      The example images in Figure 4A are maximum intensity projections including all frames containing positive immunoreactivity spanning the entire thickness of the tissue (62 frames for 64 frames for 12 months). There was no obvious difference in antibody penetration between ages and cells were sampled throughout the entire thickness of the tissue. We have now included clarification on where we acquired measures for fluorescent intensity on page 23

      “Fluorescent intensity was measured in the z-position in which it was brightest for each cell.”

      5) The described methods for studying cellular interactions are not clear, making it difficult to interpret the associations between vasculature, cell types, and age. How was colocalisation defined, and at what resolution? For example, it is expected that GFAP would be associated with but not directly colocalized with collagen IV (Fig 5). In these cases, the manuscript would benefit from high resolution examples of this colocalization/interaction. How many ROIs were taken, how exactly were the ROIs for cell types associated with collagen IV selected, was this in 2D or 3D?

      We understand that concern and have toned down the interpretation of our findings regarding “interaction” and now rather refer to “proximity” which is indeed much more correct (true interaction would require methods going beyond light microscopy). Please also refer to point 7 of Rev#1.

      6) The methods for random microniches are difficult to follow, as are the methods for investigating the associations of other markers to radial processes of R cells. Please provide a definition of a 'spot'. Again, details of the micron per pixel resolution and optical slice thickness would help in the interpretation of results. Additionally, if possible, illustrated examples of the full procedure for niche mapping should be provided in order to follow how the measurements were collected.

      We have tried to clarify the data acquisition and analyses of the microniches and modified explanation (see page 10 in the main text)

      “We speculated that within the aging DG neurogenic niche, micro-environments may exist possessing distinct capacities for preservation of neurogenic processes. Spots were randomly distributed across the GCL spaced 50µm apart. Utilizing the multidimensionality of the dataset acquired with tissue 4i, volumes of 11 cell markers were measured within a 50µm radius of each spot to achieve contiguous sampling of “microniches” in the GCL and bordering areas of the hilus and molecular layer (Figure 6A).”

      We have also added the required information to the revised Methods section regarding pixel resolution and optical slice thickness (please refer to page 20).

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript was well written and interrogates an exciting and important question about whether thalamic sub-regions serve as essential "hubs" for interconnecting diverse cognitive processes. This lesion dataset, combined with normative imaging analyses, serves as a fairly unique and powerful way to address this question.

      Overall, I found the data analysis and processing to be appropriate. I have a few additional questions that remain to be answered to strengthen the conclusions of the authors.

      1. The number of cases of thalamic lesions was small (20 participants) and the sites of overlap in this group is at maximum 5 cases. Finding focal thalamic lesions with the appropriate characteristics is likely to be relatively hard, so this smaller sample size is not surprising, but it suggests that the overlap analyses conducted to identify "multi-domain" hub sites will be relatively underpowered. Given these considerations, I was a bit surprised that the authors did not start with a more hypothesis driven approach (i.e., separating the groups into those with damage to hubs vs. non-hubs) rather than using this more exploratory overlap analysis. It is particularly concerning that the primary "multi-domain" overlap site is also the primary site of overlap in general across thalamic lesion cases (Fig. 2A).

      An issue that arises when attempting to separate lesions into “hub” versus “non-hub” lesions at the study onset is there is not an accepted definition or threshold for a binary categorization of hubs. The primary metric for estimating hub property, participation coefficient (PC), is a continuous measure ranging from 0 to 1, without an objective threshold to differentiate hub versus non-hub regions. Thus, a binary classification would require exploring an arbitrary threshold for splitting our sample. Our concern is that assigning an arbitrary threshold and delineating groups based on that threshold would be equally, if not more, exploratory. However, we appreciate this comment and future studies may be able to use the results of the current analysis to formulate an a priori threshold based on our current results. Similarly, given the relative difficulty recruiting patients with focal thalamic lesions, we did not have enough power to do a linear regression testing the relationship between PC and the global deficit score. Weighing all these factors, we determined that counting the number of tests impaired, and defining global deficit as more than one domain impaired, is a more objective and less exploratory approach for addressing our specific hypotheses than arbitrarily splitting PC values.

      We agree with the reviewer that our unequal lesion coverage in the thalamus is a limitation. We have acknowledged this in the discussion section (line 561). There may very likely be other integrative sites (for example the medial pulvinar) that we missed simply because we did not have sufficient lesion coverage. We have updated our discussion section (line 561) to more explicitly discuss the limitation of our study.

      1. Many of the comparison lesion sites (Fig. 1A) appear to target white matter rather than grey matter locations. Given that white matter damage may have systematically different consequences as grey matter damage, it may be important to control for these characteristics.

      We have conducted further analyses to better control for the effects of white matter damage.

      1. The use of cortical lesion locations as generic controls was a bit puzzling to me, as there are hub locations in the cortex as well as in the thalamus. It would be useful to determine whether hub locations in the cortex and thalamus show similar properties, and that an overlap approach such as the one utilized here, is effective at identifying hubs in the cortex given the larger size of this group.

      We have conducted additional analyses to replicate our findings and validate our approach in a group of 145 expanded comparison patients. We found that comparison patients with lesions to brain regions with higher PC values exhibited more global deficits, when compared to patients that did not exhibit global deficits. Results from this additional analysis were included in Figure 6.

      1. While I think the current findings are very intriguing, I think the results would be further strengthened if the authors were able to confirm: (1) that the multi-domain thalamic lesions are not more likely to impact multiple nuclei or borders between nuclei (this could also lead to a multi-domain profile of results) and (2) that the locations of these locations are consistent in their network functions across individuals (perhaps through comparisons with Greene et al., 2020 or more extended analyses of the datasets included in this work) as this would strengthen the connection between the individual lesion cases and the normative sample analyses.

      We can confirm that multi-domain thalamic lesions did not cover more thalamic subdivisions (anatomical nuclei or functional parcellations). We also examined whether the multi-domain lesion site consistently showed high PC values in individual normative subjects. We calculated thalamic PC values for each of the 235 normative subjects, and compared the average PC values in the multi-domain lesion site versus the single domain-lesion site across these normative subjects. We found the multi-domain site exhibited significantly higher PC values (Figure 5D, t(234) = 6.472, p < 0.001). This suggest that the multi-domain lesion site consistently showed stronger connector hub property across individual normative subjects.

      We also visually compared our results with Greene et al., 2020 (see below). We found that in the dorsal thalamus (z >10), there was a good spatial overlap between the integration zone reported in Greene et al 2020 and the multi-domain lesion site that we identified. In the ventral thalamus (z < 4), we did not identify the posterior thalamus as part of the multi-domain lesion site, likely because we did not have sufficient lesion coverage in the posterior thalamus.

      In terms of describing the putative network functions of the thalamic lesion sites, results presented in Figure 7A indicate that multi-domain lesion sites in the thalamus were broadly coupled with cortical functional networks previously implicated in domain-general control processes, such as the cingulo-opercular network, the fronto-parietal network, and the dorsal attention network.

      Greene, Deanna J., et al. "Integrative and network-specific connectivity of the basal ganglia and thalamus defined in individuals." Neuron 105.4 (2020): 742-758.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this paper, Judd et al performed intersectional viral-mediated genetics to resolve a projection map from Ntsr1-positive and inhibitory neurons in the anterior interposed nucleus. They show that, in contrast of what is currently thought, inhibitory neurons that project to the inferior olive in fact bifurcate to multiple brainstem and midbrain areas. This is a thorough and timely paper, with valuable information for cerebellar scientists with implications that will be of interest to the general neuroscience audience. As a direct consequence of the vast amount of information, this paper summarizes a lot of data using acronyms and summary schematics, which makes it at times difficult to follow the core story. A bigger concern is that the main conclusion arguing that inhibitory neurons make widespread extra-cerebellar projections relies on the assumption that the Cre-lines used in the study are able to specifically/exclusively mark to those inhibitory neurons – these details were not fully worked out in this study.

      We thank the reviewer for recognizing the importance of the study and pointing out important caveats. We have included a variety of validation methods to address the major concerns.

      Reviewer #3 (Public Review):

      By applying modern viral tracing methods, this paper described in detail extensive input-output connections of Gad1Cre+, VgatCre+, or Ntsr1Cre+ IntA projection neurons.

      Because diverse neurons are intermingled in a small region, it is generally challenging to isolate specific excitatory or inhibitory neurons and their circuits in the cerebellar nucleus.

      The authors focused on IntA of CN and demonstrated that 1) both inhibitory (Gad1Cre+ and/or VgatCre+) and excitatory (Ntsr1Cre+) neurons comprise extensive input-output connections with many extracerebellar regions, and 2) inhibitory circuits are functionally distinct from excitatory circuits on the basis of projection targets. This work could provide insights into diversity of inhibitory IntA neurons, and thus could be an interesting addition to the field's expanding efforts to identify cell types of CN, their input-output connections, and their functions.

      However, interpreting the data is difficult because of technical challenges. Critically, the main conclusion could be compromised by experimental artifacts, which need better characterization. In addition, the text could be revised to make it more accessible to a broad audience.

      We appreciate the reviewer’s recognition of the value of the questions addressed in this study and for raising the important technical points that are addressed in this revision.

    1. Author Response:

      Reviewer #1 (Public Review):

      This paper addresses an important question in the opioid field, whether the mu opioid receptor (MOR) and the delta opioid receptor (DOR) are likely to occur as independent receptors or whether their signaling is coupled and could be the result of interactions. The authors take advantage of a fluorescent label, NAI-A594, which binds to both receptors, to do live imaging experiments in the cholinergic neurons of striatum and test how the responses to a selective agonist to one receptor affects subsequent responses to an agonist of the other. They use receptor internalization and electrophysiological recordings to gauge the likelihood that the two receptors are independently expressed or act as a unit. The work is carefully done, and the authors conclude that the two receptors act independently in these neurons; the data support the idea that at all of the receptors are not necessarily linked. However, the work cannot exclude that some of the receptors also act together. One issue is that opioid receptors and GPCRs generally can produce distinct effects: recruitment of the beta-arrestin pathway promotes desensitization and receptor internalization, while signaling via Galpha or beta-gamma produces other signaling events. While the agonists used in the present study likely target both pathways in the MOR and DOR, it remains possible that in a heterodimer, signaling might instead be biased. Moreover, the similar downstream signaling pathways make it more difficult to untangle the possible interactions between the two receptor types.

      We have included discussion about biased signaling of ME. We agree that our work has not ruled out the possible transient formation of a subset of heterodimers as there could not be determined by approaches used in this study. However, at the macroscopic level and time scale, our results do not support the possibility of a stable dimer. It is not known if the lifetime of heterodimer association will be long enough for activation process and causing biased signaling. In basal conditions, no evidence of MOR-MOR homodimers was found by single molecule analyses (Moller et al., 2020; Asher et al., 2021). MOR-MOR-homodimers could be induced by DAMGO with a lifetime of ~460 milliseconds indicating the fast decay of dimers (Moller et al., 2020).

      Reviewer #2 (Public Review):

      Strengths:

      1) The authors use a nice combination of pharmacology, in situ hybridization, and fluorescence analysis to demonstrate that MORs and DORs are both expressed on ChIs.

      2) The authors specifically analyze expression patterns based on sex or dorsal vs ventral striatal zones.

      3) The authors use extracellular recordings of ChIs to demonstrate that their firing patterns are sensitive to MOR or DOR stimulation.

      4) The authors use pharmacology and extracellular recordings to demonstrate that MOR and DOR mediated inhibition is relatively independent of each other.

      5) The authors use live cell imaging and pharmacology to examine whether selective agonist administration results in receptor internalization.

      Weaknesses:

      1) It remains unclear why Met-enk pretreatment results in MOR desensitization.

      We agree this is an interesting question, and is something we are pursuing.

      Reviewer #3 (Public Review):

      This is an important study that addresses a standing question regarding whether different types of opioid receptors expressed in the same cell signal independently of one another or operate as functional units (heterodimers). This study specifically explores this question by investigating the co-expression of mu and delta opioids receptors (MORs and DORs respectively) in cholinergic interneurons of the striatum. It has been known for some time that these neurons express both MORs and DORs, but functional interactions between them in these neurons has not been explored. The study uses a variety of methodologies to investigate these interactions and the experiments were generally well designed to test this hypothesis. The results are quite striking, suggesting that this study will be of high impact to help show that while these receptors may co-exist within cells, that they do not necessarily have to act in concert with one another, which is especially relevant in deciphering opioid signaling in neural function and neurotransmission. There are some concerns related to how the data were analyzed, raising some questions about how to interpret the data and whether certain conclusions are warranted, but these do not detract too heavily from a very interesting study.

      Strengths:

      1) The use of the NAI compounds in combination with receptor-specific antagonists is a nice way to measure receptor expression and internalization, especially across the whole of the striatum.

      2) Performing measures in both male and female tissue is a strength.

      3) There was a nice comparison made between receptor-specific ligands and the endogenous opioid peptide met-enkephalin.

      4) The experiments were generally thorough with proper controls and used a variety of methodologies to address the hypothesis.

      5) Even without detailed statistical reporting (see below) many of the findings appear to be robust.

      Weaknesses:

      1) The manuscript lacks detailed statistical analyses, mostly relying on descriptions of the data as interpreted by the authors. In most places there are no indications of which statistical analyses were utilized and what the outcomes of those analyses were. In the few places where analyses were indicated, it is not clear that the appropriate tests were used (e.g. using an unpaired t-test when an ANOVA, and possibly a repeated measures ANOVA should have been used). The description of the statistical analyses utilized in the paper also conflicts with what is actually used in most places. It is difficult to evaluate the data without this information. Figure 3 uses standard deviation as a measure of variability, but other figures use standard error and there is no explanation for why this is the case. In many cases there are statements regarding data being different than baseline or 100%, but these are not supported by any statistical measures as being truly different.

      The statistical analyses have been added. The SD for the labeling data was used to demonstrate the distribution of fluorescence of each labeling condition, which also showed variability depending on location of neurons in the striatum. There were no hypothetical values for comparing in this data set, therefore SE was not used.

      2) The results section makes claims about the kinetics of desensitization as a result of met-enkephalin treatment (referring to Figure 5E), but there is no indication that a time by treatment factor was significantly different. The authors cannot make claims about the rate of desensitization without an actual assessment of rate. Relatedly, the authors do not fully discuss that while MORs and DORs have different degrees of desensitization at the times they measure, the two receptors may have similar maximal extents of desensitization, just at different time scales. Figure 5D has the implication that MORs are beginning to desensitize, just at a slower rate than DORs. Essentially, the authors are trying to have it both ways: ignoring rates in most cases and implicating rates in one case without actually testing them.

      The difference of desensitization curves as the result of ME treatments was analyzed by two-way ANOVA and the p values of time as treatment factor were included in the results. Regarding to the second concern, although it is possible that the two receptors may have similar maximal extents of desensitization if application of agonists is prolonged, this is not the purpose of our study that reports the different regulation processes of MORs and DORs endogenously co-expressed in a single neuron. It is clear that at during 5-minute application of agonists, desensitization of MOR occurs slower and with lesser degree than that of DOR.

      3) The authors conclude that MORs do not internalize, whereas DORs do, but their time course does not align with previous experiments, involving a very long treatment followed by a long washout period. The treatment differences could play a role in their differential outcomes (MOR recycling v. DOR recycling). The authors should address this disparity either experimentally or discuss it as a limitation.

      The time course of previous internalization and the desensitization were different. We did a new set of experiments in that internalization was studied with 5-minute application of agonists as it was used in desensitization experiments. There was no detectable internalization of either MOR or DOR at this 5-minute time point. We added these data in the results (Figure 6C), and discussed that desensitization and internalization were separable.

      4) The staining of DORs (as inferred by CTAP treatment) in Fig 1Bc does not match the pattern of DOR expression in the literature, appearing like there is no DOR anywhere besides the most dorsolateral region of the striatum. This also conflicts with their data in Figure 3. This is curious and should be addressed/discussed. The species differences between figures could play a role in this or it could be experimental methods.

      The result showed that there was no patch-like structure that indicated the staining of MOR. The image was taken from a macroscope with low resolution. The low fluorescence signal was difficult to acquire. In figure 3, we used 2-photon microscope to determine each stained neuron and thus a high-quality image was obtained.

      5) The authors used a variety of pharmacological agents and curiously failed to discuss instances where some of the agents didn't produce expected results. For example, morphine only partially decreased firing, which was surprising, but also wasn't discussed. CTAP and naloxone did not fully reverse the effects of DAMGO (Figure 5C), but this was glossed over.

      Morphine is known to be a partial agonist and thus our finding is not a surprising result based on several significant literatures (reviewed in Birdsong and Williams, Mol Pharmacol 2020, 98, 401-409). CTAP and naloxone reverse the action of DAMGO. The time course of blockade varied from cell to cell most likely as a result of the location of the cell within the brain slice. Cells deeper in the slice will be more slowly affected by both agonists and antagonists.

      6) It is curious that the authors found heterologous desensitization with met-enkephalin treatment, but did not explicitly test this with their receptor-specific ligands. This relates to a larger concern, and one that is lightly touched upon in the discussion: the indication that depending on the signaling pathway (G protein v. arrestin) there could be different outcomes for receptor function and regulation (i.e. biased signaling). It would be important for the authors to discuss this given that some of the pharmacological treatments they employ have different biases in their signaling which could affect their measured outcomes.

      We include a discussion on biased signaling of each receptor with ME. We also discuss the non-biased signaling of arrestin and G protein by DAMGO and deltorphin at MORs and DORs, respectively.

      7) Experiments were performed on tissue from both male and female mice, but the proportion of each sex used in each experiment was not clear, aside from Figure 3 and its accompanying supplemental figure. While overall expression may not differ between sex, sex differences could account for variability in functional data and the sexes used should be indicated in each experiment or at least discussed as a limitation of the study.

      We pooled the data from male and female mice. Reported numbers of male and female mice used were now shown in Table 1. We discussed the limitation of this finding and did not investigate potential sex difference. Each data point was the result from recording of one neuron from a single slice.

      8) The use of MOR knockout mice is a good control, but there are no details provided of how cholinergic interneurons were identified in these mice.

      We included in the figure legend (Supplemental Figure 3) that ChIs in MORKO were identified by morphology of neurons being larger than other neurons nearby. The staining of ChIs from MORKO was compared to the staining of ChAT-GFP using the same protocol and analysis. Finding the large cells and confirming with green fluorescence of ChAT-GFP helped in identifying and assigning ChIs without GFP.

      9) The description of the methods used to calculate desensitization (lines 236-240) did not seem to match what was actually performed and the methods did not clarify this. It is difficult to evaluate the data when it is not clear how the data were obtained.

      We have now corrected the calculation of the desensitization that is described in the results.

      10) The descriptions of MOR desensitization was muddled. It was described as having persistent inhibition (i.e. implied lack of desensitization), but the Table and Figures indicated that MORs do desensitize, just not to the extent that DORs do.

      We have now changed the description of MOR desensitization to be clearer that MORs also desensitized, but at a much lesser degree when compared to DOR.

      11) The authors cite literature that assessed cholinergic interneuron function in dorsal and ventral striatum and their staining data show expression of opioid receptors in both dorsal and ventral striatum, but they chose to focus on cholinergic interneurons in the ventral striatum. The authors should provide a clear rationale in the results section where this decision was made.

      The rationale has been added and focuses on functional interaction between MOR and DOR in the ventral striatum. The distribution of receptors measured with NAI-594 suggest a comparable expression of MOR and DOR in this area. This is a key point as it allows possible functional studies using cells with similar expressions of the two receptors.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors generate a unique dataset profiling the transcriptomes of late-stage secretory endometrium from women with past pregnancies confirmed to be (i) with sPE or (ii) without sPE, divided into pre-term and full-term pregnancies. They divide the dataset into training and testing samples in order to identify a signature of sPE. Intriguingly, they observe long lasting anomalies in the transcriptomes of endometria from women with prior sPE, which indicates that elements of PE pathogenesis survive the end of the pregnancy. This observation is consistent with the fact that a previous preeclamptic pregnancy is one of the best predictors for a subsequent preeclamptic pregnancies. The authors then perform functional enrichment analysis on the signature, which appears to be related to hormonal signaling (i.e. estrogen and progesterone signaling).

      Although this study presents a seemingly well-controlled dataset (with stratified controls at pre-term and full-term) and on a rarely assessed tissue (post-pregnancy endometrium), there are a number of concerns on the processing of the data and the statistical methods employed (for instance the use of fold-change thresholding, which is known to lead to extremely poor FDR control in standard processing pipelines). There is also not enough information about how some analyses were performed for full transparency and reproducibility of studies in the long-term.

      We would like to thank the reviewer for the feedback on our manuscript and the recommendations formulated to improve it. Regarding methodology, we supply more details to provide full transparency and reproducibility. Specifically, the statistical methods applied to obtain differentially expressed genes have been clarified to enable full understanding of the use of fold change thresholding and FDR cutoff.

      Reviewer #2 (Public Review):

      Garrido-Gomez et al. perform RNAseq on human endometrial biopsies from women with a history of sPE and women who never had sPE, to better define the signature of decidualization that may lead to sPE. The authors report on 166 differentially expressed genes in sPE compared to control, and many of which are connected to hormonal signaling through progesterone receptor-B and estrogen receptor 1 pathways. Therefore, changes in these genes could connect some of the dots between decidualization impairment and the development of sPE during pregnancy. The strengths of this manuscript include the use of human endometrial biopsies, a training and validation set of samples and the rigorous analyses to define the decidualization disorder genotype/phenotype of sPE. While this is a mostly descriptive study, the conclusions are mostly well supported by the data; however, some aspects of the patient population need to be further explained.

      The clinical definition of sPE and whether it was early or late onset is not included anywhere in the manuscript. Please clearly define these variables and discuss how these pathways may impact the timing of disease onset.

      Thank you for this observation that let us improve the clarity of the manuscript. Following this recommendation, we included the clinical definition of sPE in the “Study design” section of the Materials and Methods so that the definition is not just in the Introduction (Lines 35–37). Regarding preeclampsia classification, our work is focused on severe preeclampsia (sPE) due to the severity of its symptoms and health risks. Early onset preeclampsia (EOPE) tends to be more severe than late onset, but late onset preeclampsia could also be severe. Furthermore, all sPE cases included in the study were associated with prematurity (gestational week of <37) regardless of early or late onset. An increasing number of studies are focused on this classification, but severity is present in the clinical practice (PMID: 32443079). However, the discussion proposed is interesting, and we take it into consideration for future studies.

      The conclusion (and title) that these gene signatures could be useful for preconception or early prenatal screening is overreaching since all these biopsies were collected from women who already experienced sPE and not from women who had yet to be diagnosed with PE. As risks of many diseases (i.e. cardiovascular and metabolic disease) increase in women with a history of PE, sPE itself might have a long-lasting impact on the endometrial environment, independent from decidualization.

      There is increasing evidence to support that inappropriate endometrial maturation before pregnancy may contribute to reproductive disorders (PMID: 32521725). In addition, the role of defective decidualization in the origin of sPE was recently described (PMID: 33007270). It is worth highlighting recent work demonstrating a connection between decidual immaturity and shallow extravillous trophoblast invasion in preeclampsia (PMID: 25421975; PMID: 31356122). These studies identified DEGs in decidual tissue obtained in a chorionic villous sampling from women at ~11.5 gestational weeks who developed sPE symptoms 6 months later compared with normal pregnancies. Forty percent of these DEGs overlapped with DEGs associated with various stages of normal endometrial maturation before and after implantation, as identified by other data sets. These results reinforce the concept that a decidualization defect during the secretory phase and early pregnancy preceded the development of preeclampsia. Recently, it has been further demonstrated share molecular pathways of defective decidualization in preeclampsia and endometrial disorders, such as implantation failure, recurrent miscarriage, and endometriosis.

      In this context, we demonstrated [Garrido-Gómez T. et al. (PMID: 28923940)] that human endometrial stromal cells isolated from women who experienced sPE in a previous pregnancy failed to decidualize in vitro. Additionally, decidua from tissue sections of the maternal–fetal interface at delivery in sPE had transcriptomic alterations, isolated decidual cells failed to redecidualize in culture, and conditioned medium from these cells failed to support cytotrophoblast (CTB) invasion.

      Our findings reinforce a maternal cause for sPE through defective decidualization, and therefore is an important step toward the development of new strategies that will enable early assessment of women’s risk of experiencing sPE and would open the door to possible new therapeutic interventions to treat this enigmatic condition. Ideally, translational efforts could be targeted to noninvasive monitoring of maternal, placental, and fetal dynamics during pregnancy, which was recently proposed by Munchel et al. (PMID: 32611681). Following the reviewer’s recommendation, the conclusion and the title have been qualified regarding preconception and early pregnancy prenatal screening.

    1. Author Response:

      Reviewer #3 (Public Review):

      [...] This work challenges the notion that LOT inputs - inputs responsible for carrying information from the olfactory bulb as well as higher brain regions and thought to be important for odor recognition - onto pyramidal neurons become "hardwired" (LOT synapse stability in adulthood) after the olfactory critical period. Importantly, their data clearly demonstrate that LOT inputs onto distal apical dendrites can undergo LTP when these inputs are co-activated with other LOT inputs capable to generate local NMDA-spikes. These data help to reconcile seemingly conflicting previous findings.

      Weaknesses:

      We thank the reviewer for his important comments. Below are our answers point by point.

      Major issues:

      – Novelty: 1) It is well established that postsynaptic depolarizations given by dendritic spikes (i.e. NMDA-spikes) can trigger LTP in several cell types (i.e. Major et al., 2013; Golding et al, 2012; Remy and Spruston, 2007; Gambino et al., 2014). 2) The finding that LOT inputs onto distal apical layer 2 pyramidal neuron dendrites from PCx do not trigger LTP when global STDP protocols are used corroborates previously published findings [Johenning et al., 2009]. 2) The demonstration that IC inputs trigger LTP via global STDP protocols in proximal distal dendrites also corroborates previous findings [Johenning et al., 2009].

      We agree with the reviewer that NMDA-spike evoked potentiation was described before including by our group (Gordon et al. 2006). The main claim of the manuscript concerns an erroneous dogma with regard to the plasticity capabilities of LOT synapses in PCx. For piriform cortex the dogma in the literature is that LOT inputs do not undergo plasticity changes in adulthood. Thus a main novelty of this manuscript is to describe for the first time that LOT synapses do undergo large and robust LTP with local NMDA-spikes. Such a focal plasticity rule can have important impacts as to how odor information is learnt and represented in pyramidal neurons of piriform cortex which we discuss in the manuscript page 11-12 lines 269-299 (in non-tracked version of the manuscript).

      – The most interesting/puzzling finding is how even applying up to 23 NMDA-spikes at 4Hz at more proximal apical locations via activation of IC synapses completely failed to induce LTP of these IC inputs. However, global STDP can effectively trigger plasticity in these synapses. Unfortunately, the authors didn't explore the mechanisms of why distally located LOT synapses can trigger strong LTP via NMDA-spikes, but those more proximally located IC synapses cannot. These mechanisms should be explored, especially since they challenged the local depolarization and calcium influx properties of LTP induction. An experiment to study the role of active conductances that can explain these puzzling results should be designed, including imaging of local calcium concentration using the same tools as those described in the Methods section (although not presented in the results section) and measurements of local voltage changes using dendritic patch recordings (a technique for which the lab is well known).

      We thank the reviewer for this comment following which we performed additional experiments to clarify the point. As suggested by reviewer # 2, NMDA-spikes in the proximal apical dendrites that served for the induction protocol of the original version of the manuscript were indeed smaller than the full-blown NMDA spikes that can be generated in proximal location (average peak amplitudes of 32.7±4.6 mV and 47.4±5.8 mV and average area under curve of 3725±461 mVms and 7741±974 mVms for small and full blown NMDA-spikes respectively; see also Kumar et al. 2018). The reason we used smaller NMDA-spikes was to avoid initiation of BAPs during the induction period. However, following the reviewer’s comment to clarify this point we did further experiments: 1. We examined the ability of full blown proximal NMDA-spikes to induce plasticity in IC synapses. A similar NMDA-spike induction protocol (4-10 NMDA-spikes at 4Hz) also induced potentiation of proximal IC synapses (127 ± 4.39 microns from soma) (Figure 6A-D), but to a smaller extent compared to distal LOT synapses and even smaller than potentiation induced by STDP protocol in these synapses. Post induction, the proximal IC EPSP amplitude was 148.48 ± 4.1% of the control (Figure 6F; p = 0.00204; n = 9; p=0.0013 for comparison of proximal versus distal NMDA-spike potentiation; p=0.0127 for comparison with STDP in IC synapses). The number of NMDA spikes needed for this potentiation was between 4-10 NMDA spike repetitions. 2. To control for the contribution of these BAPs, we repeated the induction protocol but instead of using NMDA-spikes we used pairing BAPs and local EPSPs for 5 repetitions (1 EPSP paired with 3BAPs at 150 Hz repeated 5 times at 4 Hz). In this case we did not observe potentiation of these proximal IC synapses (Figure 6G), thus we concluded NMDA-spikes were crucial for the potentiation of IC synapses with the NMDA-spike protocol. 3. We measured the local calcium transients in active spines and neighboring shafts following STDP protocol activation (pairing BAPS and EPSPs) compared to local NMDA spikes in proximal IC. Interestingly we find, calcium transients evoked by NMDA-spikes both in shafts and spines, were significantly larger than those evoked by STDP stimulation (Figure 6Eand 6H; p<0.0001) despite the degree of potentiation with STDP protocol was higher (p=0.0127) compared to the NMDA-spike protocol. This results indicate the amount of calcium entry per se is not the only variable determining the degree of potentiation. See for example Gordon et al. 2006 where we showed that in distal basal dendrites of layer 2-3 neocortical neurons BDNF was a necessary requirement to gate plasticity in addition to calcium entry.

      These new experiments were added to the revised version and are replacing the previous experiments (page 7 lines 153-168 in non-tracked version of the manuscript and new Figure 6).

      – A demonstration that NMDA-spikes can occur in vivo in the apical and basal dendrites of PNs from PCx (i.e. during odor discrimination and plasticity task) would greatly strengthen their in vitro findings indicating that LTP can be triggered in LOT-synapses and IC synapses directed to basal dendrites when driven by NMDA-spikes. This is important since LOT synaptic contacts onto distal tuft dendrites of pyramidal neurons are few (~ 200 total contacts, Miyamichi et al, 2011) and sparse (Davison and Ehlers 2011). Hence, for the reported NMDA-spike-dependent plasticity observed in vitro to be the modus operandi for plasticity and memory formation in vivo the need for a significant amount of synchronously activated LOT inputs directed to >20 clustered spines in the apical dendrites of PNs from PCx would be required according to the presented data. Or at least, provide a more extended discussion on this issue.

      The reviewer raises an important point, following which we have extended the discussion with regard to the probability of NMDA-spikes to occur in-vivo. Our calculations are based on the following estimations:

      1) Typically, pyramidal neurons from layer 2B in adult mice have 10 terminal apical branches (Moreno-Velasquesz et al. 2021).

      2) The size of the LOT band is ~ 100 microns (Bekkers and Suzuki. 2013; Moreno-Velasquesz et al. 2021).

      3) Typical number of spine density is at least 1 spine/ micron, would result in at least ~ 100 spines per single terminal branch at LOT band. This band is almost exclusively innervated by LOT axons (see our results with baclofen blockade in Kumar et al. 2018 and Bekkers and Suzuki. 2013; Giessel and Datta. 2015). Thus, per terminal branch there are ample of LOT synapses given that ~10 synapses are needed to initiate a local NMDA spike. However, a critical question relates to the statistics of LOT activation during a natural odor stimulation.

      4) Srinivasan and Stevens. 2018, estimated each piriform neuron receives ~ 0.64 synapses from one glomerulus. We assumed 110 glomeruli are activated by a typical odor, which translates to 70 synapses per neuron following (Moreno-Velasquesz et al. 2021; Srinivasan and Stevens. 2018).

      5) For the probability calculations we assumed the connectivity of LOT inputs to layer 2B pyramidal neurons is random and independent (Giessel and Datta. 2015; Moreno-Velasquesz et al. 2021). we calculated the probability that at least one terminal dendritic branch will be simultaneous activated by 10 random LOT axons (the estimated number for NMDA-spike initiation) in any given neuron to be 15% (see Figure supplemental 3).

      These estimations show that there is a fair chance NMDA spikes will be initiated in dendrites of layer 2B pyramidal neurons in the pyriform cortex following odor stimulation. It should be stressed that these calculations are based on multiple assumption that were only partially tested experimentally, and thus serve only as proof of principle that dendritic NMDA spikes can serve for odor representation in pyriform pyramidal neurons. We agree that ultimately one should validate the occurrence of NMDA-spikes in piriform cortex pyramidal neurons by recording from dendrites in-vivo, however we feel this is beyond the scope of the present manuscript and will require an extensive experimental effort which we intend to pursue in the future.

      We have added a discussion on page 12 lines 300-317 (in non-tracked version of the manuscript) along with a new Figure supplement 3 showing a graph of our calculations.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] However, I also have some concerns about the main predictive model result. Although the parasite invasion/growth phenotypes are arguably simpler than an overall in vivo malaria disease phenotype, the reported 40 - 80% variance explained by the LASSO models strikes me as concerningly optimistic. Notably, the correlation in the growth phenotype for repeated samples from the same individuals (sampled weeks apart) is only rho = 0.34 (and for invasion, it is only 0.05). Given that a trait's repeatability is the upper limit to its heritability, and genetic prediction is based on a trait's heritable component, I do not understand how the trait prediction can be as strong as currently reported. Because the result is so striking, it will be crucial to perform true out-of-sample prediction to evaluate predictive accuracy and generalization error.

      We agree with the reviewer that the high values of variance explained in an earlier version of this work may have reflected overfitting of the LASSO models, even in randomized data. We have now reanalyzed the data in a k-folds cross-validation framework, as described in Essential Revisions. As expected, we observe lower predictive accuracy in smaller test datasets than in larger train datasets. Nonetheless, real data and malaria-associated genes produce models that are significantly more predictive of P. falciparum fitness in test data than expected from permutation or random RBC genes. We note that noise across repeated measurements from the same individuals, taken weeks or months apart, is likely to reflect variation from technical inconsistencies as well as environment-dependent biology.

      Assuming out-of-sample prediction holds up, it is interesting that the genotype data add substantially to predictive accuracy even after directly considering RBC phenotypes themselves. As the authors note, this result suggests that the mechanisms through which the genetic effects act are independent of the measured phenotypes. This prediction should be further evaluated (e.g., by assessing genotype-RBC phenotype correlations).

      We agree with the reviewer that some of the observed genetics effects must be mediated through phenotypes that we did not measure, which is quite interesting given the large number of phenotypes that we did measure. Additional phenotypes of interest include quantitative proteomics, transcriptomics, and metabolomics, among others, as addressed in the revised discussion. We plan to evaluate correlations between RBC genotypes and such phenotypes in future work, as this is outside of the scope of the current manuscript.

      Finally, although the results suggest no polarization of allele frequencies by European versus African ancestry, this result should be interpreted with caution throughout the manuscript, since it's unlikely that the predictive variants identified by LASSO are in fact causal.

      We agree with the reviewer that given our SNPs are likely to be imperfectly linked to the causal SNPs, some marginal signal of ancestry polarization of the causal SNPs could be lost. In the discussion, we agree that the predictive variants identified by LASSO may merely be linked to the true causal variants. However since linked alleles have correlated frequencies within populations, we think this is unlikely to substantially impact our conclusions about African and European ancestry with regard to small-effect alleles. We discuss how the lack of enrichment for most protective alleles in Africans is also supported by recent GWAS for severe malaria (MalariaGEN, 2019) and patterns of RBC trait variation observed here and in other studies. We provide several possible explanations for this consistent observation, including extensive pleiotropy of small-effect alleles (see Boyle, Li, and Pritchard 2017 and correlations with other phenotypes in Figure 5-Source Data 3).

      Reviewer #2 (Public Review):

      [...] 1. The authors note that there is one family (mother and five children) are not carriers of known genetic loci. Figure 5-figure supplement 4 shows that they have significantly different distributions than other non-carriers with regards to principal components and parasitic invasion and growth rate. My concern is that many of the tests in the manuscript assume independent observations and related individuals violate this assumption. The children should be removed from all analyses to test for the sensitivity of results to this structure in the data.

      We have revised the analysis after excluding the five siblings and verifying that the remaining donors are unrelated.

      1. This is also related to the increase in % variance explained in their lasso models when including genetics. It would be useful to know how much of the outcome variation was from the inclusion of the principal components specifically (capturing the family) versus the variants of interest.

      In the prior analysis, the PCs specific to the family explained up to 24% of the variation in invasion and 3% in growth in non-carriers. In the current analysis with the children excluded, PCs no longer have predictive power for growth or invasion. This change reflects the genetic uniqueness of the family, which directly produced the prior associations.

      1. It would be helpful to know some more about the variants that were included from exome sequencing. This would include their allele and genotype frequencies, as well as the comparison with reference population frequencies.

      We have added Figure 1-source data 1, which contains this information for ~160,000 exome variants that passed our quality filters.

      1. Are the frequencies of known RBC disease alleles consistent with population estimates? It would be useful to assess the representativeness of the sample.

      This information is now provided in Figure 1-source data 1. The frequencies of RBC disease alleles in our sample of African and admixed individuals are consistent with estimates from African populations.

      1. I would appreciate knowing a bit more about the difference between the two strains, one lab adapted and one clinical. Is it known how the lab strain was adapted or how representative it is to circulating strains? If so, may be worth describing in the discussion to explain the differences in results between the strains.

      We have added more details on the two divergent strains to the results and methods. We also discuss the strong correlations between the strains, including for specific phenotypes and genotypes, which suggest that our results may be generalizable. Finally, we note the interesting differences between the strains for African ancestry and HbAC carriers.

    1. Author Response:

      Reviewer #2 (Public Review):

      [...] A potential weakness of this study could be that the tagged beaked whales were feeding in an uncontrolled setting. Relying on wild animals alone would limit the conclusions, as it is generally thought that predators use a feed-forward control system to anticipate prey movements and strike just as prey respond. Therefore, to bolster their investigation, the scientists conducted experimental trials on the trained porpoises in which the experimenters pulled on targets at varying speeds. [...] The very small sample size of this study, with just two animals of each species, could be seen as another limitation. However, it is difficult to work with live cetaceans, and this sort of sample size is not unusual for biologging research. Nonetheless, it would be helpful to know more about the specimens. The data Vance et al. analyzed suggest that control bandwidths scale inversely with body length (e.g., with longer response latencies in larger animals). There was essentially no overlap in response times of the two porpoises, leaving one to wonder if one of the porpoises was notably larger. This sort of information would be useful to include in the paper. Also, potential extension of conclusions on response latencies to a range of other odontocetes, such as large sperm whales, would be useful. [...]

      Thank you for your excellent summary of our manuscript in the public review. We are very gratified to see that the argument and key conclusions of the manuscript were transmitted clearly. The reviewer raises two limitations of the study. The first is due to the lack of control over predator motion during wild prey captures, in particular striking movements at prey. There is an inevitable trade-off between experimental control and ecological realism which we addressed in the study by contrasting data from 'natural experiments', involving wild animals, and controlled trials with trained animals. This also led us to focus on latency in click-rate adjustment as this may be more directly related to prey/target motion than is predator strike behaviour. We understand that the reviewer is satisfied with this solution.

      The second limitation raised by the reviewer pertains to sample size. It is certainly difficult to collect high resolution biologging data from wild cetaceans, as the reviewer recognizes, leading to low sample sizes. However, the reviewer misstates the sample size of wild animals used in the manuscript: while we did work with only 2 captive harbour porpoise, in the wild studies we had a sample size of 6 harbour porpoise and 8 beaked whales (not two of each species as stated in the review).

      The reviewer also asks if our conclusions could be extended to consider other odontocetes (e.g., sperm whales). This is an excellent point because the large distance between the brain and biosonar sound source in male sperm whales creates an additional source of latency. We have added mention of this to the revised manuscript.

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors carried out a post-hoc analyses of a protective gene expression signature previously observed in preclinical trials and clinical trials (RV144 and HVTN505) to identify a possible correlate of reduced risk of infection and whether able to provide a potential mechanism for protection. This monocyte signature they focus on was absent in the DNA/rAd5 human vaccine trial which did not show efficacy and was enriched in the partially effective RV144 human trial where the vaccine was and ALVAC/protein vaccine. Here they indicate that the signature is a correlate of reduced risk of infection.

      Identifying signatures of protection is an important issue in the development of a HIV vaccine, and signature analyses might be important to reveal a few markers that might be selected to evaluate vaccine trials. However, this analysis must be able to point to very few genes, as single cell analyses are not an option in a large clinical vaccine trial.

      We agree scRNA-seq might not be applicable to assessing large scale clinical trials. However it is useful for identifying the cellular lineage of the signal we previously were unable to identify from bulk gene expression datasets (see discussion section, line 324). The signature identified has 200 genes for which methods are increasingly available for economic screening of large sample numbers, for example as we did previously on the Fluidigm BioMark platform (Ehrenberg et al. 2019).

      It is unclear whether or how the conclusion of the previous publication by many of the same authors of this paper, including the senior author (Ehrenberg et al., 2019, identification of a gene signature in B cells that is associated with protection from SIV and HIV infection providing a new approach for evaluating future vaccine candidates) is compatible with this new one: signature primarily expressed in myeloid lineage being the one most consistently associated with vaccine efficacy. It is unclear which one of the two is correct or how they are reconciled. Was the single cell analysis done in monocytes only for this paper or simply not reported in the studies of Ehrenberg et al., 2019?

      The protective gene signature was identified initially in microarray data from total PBMCs in the RV144 study and so we did not know the cellular lineage of this signal. Although RV144 samples were depleted, we had the unique opportunity to investigate the cellular lineage of the gene signature in the RV306 trial, which is a vaccine trial that was performed in Thailand and used the same RV144 vaccine series, with additional boosts after the 4th vaccination. We performed scRNA-seq at the timepoints that were equivalent to the RV144 4th vaccination and concluded that the enriched genes in the signature were mostly expressed in monocytes (discussion section, lines 329-336). The current paper has 3 new datasets: HVTN 505 RNA-seq data, RV306 RNA-seq data and RV306 single cell CITE-seq data. The Ehrenberg et al. 2019 were primarily focused on the Ad26 vaccine preclinical trials in NHP. This formed the basis of our current findings that expanded to human studies such as RV144, HVTN 505 and RV306. The CITE-seq data from the RV306 study was performed in 2020, only after we confirmed using bulk-RNA-seq from blood that the gene signature associated with increased ADCP in the RV306 study. We have clarified the different studies in supplementary table 1.

      Figure 1: The gene expression score (GES) of this figure does not seem to be for a specific cell type. It is unclear how the GES reported here relates to the final GES of monocytes. What is the utility of this analysis? Can we observe here the same most significant genes that we observe in monocytes? This is important because if bulk analysis gives the same results as looking at monocytes an eventual marker identified in monocytes could be evaluated in luck analysis.

      The composite gene expression score in Figure 1 focuses on a GES of only the enriched genes which can be used as a continuous or categorical variable in a immune correlates analyses as shown in Figure 1 or 2 regardless of phenotype. We see enrichment of a geneset that associates with vaccine protection and ADCP across multiple studies and species irrespective of methods being used. We think this set of 200 genes have a coordinated expression and may not be specific to a cell type, but might mark a certain biological state, such as response to a cytokine, and may be picked up even in PBMC and blood samples. We clarify this further in the discussion section (line 348).

      Figure 2: it would be good to know whether the subset of the 63 genes can be restricted to the most significant and their GES can still retain the predictive value.

      As suggested by the reviewer, we made a GES of a subset of the 63 genes in the RV144 signature that had the most significant genes (32) that associated with HIV acquisition in Fig 5. (p <0.05, q <0.1). We see that the association is slightly stronger and the probability of acquiring HIV-1 is lower in individuals with high GES (OR = 0.35 and p value = 0.0001 compared to previous OR = 0.37 and p value = 0.0002). Vaccine efficacy in individuals with high GES has also increased to 81.4% from 75.1%. The Distribution of AUC and accuracy plotted after repeating the process 1000 times showed that GES of the significant subset of the genes is predictive of HIV-1 infection with AUC of 0.69 ± 0.08 and with accuracy of 0.81 ± 0.04 (compared to previous 0.67 ± 0.08 and 0.81 ± 0.04). We agree this smaller subset could potentially be useful and now include them in the results section (page 11) and as a new supplementary figure 2.

      Figure 3 deals with genes associated with antibody dependent cellular phagocytosis (ADCP). Can one derive a gene or a few genes that are predictive of significant ADCP?

      Thank you for this suggestion, we have been able to explore this and now include new panels in the main figures which identify genes predictive of ADCP. We made a GES of the 93 genes enriched at the day 3 timepoint associating with magnitude of ADCP. A prediction model was built using the 93 genes from Day 3 time point. Internal validation has an area under the curve (AUC) of 0.80, suggesting that this classifier was able to discriminate high ADCP from low ADCP measured 2 weeks after last vaccination. This model, consisting of 93 expressed genes, was then tested at the Week 2 time point and was also able to predict ADCP as a dichotomous variable at the week 2 time point (AUC = 0.73).

      We further examined 82 genes overlapping between the 118 enriched genes from week 2 and the 93 enriched genes from day 3 post the RV144 vaccine regimen that associated with ADCP. A GES was computed using the 82 overlapping genes for both time points. A prediction model was built using the 82 genes from the Day 3 time point. Internal validation has an AUC of 0.81, suggesting that this classifier was also able to discriminate high ADCP from low ADCP measured at week 2 after vaccination. This model, consisting of 82 expressed genes, was then tested at the week 2 time point and was also able to predict ADCP as a dichotomous variable at the week 2 time point (AUC = 0.75). We thank the reviewer for this suggestion and we have now included these data as Figures 3B and 4B.

      Reviewer #3 (Public Review):

      Strong points:

      1. This provides a novel mechanism into the RV144-mediated protection of HIV acquisition.

      2. The analyses are robust and statistically sound.

      3. The flow of the paper/figures is easy to follow.

      Weak points:

      1. the RV306 trial (Figure 3 A and B) RNA-SEQ analysis vs ADCP could benefit from a little more information:

      Are the 118 / 93 genes at Wk2 / Day 3 post-vaccination overlapping a lot?

      Per the reviewer’s suggestion we looked for overlapping genes between week 2 and day 3 post the RV144 immunization series in the RV306 study. There are 82 genes in common between the enriched genes at week 2 and day 3 ADCP data which are now detailed in Supplementary table 2. The number of enriched genes in the pathway at these two timepoints are summarized in the following Venn diagram and are now included in the manuscript as Figure 4A.

      What are those genes? Do they play a known direct role in ADCP or are they upstream regulators?

      All 82 genes are listed in supplementary table 2. There is not a lot of information about genes associated with ADCP specifically from previous publications, but when querying existing databases for genes associating with phagocytosis, we identified four of the 82 genes in the GO:0006909 phagocytosis pathway including SIRPA, SIRPB1, RAB20, and TYROBP. When using GeneMANIA a gene function prediction tool to investigate interaction networks of the 82 overlapping genes we identified 44 additional genes that were connected to the 4 genes previously implicated in phagocytosis. We have included this information as a new supplementary table 4.

      We also show other canonical pathways with gene membership including the Immune System (33 genes), Innate Immune System (23), Signaling by Interleukins (9), Hallmark Inflammatory Response (7), Hallmark TNFA Signaling Via NFKB (7), Cell-Cell Communication (5), Interleukin-10 Signaling (4), Signal Regulatory Protein Family Interactions (3), and Pentose Phosphate Pathway (3). We now include this information in the results and discussion section and hope that this information will clarify the field further (new Figure 4D, new Supplementary table 3).

      Perhaps a heatmap representation with the ADCP as an annotation track would help unfamiliar readers better understand.

      We have now included a heatmap that shows gene expression of the 82 genes at both timepoints, with ADCP group status annotated (Fig. 4C). The list of the 82 genes are also available in Supplementary Table 2 – (“Yes” for enrichment in the “RV306 ADCP day3” and “RV306 ADCP wk2” columns).

      1. I would nuance that ADCP is "A" primary mechanism, not "THE" (title). There could be more potent unidentified mechanisms, so the usage of "THE" in the title is in my opinion premature.

      The title has been updated accordingly.

      1. While I agree that it is possible that ADCP is a primary mechanism with the previously identified transcriptomic signature given the evidence, we cannot exclude that the signature in fact represents an upstream regulator of ADCP, inducing a myriad of cascades contributing to vaccine-induced protection. If that were the case, ADCP could be higher in individuals with higher protection without it being directly involved in that protection (more of a collateral effect). Showing an enrichment of ADCP-associated genes from external datasets with the tested gene signature would strengthen at least partly that this is a direct phenomenon. Otherwise, I would nuance the statement and say that ADCP is a likely/potential mechanism of vaccine-induced protection.

      We agree with these nuances and have updated the title and discussion accordingly (Lines 1-2, 314-316).

      1. Observations in Figure 4 are glanced too quickly in the Results section: this would require a more in-depth description.

      Based on the results from the current revisions we have updated the previous Figure 4 (now Figure 5) to provide an indepth description of gene function prediction based on networks. We have used GeneMANIA, which is an application that can find associated genes or pathways using its functional association data, to examine overlapping enriched genes from the different studies with either infection status or ADCP magnitude. Interestingly, TYROBP, which is associated with phagocytosis, is the gene with the most connections to other genes or pathways. We also do a clustering analysis to identify highly interconnected sets of genes and pathways from the enrichment results of different studies. We describe this now in the results section (lines 209-219), updated Figures 5A-B and discussion section (lines 299-319).

      1. It is not clear whether the expression level per monocyte for the subset of genes tested in the CITE-seq data is different in patients with higher ADCP vs those with lower ADCP, or is the differential enrichment the result of a different number of cells that express this signature? Or both?

      For the CITE-seq data we performed differential expression analyses transcriptome-wide and found that monocytes had a higher frequency of differentially expressed genes when comparing higher versus low ADCP (Fig 5E). This effect was independent of frequency of the different cell populations with a frequency of >1%. We now include a sentence to clarify our findings (results last paragraph) and show the frequency differences in the supplementary section (Supplementary Figure 3).

    1. Author Response:

      Reviewer #1:

      This study set out to test whether Eurasian jays take into account the perspective and desire on an observing bird during food caching. The inclusion of multiple cues associated with different mental states is a novel and valuable approach to the field of comparative Theory of Mind and social cognition. The rigorous, elegant design over five experiments aims to pinpoint what mechanisms and cues the jays use when choosing where to cache what food and how much depending on the observer's perspective (caching location out of view or not) and current desire (being pre-fed on a single food type). The reader is guided through complex experimental procedures with clear descriptions of predictions and helpful figures. The manuscript is well written and stays on topic, while also expressing important concerns about the replicability and validity of studies in comparative cognition in general. The results go against many earlier papers by some of the authors, and it is commendable that they set out to replicate their own studies in the first place. This kind of critical assessment of earlier research and presentation of negative results (especially in a replication study of positive results) is certainly welcome after the concerns commonly expressed that this is not done sufficiently often. The results support the conclusion and no inflated interpretations are presented. Other caching corvids can be tested with this method after suitable adjustments, and the general design that includes both perspective and desire of observer has wider uses also in other taxa. The explanation that results did not replicate due to age or experience of the birds is compelling (as is the effect size argument) and would be an interesting avenue for future research if possible. Although it remains an open question why the results differ from those of earlier studies, and what mechanisms the jays employ when caching in the presence of a conspecific, this excellent study will hopefully set the example for many future studies that cover multiple lines of evidence across several experiments to critically examine the validity and replicability of the science of comparative cognition.

      We are grateful to the reviewer for their feedback and share the hope that the present study might foster future research examining the robustness of published findings in the field of comparative cognition.

      Reviewer #2:

      In the present manuscript, Amodio and colleagues investigate whether or not Eurasian jays use cues correlated with the perspective and desires of a potential competitor to shape their caching behavior. The first two studies build on past research to test whether Eurasian jays can integrate cues regarding both the competitor's perspective and desires. The authors fail to find evidence that Eurasian jays integrate these cues to cache undesired food in unobservable locations. Thus, the authors attempt to replicate their own earlier findings showing that Eurasian jays' caching behavior is impacted by these cues when the competitor's perspective and satiety are manipulated in separate tasks. The authors fail to replicate these effects and conclude with a discussion of possible reasons for these failed replications and suggestions for increasing replicability in comparative psychology.

      Strengths:

      The logic of the experiments is very clear. Beginning with the two novel studies, the authors provide convincing justification for why cue integration is an interesting question to investigate in this species. Additionally, the initial null results for Studies 1 and 2 led logically to the replication attempts for the individual effects of competitor perceptual access and desire on caching behavior.

      Notably, several of the authors on this manuscript are also authors on the studies that failed to replicate here. These authors were thus able to discuss the nuances of the past and present experiments in great detail and offer a well-reasoned perspective on why these replication attempts may have failed. This combined with their broader discussion of the importance of replication in comparative psychology make the manuscript even more impactful contribution to the literature.

      We appreciate the accurate summary and the positive evaluation of our study.

      Weaknesses:

      The authors find null results across the board on all five studies, yet they do not have a section of the discussion that explores the implications of what it would mean for Eurasian jay cognition. Do the current findings mean that jays cannot, in fact, track the perspectives and desires of conspecifics? Should the current findings prompt a replication attempt of the desire tracking of mates from Ostojic et al., 2014? What other cues might affect this behavior? Etc.). Obviously, the absence of evidence does not mean evidence of absence but the authors should provide a thorough discussion of the possibility that these replication attempts may not simply be "local failures." Given the preponderance of null results (and the fact that these result could be the first null results on this topic published in a high-impact journal), the authors should at least briefly discuss what it would mean for the avian cognition if these replication failures are not just isolated cases but instead indicative of a broader pattern in bird perspective taking capacities.

      We appreciate the concern of the reviewer and have edited and extended the discussion accordingly. In particular, we now clearly specify what we believe our findings mean (and what they do not mean) with regard to the capability of Eurasian jays to respond to the perspective and desires of conspecifics in the caching context (see lines 676-693 of the revised manuscript). Furthermore, we briefly discuss the reliability of the evidence about desire attribution in the food- sharing context in Eurasian jays (see lines 693-702 of the revised manuscript). Finally, we also explore the scenario in which our negative findings would turn out to be indicative of a broader pattern in the field of corvid social cognition (see lines 669-676 and 733-742 of the revised manuscript). In regard to this last point, we are cautious when it comes to generalising from our local failures to more general claims about corvids’ social cognitive abilities, and we explain and discuss the reasons for this caution in the text.

      In the authors' discussion of why these replications may be inconsistent with the rest of the literature on this topic, they cite the subjects' age and life experience as reasons why they may have failed on these replication attempts. However, these suggestions feel under-explored and therefore tenuous. The authors should provide more context for this claim. For example, are there any empirical or anecdotal data supporting the proposal that older Eurasian jays are less motivated to guard their caches (or are there any related social changes experienced by older Eurasian jays)? Similarly, to what extent does the experience of living in the aviary reflect or diverge from the caching and pilfering experiences of wild Eurasian jays? For example, do birds in the aviary frequently cache and pilfer even though they receive a maintenance diet? (Given the role of experience in scrub jay caching behavior suggested by Emery & Clayton, 2001 additional context here would be informative).

      We thank the reviewer for encouragement to further address these points. We have expanded our discussion about the possibility that the inconsistencies between our findings and those reported by Legg and Clayton (2014) and Ostojic et al. (2017), might have resulted from age- related processes or prior experience and learning (see lines 605-641 of the revised manuscript). We bolstered this discussion by referencing relevant data in corvids (where available) and in other taxa, to provide more context and support for both potential explanations. In addition, we included further details about the caching behaviour exhibited by the Eurasian jays in our lab (see lines 632-638 of the revised manuscript). We believe these additions have strengthened this component of our discussion.

      Impact:

      This work is a well-thought-out series of studies that performs the valuable function of attempting to replicate previously well-cited findings within the field of comparative cognition. Unfortunately, these findings are null results, but given the uniqueness of this captive population of Eurasian jays, the publication of these findings is of critical importance to our understanding of this phenomenon. Additionally, the format of the paper (two new studies, followed by replications) serves as a nice example for how replication attempts can be integrated into novel investigations of comparative cognition.

      We are grateful to the reviewer for their positive assessment of the impact of our study.

    1. Author Response:

      Reviewer #1:

      This manuscript describes a novel tool for tracking synaptic plasticity at the single synapse resolution with a SEP-tagged GluA1 receptor. The authors rather convincingly demonstrate that this tool does not disturb synaptic physiology or mouse behavior. They also show that this tool can be used to measure the distribution of synaptic weights and its variation during a plasticity protocol in barrel cortex. This tool is useful for more quantitative measurements of synaptic strength in vivo. The main weakness of the method is related to the density of marked synapses which makes the tracking difficult with only 80% reproducibility, probably due to the resolution limits of 2P-microscopy. It could however be improved with in vivo superresolution technique. The other limit of the method is that it is not demonstrated to allow longitudinal studies at the single synapse resolution. The authors do not discuss this issue in detail. It seems feasible with additional markers of the dendrite and spines. But this is not developed in the manuscript. Also, it is not demonstrated to which extent this technique outperforms traditional methods for synaptic weight measurements like spine volume.

      We have included new experiments and analyses to further demonstrate the utility of these tools and expanded discussion of the limits of our approach to automatic synapse detection. We are further demonstrating that by introducing a red cytosolic fluorescent protein, longitudinal imaging of the same synapses across days is achievable with this mouse line. We further discuss how imaging SEP-GluA1 is a more accurate readout of synaptic plasticity and strength than spine volume, since synaptic strength is mediated by receptors rather than spine size. In previous studies, we have shown that imaging sparsely expressed SEP-GluA1 can reveal plasticity at synapses that were not detectable by measuring spine size (Zhang et al., 2015) or reveal larger amplitudes of changes in SEP-GluA1 than spine size (Tan et al. 2020, Roth et al., 2020). The dissociation of spine size and synaptic strength has been reported many times (Lee et al., 2012). For instance, spine number or volume is not changed at all at cerebellar Purkinje cell synapses during LTD (Sdrulla and Linden, 2007) and Insulin-induced endocytosis of AMPARs is not accompanied by spine shrinkage (Wang et al., 2007). Thus, spine size, in certain conditions, is not a good indication of synaptic strength, Together, these experiments demonstrate that measuring SEP-GluA1 is a reliable and sensitive readout of synaptic strength and plasticity. We believe that our added data and discussion as suggested by the reviewers has improved and strengthened our demonstration of this SEP-GluA1 KI mouse line.

      Reviewer #2:

      Over the last couple of decades, the development of fluorescent transgenic mouse lines (e.g thy1-GFP) and delivery techniques (e.g., in utero electroporation), as well as the democratization of recording methods in living animals (such as calcium imaging, high-density probes,) have strengthened the link between synaptic plasticity and behavior. Nevertheless, these methods are most of the time limited to hundreds of cells at best (very few in the case of patch-clamp recordings), or failed to achieve a clear synaptic resolution without affecting the tight equilibrium of endogenous proteins.

      In this paper, Graves, Roth, Tan, Zhu, Bygrave, Lopez-Ortega et al present an additional high-resolution optical tool to overcome these limitations. They generated a new knock-in mouse line that fluorescently labels all endogenous AMPA receptors (KI SEP-GluA1). In this mouse line, the extracellular N-terminal domain of the GluA1 subunit of AMPAR is tagged with super ecliptic pHluorin (SEP), a pH sensitive variant of GFP that fluoresces at neutral pH (at cell surface) and is quenched at acidic pH (within the cell). This tool thus avoids the use of antibodies or the over-expression of exogenous tagged receptors.

      They perform a set of convincing experiments showing that synaptic transmission, homeostatic and activity-dependent synaptic plasticity in vitro (Figs2-3), and behavior (Fig. 4), are not affected in KI SEP-GluA1 as compared to wild-type mice. Despite the obvious quality and viability of this mouse line (this is an important tool with no doubt), it is puzzling however that, while the level of GluA2 remains unchanged, the global expression of SEP-GluA1 is twice as low as the expression of GluA1 in wild-type mice (Fig.1). The manuscript would benefit from a clear brain-region specific comparison between the expression pattern of GluA1 and SEP-GluA1. At least, the author should discuss this point, and how this might affect the formation of GluA1/A2 heteromers, the dominant form of AMPAR in pyramidal neurons.

      Then, they provide strong evidence that SEP-GluA1 receptors are mobile in vivo (Fig. 6) and can thus accurately report synaptic plasticity, at least in anesthetized animals (Figs. 7-8). The authors make a point that "this novel SEP-GluA1 knockin mouse is the first tool that enables longitudinal tracking of synaptic plasticity underlying behavior at brain-wide scale with single-synapse resolution". However, given the high density of fluorescent synapses, it remains unclear how effective would be this mouse line in awake mice, and more specifically during behavior, in which movement artifacts could preclude the tracking and registration of the same population of SEP-GluA1 containing synapses over time.

      Finally, they present a new automated analytical tool to detect and register fluorescent synapses (Fig.7). Although the initiative is important and laudable (it is true that imaging approaches are usually plagued by the lack of user-friendly analysis tools), the method would benefit from a comparison with existing methods.

      We thank the reviewer for their detailed evaluation of our manuscript and appreciate their insightful comments. Addressing the suggestions raised by the reviewer, we have now included new data and expanded our discussion which we believe has significantly improved our manuscript. As suggested, we now show that SEP-GluA1 expression levels are consistent across brain regions, discuss the ability to detect and track individual synapses in awake mice, and expanded the description of our detection algorithm to include comparisons with existing methods.

      Reviewer #3:

      Understanding the distribution of synaptic strength and plasticity in the brain is paramount for understanding neural circuit function underlying behavior. In the manuscript by Graves et al., the authors developed a novel mouse model for optical detection of synaptic strength and plasticity in live brains. Specifically, they modified the mouse genome by modifying the native GluA1 AMPAR subunit gene (gria1) with a pH-sensitive GFP (pHluorin) -tagged GluA1 at its N-terminus. This sensor is nearly maximally fluorescent when the pH is neutral and quenched in acidic environments and, therefore, preferentially marks AMPARs located on the plasma membrane. Since AMPARs are known to cluster in the postsynaptic density, AMPAR number is the predominant postsynaptic determinant of synaptic strength. Specifically, the trafficking of GluA1 AMPARs is responsible for LTP in CA1 hippocampus. The use of this novel genetic tool raises the possibility of monitoring synaptic strength optically, thus providing a strategy for massively parallel assessment of the distribution of synaptic strength in bulk brain tissue. Even more promising is the use of cranial windows and 2-photon microscopy to assess synaptic strength longitudinally, for example, during learning acquisition.

      The authors performed a comprehensive and rigorous set of control experiments using electrophysiology and behavior experiments. They demonstrated that modified GluA1 acts as native receptors and does not suffer from the shortcomings of overexpression approaches. The authors convincingly demonstrate that the modified receptors generate normal wild-type synaptic physiology and no behavioral alterations. Using glutamate uncaging, they showed that fluorescence changes at synapses were highly correlated with an electrophysiological assessment of synaptic strength and plasticity. Thus the data support claims that synaptic strength and plasticity could be assessed and monitored at unprecedented parallelization.

      Whether SEP-GluA1 can be used to quantify synaptic strength and its changes is uncertain due to an unknown ratio of GluA1/2 versus GluA2/3 receptors, the differential expression of GluA1 in different cell types, and the presence of GluA1-independent plasticities. Another potential shortcoming of the study is the lack of a ground truth demonstration of true synapses in vivo. Given the high-density of synapses, low z-resolution 2P microscopy (> 2 um), and the presence of a significant extrasynaptic pool, a confirmation of their results with superresolution or EM would be essential. Moreover, the lack of cell-type-specific labeling is likely to limit the tool's use for linking behavioral and microcircuit synaptic plasticity. It is possible that control experiments in acute brain slices could circumvent some shortcomings and provide a more quantitative workflow.

      We would like to thank the reviewer for their careful reading and evaluation of our manuscript and for providing valuable comments and recommendations. We have now added new data and expanded our discussion in response to the reviewer’s recommendation and believe that his has improved our manuscript. Among other points, we have added discussion regarding our mouse line’s ability to detect changes in GluA1 containing AMPARs, included measurements of the PSF of our microscope to quantify the detection limit of individual synapses with our approach, and demonstrated how the SEP-GluA1 knockin line can be used for measuring cell-type-specific changes in GluA1.

    1. Author Response:

      Reviewer #2:

      This study investigates the important question of which afferent circuits are responsible for the rapid, presystemic regulation of AVP neurons by food and water. Using a combination of rabies tracing, CRACM, and chemogenetic silencing in conjunction with photometry, the authors conclude that the MnPO transmits presystemic fluid signals whereas food signals are relayed by unidentified neurons in the ARC. The paper presents a collection of useful data investigating the anatomic connectivity between cell types in the LT and elsewhere and tests a broad range of possible inputs that could mediate these presystemic effects. However there are interpretational problems with the experiments that investigate functional connectivity (which pathways transmit which signals).

      Major points:

      1) The conclusion that the ARC is the source of the presystemic signal that activates AVP neurons in response to food is based on an experiment in which the effect of silencing the VMH and DMH individually on a photometry trace is subtracted from the effect of silencing DMH, VMH and ARC simultaneously. This is problematic given the broad and variable spread of such injections, the difficulty of completely and selectively hitting a single nucleus, and the lack of characterization. It also seems likely that there could be synergistic effects of inhibiting multiple adjacent nuclei. From the data in Figure 7F, it appears that most of the effect is driven by a few animals, which also raises the question of sources of variability.

      We completely agree with the reviewer. However, we want to emphasize that given its anatomical size and location and molecular and functional complexity, ARC is an extremely difficult region to study especially without knowing an identity of the neuronal population we are targeting. As mentioned in the manuscript (“This approach allowed for effective silencing of the ARC with hM4Di, which is difficult to achieve with restricted injection of cre-independent AAVs into the ARC” lines 418-420), subtractive approach was the most reasonable and efficient strategy we could take to address the question. However, as mentioned in our response to Essential revisions #4, we have attempted, though without much success, multiple approaches to more specifically identify the circuit.

      2) The effect of silencing DMH/VMH/ARC on food intake is not reported. If these mice eat less or more slowly, this would explain the partial reduction in the presystemic activation of AVP neurons.

      We now provide food intake and latency data for DMH/VMH/ARC silencing experiments (Figure 7-figure supplement 4). Feeding behavior was not affected during the experiment. Therefore, reduced feeding-related presystemic response in AVP neurons is not indirectly caused by slower or less food consumption.

      3) It is not clearly reported to what extent the chemogenetic silencing of the MnPO/OVLT in the mice used in Fig. 2 and 5 reduces the amount of water consumed and how this relates to the dynamics in each animal/trial. This is confounding in two ways. Given that the silencing does not fully block drinking, this implies that the MnPO/OVLT silencing is incomplete (based on Augustine 2018), and thus the negative photometry result in Fig. 5 is hard to interpret. Conversely, if silencing reduces drinking partially, which seems likely, then this behavioral change could account for the reduced presystemic inhibition of AVP neurons. It is hard to see how the direct effect of MnPO on AVP neural dynamics could be separated from its effects on behavior in this experiment. While there is a pre-ingestive response in the AVP neurons (which does not have this confound), in several experiments in Fig. 2 this is approx. 1% dF/F.

      Please refer to our response to Essential revisions #1.

      "It is a valid concern and we now discuss this point in our manuscript (lines 507-512). However, for the following reasons, we do not believe it is likely that reduced presystemic suppression of AVP neurons is mainly driven by behavioral changes. First, pre-ingestive, cue-induced suppression observed prior to any drinking behavior is completely blocked by silencing of MnPO/OVLTVgat neurons (Figure 5). Second, MnPO/OVLT neurons provide direct excitatory and inhibitory synaptic inputs to AVP neurons with high probability connections (~100% and ~50%, respectively, Figure 1). Finally, SON-projecting, putative AVP-regulating MnPO/OVLT neurons show water- related presystemic responses that resemble those seen in AVP neurons (Figure 3). Altogether, these factors strongly support the notion that the reduced presystemic suppression of AVP neurons upon silencing of MnPO/OVLT input is primarily caused by direct reduction of the influence of MnPO/OVLT input onto AVP neurons."

      4) MnPO neurons are heterogeneous in their dynamics, especially the MnPO-GABA neurons, and for this reason ruling out a possible mechanism based on a photometry trace is challenging. For example, compare the interpretation of the photometry recordings of MnPO-Glp1r neurons in (Augustine, 2018) with the results of single cell imaging of the same neurons in (Zimmerman, 2019)).

      We completely agree with the reviewer that MnPO neurons are extremely heterogeneous. To address this issue, we specifically recorded the activity of SON- projecting MnPO/OVLT neurons that will certainly include the population that is directly involved in AVP neuron regulation. Please note that the SON is almost exclusively comprised of neuroendocrine AVP and Oxytocin neurons.

      Reviewer #3:

      This manuscript by Kim and colleagues explores the circuit that communicates food- and water-intake-related presystemic regulations to vasopressinergic endocrine output neurons. Although previous work from several labs had observed food and water intake related anticipatory signaling in cell types in several lamina terminalis (LT) nuclei, the functional significance of these remained unexplored. Here, the authors demonstrate that the neural circuits underlying foo- and water-related presystemic signals are anatomically dissociable at the level of the vasopressin secreting endocrine output neurons. The authors use viral retrograde tracing to identify candidate anatomical regions that could communicate food and water intake related anticipatory signals to VP neurons in the SON and PVN. They show that excitatory neurons in LT nuclei SFO and MnPO/OVLT and inhibitory neurons in the latter make direct synaptic connections onto VP neurons in the SON and PVN. They also perform chemogenetic silencing experiments to elucidate the functional importance of LT and other brain structures for presystemic VP regulation. They show that MnPO/OVLT is important for water drinking but not food intake related presystemic regulation. Furthermore, the authors survey several brain regions that could provide the food intake-related input to VP neurons identifying the arcuate nucleus as the likely source. The experiments in this paper are generally rigorous. Addressing the following points should improve the manuscript:

      • In Figure 2, DREADD was used to suppress the activity of the LT. However, the virus construct is a general promoter, and no data is provided to demonstrate that CNO/DREADD works in this system or cells. In particular, there is no behavioral effect by CNO inhibition of SFO or MnPO/OVLT. To confirm these negative data, slice ephys or similar method should be used to confirm the efficiency of chemogenetic manipulation.

      We now provide slice electrophysiology data showing effective silencing of MnPO/OVLT and SFO neurons by CNO/hM4Di (Figure 2-figure supplement 3).

      • Although the authors revealed the anatomic sites relevant for different kinds of presystemic regulation of VP neurons, the causal role of specific cell types in these structures that provide this input remains untested/unclear. The manuscript would be significantly more impactful if this was addressed. Specifically, whether excitatory and inhibitory populations in the MnPO/OVLT indeed mediate the pre- and post-ingestive effects on presystemic VP neuronal activity as suggested by GCaMP imaging should be tested.

      We thank the reviewer for suggesting this experiment. The data is now presented in Figure 5.

      • The authors rule out two cell populations in ARC as a potential source of food-related presystemic effects on VP neurons. They do suspect a specific cell type in ARC (OXTR+ neurons), which should be tested.

      Please refer to our response to Essential revisions #4.

      "We agree that this is an extremely interesting question, and we have ambitiously attempted multiple approaches to identify an ARC neuronal population that provide feeding-related presystemic signal to AVP neurons. A main obstacle in targeting glutamatergic ARC population is that we currently do not have a specific genetic marker for these neurons. We could not use Oxtr-cre mice that were used in our original study because we saw cre expression around the SON that prevented us from using Oxtr- cre;AVP-cre mice to selectively target ARC Oxtr and AVP neurons in the same animal. We also attempted using Vglut2-flp;AVP-cre mice but achieving restricted hM4Di expression in the ARC with viral injection was extremely challenging and we decided that the experiment is too inefficient to be completed “in a timely manner”.

      As we now present in Figure 7-figure supplement 4, DMH/VMH/ARC silencing did not alter feeding behavior. Therefore, reduced feeding-related presystemic response in AVP neurons are not indirectly caused by slower or less food consumption."

    1. Author Response:

      In this manuscript, Avraham et al. report their results of profiling different cell types in DRG from mice with different types of injury. In general, injuries occurred in PNS (sciatic nerves or dorsal roots) trigger more drastic effects on nearly all DRG cell types, comparing to those applied to CNS (spinal cord), albeit with some exceptions. Among these responding cell populations is a subset of macrophage expressing a satellite glial cell (SGC) marker, and an another population of SGCs, although their lineage and role remain unknown. Furthermore, fatty acid biosynthesis and PPARgamma signaling pathways are up-regulated after sciatic nerve injury, but down-regulated after dorsal root injury (again these observations are not verified and the underlying mechanisms are elusive). Application of a PPARgamma agonist is able to elevate axon regeneration after dorsal root injury. In general, this manuscript has a large amount of data from bioinformatics analysis but with limited functional verifications. Thus their biological meaning is less clear.

      Functional verifications were included in most figures. However, to address this point we have performed additional experiments and validations, as detailed below.

      Figure 3: In addition to the immunofluorescence for macrophage and proliferation markers in DRG sections from all injury conditions, we now validated with qPCR the downregulation of selected cytokines and genes involved in antigen processing and presentation and the upregulation of proliferation markers in macrophages following sciatic nerve crush injury (new Figure 3D).

      Figure 4: In addition to the immunofluorescence of DRG sections showing co-expression of the SGC marker FABP7 and the macrophage marker CD68, we added a flow cytometry experiment of genetically labeled SGC (BlbpCreER:Sun1GFP), labeled with 3 different macrophage specific markers to validate the scRNAseq analysis (new Figure 4E). These new results support the notion that a subset of macrophages express glial properties at the protein level.

      In Fig 3, the results imply different signaling involvement in DRG macrophages (cell cycle and DNA replication after SNC and steroid biosynthesis and glycolysis/gluconeogenesis pathways after DRC/SCI). Are these results due to differential resident/infiltrated macrophage in DRG after individual injury types?

      We thank the reviewer for highlighting this point. Previous studies have indicated that the number of macrophages increases in the DRG after peripheral nerve injury but not dorsal root injury (Kwon et al 2012). This increase in macrophages number after nerve injury results in part from proliferation of macrophages (Leonhard et al., 2002; Yu et al., 2020) and may also include myeloid cell proliferation (Yu et al., 2020) and infiltration of a small number of blood-borne myeloid cells (Kalinski et al 2020). We quantified all ki67 expressing cells in our scRNAseq, which demonstrates that the majority of cells proliferating following all injuries are macrophages, and that the number of proliferating macrophages is highest after sciatic nerve injury (Figure 3 K,L). Our results suggest that the different signaling responses result largely from proliferation of macrophages after nerve injury. Whether the proliferating macrophages originate from resident macrophages or from the infiltration of monocytes-derived macrophages remains to be determined and is beyond the scope of the current study.

      It is also intriguing to note that all injuries down-regulate genes related to antigen processing and presentation (Fig 3). This seems an interesting observation as macrophages often exhibit pro-inflammatory responses to injury. These results should be verified with independent methods.

      We agree and have performed a qPCR experiment to validate the downregulation of genes regulating antigen processing and presentation associated with class II major histocompatibility complex (MHC II) CD74, H2-Aa and Ctss (new Figure 3D. We also confirmed the downregulation of the cytokines Ccl2, Il1b and Tnf following sciatic nerve crush in qPCR experiments (new Figure 3D).

      In Fig. 4, the authors describe a subset of macrophages expressing glial markers whose numbers become increased after injury. Is it possible that these might be the macrophages with engulfed SGCs? To test this, perhaps the authors could compare the abundance of these type-specific RNAs or use other independent methods (with transgenic mice with GFP-labeled SGCs to see if any GFP signals are in these macrophages).

      We agree that it is important to exclude the possibility that immune glial cells simply result from phagocytosis of satellite glial cells by macrophages. We have performed additional experiments and additional analyses that strongly support the existence of subset of macrophages with glial properties. We renamed these cells Imoonglia, to reflect their immune properties and their crescent shape morphology typical of SGC surrounding sensory neurons. First, we performed a flow cytometry experiment to show that a subset of genetically labeled SGC (BLBP-creER: Sun1 GFP) express the specific macrophage markers Cd11b, F4/80 and cd45 (new figure 4E). Second, we included additional analyses showing that these immune glia cells express progenitor cell markers (Dhh, Sox2 and Foxd3), which are not expressed in macrophages (new figure 4C). Third, we performed a trajectory analysis demonstrating that Imoonglia express a transcriptome that position them between satellite glial cells and macrophages (new figure 4D). Fourth, we expanded the methods section to clarify that duplicate cells are filtered out from downstream analysis and also plotted total counts in all cell types (new figure 4- Figure supplement 1B), further excluding the possibility that these cells are satellite glial cells engulfed by macrophages. We believe that these additional experiments and analyses strongly support the characterization of this Imoonglia cell population.

      The results in Fig. 5 suggest that SGCs represent different cell populations. Again, their biological meaning remains unknown. An obvious possibility is these clusters might reflect their different activation states. It might be useful to apply single cell trajectory analysis to assess their relationship.

      We thank the reviewer for this suggestion and have performed trajectory analysis to assess the activation state and the relationship of the different SGC subtypes. The results indicate a trajectory starting from cluster 3, to cluster 2, then cluster 1 and finally cluster 4 (new Figure 5E). This is very interesting in light of our comparison of SGC clusters to astrocytes and Schwann cells, showing that cluster 3 most resembles astrocytes while cluster 4 mostly resembles Schwann cells (Figure 5H,I). The trajectory analysis comparing different cell lineage genes suggests that all SGC subtypes present the same activation state (Figure 5E). The biological function of these different SGC clusters awaits further in-depth investigations that are beyond the scope of the current manuscript.

      Fig. 7, are the regeneration results after PPARa agonist comparable to those after sciatic nerve injury? Such information might provide insights as to its translational potential.

      It has been shown that dorsal root axonal growth occurs at half the rate of peripheral axons (Oblinger and Lasek 1984; Wujek and Lasek, 1983). In the experiment presented in Figure 7E-G, we observed that fenofibrate treatment almost doubled the length of dorsal root axons, suggesting that activating SGC with fenofibrate can increase axon growth. In our previous study, we showed that deleting the enzyme Fasn, which is upstream of PPARα activation, specifically in SGC, decreases axon growth in the sciatic nerve by about half (Avraham et al 2020). Altogether, these findings suggest that the lack of PPARα activation after dorsal root crush contributes to the low regeneration rates of axons in the dorsal root. We have edited the text in the discussion section (p.22) to provide insights into the translational potential of fenofibrate.

      Reviewer #2:

      Avraham et al. applied single cell RNA seq to characterize the sensory neuron microenvironment in dorsal root ganglia after sciatic nerve crush (SNC), dorsal root crush (DRC) and dorsal column transection spinal cord injury (SCI) 3 days after injury. The data revealed differentially expressed genes and pathways in endothelial cells, Schwann cells, macrophages and satellite glial cells (SGCs), etc. among the different injury models, with SNC and DRC co-clustering, and SCI and uninjured control co-clustering for the most part. While a number of cell types are implicated in the differential responses of the microenvironment to injury, the authors focused on the satellite glial cells (SGCs) in functional validation of PPARa signaling in regeneration after DRC using a PPARa agonist (fenofibrate).

      Strengths:

      1) Many strengths: contrasting injury models, scRNA seq, extensive bioinformatics analyses

      2) Many interesting pathways were found to be differentially expressed after different injuries (e.g. Arg1 in macrophages, the Hippo pathways in Schwann cells, etc).

      3) If immune glia cells prove to be a new subtype (of macrophages or SGCs?), it will be a very interesting finding indeed.

      4) It is interesting that PPARa signaling is upregulated in SNC, unchanged in DRC and reduced after SCI.

      5) The study illustrates the value of using single cell RNA seq to dissect the neuron microenvironment in response to injury and neuron extrinsic influences on axon regeneration.

      We thank the reviewer for highlighting the strengths. We agree that our approach highlights the importance of the microenvironment response and the potential extrinsic influence on axon regeneration. We have performed an additional analysis to highlight how the neuron microenvironment is affected by the different injuries. We examined the cell-cell interaction network based on ligand-receptor expression in the different cell types in injury conditions compared to naïve, which is now presented in new figure 1- Supplement Figure 1E. The molecular interactions between cells were identified based on CellPhoneDB repository (v 2.1.6). In this analysis, nodes (circles) in the figures represent cell clusters identified by Partek, and node size correlated with the relative cell counts in the cluster. Significant cell-cell interactions were predicted by CellPhoneDB and represented edges (arrows) in the network. The width and transparency of the edges correlated with the number of interactions defined by CellPhoneDB, and arrow indicates the directionality of ligand/receptor interactions. SNC changed significantly the cell-cell interaction network compared to naïve, and these changes are distinct from those elicited by DRC. SCI had limited influence on cell-cell interaction compared to naïve. This analysis, now presented in the new Figure1- figure supplement 1E further highlight the distinct neuron microenvironment responses to injuries. Additionally, we provide a detailed resource for the significant receptor-ligand interaction pairs for every cell population in all injury conditions in Figure 1- Source Data 2.

      Weakness:

      1) The authors have previously shown that PPARa agonist rescues the reduced axon regeneration in fatty acid synthase (Fasn) conditional knockout mice after SNC, so the role of PPARa in SGCs to support regeneration is no longer novel.

      Yes, we agree that we previously unraveled the role of PPARα in SGC after peripheral nerve injury. However, here we show that PPARα is not activated in SGC after injury to the dorsal root, and that fenofibrate can increase axon growth in the dorsal root, which is novel and may have translational potential.

      2) This is not a weakness per se, but the two most interesting findings on immune glia cells and PPARa do not appear to be directly related.

      We agree, but as this is a resource paper that describes how the neuronal microenvironment respond to different injuries, we believe that even if unrelated, our findings are important for the community. We have revised the abstract to better highlight these two findings.

      3) Pharmacological test with fenofibrate does not address the cell type specific role of PPARa, so it cannot be firmly established that PPARa in SGCs is most important for regeneration.

      We agree that this is an important point. We previously addressed the specificity of fenofibrate by demonstrating that PPARα in the DRG is highly enriched in satellite glial cells (Avraham et al Nat Comm 2020). We showed in that prior study that PPARα and PPARα target genes are upregulated in SGC but not in neurons after injury. We also showed in an in vitro assay that fenofibrate does not promote growth in pure neuronal cultures, further supporting that PPARα is not expressed in neurons. Immunostaining for PPARα demonstrated that PPARα is expressed in SGC but not neurons, and that neither injury nor fenofibrate treatment led to PPARα expression in neurons. Furthermore, a transcriptional profiling study of sensory neurons at single cell resolution (Renthal W. et. al.,Neuron, 2020) confirms that PPARα is not expressed in neurons, neither in naïve conditions nor following sciatic nerve crush injury. In the current study, we performed additional analyses to examine PPARα expression in other cells. First, we plotted all the cells expressing PPARα by cell type and injury condition. This analysis reveals that the majority of PPARα expressing cells are SGC in any injury condition (Figure 7C). Although macrophages can express PPARa and PPARg, (Rigamonti et al, 2008), we did not detect expression of PPARα in DRG macrophages under naïve or injury conditions (Figure 7C). Second, we present violin plots of PPARα and selected PPAR specific target genes, which demonstrates higher expression of PPARα in SGC compared to all other cells in the DRG (figure 7, figure supplement 1C). Given that fenofibrate is a selective activator of PPARα and does not target other PPAR isoforms (Lee CH, Olson P, Evans RM. Minireview: lipid metabolism, metabolic diseases, and peroxisome proliferator-activated receptors. Endocrinology. 2003 Jun;144(6):2201-7), we believe that the pharmacological manipulations presented here sufficiently address the role of PPARα signaling in satellite glial cells.

    1. Author Response:

      Reviewer #2 (Public Review):

      Plants have an amazing diversity of pollen morphologies. This study set out to determine how apertures, the gaps in pollen exine wall, are specified during pollen development. The authors focused on the macaron (mcr) mutant that was identified in a previous forward genetic screen to have one aperture that extends around the circumference of the pollen grain, instead of the 3 equidistant apertures in normal Arabidopsis pollen. They identify the MCR gene as ELMOD B, a member of a small gene family with domain similarity to the animal Engulfment and Cell Motility domain (ELMOD) protein , which has been shown to be non-canonical GTPase Activating Proteins (GAPs). Genetic dosage experiments showed that increasing the expression levels of MCR and the closely related ELMOD_A gene in developing pollen also increases the number of apertures. Combined with epistasis analysis showing that MCR is upstream of INP1, INP2, and D6PKL3 (previously published regulators of aperture development), this data provides good evidence that MCR and ELMOD-A are major regulators of the number, positions, and size of pollen apertures.

      In the second part of the manuscript, the authors present a phylogenetic analysis of ELMOD proteins in plants. They show that the plant ELMOD genes likely have a common ancestor and that Angiosperms have four distinct ELMOD clades. They hypothesize that the A/B clade is necessary for aperture formation in Angiosperm pollen and that many angiosperms have more than one A/B ELMOD gene in order to provide redundancy for pollen development and/or other important functions. While intriguing, a weakness in this part of the study is that they did not test this hypothesis by checking to see if A/B clade ELMOD genes are expressed during pollen development in other Angiosperm lineages.

      We did not directly check the expression of A/B clade ELMOD genes during aperture formation in other angiosperms, but we have retrieved some expression data from public databases, such as the ones for rice and tomato. The microarray expression data from RiceXpro (https://ricexpro.dna.affrc.go.jp/index.html) suggested that LOC Os02g43590 and LOC Os04g46079, the A/B clade ELMODs from rice, are expressed in inflorescences and anthers at different development stages, including the young stages when apertures develop. Similarly, the expression data from the Tomato Functional Genomics Database (http://ted.bti.cornell.edu/) showed that Solyc10g062200 and Solyc01g089980, the A/B clade ELMODs from tomato, are also expressed in young flower buds.

      In the final experiments, the authors analyzed the predicted GAP domain for amino acids that are highly conserved in Angiosperm ELMODs and that are specific to different clades. They identified a conserved Arginine in the same position of the GAP domain as in animals. This arginine is necessary for GAP function in animals. The authors predicted that this Arginine would also be important for Arabidopsis ELMOD function and mutated this residue to Lys in MCR and ELMOD_A. Neither of these versions could complement the mcr aperture phenotype, confirming their hypothesis. One limitation of this experiment is that it only indicates that the domain might have a similar function to the animal ELMODs but does not directly test whether MCR and ELMOD_A actually have GAP activity.

      The most intriguing data comes in the final figures of the paper, where the authors compare the GAP domains in the different ELMOD clades. Sequence comparisons revealed that the A/B clade ELMODs tend to have a glycine at position 129 within the GAP domain, while clade E ELMODs have a cysteine at this position. They predicted that this amino acid position could be important for diversification of ELMOD functions. elmod-b, c, d, and e mutants did not have aperture phenotypes as single mutants nor in combination with mcr, indicating that they probably do not function in aperture development. However, when ELMOD_E was expressed in developing pollen with the MCR regulatory elements the shape of apertures changed to round instead of elongated furrows. A similar dosage study to the one described previously in the manuscript revealed that high levels of MCR protein could counteract the effects of ELMOD_E on aperture shape. When the ELMOD_E protein was mutated to be more like MCR in the GAP domain (Cys129 changed to Gly and Asn129 changed to Asp), aperture number in an mcr background was increased and some of the apertures were furrowed rather than round. A limitation in this study is that the furrowed and round apertures were counted together, thus missing an opportunity to quantify the effects of the mutated ELMOD_E on aperture shape. While the opposite experiment of changing these residues in MCR to ELMOD_E-like residues was not as striking (aperture number was complemented but they did not become round), these data are exciting because they reveal the power of small amino acid changes in one protein to dramatically change aperture number and phenotypes during pollen development.

      This manuscript will be of broad interest to scientists interested in cell polarity, patterning, and evolution of diverse morphologies. Diversification of clade A/B ELMOD genes could have played a role in generating the wide range of aperture numbers and shapes seen in Angiosperms. A mystery that remains and that can be addressed in future studies is how a protein that is localized throughout the cytoplasm and in the nucleus is able to regulate polarity during aperture formation.

    1. Author Response:

      Reviewer #1:

      Click-Seq represents a novel method of sequencing RNA viruses such as SARS-CoV-2, with evidence of successfully sequencing the SARS-CoV-2 genome and identification of recombinations and variants. This does appear to be a potential advantage that needs a direct comparison with existing methods to be fully convincing.

      Thank you for your time and comments on our manuscript and approach.

      Specific comments:

      1) The actual sensitivity in terms of number of copies would be useful to know and tocompare with other methods. Here, cultures are used, not clinical samples that make this even more important

      We now present results from three independent batches of Tiled-ClickSeq libraries of 60 NP swabs obtained through routine diagnostics for COVID19. We compare genome coverage and genome completeness with CT values of these samples. This presents the utility and potential application of the method with different clinical specimens and illustrates that with only 18 cycles of PCR we can obtain high quality data with most samples at a CT < 25.

      2) Is the large difference in coverage across the genome shown in Fig 2B, due to methodological issues to random variation. How would this compare to coverage variation by the ARCTIC protocol by different methods

      If the reviewer is referring to the high-frequency and regular dips in coverage (which we refer to as ‘saw-teeth’) then this is an expected feature of the stochastic termination of the cDNA by the azido-nucleotides upstream of the tiled-primers. The sharp changes in coverage here are highly comparable to coverage in ARTIC protocols. We provide an equivalent read coverage map in the new SFig 2 when using the ARTIC approach of the same samples presented in Fig 2B.

      If the reviewer is referring to the difference in coverage from different tiled primers (e.g. at nt ~14000), then this is likely an issue with the specific primer used in the ‘v1’ set of primers initially used. The ‘v3’ primers presented in Fig4A illustrate that these drops in coverage are removed, which indeed is an advantage or our approach that allows for multiple closely spaced tiled-primers in the same RT-PCR reaction. To further illustrate sample-to-sample variability, we now present read coverage using Tiled-ClickSeq v3 primers for 60 clinical isolates at different CT values which gives an overview of the variation that can be expected across multiple samples with our method.

      Reviewer #2:

      The authors present a novel method of sequencing SARS-CoV-2, arguing its overcomes many limitations of other currently used methods, particularly the ARTIC protocol. Generally the method is interesting and encouraging to see these limitations can be overcome. Although the authors walk through evidence that their method can successfully sequence the SARS-CoV-2 genome and use the data to identify minor variants and recombination events, the manuscript doesn't contain any direct comparisons of their method with the ARTIC protocol. Consequently, the assertions made throughout the paper of reduced bias and increased sensitivity and utility are not supported empirically.

      Thank you for your time and comments on our manuscript. To address these concerns, we have provided substantial new data comparing to ARTIC protocols and applying our methods to study clinical sample, described further below in response to your specific comments.

      Specific comments:

      For instance, in figure 2, I think it is important to present an equivalent plot to Fig 2A for artic samples with equivalent read depths using both MiSeq and Nanopore. This sequence data could be obtained from the COG-UK data deposited on NCBI SRA, and sub-sampled to match sequence depth between methods.

      Thank you for your comments. We have provided this information in Supplementary Figure 2. Using the ARTIC approach, we sequenced the 12 WRCEVA isolates described in the manuscript and presented in Figure 3. As can be seen, peaks and troughs are observed in the ARTIC data, as is expected and previously reported.

      I specifically wonder if this approach only outperforms artic using Nanopore sequencing given the frequent drops in coverage observed in the MiSeq data.

      The frequent drops in coverage observed in the MiSeq data in figure 2 is a symptom of the first primer set we used (v1) that only contained 72 primers. Similar frequent drops in coverage are also observed in the ARTIC approach (e.g. as seen in SFig2). The v3 primer set that we subsequently developed is presented in Figure 4. As can be seen, the drops in coverage are largely removed. We further illustrate this in the new Supplementary Figure 4 where we provide coverage plots using the v3 primers for 60 clinical samples of SARS-CoV-2 at different CT values. As can be seen, the variability in coverage is greatly improved.

      An additional point about figure 2: I understand that this figure is based on the depth of a single run, I think readers that are interested in using this method would be interested to know about the run-to-run variability, so I think it would be a valuable addition to this manuscript to show the average read depth (relative to total nucleotides sequenced per sample) across multiple samples with confidence intervals or equivalent to visualize run-to-run variability.

      Thank you for this point. As mentioned above, we present a new Supplementary Figure 4 where we provide coverage plots using the v3 primers for 60 clinical samples of SARS-CoV-2 at different CT values. Run-to-run variability is additionally addressed in Figure 6A where we correlate genome completeness/coverage with CT values across three different NGS library preparations.

      Further, the authors describe previously detecting recombinant RNA molecules in SARS-CoV-2 in another manuscript, and highlight that the method presented in this manuscript can detect recombinant RNA molecules that could be missed using the artic protocol. Were any such RNA sequences observed in these samples, or was there perfect correspondence between the methods?

      As described above, in the revision, we describe the recombination analysis of multiple clinical samples of SARS-CoV-2. We provide an example of a large genome duplication (annotated as 29442^29323) found in multiple clinical samples, but not any cell-culture samples (providing support that these are not sequence artifacts). To our knowledge these have not been observed before. Our previous manuscript (Gribble et al, PLoS Path, 2021) used both random-primed RNAseq and direct RNA sequencing of poly(A)-enriched RNAs, rather than targeted approaches. Neither of these are currently feasible for clinical samples. Given the hundreds of different DVGs observed in our previous studies, it is not possible for there to be perfect correspondence. Nevertheless, the trends and distributions of RNA recombination events are very similar between our previous study and the ones presented here, as described in the manuscript.

      As well , the authors state: "Phylogenetic tree reconstruction using NextStrain (45) placed 10 of the isolates in the A2a clade (Fig 3D). Three of these isolates (WRCEVA_00506, WRCEVA_00510, WRCEVA_00515) were most closely related to European ancestors. Two isolates (WRCEVA_00508, WRCEVA_00513) were Clade B/B1 most closely related to Asian ancestors. Together, these data thus supported a model for multiple independent introductions of SARS-CoV-2 into the USA and subsequently into Galveston, Texas." This analysis seems out of place in the manuscript and not robust enough to support the claims made. How did the authors come to the conclusion that different sequences are of "European" or "Asian" origin? Due to the limited amount of genetic variation present in circulating strains prior to March 2020 combined with the wide geographic range that many genotypes were circulating, it is not enough to conclude the geographic origin of a viral isolate from clade membership alone.

      Thank you for this comment. We agree that this statement was not properly supported and have simply removed it in the revised manuscript.

      Reviewer #3:

      Strengths. While current NGS method(s), namely the ARTIC protocol, has made phenomenal contributions to resolving the genome of SARS-CoV-2, there is room for improvement. Towards this end, Jaworski and company have devised an alternative approach that utilizes a one-step RT PCR that combines ClickSeq with tiled amplification of the viral genome. This negates the use of primer pairs, which may encounter problems with amplification of structural variants. The method appears to be straightforward and amendable for sequencing on Illumina and Oxford platforms. The results generated do support the claims of the authors and have the potential to contribute significantly to understanding the evolutionary dynamics of SARS-CoV-2.

      Weaknesses. The main shortcoming of the manuscript in its current form is that the samples used for sequencing as proof of concept were cell-grown viral isolates and not directly of the samples. The method described has the potential for providing the field with an alternative to produce high quality sequence, but without performing the work directly on nasopharyngeal swab samples, then it may have limited used for public health laboratories, resource-poor environments or laboratories with little expertise in viral isolation, etc. Validation of the method can benefit if the authors can compare the quality of the sequence generated compared to the ARTIC protocol using primary samples rather than cell-grown viral isolates. It is difficult to assess whether this method will provide a viable alternative over current state-of-the-art protocols.

      Thank you for your comments and time reviewing our manuscript. To address these concerns, we have provided substantial new data where we apply the Tiled-ClickSeq approach to assay clinical specimens.

      Specific comments.

      The methods should include detailed steps in the construction of the NGS library, such as whether or not cDNA input has an impact in the quality of the data output, coverage etc.

      We have previously published detailed protocols describing how to make ClickSeq libraries emphasizing issues that affect success and quality of the output data. We have emphasized this point in the methods section. Assuming we continue to utilize and improve our design, we will release updates through online freely available resources such as protocols.io.

      To address these questions here: the input RNA (not cDNA) in the RT-PCR step is addressed in Figure 1. All the cDNA generated after RT is used as input in the subsequent steps and the click-reaction. We do believe that the quality of the input RNA in the clinical specimens is very important, however, beyond CT value, we have no viable way of measuring the quantity and quality of the tiny amounts of RNA that we extract from NP swabs.

      While the authors mentioned that equimolar of primers were used - there should be data to demonstrate that this results in even covering of the whole genome. Figure 2. There is a slight dip in the coverage at around 17000 to 18000 (Figure 2A) on both the Illumina and Oxford runs, do the authors know if it is due to the primer(s) covering that area and if so, have they tried to address this by improving the design.

      The dip in the coverage in Fig 2 is resolved by using the v3 primers presented in Figure 4. Additional coverage maps for clinical samples in SFig 3 also demonstrate this. Even coverage over the entire genome can be seen for the low CT value samples, which begins to wane in clinical samples with CT values greater than ~25, as described in the new main text and presented in the new Figure 6A.

      The different colors of the graph (Figure 2B) should be defined in the legend. Is the read depth a representation of both Illumina and Oxford runs - either way, this should be indicated.

      Fixed. Thank you.

    1. Author Response:

      Reviewer #1 (Public Review):

      This work investigated the mechanism of inhibition of SARS-CoV-2 polymerase by multiple nucleotide analogs using a high-throughput, single-molecule, magnetic tweezers platform. There was particular focus on the remdesivir (RDV) because it is the only FDA approved anti-coronavirus drug on the market at the time of this review. The study shows that remdesivir leads the polymerase to undergo a backtrack in which it moves back as much as 30 nucleotides from the last insertion. The results also show that RDV is not a chain terminator, which is consistent with prior work. In addition to RDV, the authors characterized other nucleotide analogs such as ddhCTP, 3'-dCTP, and Sofosbuvir-TP to propose that the location of the modification in the ribose or in the base dictates the catalytic pathway used for incorporation. The authors also propose that the use of magnetic tweezers is essential towards characterizing and discovering therapeutics that target viral polymerases.

      Strengths:

      A strength of the papers is the utilization of magnetic tweezers to characterize the polymerase at the single molecule level. This provides a unique method to capture less common or difficult to observe phenomena such as backtracking. Most bulk ensemble assays would have difficulty detecting these phenomena.

      The characterization of multiple different types of nucleotides analogs to investigate the different mechanisms by which they could inhibit the polymerase is a strength of the paper. The authors elegantly utilize their system to show different pause states and backtracking of the polymerase.

      In general, the paper is well written, and the data is clearly presented.

      The authors thank the Reviewer for the strong appraisal of our work!

      Weakness:

      The experiments performed with the magnetic tweezers appear to not have contained the exonuclease domain. This domain would presumably be involved in removing nucleotide analogs that have been inserted and may alter the pause states or backtracking prevalence. For example, does the prevalence of backtracking increase when the exonuclease domain is not present. This is particularly important in regard to the RDV experiments.

      To date, no laboratory has been able to couple the polymerase complex with the proofreading complex. Indeed, we have entire five-year R01 grant to pursue this objective. Just like all proofreading polymerases studied before this one, it is imperative to establish a baseline with exonuclease deficient state prior to adding that component. Even before we add the exonuclease, it will be important to add the helicase to determine if it can assist the polymerase with dsRNA, because its strand-displacement activity is weak.

      A major claim for this study is the utilization of the magnetic tweezers "experimental paradigm" as being essential to the discovery and development of therapeutics to viral polymerases. In addition the authors state this approach is superior to bulk ensemble studies. This reviewer found these conclusions to be an overstatement and unnecessary. The use of magnetic tweezers is not amenable to all laboratories or an easy technique to implement within the therapeutic drug development. In general, the authors also overstate the power and feasibility of the magnetic tweezers in comparison to bulk ensemble studies. All assays have limitations, and the magnetic tweezers is no different in regards to being purified proteins, an in vitro approach, limitations in regards to feasibility for all users, ability to detect the amount of active protein, and multiple other reasons. This is a minor weakness of the paper that can be easily addressed because it detracts from the novelty of the studies.

      We feel that it is important to avoid an either-or scenario. We apologize for evoking a negative reaction with our statement, as we were only trying to emphasize how illuminating the magnetic-tweezers approach can be. It was not our intention to rule out the need for bulk methods at the bench top or using quench-flow or stopped-flow devices.

      We have edited the text in l.83-87 to convey the following:

      “Magnetic tweezers permit the dynamics of an elongating polymerase/polymerase complex to be monitored in real time and the impact of nucleotide analogues to be monitored in the presence of all four natural nucleotides in their physiological concentration ranges. Here, we present a magnetic tweezers assay to provide insights into the mechanism and efficacy of current and underexplored NAs on the coronavirus polymerase.”

      Reviewer #2 (Public Review):

      This study investigates the impact of remdesivir (RDV) and other nucleotide analogs (NAs), 3'-dATP, 3'-dUTP, 3'-dCTP, Sofosbuvir-TP, ddhCTP, and T-1106-TP, on RNA synthesis by the SARS-CoV-2 polymerase using magnetic tweezer. This technique allows to directly quantify termination of viral synthesis, pausing or stalling of the polymerase, thus, defining the effect of these NAs on viral synthesis. The work includes good quality data and nicely stablishes an assay to follow the activity of the SARS-CoV-2 RNA-dependent RNA polymerase.

      The authors thank the Reviewer for her/his appreciation of our work!

      However, the basis of the assay and theory was largely presented before by the authors in Ref 22 and 23 (and other references therein).

      The main result here is that RDV incorporation does not prevent the complete viral RNA synthesis but causes an increase of pausing and back-tracking. This contrasts with a clear signature of synthesis termination induced by 3'-dATP. The work is complemented with the characterization of other NAs. Despite these results are of merit, I do not see this work to present a sufficient advance of our current knowledge.

      We acknowledge Reviewer #2 opinion. However, we believe that our work is highly novel and important, as noted by Reviewer #1: “This [utilization of magnetic tweezers] provides a unique method to capture less common or difficult to observe phenomena such as backtracking. Most bulk ensemble assays would have difficulty detecting these phenomena.”

      and Reviewer #3: “Overall, this manuscript constitutes a major advance in our understanding of chain termination in polymerases, and provides deep insights into the mechanism of action of remdesivir, which may contribute to further drug discovery efforts targeting this polymerase.”.

      How these results translate into more physiological conditions at zero force should be addressed.

      We show here that nucleotide analogs are incorporated via specific catalytic pathways (NAB, SNA, VSNA) depending on the nature of their modification (position and type in ribose, base). In the companion paper attached to this submission (https://doi.org/10.1101/2021.03.27.437309, currently in press), we show that the force has no effect on the probability to enter any catalytic pathways, and only affects the kinetics of a large conformational change occurring after chemistry. In conclusion, the force has no effect on nucleotide analog selection, as supported by our evaluation at both 25 and 35 pN. To clarify this, we have added in l.416-421:

      “The present study demonstrates that nucleotide analog selection and incorporation is not force-dependent (Figure 2–figure supplement 3), which further validates the utilization of high-throughput magnetic tweezers to study nucleotide analog mechanism of action. This result is in agreement with our recent SARS-CoV-2 polymerase mechanochemistry paper, where we showed that entry probability in NAB, SNA and VSNA was not force dependent, and that force mainly affected the kinetics of a large conformational subsequent to chemistry, i.e. after nucleotide selection and incorporation.”

      The rationale of testing other NAs apart from the mere systematic characterization of other compounds is unclear.

      We have tested 3’-dATP, a well-known chain terminator, with Remdesivir, which was claimed to be a delayed chain terminator, as both are ATP analogue. We monitored the incorporation of Sofosbuvir, a well-known inhibitor of HCV replication, with its 3’-dNTP homologue, i.e. 3’-dUTP. T-1106-TP is a compound that was recently tested for coronavirus because it has a proven efficacy against influenza. ddhCTP is an endogenously produced nucleotide analog and chain terminator, and we compared it to its 3’-dNTP homologue, 3’-dCTP. Furthermore, each of these nucleotide analogs have modification at specific position, i.e. either at the ribose or at the base, which helps to understand how the polymerase responds to each modification. We have added this sentence in introduction in l.83-84 for clarity:

      “We have therefore compared several analogs of the same natural nucleotide to determine how the nature of the modifications changes selection/mechanism of action.”

      Similarly, I do not see the benefits of adding cell experiments with three compounds and experiments with the nsp14 mutant to address proofreading because they were inconclusive.

      While we acknowledge Reviewer #2 opinion, Reviewer #3 has a different opinion and strongly appraises the importance of these results:

      “Interestingly, the ddhCTP didn't actually work in infected cells. However, the authors presented a few theories on why it didn't work and said they plan to follow up to elucidate why it didn't work in cells. I think those results will be very interesting for the larger community working in this area.”

      We share the opinion of Reviewer #3 and have therefore decided to keep these results in the revised manuscript.

      Reviewer #3 (Public Review):

      This manuscript focuses on understanding the mechanism of action of remdesivir in the inhibition of SARS-Cov2 polymerase, using single molecule methods. The findings are highly original, significant and surprising. The approach is highly robust and supported by a range of orthogonal studies. Overall, these findings should help those engaged directly in drug discovery by providing a critical foundational understanding for the action of remdesivir.

      The research described in this manuscript has several findings that significantly impact the broader field polymerase inhibition. First, the authors were able to show using single molecule methods that remdesivir-TP incorporation leads to polymerase backtrack. This is important because the pause is long enough that an ensemble assay could mistake this backtrack for a termination event. Secondly, the researchers found the effective incorporation of remdesivir-TP was determined by its absolute concentration. This suggests remdesivir-TP and similar nucleotide analogs incorporate via the SNA or VSNA pathway and would be more likely to add to the RNA chain when substrate concentration is low (independent of stoichiometry with the competing native nucleotide). Thirdly, the researchers found the effective incorporation rate of obligatory terminators was affected by the stoichiometry of their competing native nucleotide rather than their absolute concentration. This suggests that obligatory terminators are incorporated via the NAB pathway. The pausing that the researchers observed in the polymerase elongation kinetics have recently been demonstrated by two other groups. However, this study improved upon the assay conditions used by other researchers to recapitulate in vivo conditions and remove bias from kinetics measurements.

      The authors highlighted the issues with remdesivir, tested other nucleotide analogs, and proposed a better alternative based on their assays (ddhCTP). Interestingly, the ddhCTP didn't actually work in infected cells. However, the authors presented a few theories on why it didn't work and said they plan to follow up to elucidate why it didn't work in cells. I think those results will be very interesting for the larger community working in this area. It's clear that the authors made a substantial enough contribution on the mechanism of inhibition of SARS Cov2 polymerase to merit publication in eLife, independent of the work on the "improved" antiviral candidate.

      It would have been useful to clarify for the reader the pharmaceutical import of the putative delayed chain termination (or pausing) relative to actual chemical chain termination. In other words, I'm assuming that in both cases the viral genome is considered to be non-transcribed (in that a chemical agent has been incorporated into the growing strand). This is true for most compounds in this broad class of anti-virals. The issues are usually surrounding the width of the therapeutic index and the degree to which resistant mutants arise.

      Coronaviruses are unique among positive-strand RNA viruses in that they encode a proofreading exonuclease. Although it is unclear how the polymerase and exonuclease activities are coordinated, the current assumption is that errors are recognized when located at the terminus of nascent RNA. Therefore, nucleotide analogues which manifest their antiviral activity when embedded in nascent RNA should evade excision by the exonuclease.

      We have added text conveying this sentiment here in l.70:

      “The latter proofreads the terminus of the nascent RNA following synthesis by the polymerase and associated factors, a unique feature of coronaviruses relative to all other families of RNA viruses.”

      And in lines 75-77:

      “In other words, nsp14 adds another selection pressure on NAs: not only they must be efficiently incorporated by nsp12, they must also evade detection and excision by nsp14.”

      Overall, this manuscript constitutes a major advance in our understanding of chain termination in polymerases, and provides deep insights into the mechanism of action of remdesivir, which may contribute to further drug discovery efforts targeting this polymerase. Additionally, the authors have highlighted and addressed issues in the methodologies of previous mechanistic studies that led others to erroneous conclusions.

      We thank Reviewer #3 for her/his strong appraisal of our work.

    1. Author Response:

      Reviewer #2 (Public Review):

      [...] Despite the success in the primary purpose of the model, we note concerning issues that we recommend the authors address.

      There is a recurrent lack of clarity in many sections of text and how the author's claims are supported by the evidence shown. Though we were able to fully understand the study, interpretation was needlessly difficult at times. Outstanding, but not exhaustive, examples are listed below:

      1. Overstatement of capabilities of this model that are weakly or not supported by the study. At various points throughout the article, the authors speak of the applicability of their model to nuanced conditions that we believe is either only indirectly supported, or not supported at all by their evidence.

      a. On line 62 the authors claim that this model uses non-equilibrium thermodynamics to capture the diffusion across the droplet interface. This implies that the model would be applicable to dynamic processes in which detailed balance is not preserved. While exchange of photobleached and unphotobleached fluorescently labelled components is a dynamic process, the authors explicitly assume that the volume fraction of condensate components within a droplet (Φtot) remains either at equilibrium or quasi-equilibrium when building their model.

      We agree that in our manuscript we focus on the case where the total volume fraction (composed of bleached and unbleached molecules) is at thermodynamic equilibrium leading to partial_t phi_tot = 0. Please note that our theory can be applied to non-equilibrium situations, i.e., in the presence of fluxes. For example, see e.g. Eq. (6) in Bo et al. (reference in manuscript), where we use this coarse-grained theory to derive a single molecule description also away from thermodynamic equilibrium.

      Based on the reviewer’s remark, we have revised the paragraph around Eq. (2) and now stress more clearly that we focus on the cases where detailed balance holds.

      b. On line 272 where the authors claim that their model is applicable to non-spherical droplets, referencing Fig. 3 as evidence. However, Fig. 3 and the accompanying text sections starting on lines 163 and 188 describe effects of different environments on an explicitly spherical droplet. In particular, the distance to a coverslip (h) and between neighbouring droplets (d) never drops below the spherical droplet radius (r). We believe this data would constitute evidence of the model's applicability to a non-spherical droplet. Another concern is that the dynamic boundary condition could be dependent on the ratio between the bleaching spot radius and the condensate radius. Thus the authors should discuss the applicability of their theory when Rbleaching spot << Rcondensate and when Rbleaching spot >> Rcondensate.

      We were referring to the theoretical derivation of Eq. (6), which is general and does not depend on the specific droplet shape or boundaries. We revised the paragraph to improve clarity. While the fitting and imaging procedure would get more involved, the theory is still applicable to a non-spherical scenario. Regarding the bleach spot radius, we now mention the required bleach geometry in the first results section, to indicate that a full bleach is necessary, if this data analysis framework should be applied. Please note that a larger bleaching area does not alter the procedure pointed out in Fig. 1, while bleaching less than droplet size is not allowed within the framework of Eq. (1), due to breaking spherical symmetry of fluorescent protein within the droplet. It would be allowed if we were to drop spherical symmetry in the data analysis.

      c. Having claimed that not only Din, but also Dout and P can be determined, in principle, from analyzing a single FRAP experiment, it is unclear why they do not show this capability using the experimental data they have. It is especially obscure because the cost function and how it is calculated is not described at all.

      We now describe both cost functions in the flow charts of figures 1 and 4. We have also made a clearer distinction between the theoretically possible determination of all parameters and the experimental feasibility, including a new section heading. Other than effects that are unaccounted for in our theory, such as a potential interfacial resistance, current experiments are also hampered by several effects, such as the presence of a coverslip (which makes P appear artificially large), non-uniform bleaching in the bulk, imaging artefacts at the droplet boundary etc. This means that the global minimum which we find in the cost function when comparing experimental data to our model is not reliable and cannot be used to extract both P and D_out given current limitations.

      d. In the Discussion section, the authors claimed that their model can be generally applied to study the diffusive properties of biomolecular condensates. However, recent literature (e.g. Biophys. J. 2019, 117, 1285-1300, Nature 2017, 547, 241-245, etc.) reported that the diffusion at the biomolecular condensate interface cannot be treated with local equilibrium due to interfacial resistance. This work did not take these interfacial effects into consideration, and the authors should explain if they expect these phenomenological effects to hamper the application of their theory.

      We have now included a brief discussion of a possible interfacial resistance. While there has been a discussion in the field about this possibility, this has not been shown conclusively to the best of our knowledge. Strom et al. measure diffusivity across the boundary. However, it is unclear what the resolution of their FCS derivative is, and whether these effects could potentially be explained by correlated movement near the boundary. This correlated movement is expected due to a molecule’s tendency to stay inside a droplet, which is expected even without an interfacial resistance. The only work we are aware of (now brought to our attention by Reviewer #1), that investigates resistance at the boundary, are the papers by Hahn et al. and Gebhard et al., which initially found no evidence of a mass transfer resistance (J. Phys.: Condens. Matter 23 (2011) 184107 (8pp)). Later some evidence was found for a mass transfer resistance for proteins (Soft Matter, 2021, 17, 3929). However, these papers investigate a three-component system, where the investigated molecules adsorb to a liquid-liquid interface, which we find no evidence for. We thus remain cautious about the potential role of an interfacial resistance in our simple two-component set-up.

    1. Author Response:

      Evaluation Summary:

      This manuscript describes an analysis of cell type-specific alternative splicing using 10x scRNA-seq data. This work shows that in spite of the challenges associated with the analysis of such datasets, it is possible to identify alternative exons with differential splicing between tissue compartments and to some extent reveal cell types by splicing profiles of single cells. This work is informative regarding what can be done to analyse alternative splicing using 10X data and fills in a gap in the field.

      Thank you very much for this thoughtful distillation of the contributions of this paper; we are grateful that you find this work to be useful in filling the gap of how splicing analysis can be performed on 10X data.

      Reviewer #1 (Public Review):

      Olivieri, Julia Eve et al., applied their novel statistical approach, the SpliZ (detailed in a separate manuscript but it's very difficult to judge the approach since we do not really have access to it) to high-throughput single-cell RNA-seq datasets collected using the 10x platform to discover novel insights into cell-level heterogeneity of alternative splicing.

      We understand that the SpliZ paper not being published makes it more difficult to review this manuscript. It is currently in review at Nature methods, where it is being re-evaluated by reviewers after an invitation for resubmission. We are happy to share the comments from the reviewers on that manuscript if it would help you make a decision. We have also added a thorough explanation of the SpliZ to the methods in the section called “Explanation of the SpliZ method” on page 12 and added several more sentences of explanation to the main text at the beginning of page 3: “A large negative (resp. positive) SpliZ score for a gene in a cell means that the cell has shorter (resp. longer) introns than average for that gene. In the simplest exon skipping case, the SpliZ reduces to PSI.”

      Previous works in the field of single-cell alternative splicing have relied on single-cell technologies that profile a much lower number of cells. The authors validate their findings using the experimental approaches of single-cell PCR and RNA FISH, and validate that their findings can also be found using Smart-seq2 data, on which the gold standard approaches for single-cell alternative splicing analysis have been developed. They demonstrate conservation analysis of single-cell alternative splicing events across species, examples of genes that are spliced in cellular compartment specific and cell-type specific patterns, cell-type specific alternative splicing changes correlated with psuedotime in spermatogenesis, and, importantly, the discovery of new cell-type subpopulations that are defined by splicing changes but are indistinguishable based on gene expression. They show that the SpliZ score correlates well across replicates on a tissue level and cell-type level, which indicate the robustness of the method.

      We are glad that you find the analyses of these datasets biologically important, robust, and informative.

      The conclusions of the paper are reasonably well supported by the data, and the authors have sufficiently proven that their approach allows for the discovery of novel biological phenomena. The authors provide examples in which the key questions that can be addressed with a single-cell splicing technology are investigated. An important question in the field of cellular heterogeneity is whether or not novel cell populations can be detected by clustering based on splicing events that can be not detected based on gene expression. The authors convincingly demonstrate that subpopulations of the blood classical monocyte cell type can be distinguished by a single splicing event captured by their approach that do not separate by gene expression.

      Thank you for highlighting that our conclusions are reasonably well supported by the data, and that we have discovered new biological phenomena including identification of subpopulations differentiated by splicing.

      Overall this paper reports some novel biological discoveries. The weakness and limitations of the method should be elaborated to guide future usage. When introducing a new technology, it is important for researchers utilizing these findings to be aware of the known limits.

      We agree that a thorough understanding of the weaknesses of a method are important for readers considering using the method themselves. We have now added the following paragraph on page 9 to more clearly outline these weaknesses: “Although the SpliZ method enables biological discovery of splicing differences based on droplet-based sequencing data, droplet-based data still presents major challenges for splicing analysis compared to full-length data. In this study, droplet-based sequencing has much lower sequencing coverage than full-length data, resulting in only 1,416 genes with measurable SpliZ values in the first human individual based on 10X data compared to 9,802 genes with measurable SpliZ values in Smart-Seq2 data. Additionally, current droplet-based data is 3-prime-biased, meaning that some splicing events will never be sequenced by the technology and therefore cannot be analyzed. Despite these challenges, the ubiquity of droplet-based data, its utility for profiling rare cell types, and its unprecedented scale make it a powerful approach to discover regulated splicing.”

      Furthermore, evaluation of alternative splicing conservation on a transcriptome-wide scale and reproducibility of splicing change detected on a single cell level are not demonstrated, and could further strengthen the arguments claimed by the authors.

      Thank you for pointing out that the comparative analysis between organisms on a transcriptome-wide scale and at a cell-type level would improve the paper. A complete, rigorous analysis was limited by the fact that full maps between cell types of the three organisms were not complete, leaving many cell types in human without corresponding cell type in mouse and/or mouse lemur, and that some of the gene orthologs have not been identified between the organisms. This motivated our decision to use the mouse and mouse lemur data to validate specific biological discoveries rather than perform global analyses. We anticipate all of these difficulties will improve over time, and we hope to incorporate more thorough comparisons in future work.

      Reviewer #2 (Public Review):

      This manuscript from Salzman and colleagues described interesting attempts to study cell type-specific alternative splicing using 10x scRNA-seq data. Given the strong 3' bias, analysis of splicing using such a dataset is in general challenging. This work provided evidence that alternative exons with differential splicing between tissue compartments can be identified, and cell types can be revealed by splicing profiles of single cells, to some extent. This work is informative regarding what can be done for alternative splicing using 10X data and filled in a gap in the field in this regard.

      Thank you very much for this thoughtful distillation of the contributions of this paper; we are grateful that you find this work to be useful in filling the gap of how splicing analysis can be performed on 10X data.

    1. Author Response:

      Evaluation Summary:

      This study proposes the identification of "bivalent chromatin" in genes associated with the biosynthesis of secondary metabolites in Arabidopsis and describes an investigation into the role of chromatin states in the regulation of the major Arabidopsis phytoalexin. Perturbation of either H3K27me3 or H3K18ac levels using mutants were used to show that there were effects on the expression of these metabolic genes. It has previously been shown that H3K27me3 and H3K18ac colocalize in the Arabidopsis genome and that genes targeted by PRC2/H3K27me3 in Arabidopsis are enriched for genes that respond to the environment and/or developmental cues. Therefore, the reported changes to the regulation of these genes in defective mutants are as expected, although the finding of this study will still be of interest to those working on pathogen-induced changes to plant metabolism.

      Response: In our paper, we put forward a new model for the role of a bivalent chromatin formed by H3K18ac-H3K27me3, a form that is not yet studied in the bivalent chromatin literature, on regulating the timing of a defense compound synthesis pathway in a whole organism. As far as we know, bivalent chromatin has not been associated with controlling metabolism in any system. We tested our model genetically using several mutant lines that affect both the activating and repressing marks. We show that both H3K27me3 and H3K18ac are required to maintain timely induction of camalexin genes upon a stress signal using ChIP, genetics, and biochemistry, which is a novel molecular mechanism and different from PRC2 complex mediated repression of a bivalent chromatin. This is the first time that a clear function of a bivalent chromatin has been demonstrated in vivo and at the organismal level in any system.

      Reviewer #1 (Public Review):

      The series of experiments to identify and examine changes in chromatin marks in response to flgg22 treatments investigate clearly defined hypotheses. They first present data showing that genes in pathways involved in specialized metabolism are more likely to be associated with both repression (H3K27me3) and activation (H3K18ac) marks than expected by chance in the genome. Using antibodies against H3K27me3 and H3K18ac in pull-down experiments, they show that H3K18ac and H3K27me3 are co-localized at the camalexin biosynthesis genes. They then show that, in response to FLG22, H3K18ac modifications increased and H3K27me3 modifications depleted. In mutant lines that have defective deposition of H3K27me3 expression of camalexin biosynthesis genes in response to FLG22 increased compared to wild type. Together, this progression of experiments provides convincing data of an association between chromatin state and changes in transcription levels. Overall, the model is compelling, though the authors should take care to ensure that the language used is reflective of the association or interplay between chromatin state and transcription factor availability.

      Response: we revised the manuscript to reflect the interplay between chromatin state and transcription factor availability in Introduction: “The accessibility of target gene regions to transcription factors is determined by the dynamics of chromatin states in eukaryotic cells16. Chromatin states are controlled by epigenetic modifications that influence nucleosome accessibility17. Epigenetic modifications constitute various covalent decoration of chemical groups to histones and DNA, which are associated with promoting or repressing gene expression by altering chromatin accessibility to transcription factors. For example, trimethylation of lysine 27 of histone 3 (H3K27me3), established by the Polycomb Repressive Complex 2 (PRC2), is associated with repressing gene expression20. H3K27me3 represses gene expression by increasing chromatin condensation and limiting the recruitment of transcription factors and other components of the transcriptional machinery. Trimethylation of lysine 4 of histone 3 (H3K4me3) is marked at actively transcribed genes, which activates gene expression by promoting the recruitment of transcription initiation factors to promoters of target genes.”

      What is less clear is the link between chromatin states/transcriptional expression and the abundance of the metabolic pathway products that are required to limit pathogen spread. In this manuscript, Zhao et al, test camalexin content using liquid chromatography-tandem mass spectrometry (LC-MS/MS) following FLG22, finding an initial accumulation, followed by further increases over 6 hours. Plants with reduced H3K27me3 marks started to accumulate camalexin earlier while those with reduced H3K18ac marks accumulated camalexin later. While the altered timings in the mutant lines do support a connection between gene expression dynamics and metabolite accumulation, it does not prove that changes in transcription are wholly responsible for the differences in metabolites. Previously reported Ribo-seq experiments mRNAs from genes, including those for camalexin biosynthesis, suggesting that PTI/RTI-triggered translational regulation has a significant role in changes in expression (Xu et al 2017 Nature 545, 487-490; Yoo et al Molecular Plant 13:1 88-98). This does not need to detract from the main findings of this paper, but the changes in metabolite accumulation should be interpreted with these data in mind and discussed appropriately.

      Response: We thank the reviewer for pointing this out and helping us improve the manuscript by providing a more holistic context of camalexin regulation. We revised Introduction and Discussion to include current understanding of post-transcriptional regulation on camalexin pathway and interpret the results in a more holistic view. Revisions in the Introduction: “Previous studies revealed complex transcriptional and translational control of camalexin biosynthesis genes. At the transcriptional regulation level, transcription factors from the MYB family, including MYB34, MYB51 and MYB122 promote camalexin biosynthesis in response to P. syringae infection. WRKY33 functions as an activator and directly binds to the promoters of camalexin biosynthesis genes. Besides transcription factors, CALCIUM-DEPENDENT PROTEIN KINASE (CPK)5/6 and MAPK3/6 can phosphorylate WRKY33 to enhance promoter binding and transactivation. At the translational level, ribosome footprinting showed that genes involved in camalexin biosynthesis, CYP79B2 and CYP79B3, also increased translational efficiency under pattern triggered immunity. Despite the rich knowledge of these upstream regulators of the camalexin biosynthetic pathway, it remains unknown how the rapid induction upon a pathogen signal is enabled”.

      In Discussion, we added: “Our results provide new evidence for how chemical defense mediated by camalexin may be regulated at the epigenetic level. However, we cannot rule out the possibility that other known mechanisms regulating camalexin genes may also affect the transcription kinetics and metabolite accumulation. For example, H3K27me3 affects gene expression by altering chromatin accessibility to transcription factors. Removing this repression mark may create a permissive environment and facilitate transcription factors, such as WRKY33, to bind to promoters of camalexin genes. At the translational level, camalexin biosynthesis genes can alter translational efficiency controlled by a highly enriched messenger RNA consensus sequence, R-motif, during pattern triggered immunity. Additional studies are needed to unravel how different regulatory machineries work together to enable the rapid induction of camalexin genes upon stress signals.”

      Reviewer #2 (Public Review):

      The authors propose that a bivalent chromatin switch exists on genes for the major Arabidopsis phytoalexin and this helps to influence the kinetics of this compounds regulation. This suggests that developmental regulatory designs are also used for defense chemistry. This is an interesting idea and the use of serial ChIP to show that this is not developmentally delineated but in fact occurring on a single promoter at the same time is very interesting. The efforts to extend this to being a general specialized metabolism pathways are less clear given some issues of over-counting pathways when a metabolic pathway is truly cyclical and applied to a hierarchical database design. The phenomon appears limited to fewer pathways then suggested and some statistical analysis are needed to support the claim on this one pathway.

      Response: We appreciate hearing that our study that shows a bivalent chromatin regulating defense metabolism is interesting. Also, we appreciate the point about the apparent redundancy of reactions associated to pathways and how that may affect enrichment analysis. Metabolic pathways are interconnected, and metabolic genes can be mapped to multiple pathways. In addition, more than one variant of a pathway can be represented in a single species database. Finally, some pathways can be represented as a part of a super-pathway. The reviewer's comment prompted us to explore novel strategies to organize interconnected pathways without gene redundancy in our databases, which will be helpful to the community. But we feel that to do this analysis thoroughly and possibly overhaul the infrastructure of our database would be out of the scope of this manuscript. To address the comment about over-counting genes in the pathway enrichment analysis, we reported the number of specialized metabolic genes marked by both H3K27me3 and H3K18ac. We found that 37% genes (324 out of 887) annotated to specialized metabolism have both marks.

      Regarding the comment about additional statistical analysis on the camalexin pathway, we analyzed the absolute transcript and metabolite levels using two-way ANOVA as the reviewer suggested.

      Reviewer #3 (Public Review):

      This study proposes the identification of "bivalent chromatin" at specialized metabolic gene clusters. Perturbation of either H3K27me3 or H3K18ac levels using mutants were used to show that there were effects on the expression of these metabolic genes. However, it is not novel that H3K27me3 and H3K18ac colocalize in the Arabidopsis genome. This was shown by Luo C et al in 2012 and is referenced by the authors. In fact, Luo C et al also showed these two modifications are colocalized by Chip-re-ChIP. This current study is presented as if the H3K18ac and H3K27me3 modifications are specific to metabolic genes, but they're not. Instead, it appears H3K18ac is localized to many H3K27me3 genes of which certain specialized metabolic genes are an enriched subset. There are >4000 genes targeted by PRC2/H3K27me3 in Arabidopsis that are enriched for genes that respond to the environment and or developmental cues. It is therefore, not unexpected that specialized metabolites are a subset of this class given they're only expressed in very specific environments. In summary, the results presented in Figure 1, 2 and 4 are essentially already published by Luo C et al, making the results of this study incremental.

      Response: In our paper, we put forward a new model for the role of a bivalent chromatin formed by H3K18ac-H3K27me3, a form that is not yet studied in the bivalent chromatin literature, on regulating the timing of a defense compound synthesis pathway in a whole organism. As far as we know, bivalent chromatin has not been associated with controlling metabolism in any system. Furthermore, we tested our model genetically using several mutant lines that affect both the activating and repressing marks. By using a combination of epigenetic modification mutants, ChIP-re-ChIP, temporal transcriptomics and metabolomics on plants subjected to a pathogen signal, we showed that the bivalent chromatin is required to control the temporal gene expression kinetics of camalexin biosynthetic enzymes as well as the end product. This is the first time that a clear function of a bivalent chromatin has been demonstrated in vivo in any system. Moreover, we show that both H3K27me3 and H3K18ac are required to maintain timely induction of camalexin genes upon a stress signal, which is a novel molecular mechanism and different from PRC2-mediated repression.

      The reviewer thought that we are claiming that H3K18ac and H3K27me3 are specific to metabolic genes. To avoid this confusion, we revised the manuscript to explicitly state that H3K27me3 as well as H3K18ac targets are enriched in specialized metabolism when compared to other domains of metabolism, but not exclusively targeting specialized or metabolic genes per se.

      The paper the reviewer is referring to, Luo et al. (2012) The Plant Journal, is typical of the bivalent chromatin literature in that they observe certain patterns, but do not pursue it further. Luo et al. (2012) The Plant Journal reported genome-wide maps of nine histone modifications produced by ChIP-seq together with a strand-specific RNA-seq dataset to profile the epigenome and transcriptome in Arabidopsis thaliana. Combinatorial chromatin patterns were described by 42 major chromatin states with selected states validated using the re-ChIP assay. The major point of their paper is the potential synergistic effect of two repressive marks on natural antisense transcripts. However, because of the broad survey, they did find the H3K27me3-H3K4me3 bivalent chromatin. They also found a high correlation between H3K27me3 and H3K18ac, but do not go into any detail about this observation. They only description is: “Intriguingly, a strong correlation was detected between H3K18Ac and H3K27me3 in the Arabidopsis genome (Pearson r = 0.44, Figure 1b and Figure S4), although histone acetylation is not expected to co-localize with a repressive mark such as H3K27me3. The functional relevance of the co-existence of these two marks is unclear at this time.” (Luo et al. (2012) The Plant Journal).

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] An aspect that would greatly strengthen the paper overall is to clarify whether the pyrenoids induced at high CO2 by hyperoxia or hydrogen peroxide participate in the CCM. Do these cells now have high affinities for Ci?

      Response: We found that hydrogen peroxide treated cells do have a higher affinity for Ci.

      Reviewer #2 (Public Review):

      [...] It would be helpful for readers to understand the significance of the work if the authors could put the work into context of ongoing efforts to engineer the pyrenoid into crop plants to increase yield and to highlight the global importance of pyrenoid-mediated algal photosynthesis.

      For a better understanding of their proposed model of H2O2 mediated pyrenoid/CCM induction it would be very helpful if the author added a figure of their CCM induction model to the discussion.

      Response: We have updated the introduction with all the references the reviewer suggested, and also added the following text to better close out our introduction:

      “Engineering the algal CCM into land plants is seen as a key route to improving crop photosynthesis (Fei et al., 2021; Hennacy and Jonikas, 2020; Mackinder, 2018; Meyer et al., 2016; Rae et al., 2017). If the algal pyrenoid CO2 concentration system were engineered into crops such as rice, wheat, or soya yields could increase by up to 60% (Long et al., 2019); yet photosynthetic improvements is thought to only occur if a complete algal-like CCM is assembled in land plants (Atkinson et al., 2020; Barrett et al., 2021); such ambitions necessitate an understanding of the signals and trade-offs of pyrenoid formation, for which Chlamydomonas is an excellent model system.“

      Reviewer #3 (Public Review):

      [...] It is not quite clear why the authors included a growth analyses under mixotrophic conditions and solid media and measured of photosystem II efficiency only under these conditions. The results showed faster growth of cells (CC2343) that tend to accumulate fractured pyrenoid starch sheath, however, growth is based on undefined proportions of autotrophy and heterotrophy under these conditions, and changes in metabolism are not well understood or predictable, which - from my point of view - is too confounding for gaining conclusive evidence for hyperoxia tolerance from biomass accumulation. Likewise, the measurements of PSII function which is a product of many factors concertedly impacting photosynthetic electron transport and correlates with growth only conditionally under autotrophy is even less informative under mixotrophy. Differences in respiration rates may have to be considered as well as differential partitioning of metabolites in the two different strains, which is outside the scope of this paper.

      Response: We feel this the data should be retained to guide future studies, but have moved the mixotrophic growth figure to the Supplemental Materials (SM Figure 7). We have also deleted the line “These observations support that the growth advantage on the TAP plate was related to carboxylation, at least more likely than an aspect of the light reactions.”

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] In terms of weaknesses, I did not see any technical problems. One question that remains for me is how this model would apply in a more diverse gut microbiome. Specifically, do the authors envision single species aggregates in the human gut? How would the model be applied when there are ~1000 species? Do multispecies aggregates form by the same principle? I would expect this to be addressed in future work.

      Reply, regarding multiple species: This is a great question, and one that we’re already starting to get data on! Because this reply will likely be posted online, we won’t describe the preliminary observations.

      Reviewer #2 (Public Review):

      [...] Weaknesses:

      • The authors present data from 8 different bacterial strains but do not investigate how their model explains differences in bacterial aggregate distribution between these strains. This data would provide biological intuition about the specific different strains and their modes of aggregation.

      We have added text to the discussion to clarify this, noting especially plateau differences.

      • The manuscript would gather a broader readership were the model more thoroughly explained. For example, how are the parameters considered? E.g., is growth rate constant across a single aggregate? As written, the model can be difficult to understand conceptually to non-theoretical readers and the concepts would be more accessible if key details were explicitly communicated.

      We have added a paragraph introducing the modeling approach to a more general audience, and clarified the growth rate.

      • Greater discussion and prediction of how effective the model of aggregation may be in guts of different physical or chemical conditions would be valuable towards developing general biophysical principles. For example, is the spatial dependence of fragmentation rate dependent on the type of fluid flow field that exists in the gut?

      This is definitely interesting, generally unknown, and something we wonder about (whether there is a fluid dynamical explanation for the aggregation scaling). We have raised this as an interesting future direction in the Discussion.

      Reviewer #3 (Public Review):

      [...] Overall, the the model sufficiently explains the important features of the gut bacterial aggregate size distribution, namely, the initial power law and the final plateau.

      That said, a minor issue is that the initial motivation behind building the model in this way seems somewhat unnecessary. The authors motivated the basis for the model by claiming P(size > n) ∼ n−1, using Fig. 2. But the model seems to work for any slope (depending on fragmentation rates etc). So why is the slope of -1 special?

      We have clarified this in the Results, explaining that the slope of -1 (only) robustly emerges from growth/fragmentation

      Also, in Fig. 2, since the dashed line is separated from the actual data, it is tricky to visually compare them, and some experimental plots appear to have quite different slopes. It would be helpful if the best fit slope for the small n part is also reported.

      We now include all the slope values in a table.

      Another minor issue: they claim that the decrease in size due to fragmentation is linked to cell division at the surface. However, after the cell divides, if only one daughter leaves the cluster then it shouldn't change the cluster's size (since size is measured in terms of numbers of cells rather than total volume). But if both daughters leave the surface, then what does it have to do with division?

      We have revised the text to clarify what is meant by fragmentation.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The first issue that may be a concern is the idea that MRS measures of Glutamate and GABA are appropriate proxy measures of the excitation-inhibition balance. While this is a not uncommon interpretation of the relative levels of these two neurotransmitters, a recent paper in NeuroImage presents data to refute this assertion (Rideaux et al 2021), especially in the visual cortex. However, The current study by Kooschijn et-al is different from the data reported by Rideaux, in that the current data is looking at the relative change in the Glu/GABA ratio (and the relative difference in each Glu and GABA by themselves) between two conditions, and not the overall "balance" at rest, or during activity. This dynamic nature of the data, and the temporal resolution present, may explain why a relationship is found between the change in ratio (and individual levels) and performance in this study. Either way, the authors could address this recent paper of Rideaux and the challenge it may present for their interpretation of Glu/GABA ratios as a measure of E-I balance.

      Thank you for raising this point and highlighting the findings from Rideaux Neuroimage 2021. As we discuss below, and as noted by the reviewer, the design of our experiment and analysis is different from Rideaux Neuroimage 2021, which may explain the difference in findings.

      The data presented in Rideaux 2021 considers the between-subject variance for measures of Glx (glutamate+glutamine) and GABA, averaged across time. The authors show no evidence for a correlation between GABA and Glx across participants. Using our data, we replicate this result when assessing the relationship between GABA and glutamate, quantified using all spectra (r17=0.19, p=0.433; after regressing out sex and age: r17=0.21 p=0.400). While this result is intriguing, comparing average measures of GABA and glutamate across subjects obscures temporal dynamics of these neurometabolites that may more closely relate to fluctuations in excitation and inhibition reported at the physiological level.

      Here, by using ultra-high field fMRI and fMRS, we designed our study to assess task-dependent temporal dynamics in neurometabolites. Thus, we acquired time-resolved, within-subject measures of glutamate and GABA which we compare across two conditions of interest (‘remembered’ and ‘forgotten’). This condition-dependent approach provides a means to assess subtle, task-specific changes in neurochemistry that cannot be observed when taking a bulk-average. Moreover, compared to taking the bulk-average measures used by Rideaux, our approach inherently controls for: (1) between-subject differences in average GABA and glutamate which are affected by demographic (e.g. age and sex); (2) between-subject differences in spectral quality; (3) between-subject differences in tissue composition; (4) between-subject differences in the effect of other neurochemicals on measures of glutamate and GABA. Overall, our time-resolved, within-subject and condition-dependent approach arguably provides a readout for fluctuations in neurometabolites that more closely approximate physiologically relevant shifts in excitation and inhibition.

      However, as noted in the Introduction and Discussion sections of our manuscript, the relationship between fMRS and physiological definitions of EI balance remains complex. The temporal resolution of fMRS remains several orders of magnitude slower than rapid changes in synaptic glutamate and GABA that accompany neurotransmitter release. Moreover, MRS fails to discriminate between different pools of glutamate and GABA and only a fraction reflects neurotransmitter release. Meaningful interpretation of MRS instead derives from the approximately 1:1 relationship between the rate of glutamine-glutamate cycling, which is necessary for glutamate and GABA synthesis, and neuronal oxidative glucose consumption, which indirectly supports neurotransmitter release among other processes (Rothman et al., 2003; Shen et al., 1999; Sibson et al., 1998). We therefore conclude that through careful experimental and analytical design, fMRS can provide a non-invasive marker for physiologically relevant shifts in excitation and inhibition, if indirect and at a coarse spatiotemporal scale. This interpretation nevertheless requires validation, by carefully combining preclinical MRI with invasive methods in animal models in future work.

      It is also worth noting that the relative nature of measures utilised may enhance error propagation between individual measures, increasing the risk of a false positive result. The authors have tried to address this through the use of Monte-Carlo permutation analysis for false errors, and it does go some way to restoring confidence.

      Our analytical approach involves assessing the ratio between glutamate and GABA (‘glu/GABA’) across two conditions (‘remembered’/‘forgotten’). As the reviewer notes, by considering this relative measure, the measured uncertainties in glutamate and GABA propagate through to the uncertainty in the functional relationship of interest (‘glu/GABA remembered’/‘glu/GABA forgotten’). In the revised manuscript we now show the full sampling-error curve for the effect size of the relative measure, where the 95% confidence interval is notably non-overlapping with zero (see newly added panel in Figure 4F showing mean, 95% confidence interval and the sampling-error distribution derived using bootstrapping). Thus, in addition to the Monte-Carlo permutations presented in Figure 4G, by presenting the sampling-error distribution we provide additional evidence to suggest our findings cannot be explained by a false positive.

      We also note that there are advantages to our approach, which we discuss above and reiterate here. Compared to analyses that consider individual measures, our approach controls for a number of factors that can otherwise introduce random errors. These factors include: (1) between-subject differences in average GABA and glutamate which are affected by demographic (e.g. age and sex); (2) between-subject differences in spectral quality; (3) between-subject differences in tissue composition; (4) between-subject differences in the effect of other neurochemicals on measures of glutamate and GABA. Finally, our approach is analogous to standard analytical methods employed for event-related fMRI.

      There are some other potential methodological caveats a reader inexperienced in fMRS (and indeed fMRI) should be aware of:

      • The first is that the MRS sequence utilised is not typically used to measure GABA (a point the authors also note), with GABA typically measured using so called "editing" techniques like MEGA-PRESS. The authors also utilise a non-standard unconstrained fit for GABA in their analysis of the MRS spectrum. While these two non-standard methodologies may weaken confidence the GABA measures, the authors are to be commended on the use of simulations to demonstrate that even if "absolute" measures of GABA via this methodology may be slightly outside the usual norms, this methodology is able to detect changes in GABA of the size of those detected in this experiment. These simulations, coupled with the Monte-Carlo permutation steps for FWE correction are a strength of the paper, and I encourage readers to fully examine the supplementary material for this paper to get a better appreciation for the quality and validity of the data being presented.

      Thank you for this point and for highlighting that the Monte-Carlo simulations are a strength of the paper. Below we summarise the rationale for using an unedited sequence, and detail the analysis pipeline that we implement to detect dynamic changes in glutamate and GABA.

      Edited and unedited acquisition protocols each have their pros and cons. To test our hypothesis concerning dynamic fluctuations in glutamate and GABA, we considered an unedited sequence to be advantageous for the following two reasons: (1) unedited sequences permit acquisition of metabolite data for both glutamate and GABA within the shortest possible timeframe, thus minimizing motion and drift artefacts; (2) unedited sequences allow acquisition of high-quality spectra from a comparatively small (8 ml) volume of interest (VOI), while an edited methods would have required a larger VOI. For these reasons unedited sequences may therefore be more suitable for event-related fMRS. In the Discussion we note that investigations comparing edited and non-edited sequences at 7T reveal no significant difference in the concentration of GABA measurements (Hong et al., 2019).

      Regarding our analysis pipeline, as noted by the reviewer, we do not implement default assumptions that are typically used to obtain static estimates of GABA, where metabolite values are constrained within a predefined (‘physiologically plausible’) range. Instead, we remove these constraints to optimise the analysis pipeline for detecting dynamic changes in GABA. To demonstrate the sensitivity of this approach, we use Monte Carlo simulations to generate MRS spectra while preserving the observed noise in our data. Our simulations show that the observed difference in GABA between ‘remembered’ and ‘forgotten’ conditions is significant from a null distribution that would be expected by chance. Moreover, these simulations show that relative to default settings, our analysis pipeline is more sensitive to detecting dynamic changes in GABA. For readers that may be interested in the more technical motivation behind our approach, we now include a supplementary note in Appendix 1.

      • The exact time locking (or not) of fMRS data acquisition to phases of the stimulus presentation and the subsequent temporal resolution and timeline of changes is not fully explained - and may be somewhat misleading. Given the MRS data were collected in step sizes of 4 secs, it may be hard to understand how a temporal resolution of 2.5 secs for the fMRS data is achieved. Likewise, given that the total Question and ITI time was allowed to vary from stimulus to stimulus, it may be hard to understand how the timeline in figures 4 and 5 are achieved. This could be ameliorated by a better explanation of data collection process in the methods, and how data was averaged to produce the timelines as presented.

      We have now edited the Results and Methods sections to clearly explain the temporal relationship between the fMRS acquisition and the task. We now also include a new Figure 4–figure supplement 1 to illustrate how we estimate a moving average for fMRS data at a higher resolution than the 4 s TR used during acquisition.

      • Lastly, the fMRI technique being used is not a typical echo-planar sequence, and the data produced are by the authors own admission impacted in quality. Given that the hippocampus, and other parts of the anterior temporal lobe are difficult to measure at the best of times, the BOLD signal changes from this data may be less robust than normal. As such, while interesting, the correlation between the Hippocampal BOLD signal and the Glu/GABA ratio changes might be considered tentative, and can only benefit from replication at a later data.

      As already noted in the manuscript, the quality of the fMRI component in the fMRI-fMRS sequence was compromised relative to EPI obtained using non-combined contemporary state-of-the-art fMRI sequences. However, we consider the BOLD effects reported in visual cortex and hippocampus to be robust as they replicate equivalent analyses conducted on a dataset previously acquired using a non-combined multiband EPI sequence on the same task (Barron et al., Cell 2020). This point is now noted in the revised Results and Discussion sections of the manuscript. Given the multiband EPI sequence in Barron et al., 2020 did not include MRS measurements, the reported correlation between hippocampal BOLD and glu/GABA ratio during successful vs unsuccessful recall can only be verified in future work.

      My expertise as a reviewer is usually in the methodological aspects of fMRS studies, and not really in the "psychology" or "cognitive neuroscience" aspects of memory recall. However, from my perspective the authors have addressed most major concerns here, and the experimental design presented would seem to be one that can indeed test the processes they are attempting to test. As such I find the information from this study interesting, further supporting the notion that information and memory is stored in the neo-cortex in some way, and not directly in the hippocampus, and that hippocampal activity works to re-instate this information through disinhibition of these circuits. What would be interesting would be to watch the formation of these memories during the training phase, and see if a similar change in the E-I balance occurs. That is, does the Hippocampus "awaken" latent stored information about the associated visual through disinhibition, or is it actually re-instating the E-I balance, and hence the processing state, of the circuits when the stimuli were presented. (I realise it may not actually be able to disentangle these two ideas - and that they may in fact be the same thing.)

      Thank you for this comment. We agree with the reviewer that in future work it would be highly interesting to observe the dynamics of glutamate and GABA during memory formation. Moreover, comparing fMRS data acquired during the learning and test phase of the inference task will likely provide new insight into the mechanisms that support memory recall.

      In all, I found this study methodologically novel, rigorous and sound, and the conclusions and results intriguing and of interest.

      Rideaux, Reuben. 'No Balance between Glutamate+glutamine and GABA+ in Visual or Motor Cortices of the Human Brain: A Magnetic Resonance Spectroscopy Study'. NeuroImage 237 (15 August 2021): 118191. https://doi.org/10.1016/j.neuroimage.2021.118191.

      We would like to thank the reviewer for their complimentary remarks and constructive comments. We are grateful to the reviewer for highlighting the novelty and rigor of our approach. As outlined above, we have addressed your comments in the revised manuscript by including new analyses and substantial changes to the text. We hope you agree that these changes have significantly improved the manuscript.

      Reviewer #2 (Public Review):

      [...] The current study had numerous strengths. The effort to better understand the mechanisms underlying cortical reinstatement in humans is important, although typically constrained by the inferential limitations of BOLD data. Here, the authors test an EI-based account of reinstatement through the application of simultaneous fMRI and fMRS. Both methodologically and conceptually, the work establishes a framework for exploring and understanding how the brain might implement reinstatement. The results were generally compelling given the relatively narrow hypothesis and supportive of the claims of the authors. One weakness was that, perhaps due to the novel nature of the imaging approach, the reliability/robustness of certain neural results was hard to ascertain, particularly for the BOLD data which the authors acknowledge might be compromised compared to a non-combined sequence. For example, the presence and location (e.g. hippocampal laterality) of some of the effects seemed to strongly depend on key preprocessing decisions (smoothing) and at least one participant was excluded due to data quality issues although the criteria for the decision was not described. Notably, this concern is offset slightly by the convergence (and correlation) of the results across fMRI and fMRS.

      We would like to thank the reviewer for highlighting the methodological and conceptual strengths of our study. We are grateful for the constructive comments and hope the reviewer agrees that the revised manuscript is much improved.

      As the reviewer notes, and as stated in the Methods section, due to the novel nature of the imaging approach, the fMRI component in the fMRI-fMRS sequence was compromised relative to non-interleaved state-of-the-art multiband EPI sequences. Nevertheless, several factors indicate the reliability and robustness of our fMRI results. First, the BOLD effect in visual cortex and hippocampus reported here replicate equivalent analyses conducted previously using data acquired on the same task with a higher quality multiband EPI sequence (Barron et al., Cell 2020). Moreover, the higher quality multiband data acquired previously permitted a searchlight Representational Similarity Analysis (RSA) which further revealed reinstatement of the associated visual cues in hippocampus and visual cortex during inference. This latter result demonstrates the involvement of these two brain regions in associative recall during the inference task, and provides the basis for investigating the underlying mechanism for recall using the data reported here.

      Second, our fMRI results are consistent with other studies investigating associative recall of visual cues, where an increase in BOLD signal is observed in the hippocampus and visual cortex (e.g. Horner et al., 2015; Wimmer and Shohamy 2012).

      Third, at a reduced threshold, the effects in hippocampus are bilateral regardless of the smoothing parameters employed. While the peak on one side did not survive whole-brain FWE correction, the t-statistic for the bilateral hippocampal effect is now reported in the revised manuscript (see revised legend for Supplementary File 4 and Figure 3–figure supplement 1D). Notably, whole-volume FWE statistical correction is more conservative than small-volume correction typically applied to regions of the medial temporal lobe, such as the hippocampus, where the BOLD signal is susceptible to distortion and signal loss.

      Finally, we explain the criteria for excluding a participant from the fMRI data. Namely, if the quality of the data was insufficient to allow co-registration between the EPI and structural scan, then data were excluded. These criteria are now clearly stated in the revised Methods section. Apologies that this was not made clear in the previous submission of the manuscript.

      In summary, despite the EPI in the combined fMRI-fMRS sequence being compromised relative to state-of-the-art multiband EPI sequences, the validity of our fMRI results is supported by a number of factors described above, including replication of previous results. We also note that the novelty of our study primarily lies with the fMRS data and the relationship between the fMRS and fMRI data. As demonstrated in Figure 4–figure supplement 3, the quality of the fMRS data is comparable with recent studies using non-interleaved MRS sequences. Moreover, the Monte-Carlo simulations and related permutation testing for false errors further illustrates the validity of our approach. We hope the reviewer agrees that together these factors demonstrate the reliability and robustness of the neural data we present.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The study tackles important questions with regards to how ripples relate to broadband LFP activity as well as single neurons in the human brain. The authors also include elegant analyses to characterize the timing of spiking activity with respect to high and low frequency activity and relevant control analyses to take into account possible artifacts and epileptic-related activity. My main concern with the study is whether the authors are truly isolating ripple activity in the human brain as claimed. Their threshold for ripple activity is quite low and it thus seems very possible that many of their "ripple" events are rather high frequency activity events that reflect spiking activity. That being said I think this is an important study to share results from in that it provides unique characterization of the relationship between high frequency activity and spiking in the human brain, as well as how it relates to human memory.

      We thank the reviewer for the positive assessment of our manuscript. We agree with the reviewer that there are important concerns regarding whether these events can be truly regarded as ripples, which we address by performing additional analyses to provide stronger support to the possibility that the identified cortical ripples in our recordings are transient and discrete events that reflect underlying bursts of spiking activity. We also acknowledge that these events more likely exist on a continuum, and that there is not always a clear separation between what constitutes one ripple event and the background activity. We think this is a fertile area of investigation and that there are several points to consider regarding this question, which we now address in the revised Discussion.

      Reviewer #2 (Public Review):

      Recordings from human patients with implanted electrodes provide high temporal resolution, localized measurements of brain activity that can reveal neural correlates underlying a wide variety of cognitive functions. These intracranial electroencephalography (iEEG) recordings are typically made with large electrical contacts, however, and thus represent a complex and poorly understood averaging of voltages from the underlying tissue. As such, it is difficult to know exactly what patterns of neural activity these signals correspond to and how to compare them to spiking and local field potential (LFP) recordings more commonly acquired in non-human animals. Tong et al. carried out simultaneous iEEG recordings from surface contacts as well as spiking and LFP recordings from implanted electrode arrays to directly address the relationship between these signals.

      They present quantifications (e.g. Figure 2B) of the relationships between the amount of spiking activity and the amplitude of events detected in the LFP and iEEG. Their results showing the relationships among these signals are very important for the field, and it is very helpful to see that there are clear correlations across scales.

      We thank the reviewer for this positive review of our manuscript.

      The context in which they present these results is problematic, however. They focus on "ripple" events, detected as periods where the power in a 80-120 Hz band exceeds an arbitrary threshold for an arbitrary length of time. To be fair, the application of similarly arbitrary thresholds is common in the human, primate, and rodent literatures, and several important results have arisen from the analysis of these events. These results can be understood as claiming that a set of high amplitude events have certain properties (e.g. they are related to memory retrieval), but should not be understood as establishing that there is some specific threshold that separates real events from others.

      We completely agree with the reviewer that, when considering ripples, there is a continuity of activity and that the central challenge for the field has been how to identify real events and separate them from others. To be clear, the purpose of our manuscript is not to claim that such a threshold exists. Our purpose instead was to offer evidence that even events that fall below arbitrarily defined thresholds still reflect underlying bursts of spiking activity, that these events are still punctate and temporally discrete, and that therefore these events are also likely functionally meaningful.

      Here they go beyond these analyses and make the claim that these ripple events correspond to real, discrete events that, as their title indicates, "reflect a spectrum of synchronous spiking activity." The problem here is that they do not present any criteria for defining a real, discrete event. Indeed, they conclude that "the continuum of activity that [they] observe in [the] data ... suggests that strictly adhering to predefined criteria for what constitutes a ripple may run the risk of overlooking functionally meaningful events". Without a clear definition of what should and should not be considered to be a discrete event, we are left with the current situation where each study uses their own set of criteria, picks out a set of high amplitude events, and uses those for subsequent analyses.

      We agree with the reviewer that the amplitude and strength of the identified ripple events can be quite variable, making it challenging to distinguish these events from baseline activity. We therefore do not claim to identify specific criteria for defining real events. As the reviewer notes, without such criteria we are left with the current situation where future studies will still need to use their own set of criteria. We completely agree. The purpose of our study, however, is to just highlight the point that using these arbitrarily defined criteria risks overlooking these other events that still may be meaningful for the brain. We have addressed this concern by adding several changes to our manuscript and introducing several new analyses. First, we have tempered our claims that these are discrete events that represent separate packets of information. We acknowledge that there is certainly some variability in the size of these ripple events, that making a clean distinction between when these ripple events emerge as entities that are distinct from the background activity is challenging, and that we can never be certain whether arbitrarily small events are functionally meaningful. We have revised our Discussion accordingly to highlight these possibilities, and to discuss the larger point regarding the challenges in identifying these specific thresholds. Second, although we recognize that the data may not be absolutely conclusive, we have supplemented this discussion with additional analyses that, in our opinion, strongly suggest that these events are indeed transient in nature even when failing to meet previous thresholds.

      A second major challenge to understanding the current manuscript is the ambiguity of the physical relationships between the LFP and iEEG recording sites. While it might be obvious to human physiologists, details such as the distance between the LFP and iEEG contacts and the site areas of each type of electrode are critical for interpreting how closely the data from each could be expected to be related.

      We also agree with this very good point. As noted above in the response, we have now introduced several new analyses that examine the relation between the ripples identified using LFP and iEEG recordings.

      We would also like to highlight an instance of a common statistical error in Fig 1 I: the authors conclude that the difference between correct and incorrect is significant in true data and insignificant in the ripple-removed data, and therefore the 70-200Hz power band modulation on correct trials is significantly informed by 80-120Hz ripple events. The statistical problem is further described in Nieuwenhuis et al., Nature Neuroscience 2011.

      We thank the reviewer for pointing out this common statistical error that we have made in concluding that the true and ripple-removed data differ because the difference between correct and incorrect is significant in the true data and not in the ripple-removed data. We agree with the suggestion that a more accurate way to compare the true and ripple-removed data is to compare the effect sizes, or the difference between correct and incorrect trials for the true and ripple-removed data. We have now conducted this analysis. Specifically, we computed the true correlation between the difference in 70-200 Hz power between correct and incorrect trials and the difference in ripple rates between correct and incorrect trials across electrodes and compared this correlation to the correlation present after removing the 80-120 Hz ripples using a paired t-test across participants. We performed this analysis in the subset of six participants who had an MEA and focused on the MTL and ATL electrodes since these regions have the greatest 70-200 Hz power increase with successful retrieval. We found a significant decrease in correlation across patients with ripples removed compared to when we retained the ripples (t(5) = 3.89, p = 0.0115). We now report this new analysis in Fig. 1 – S6. We also compared the two correlations as dependent groups and found a significant difference in correlation (r_true – r_control = 0.172, 95% CI = [0.0691 0.2764], z = 3.2677, p = 0.0011). We accounted for potential interaction effects using the correlation between 70-200 Hz and 70-200 Hz with ripple removed (r = -0.031).

      Finally, we would also like to note the difficulty of characterizing a single deflection in the LFP or iEEG signal as a low frequency oscillation, given the large potential for measurement variability of the frequency of that oscillation as described in Fig. 4. This large deflection is to be expected when a concentrated amount of synaptic input drives a burst of spiking, as we would expect in the case of the increased spiking during ripples. In the hippocampus, this deflection is the sharp wave component of the sharp-wave ripple; it appears to take a similar form in cortical ripples. While unsurprising, it is well worth observing that the iEEG reflects this coincident deflection, but it should not be characterized as a 2-10Hz oscillation.

      We completely agree with the reviewer about this point. As the reviewer points out, the large deflection is often observed with bursts of spiking, which we find with ripples, and we also feel that the iEEG reflects this coincident detection. We have therefore corrected the text and no longer characterize the deflection associated with ripples as 2-10 Hz oscillations. To illustrate this point further, we have also now added a new analysis demonstrating the average iEEG and LFP activity around each ripple, which clearly demonstrates this deflection (Fig. 1). We have, however, retained the discussion about the locking of spikes to 2-10 Hz oscillation, as our analysis includes spikes both within and outside of the spike bursts. While we expect that spikes are associated with the large deflection that reflects a concentrated amount of synaptic input, we also find spikes that are modulated by a 2-10 Hz oscillation, consistent with prior findings of theta-phase locking of spiking neurons.

      Reviewer #3 (Public Review):

      In this study, authors systematically investigated the iEEG ripples, LFP ripples, and their relation with each other and single units from micro channels obtaining LFP. They found that the amplitude of LFP ripples reflects the sum and alignment of underlying spiking activities. Meanwhile, the amplitude of iEEG ripples reflects the number and alignment of LFP ripples. More interestingly, the amplitude of ripple events is functionally relevant. In general, I find that the data analyses and methods are sophisticated and the results are interesting. It extends our understanding of ripple events and is of interest to a wide audience.

      We thank the reviewer for the positive assessment of our manuscript.

    1. Author Response:

      We are glad that the reviewers found the manuscript comprehensive and that they think it will likely be a useful resource for the community. We have made several changes with that in mind, incorporating input from a number of fly CX researchers. In particular:

      • We have performed new analyses to improve our descriptions and characterization of phase angles in the PB and FB. Some of these changes, which have to do with the assignment of phase angles within the PB and FB columns, should help prevent any confusion or inconsistency relative to recent papers from the Maimon and Wilson labs.
      • Streamlined the Results section focused on dFB and EB sleep-wake circuits to aid in clarity.
      • Restructured and expanded the Discussion sections framing the CX as a multifunctional network to better capture previous physiological observations.
      • Added over 20 videos showing morphological renderings in 3D to help explain key results.
      • Analyzed the total number of synapses per region for the PB columnar neurons as a function of the number of neurons in each glomerulus (New Figure 24 S1).
      • Characterized the number of type-to-type connections between FB columnar neurons as a function of the percentage of total possible neurons that are actually connected (New Figure 33—figure supplement 1B).

      Reviewer #1 (Public Review):

      [...] This makes this paper not only an invaluable resource on the connectome of the Drosophila central complex, but also a most comprehensive review on the current state of the art in central-complex research. This unifying approach of the paper clearly marks a reset of central-complex research, essentially providing a starting point of hundreds of new lines of enquiry, probably for decades to come.

      We thank the reviewer for their generous comments. We are excited by the prospect of this manuscript being helpful for projects from many labs working in different insects.

      [...] The figures are equally overwhelming as the text at first sight, but when taking the time to digest each one in detail, they present the data in a rich and clear manner. The figures are often encyclopedic and will serve as reference about the central complex for years. The summary graphs that are presented in regular intervals are welcome resting places for the reader, helping to digest all the detailed information that has preceded or that will follow.

      We realize that the length and complexity of the manuscript makes it difficult to get through and to process. We are glad that the reviewer found the paper’s organization helpful in that regard. Although we have made many additions and edited the text and figures, we have tried to preserve the original organization and the mini-summaries with each section that the reviewer found useful.

      The analysis performed in the paper is excellent, comprehensive and should set the standard for any future work on this topic. Also, the text is very honest about the limits of the conclusions that can be reached based on this kind of data, which is important in generating realistic and feasible hypotheses for future experiments.

      We appreciate the reviewer’s comments on the analysis. We are making all the analysis code available open source, and have tried to package it in a way that we hope will make it easy for others to use and build upon.

      Reviewer #2 (Public Review):

      [...] This is a massive work. There are 75 figures, not including supplements, and numerous region and neuron names to keep track of (not to mention visualize). It is impossible to read in a single sitting. So for the purposes of this public review, I highly recommend to any reader that they first find the region of the paper they're interested in and skip to that to view in side-by-side mode. The "generally interested" reader is best served by reading through the Discussion, which has more of the structure-function analyses in it and then referring to the Results as their curiosity warrants.

      We thank the reviewer for their comments.

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] In the model, the mitigation function is fitted; no actual data on deliberate versus randomly-varying behavior change is used. Given clear empirical signals of synchronous and delibate response to epidemiology, modulated by social factors (Weill et al., 2020), a persuasive demonstration that consideration of random behavioral variation is necessary and/or sufficient to explain observed US COVID-19 dynamics would need to start from mobility data itself, and then find some principled way of partitioning changes in mobility into those attributable to random variation versus deliberate (whether top-down or bottom-up) action.

      As suggested by our referees and the editor, we undertook a principled analysis of the US COVID-19 data that took into account Google mobility patterns. The average mobility reflects systematic changes in social activity due to both government-imposed mitigations and knowledge-based adaptation of the population. We identified a range of dates (July 2020- February 2021) during which there has been only modest and slow changes in the average mobility. This time range allows for a direct test of our model, accounting for stochastic changes in social activity uncorrelated across the population (see Figure 6 and Appendix 5. Figure 1A).

      In the new version, we also present a direct comparison of the predictive power of our SSA model vs the traditional SIR model within this time range (see Figure 8 and Appendix 5. Figure 2).

      My other main concern is that the central result of transient epidemiological dynamics due to transient concordance of abnormally high versus low social activity-stems from the choice to model social behavior as stochastic but also mean-seeking. While I find this idealization plausible, I think it would be good to motivate it more.

      In other words, the central, compelling message of the paper is that if collective activity levels sometimes spike and crash, but ultimately regress to the mean, so will transmission. The more that behavioral model can be motivated, the more compelling the paper will be.

      We included an additional justification of our form of stochastic social dynamics and expanded the discussion of relevant prior studies. Especially revealing are the studies of burstiness in virtual communication such as e-mail (Vazquez et al. (2007); Karsai et al. (2012)). Studies of digital communications can be easily studied over a substantial time interval, which is more problematic for field studies of face-to-face contact networks. These studies unequivocally show the regression of individual activity levels towards its long-term mean value. This regression happens over a well-defined relaxation time ranging from days to months depending on the context. Note that the value towards which the activity regresses may not be identical for different individuals. In the context of our model, such persistent heterogeneity is captured by the distribution of \alpha_i with the dispersion parameter \kappa.

    1. Author Response:

      Joint Public Review:

      Using an impressive combination of endothelial-specific knockouts, the investigators provide strong evidence for the following signaling pathway in pulmonary arteries: Activation of Pannexin1 to the release of ATP-activation to P2Y receptors activation of PKC to activation of TRPV4 channels, anchored by caveolin1.

      The study by Daneva et al examines the link between Cav-1, Panx1, P2Y2R and PKC in modulating TRPV4 channel activity. The authors hypothesize that activation of this signaling pathway, and specifically TRPV4, controls the reactivity of pulmonary arteries and contributes to fine-tuning pulmonary arterial pressure. To examine this hypothesis, the authors deployed an impressive number of techniques that include several endothelial-specific knockouts of key members of the signaling pathway, optical and electrical patch clamping, MRI and in situ proximity ligation assay (PLA). The data seem of high quality and for the most part, supportive of the conclusions of the study. The results may have broad implications for regulating pulmonary artery regulation and potential identification of novel targets to treat pulmonary artery dysfunction.

      1) The physiological role of the proposed pathway is unclear. PAP is normally low (8 - 20 mm Hg). Are the authors proposing that this pathway is always engaged to maintain low PAP? If so, then how is Pannexin 1 being tonically activated? This would also imply that there exists a tonic constrictor pathway which Pannexin1-TRPV4 opposes. Does this exist? Or the proposed Panx1-V4 pathway only engaged in the face of pulmonary hypertension. It is hard to envision a dilatory pathway when the system is already at low pressure, i.e., relaxed.

      We appreciate the feedback from the Reviewers and the Editor. It has generally been considered that PAs, due to low intraluminal pressure, are relaxed/low-resistance. However, this assumption results from the lack of detailed studies on pressure-induced (myogenic) constriction in small PAs under resting conditions. In this manuscript we provide evidence that small PAs (50-100 microns) show myogenic constriction (Fig. 2B, 2D and 3G), but large PAs (> 200 microns) do not (Supplemental Fig. 2A). We also show that PAs from endothelial Panx1-/-, TRPV4-/-, and P2Y2R-/- mice develop significantly higher myogenic constriction compared to PAs from the respective control mice (Fig. 2B, 2D and 3G). These data strongly support tonic activation of endothelial Panx1–P2Y2R–TRPV4 channel pathway and its dilatory effect under basal conditions. In addition to myogenic constriction, agonist-induced constriction was also higher in PAs from endothelial Panx1-/-, TRPV4-/-, and P2Y2R-/- mice compared to the control mice (Fig. 2C, 2E, and 3H). The detailed studies of myogenic constriction of PAs and mechanisms involved will be published in a separate manuscript.

      As the reviewer pointed out, it is plausible that the Panx1-dependent signaling is altered in pulmonary hypertension (PH), a possibility that has not been tested. In this regard we have shown that endothelial TRPV4 channel activity is impaired in PAs from PH patients and mice models of PH1.

      2) The use of the term, "small, resistance-sized pulmonary arteries" is curious. Pulmonary arteries have low resistance and pressure. What is the basis of using this term?

      While the general opinion is that PAs are low-resistance or are in a completely relaxed state, there are no detailed studies showing a lack of myogenic constriction in pressurized small PAs under basal conditions. Our new PA pressure myography data show that small PAs (~ 50-100 microns) develop pressure-induced/myogenic constriction, whereas large PAs (~ 200 microns or more) do not (Fig. 2B, 2D, and 3G, Supplemental Fig. 2A). We use the term “resistance-sized PAs” to describe PAs that show myogenic constriction (~ 50-100 microns, used in this study). We present evidence that PAs develop myogenic constriction at the physiological intraluminal pressure (15 mm Hg, Fig. 2B, 2D, and 3G). We also show that PAs from endothelial Panx1-/-, TRPV4-/-, and P2Y2R-/- mice develop significantly higher myogenic constriction compared to PAs from the respective control mice. Thus, endothelial knockout of Panx1, P2Y2R, or TRPV4 channel increases PA contractility and elevates PAP.

      3) The major concern is related to conceptual significance. The reviewer appreciates that the work presented here connects Cav-1, Panx1, P2Y2R, PKC and TRPV4 into a signaling axis regulating pulmonary artery reactivity. However, this group has already published similar papers implicating a role for this axis in pulmonary arteries (and a similar axis in systemic arteries), and comparable conclusions have been reached by examining members of the pathway independently. Therefore, it is unclear what new conceptual information is gained, other than the link between all the proteins in the complex. Perhaps the authors could highlight more the major gaps in knowledge and novel aspects of their work.

      We thank the reviewers and the Editor for identifying the strengths of the manuscript and for their constructive feedback. We recently reported that endothelial TRPV4 channels decrease the contractility of small pulmonary arteries (PAs) and lower resting pulmonary arterial pressure (PAP) . Moreover, exogenous ATP activated endothelial TRPV4 channels to dilate PAs. However, the regulation of TRPV4 channel activity by endogenously released ATP, the source of endogenously released ATP, and the precise signaling mechanisms for ATP activation of endothelial TRPV4 channels were not known. In the current manuscript, we present a novel signaling axis whereby ATP efflux through endothelial Pannexin 1 (Panx1) activates nearby TRPV4 channels via purinergic receptor signaling to lower PA contractility and PAP. Following key findings contribute to the high conceptual significance and novelty of the study:

      1) First evidence, using endothelial knockout mice, that ATP efflux through endothelial Panx1 lowers PA contractility and PAP. Notably, previous studies have shown that endothelial Panx1 activity does not contribute to vasodilation in systemic arteries and systemic blood pressure regulation.

      2) First direct evidence that ATP efflux through Panx1 promotes endothelial TRPV4 channel activity in PAs, but TRPV4 channel activity does not regulate ATP efflux through Panx1 under resting conditions.

      3) First evidence that ATP effluxed through endothelial Panx1 stimulates purinergic P2Y2 receptor (P2Y2R) signaling to activate TRPV4 channels and lower PA contractility and resting PAP.

      4) Earlier, we showed that endothelial caveolin-1 (Cav-1) lowers the resting PAP. In the current manuscript, we provide evidence that endothelial Cav-1 provides a signaling scaffold for Panx1, P2Y2R, and TRPV4 channels, ensuring their spatial proximity in PAs. Activation of the endothelial Panx1–P2Y2 receptor–TRPV4 channel pathway, enabled by the Cav-1 scaffold, lowers PA contractility and PAP.

      5) PAs are a high-flow vascular bed, yet flow-induced endothelial signaling is poorly understood in PAs. We provide evidence that physiological flow/shear stress increases luminal ATP release through endothelial Panx1 activation.

      We have now modified the Introduction and other sections of the manuscript to highlight the conceptual significance and novelty of the results presented in this manuscript.

      4) The major strengths of the study include the use of EC-specific conditional knockouts of Panx1, TRPV4 and P2Y2R that allowed them to focus on the role played by these protein in the endothelium; the state-of-the-art measurement of TRPV4 Ca2+ sparklets and TRPV4 currents; the use of pressure myography to close-the-loop between the ex vivo studies of TRPV4 sparklets and their in vivo measurement of Right ventricular systolic pressure (RVSP as a surrogate for PAP); their measurement of Right heart mass and function to exclude major effects on heart function as a cause of the observed increase in RVSP; and the use of transfected HEK293 cells to examine the role played by caveolin-1 in the signaling pathway.

      Thank you for identifying the strengths of our manuscript. We previously reported that endothelial TRPV4 sparklets dilate PAs via eNOS activation. Specifically, TRPV4 channel activation increased NO levels, an effect that was absent in PAs from eNOS-/- mice. Moreover, TRPV4 channel-induced vasodilation was abolished by NOS inhibitor L-NNA. Also, in endothelial TRPV4-/- mice, endothelial NO levels were reduced . We have now cited these studies. We also show that physiological flow/shear stress activates ATP efflux through endothelial Panx1.

      6) The results shown support the authors hypothesis and provide new drug and molecular targets to modulate pulmonary vascular resistance, particularly in disease states where endothelial function is compromised.

      We agree that our data provide multiple targets for lowering pulmonary artery contractility and pulmonary arterial pressure, including Panx1, P2Y2 receptors, TRPV4 channels, and Cav-1.

    1. Author Response:

      Reviewer #1:

      Yan et al. take a comprehensive look at structural variants in the 1000 Genomes Project high-coverage dataset, using recent developments that can link short- and long-read data. Combined with genomic simulations, they identify and characterize the timing and origin of a likely selected region in Southeast Asian populations. The combination of multiple data types adds depth to the interpretation.

      The study is timely, combing recently released data and methods, and had interesting biological implications. Tree main areas would help interpretation and robustness of the paper:

      Thank you for sharing your enthusiasm for our work!

      1) Further context and interpretation of the original SV set found is needed, for example comparisons to previous work to identify clearer "positive controls" or sanity checks on the method, and to understand what the contribution of the method/dataset/paper is.

      Thank you for this suggestion, which was shared with those of other reviewers. We agree that the previous version of the manuscript placed too much responsibility on readers to track down the relevant content in the references and that a more direct and transparent comparison is warranted. Our set of SVs was carefully curated based on PacBio long- read sequencing data from 15 diverse samples by Audano et al. (2019). We now provide a detailed comparison of these curated SVs to two sets of SVs discovered from short-read sequencing of diverse human samples (Almarri et al., 2020; Sudmant et al., 2015) (lines 80-97). We find that this long-read-discovered SV set includes 89,979 variants (83.4% of long-read SVs) that are not represented in the 1000 Genomes Project (1KGP) or the Human Genome Diversity Project (HGDP). These long-read-specific variants include 30,229 that are “common” (AF ≥ 0.05), or 72.3% of all common SVs. We were also able to rediscover a large proportion of the short-read-discovered SVs in these two datasets, including 66.0% and 17.7% of common SVs in 1KGP and HGDP, respectively (Fig. 1 - S2 and Fig. 1 - S3). These results are consistent with reports from previous studies (Zhao et al., 2021).

      The overlap we describe above is notable given that the much smaller size of the long- read sample set (15 individuals vs. 2,504 for 1KGP and 911 for HGDP), and that the sample sets do not overlap completely (i.e., we expect that many rare or singleton SVs should not be represented in both datasets). We expect that the SVs unique to the short- read datasets reflect both differences in the discovery sample set (i.e., many of the long- read sequenced individuals are also in 1KGP, while none are in HGDP) and a high rate of false positives in short-read-based SV discovery (Nattestad et al., 2018).

      Furthermore, we have released all of our code, along with the SV genotypes (among the long-read sequenced samples [i.e. the input set], as well as based on graph genotyping of the 1000 Genomes cohort). This will enable future work based on these SV genotype calls, while also ensuring reproducibility and facilitating improvements to the genotyping methods. Indeed, we are aware that the data that we released are already being used in several other studies and that the genotyping strategy that we outlined has motivated additional studies being proposed in grant applications by other groups.

      2) The above is particularly important across ancestries/populations which differ in their LD levels. How does population-specific LD patterns impact the ability to detect these SV patterns? and therefore to make cross-population comparisons or infer differences in frequency that are central to the selection scan and the 220 highly differentiated SVs of interest. Perhaps this is in the original methods paper, but is central to this paper so should at least be explained or analyzed.

      The graph genotyping approach does not leverage LD per se, though it is feasible that multiple linked variants could be spanned by a single long read. Instead, the Paragraph genotyping algorithm relies on an on-the-fly realignment of the primary short read sequencing data to a graph encoding the reference genome as well as the variant sequence. There are, however, some interesting implications of the differences in LD across populations for the use of SV genotypes. We quantified the population differences in LD between SVs and nearby SNPs on lines 174-184 and in Fig. 1 - S7. One implication of this result, mirroring the situation for other classes of variation, is that the accuracy of imputation of SVs based on knowledge of SNPs will be lowest in African populations. Conversely, these low rates of LD may improve fine mapping in the same populations, allowing future studies to test whether SVs are enriched for causal effects on expression and other phenotypes. While detailed investigation of imputation and fine-mapping are outside of the scope of our current study, we now discuss these implications in the section that describes patterns of LD (lines 181-184).

      3) The genomic simulations to infer the strength selection was a nice addition, a step beyond common empirically-driven work. It would help to know how to interpret the ABC model in the context of the later finding that the region was introgressed from Neanderthals--the model seems to not include this aspect.

      Thank you for appreciating the value of this section. We believe that introgression of the adaptive IGH haplotype from Neanderthals should not impact our ABC results within the time scale of our simulation. This is because our simulation begins after the introgression event has already occurred and the Neanderthal haplotype is segregating within the human population. A recent study showed that situations like these, in which introgressed variants persist at low frequencies and later undergo selection, may have occurred frequently in human evolutionary history (Yair et al., 2021).

      However, we agree that the impact of the introgression event on our simulation requires clarification. We now discuss this point in the simulation section (lines 489-491), and also cite the paper above. We have additionally moved this section to the end of the paper to better emphasize that it focuses on the history of the Neanderthal haplotype in humans, rather than the introgression event itself.

    1. Author Response:

      Reviewer #1:

      This work provides insight into the effects of tetraplegia on the cortical representation of the body in S1. By using fMRI and an attempted finger movement task, the researchers were able to show preserved fine-grained digit maps - even in patients without sensory and motor hand function as well as no spared spinal tissue bridges. The authors also explored whether certain clinical and behavioral determinates may contribute to preserving S1 somatotopy after spinal cord injury.

      Overall I found the manuscript to be well-written, the study to be interesting, and the analysis reasonable. I do, however, think the manuscript would benefit by considering and addressing two main suggestions.

      1) Provide additional context / rationale for some of the methods. Specific examples below:

      a) The rationale behind using the RSA analysis seemed to be predicated on the notion that the signals elicited via a phase-encoded design can only yield information about each voxel's preferred digit and little-to-no information about the degree of digit overlap (see lines 163-166 and 571-575). While this is the case for conventional analyses of these signals, there are more recently developed approaches that are now capable of estimating the degree of somatotopic overlap from phase-encoded data (see: Da Rocha Amaral et al., 2020; Puckett et al., 2020). Although I personally would be interested in seeing one of these types of analyses run on this data, I do not think it is necessary given the RSA data / analysis. Rather, I merely think it is important to add some context so that the reader is not misled into believing that there is no way to estimate this type of information from phase-encoded signals.

      • Da Rocha Amaral S, Sanchez Panchuelo RM, Francis S (2020) A Data-Driven Multi-scale Technique for fMRI Mapping of the Human Somatosensory Cortex. Brain Topogr 33 (1):22-36. doi:10.1007/s10548-019-00728-6
      • Puckett AM, Bollmann S, Junday K, Barth M, Cunnington R (2020) Bayesian population receptive field modeling in human somatosensory cortex. Neuroimage 208:116465. doi:10.1016/j.neuroimage.2019.116465

      We did not intend to give the impression that inter-finger overlap can only be estimated using RSA. To clarify this, we included a sentence in our methods section stating that inter-finger overlap cannot be estimated using the traditional travelling wave approach, but new methods have estimated somatotopic overlap from travelling wave data. Since our RSA approach lends itself for estimating inter-finger overlap and is currently the gold standard in characterizing these representational patterns, we opt –in accordance with the reviewer’s comment– not to include this additional analysis.

      Revised text Methods:

      “While the traditional traveling wave approach is powerful to uncover the somatotopic finger arrangement, a fuller description of hand representation can be obtained by taking into account the entire fine-grained activity pattern of all fingers. RSA-based inter-finger overlap patterns have been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). RSA-based measures are furthermore not prone to some of the problems of measurements of finger selectivity (e.g., dependence on map thresholds). The most common approach for investigating inter-finger overlap is RSA, as used here, though note that somatotopic overlap has recently been estimated from travelling wave data using an iterated Multigrid Priors (iMGP) method and population receptive field modelling (Da Rocha Amaral et al., 2020; Puckett et al., 2020).”

      b. The rationale for using minimally thresholded (Z>2) data for the Dice overlap analysis as opposed to the threshold used in data visualization (q<0.05) was unclear. Providing the minimally thresholded maps (in Supplementary) would also aid interpretation of the Dice overlap results.

      We followed previously published procedures for calculating the Dice overlap between the two split-halves of the data (Kikkert et al., 2016; J. Kolasinski et al., 2016; Sanders et al., 2019). We used minimally thresholded data to calculate the dice overlap to ensure that our analysis was sensitive to overlaps that would be missed when using high thresholds. We clarified this in the revised manuscript. We thank the reviewer for their suggestion to add a Figure displaying the minimally thresholded split-half hard-edged finger maps - we have added this to the revised manuscript as Figure 2-Figure supplement 1.

      To ensure that our thresholding procedure did not change the results of the dice overlap analysis, we repeated this analysis using split-half maps that were thresholded using a q < 0.05 FDR criterion (as was used to create the travelling wave maps in Figures 2A-B). We found the same results as when using the Z >2 thresholding criterion: Overall, split-half consistency was not significantly different between patients and controls, as tested using a robust mixed ANOVA (F(1,17.69) = 0.08, p = 0.79). There was a significant difference in split- half consistency between pairs of same, neighbouring, and non-neighbouring fingers (F(2,14.77) = 38.80, p < 0.001). This neighbourhood relationship was not significantly different between the control and patient groups (i.e., there was no significant interaction; F(2,14.77) = 0.12, p = 0.89). We have included this analysis and the relating figure as Figure 2- Figure supplement 2 in the revised manuscript.

      Revised text Methods:

      “We followed previously described procedures for calculating the DOC between two halves of the travelling wave data (Kikkert et al., 2016; Kolasinski et al., 2016; Sanders et al., 2019). The averaged finger-specific maps of the first forward and backward runs formed the first data half. The averaged finger-specific maps of the second forward and backward runs formed the second data half. The finger-specific clusters were minimally thresholded (Z>2) on the cortical surface and masked using an S1 ROI, created based on Brodmann area parcellation using Freesurfer (see Figure 2– figure supplement 1 for a visualisation of the minimally thresholded split-half hard-edged finger maps used to calculate the DOC). We used minimally thresholded finger-specific clusters for the DOC analysis to ensure we were sensitive to overlaps that would be missed when using high thresholds. Note that results were unchanged when thresholding the finger-specific clusters using an FDR q < 0.05 criterion (see Figure 2 – figure supplement 2).”

      2) Provide a more thorough discussion - particularly with respect to the possible role of top-down processes (e.g., attention).

      a) The authors discuss a few potential signal sources that may contribute to the maintenance of (and ability to measure) the somatotopic maps; however, the overall interpretation seems a bit "motor efferent heavy". That is, it seems the authors favor an explanation that the activity patterns measured in S1 were elicited by efference copies from the motor system and that occasional corollary discharges or attempted motor movements play a role in their maintenance over time. The authors consider other explanations, noting - for example - the potential role of attention in preserving the somatotopic representations given that attention has been shown to be able to activate S1 hand representations. The mention of this was, however, rather brief - and I believe the issue deserves a bit more of a balanced consideration.

      When the authors consider the possible role of attention in maintaining the somatotopic representations (lines 329-333), they mention that observing others' fingers being touched or attending to others' finger movements may contribute. But there is no mention of attending to one's own fingers (which has been shown to elicit activity as cited). I realize that the patients lack sensorimotor function (and hence may find it difficult to "attend" to their fingers); however, they have all had prior experience with their fingers and therefore might still be able to attend to them (or at least the idea of their digits) such that activity is elicited. For example, it is not clear to me that it would be any more difficult for the patients to be asked to attend to their digits compared to being asked to attempt to move their digits. I would even suggest that attempting to move a digit (regardless of whether you can or not) requires that one attends to the digit before attempting to initiate the movement as well as throughout the attempted motor movement. Because of this, it seems possible that attention-related processes could be playing a role in or even driving the signals measured during the attempted movement task - as well as those involved in the ongoing maintenance of the maps after injury. I don't think this possibility can be dismissed given the data in hand, but perhaps the issue could be addressed by a bit more thorough of a discussion on the process of "attempting to move" a digit (even one that does not move) - and the various top-down processes that might be involved.

      We thank the reviewer for their consideration and insights into the potential mechanisms underlying our results. We have now elaborated further on the possibility that attention- related processes might have contributed to the reported effects, also in consideration of comment 3.4.

      Revised text Discussion:

      “Spared spinal cord tissue bridges can be found in most patients with a clinically incomplete injury, their width being predictive of electrophysiological information flow, recovery of sensorimotor function, and neuropathic pain (Huber et al., 2017; Pfyffer et al., 2021, 2019; Vallotton et al., 2019). However, in this study, spared midsagittal spinal tissue bridges at the lesion level, motor function, and sensory function did not seem necessary to maintain and activate a somatotopic hand representation in S1. We found a highly typical hand representation in two patients (S01 and S03) who did not have any spared spinal tissue bridges at the lesion level, a complete (S01) or near complete (S03) hand paralysis, and a complete (S01) or near complete loss (S03) of hand sensory function. Our predictive modelling results were in line with this notion and showed that these behavioural and structural spinal cord determinants were not predictive of hand representation typicality. Note however that our sample size was limited, and it is challenging to draw definite conclusions from non-significant predictive modelling results.”

      “How may these representations be preserved over time and activated through attempted movements in the absence of peripheral information? S1 is reciprocally connected with various brain areas, e.g., M1, lateral parietal cortex, poster parietal area 5, secondary somatosensory cortex, and supplementary motor cortex (Delhaye et al., 2019). After loss of sensory inputs and paralysis through SCI, S1 representations may be activated and preserved through its interconnections with these areas. Firstly, it is possible that cortico-cortical efference copies may keep a representation ‘alive’ through occasional corollary discharge (London and Miller, 2013). While motor and sensory signals no longer pass through the spinal cord in the absence of spinal tissue bridges, S1 and M1 remain intact. When a motor command is initiated (e.g., in the form of an attempted hand movement) an efference copy is thought to be sent to S1 in the form of corollary discharge. This corollary discharge resembles the expected somatosensory feedback activity pattern and may drive somatotopic S1 activity even in the absence of ascending afferent signals from the hand (Adams et al., 2013; London and Miller, 2013). It is possible that our patients occasionally performed attempted movements which would result in corollary discharge in S1. Second, it is likely that attempting individual finger movements poses high attentional demands on tetraplegic patients. Accordingly, attentional processes might have contributed to eliciting somatotopic S1 activity. Evidence for this account comes from studies showing that it is possible to activate somatotopic S1 hand representations through attending to individual fingers (Puckett et al., 2017) or through touch observation (Kuehn et al., 2018). Attending to fingers during our attempted finger movement task may have been sufficient to elicit somatotopic S1 activity through top-down processes in the tetraplegic patients who lacked hand motor and sensory function. Furthermore, one might speculate that observing others’ or one’s own fingers being touched or directing attention to others’ hand movements or one’s own fingers may help preserve somatotopic representations. Third, it is possible that these somatotopic maps are relatively hardwired and while they deteriorate over time, they never fully disappear. Indeed, somatotopic mapping of a sensory deprived body part has been shown to be resilient after dystonia (Ejaz et al., 2016; though see Burman et al., (2009) and Taub et al., (1998)) and arm amputation (Bruurmijn et al., 2017; Kikkert et al., 2016; Wesselink et al., 2019). Fourth, it is possible that even though a patient is clinically assessed to be complete and is unable to perceive sensory stimuli on the deprived body part, there is still some ascending information flow that contributes to preserving somatotopy (Wrigley et al., 2018). A recent study found that although complete paraplegic SCI patients were unable to perceive a brushing stimulus on their toe, 48% of patients activated the location appropriate S1 area (Wrigley et al., 2018). However, the authors of this study defined the completeness of patients’ injuries via behavioural testing, while we additionally assessed the retained connections passing through the SCI directly via quantification of spared spinal tissue bridges through structural MRI. It is unlikely that spinal tissue carrying somatotopically organised information would be missed by our assessment (Huber et al., 2017; Pfyffer et al., 2019). Our experiment did not allow us to tease apart these potential processes and it is likely that various processes simultaneously influence the preservation of S1 somatotopy and elicited the observed somatotopic S1 activity.”

      Reviewer #2:

      The authors investigate SCI patients and characterize the topographic representation of the hand in sensorimotor cortex when asked to move their hand (which controls could do but patients could not). The authors compare some parameters of topographic map organization and conclude that they do not differ between patients and controls, whereas they find changes in the typicality of the maps that decrease with years since disease onset in patients. Whereas these initial analyses are interesting, they are not clearly related to a mechanistic model of the disorder and the underlying pathophysiology that is expected in the patients. Furthermore, additional analyses on more fine-grained map changes are needed to support the authors' claims. Finally, the major result of changed typicality in the patients is in my view not valid.

      • Concept 1. At present, there is no clear hypotheses about the (expected or hypothesized) mechanistic changes of the sensorimotor maps in the patients. The authors refer to "altered" maps and repeatedly say that "results are mixed" (3 times in the introduction).

      We thank the reviewer for highlighting to us that our introduction and hypotheses were unclear and/or incomplete to them. We have restructured our Introduction to better highlight competing hypotheses on how SCI may change S1 hand representations, the reasons for our analytical approach, and elaborate on our hypotheses.

      Revised text Introduction:

      “Research in non-human primate models of chronic and complete cervical SCI has shown that the S1 hand area becomes largely unresponsive to tactile hand stimulation after the injury (Jain et al., 2008; Kambi et al., 2014; Liao et al., 2021). The surviving finger-related activity became disorganised such that a few somatotopically appropriate sites but also other somatotopically nonmatched sites were activated (Liao et al., 2021). Seminal nonhuman primate research has further demonstrated that SCI leads to extensive cortical reorganisation in S1, such that tactile stimulation of cortically adjacent body parts (e.g., of the face) activated the deprived brain territory (e.g., of the hand; Halder et al., 2018; Jain et al., 2008; Kambi et al., 2014). Although the physiological hand representation appears to largely be altered following a chronic cervical SCI in non-human primates, the anatomical isomorphs of individual fingers are unchanged (Jain et al., 1998). This suggests that while a hand representation can no longer be activated through tactile stimulation after the loss of afferent spinal pathways, a latent and somatotopic hand representation could be preserved regardless of large-scale physiological reorganisation.

      A similar pattern of results has been reported for human SCI patients. Transcranial magnetic stimulation (TMS) studies induced current in localised areas of SCI patient’s M1 to induce a peripheral muscle response. They found that representations of more impaired muscles retract or are absent while representations of less impaired muscles shift and expand (Fassett et al., 2018; Freund et al., 2011a; Levy et al., 1990; Streletz et al., 1995; Topka et al., 1991; Urbin et al., 2019). Similarly, human fMRI studies have shown that cortically neighbouring body part representations can shift towards, though do not invade, the deprived M1 and S1 cortex (Freund et al., 2011b; Henderson et al., 2011; Jutzeler et al., 2015; Wrigley et al., 2018, 2009). Other human fMRI studies hint at the possibility of latent somatotopic hand representations following SCI by showing that attempted movements with the paralysed and sensory deprived body part can still evoke signals in the sensorimotor system (Cramer et al., 2005; Freund et al., 2011b; Kokotilo et al., 2009; Solstrand Dahlberg et al., 2018). This attempted ‘net’ movement activity was, however, shown to substantially differ from healthy controls: Activity levels have been shown to be increased (Freund et al., 2011b; Kokotilo et al., 2009; Solstrand Dahlberg et al., 2018) or decreased (Hotz- Boendermaker et al., 2008), volumes of activation have been shown to be reduced (Cramer et al., 2005; Hotz-Boendermaker et al., 2008), activation was found in somatotopically nonmatched cortical sites (Freund et al., 2011b), and activation was poorly modulated when patients switched from attempted to imagined movements (Cramer et al., 2005). These observations have therefore mostly been attributed to abnormal and/or disorganised processing induced by the SCI. It remains possible though that, despite certain aspects of sensorimotor activity being altered after SCI, somatotopically typical representations of the paralysed and sensory deprived body parts can be preserved (e.g., finger somatotopy of affected hand). Such preserved representations have the potential to be exploited in a functionally meaningful manner (e.g., via neuroprosthetics).

      Case studies using intracortical stimulation in the S1 hand area to elicit finger sensations in SCI patients hint at such preserved somatotopic representations (Fifer et al., 2020; Flesher et al., 2016), with one exception (Armenta Salas et al., 2018). Negative results were suggested to be due to a loss of hand somatotopy and/or reorganisation in S1 of the implanted SCI patient or due to potential misplacement of the implant (Armenta Salas et al., 2018). Whether fine-grained somatotopy is generally preserved in the tetraplegic patient population remains unknown. It is also unclear what clinical, behavioural, and structural spinal cord determinants may influence such representations to be maintained. Here we used functional MRI (fMRI) and a visually cued (attempted) finger movement task in tetraplegic patients to examine whether hand somatotopy is preserved following a disconnection between the brain and the periphery. We instructed patients to perform the fMRI tasks with their most impaired upper limb and matched controls’ tested hands to patients’ tested hands. If a patient was unable to make overt finger movements due to their injury, then we carefully instructed them to make attempted (i.e., not imagined) finger movements. To see whether patient’s maps exhibited characteristics of somatotopy, we visualised finger selectivity in S1 using a travelling wave approach. To investigate whether fine-grained hand somatotopy was preserved and could be activated in S1 following SCI, we assessed inter-finger representational distance patterns using representational similarity analysis (RSA). These inter-finger distance patterns are thought to be shaped by daily life experience such that fingers used more frequently together in daily life have lower representational distances (Ejaz et al., 2015). RSA-based inter-finger distance patterns have been shown to depict the invariant representational structure of fingers in S1 and M1 better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). Over the past years RSA has therefore regularly been used to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g., Dempsey-Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). We closely followed procedures that have previously been used to map preserved and typical somatotopic finger selectivity and inter-finger representational distance patterns of amputees’ missing hands in S1 using volitional phantom finger movements (Kikkert et al., 2016; Wesselink et al., 2019). However, in amputees, these movements generally recruit the residual arm muscles that used to control the missing limb via intact connections between the brain and spinal cord. Whether similar preserved somatotopic mapping can be observed in SCI patients with diminished or no connections between the brain and the periphery is unclear. If finger somatotopy is preserved in tetraplegic patients, then we should find typical inter-finger representational distance patterns in the S1 hand area of these patients. By measuring a group of fourteen chronic tetraplegic patients with varying amounts of spared spinal cord tissue at the lesion level (quantified by means of midsagittal tissue bridges based on sagittal T2w scans), we uniquely assessed whether preserved connections between the brain and periphery are necessary to preserve fine somatotopic mapping in S1 (Huber et al., 2017; Pfyffer et al., 2019). If spared connections between the periphery and the brain are not necessary for preserving hand somatotopy, then we would find typical inter-finger representational distance patterns even in patients without spared spinal tissue bridges. We also investigated what clinical and behavioural determinants may contribute to preserving S1 hand somatotopy after chronic SCI. If spared sensorimotor hand function is not necessary for preserving hand somatotopy, then we would find typical inter-finger representational distance patterns even in patients who suffer from full sensory loss and paralysis of the hand(s).”

      They do not in detail report which results actually have been reported before, which is a major problem, because those prior results should have motivated the analyses the authors conducted. For instance, two of the cited studies found that in SCI patients, only ONE FINGER shifted towards the malfunctioning area (i.e., the small finger) whereas all other fingers were the same. However, the authors do NOT perform single finger analyses but always average their results ACROSS fingers. This is even true in spite of some patients indeed showing MISSING FINGERS as is clearly evident in the figure, and in spite of the clearly reduced distance of the thumb in the patients as is also visible in another figure. Nothing of this is seen in the results, because the ANOVA and analyses never have the factor of "finger". Instead, the authors always average the analyses across finger. The conclusion that the maps do not differ is therefore not justified at present. This severely reduces any conclusions that an be drawn from the data at present.

      We apologise for the lack of clarity. We now added additional detail regarding studies showing altered sensorimotor processing following SCI. We also clarified that we based our analysis steps on previous studies investigating hand somatotopy following deafferentation (i.e., following arm amputation; Kikkert et al., 2016; Wesselink et al., 2019) and somatotopic reorganisation RSA- based inter-finger distance patterns have been shown to depict the invariant representational structure of fingers in S1 and M1 better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). Over the past years RSA has therefore regularly been used to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g. Dempsey-Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). It is believed to be the most appropriate measure to reliably detect subtle changes in somatotopy. We adjusted the text in our revised Introduction section to better highlight this.

      Please note that we do not average across fingers in our RSA typicality procedure. Instead, RSA considers how the (attempted) movement with one finger changes the activity pattern across the whole hand representation. Note that somatotopic reorganisation will change the inter-finger distance measured with this method as previously shown (Kieliba et al., 2021; Kolasinski et al., 2016; Wesselink et al., 2019).

      Still, as per the reviewer’s suggestion, we conducted a robust mixed ANOVA on the RSA distance measures with a within-subjects factor for finger pair (10 levels) and a between- subjects factor for group (2 levels: controls and SCI patients). We did not find a significant group effect (F(1,21.66) = 1.50, p = 0.23). There was a significant difference in distance between finger pairs (F(9,15.38) = 27.22, p < 0.001), but this was not significantly different between groups (i.e., no significant finger pair by group interaction; F(9,15.38) = 1.05, p = 0.45). When testing for group differences per finger pair, the BF only revealed inconclusive evidence (BF > 0.37 and < 1.11; note that we could not run a Bayesian ANOVA due to normality violations). We have added this analysis to the revised manuscript.

      Lastly, we would like to highlight that our argument is that the finger maps can be preserved in the absence of sensory and motor function, but over time they deteriorate and become less somatotopic. As such, we do not aim to state that they are unchanged overall – but rather that they can be unchanged even despite loss of sensory and motor function. We have clarified this in our abstract and manuscript to avoid confusion.

      Revised abstract:

      “Previous studies showed reorganised and/or altered activity in the primary sensorimotor cortex after a spinal cord injury (SCI), suggested to reflect abnormal processing. However,little is knownaboutwhether somatotopically-specific representations can be preserved despite alterations in net activity. In this observational study we used functional MRI and an (attempted) finger movement task in tetraplegic patients to characterise the somatotopic hand layout in primary somatosensory cortex. We further used structural MRI to assess spared spinal tissue bridges. We found that somatotopic hand representations can be preserved in absence of sensory and motor hand functioning, and no spared spinal tissue bridges. Such preserved hand somatotopy could be exploited by rehabilitation approaches that aim to establish new hand-brain functional connections after SCI (e.g., neuroprosthetics). However, over years since SCI the hand representation somatotopy deteriorated, suggesting that somatotopic hand representations are more easily targeted within the first years after SCI.”

      Revised text Methods:

      “Second, we tested whether the inter-finger distances were different between controls and patients using a robust mixed ANOVA with a within-participants factor for finger pair (10 levels) and a between-participants factor for group (2 levels: controls and patients).”

      Revised text Results:

      “We then tested whether the inter-finger distances were different across finger pairs between controls and SCI patients using a robust mixed ANOVA with a within-participants factor for finger pair (10 levels) and a between-participants factor for group (2 levels: controls and patients). We did not find a significant difference in inter-finger distances between patients and controls (F(1,21.66) = 1.50, p = 0.23). The inter-finger distances were significantly different across finger pairs, as would be expected based on somatotopic mapping (F(9,15.38) = 27.22, p < 0.001). This pattern of inter-finger distances was not significantly different between groups (i.e., no significant finger pair by group interaction; F(9,15.38) = 1.05, p = 0.45). When testing for group differences per finger pair, the BF only revealed inconclusive evidence (BF > 0.37 and < 1.11; note that we could not run a Bayesian ANOVA due to normality violations).”

      Revised text Discussion:

      “In this study we investigated whether hand somatotopy is preserved and can be activated through attempted movements following tetraplegia. We tested a heterogenous group of SCI patients to examine what clinical, behavioural, and structural spinal cord determinants contribute to preserving S1 somatotopy. Our results revealed that detailed hand somatotopy can be preserved following tetraplegia, even in the absence of sensory and motor function and a lack of spared spinal tissue bridges. However, over time since SCI these finger maps deteriorated such that the hand somatotopy became less typical.”

      • Concept 2: This also relates to the fact that the most prominent and consistent finding of prior studies was to show changes in map AMPLITUDE in the maps of patients. It is not clear to me how amplitude was measured here, because the text says "average BOLD activity". What should be reported are standard measures of signal amplitude both across the map area and for individual fingers.

      We apologise for the lack of clarity, “average BOLD activity” represented the average z- standardised activity within the S1 hand ROI. To comply with the reviewer’s comment, we adjusted this to the percent signal change underneath the S1 hand ROI and report this instead in our revised manuscript and in revised Figure 3A and revised Figure 3- Figure supplement 1. Note that results were unchanged.

      As per the reviewer’s suggestion, we further extracted the activity levels for individual fingers under finger-specific ROIs. To create finger-specific ROIs, probability finger maps were created based on the travelling wave data of the control group, thresholded at 25% (i.e., meaning that at least 5 out of 18 control participants needed to significantly activate a vertex for this vertex to be included in the ROI), and binarised. We then used the separately acquired blocked design data to extract the corresponding finger movement activity levels underlying these finger-specific ROIs per participant. Per ROI, we then compared the activity level between groups. After correction for multiple comparisons, there was no significant difference between groups for the thumb (U = 93, p = 0.37), index (t(30) = -0.003, p = 0.99), middle (t(30) = 1.11, p = 0.35), ring (t(30) = 2.02, p = 0.13), or little finger (t(30) = 2.14, p = 0.20). We have added this analysis to Appendix 1.

      Note that lower or higher BOLD amplitude levels do not influence our typicality scores per se. Indeed, typical inter-finger representational patterns have been shown to persist even in ipsilateral M1 that exhibited a negative BOLD response during finger movements (Berlot et al., 2019). As long as the typical inter-finger relationships are preserved, brain areas that have low amplitudes of activity can have a typical somatotopic representation.

      Revised text in Methods:

      "The percent signal change for overall task-related activity was then extracted for voxels underlying this S1 hand ROI per participant. A similar analysis was used to investigate overall task-related activity in an M1 hand ROI (see Figure 3- Figure supplement 1). We further compared activity levels in finger-specific ROIs in S1 between groups and conducted a geodesic distance analysis to assess whether the finger representations of the SCI patients were aligned differently and/or shifted compared to the control participants (see Appendix 1)."

      Revised text in Results:

      “Task-related activity was quantified by extracting the percent signal change for finger movement (across all fingers) versus baseline across within the contralateral S1 hand ROI (see Figure 3A). Overall, all patients were able to engage their S1 hand area by moving individual fingers (t(13)=7.46, p < 0.001; BF10=4.28e +3), as did controls (t(17)=9.92, p < 0.001; BF10=7.40e +5). Furthermore, patients’ task-related activity was not significantly different from controls (t(30)=-0.82, p=0.42; BF10=0.44), with the BF showing anecdotal evidence in favour of the null hypothesis.”

      Revised Appendix 1:

      “Percent signal change in finger-specific clusters To assess whether finger movement activity levels were different between patients and controls, we created finger-specific ROIs and extracted the activity level of the corresponding finger movement for each participant. To create the finger-specific ROIs, the probability finger surface maps that were created from the travelling wave data of the control group (see main manuscript) were thresholded at 25% (i.e., meaning that at least 5 out of 18 control participants needed to significantly activate a vertex for this vertex to be included in the ROI), and binarised. We then used the separately acquired blocked design data to extract the finger movement activity levels underlying these finger-specific ROIs. We first flipped the contrast images resulting from each participant’s fixed effects analysis (i.e., that was ran to average across the 4 blocked design runs) along the x-axis for the left-hand tested participants. Each participant’s contrast maps were then resampled to the Freesurfer 2D average atlas and the averaged z-standardised activity level was extracted for each finger movement vs rest contrast underlying the finger-specific ROIs. We compared the activity levels for each finger movement in the corresponding finger ROI (i.e., thumb movement activity in the thumb ROI, index finger movement activity in the index finger ROI, etc.) between groups. After correction for multiple comparisons, there was no significant difference between groups for the thumb (U = 93, p = 0.37), index (t(30) = -0.003, p = 0.99), middle (t(30) = 1.11, p = 0.35), ring (t(30) = 2.02, p = 0.13), or little finger (t(30) = 2.14, p = 0.20).”

      Appendix 1- Figure 1: Finger-specific activity levels in finger-specific regions of interest. A) Finger- specific ROIs were based on the control group’s binarised 25% probability travelling wave finger selectivity maps. B) Finger movement activity levels in the corresponding finger-specific ROIs. There were no significant differences in activity levels between the SCI patient and control groups. Controls are projected in grey; SCI patients are projected in orange. Error bars show the standard error of the mean. White arrows indicate the central sulcus. A = anterior; P = posterior.

      • Concept 3: The authors present a hypothesis on the underlying mechanisms of SCI that does not seem to reflect prior data. The argument is that changes in map alignment relate to maladaptive changes and pain. However, the literature that the authors cite does not support this claim. In fact, Freund 2011 promotes the importance of map amplitude but not alignment, whereas other studies either show no relation of activation to pain, or they even show that map shift relates to LESS pain, i.e., the reverse argument than what the authors say. My impression is that the model that the authors present is mainly a model that is used for phantom pain but not for SCI. Taking this into consideration, the findings the authors present are not surprising anymore, because in fact none of these studies claimed that the affected area should be absent in SCI patients; these papers only say that the other body parts change in location or amplitude, which is something the authors did not measure. It is important to make this clear in the text.

      As the reviewer states, the literature is debated regarding the relationship between reorganisation and pain in SCI patients. We did not highlight this clearly enough. To improve clarity and focus our message we have therefore removed the sentence regarding reorganisation and pain from the Introduction of our revised manuscript. Also taking comment 2.1 and 2.2 into consideration, we have restructured our Introduction.

      We respectfully disagree with the reviewer that our results are not novel or surprising. Whether the full fine-grained hand somatotopy is preserved following a complete motor and sensory loss through tetraplegia has not been considered before. Furthermore, to our knowledge, there is no paper that has inspected the full somatotopic layout in a heterogenous sample of SCI patients and shown that over time since injury, hand somatotopy deteriorates. We indeed cannot make claims regarding the reorganization in S1 with regards to neighbouring cortical areas activating the hand area, as we have now clarified further in the revised Discussion. We now also clarify in our discussion that our result does not exclude the possibility of reorganisation occurring simultaneously and that this is topic for further investigation. As described in the Discussion, it is very possible that reorganisation and preserved somatotopy could co-occur.

      Revised text Discussion:

      “We did not probe body parts other than the hand and could therefore not investigate whether any remapping of other (neighbouring and/or intact) body part representations towards or into the deprived S1 hand cortex may have taken place. Whether reorganisation and preservation of the original function can simultaneously take place within the same cortical area therefore remains a topic for further investigation. It is possible that reorganisation and preservation of the original function could co-occur within cortical areas. Indeed, non-human primate studies demonstrated that remapping observed in S1 actually reflects reorganisation in subcortical areas of the somatosensory pathway, principally the brainstem (Chand and Jain, 2015; Kambi et al., 2014). As such, the deprived S1 area receives reorganised somatosensory inputs upon tactile stimulation of neighbouring intact body parts. This would simultaneously allow the original S1 representation of the deprived body part to be preserved, as observed in our results when we directly probed the deprived S1 hand area through attempted finger movements.”

      • Concept 4: There is yet another more general point on the concept and related hypotheses: Why do the authors assume that immediately after SCI the finger map should disappear? This seems to me the more unlikely hypotheses compared to what the data seem to suggest: preservation and detoriation over time. In my view, there is no biological model that would suggest that a finger map suddenly disappears after input loss. How should this deterioration be mediated? By cellular loss? As already stated above, the finding is therefore much less surprising as the authors argue.

      We did not expect that finger maps would disappear, especially given the case studies using S1 intracortical stimulation studies in SCI patients and the result of preserved somatotopy of the missing hand in amputees. We are not sure which part of the manuscript might have caused this misunderstanding.

      With regards to the reviewer’s comment that there are no models to suggest that fingers maps would disappear: there is competing research on this as we now explain in our revised Introduction. Non-human primate research has shown that the S1 hand area becomes largely unresponsive to tactile hand stimulation after an SCI (Jain et al., 2008; Kambi et al., 2014; Liao et al., 2021). The surviving finger-related activity was shown to be disorganised such that a few somatotopically appropriate sites but also other somatotopically nonmatched sites were activated (Liao et al., 2021). These fingers areas in S1 became responsive to touch on the face. Furthermore, TMS studies that induce current in localised areas of M1 to induce a peripheral muscle response in SCI patients have shown that representations of more impaired muscles retract or are absent (Fassett et al., 2018; Freund et al., 2011a; Levy et al., 1990; Streletz et al., 1995; Topka et al., 1991; Urbin et al., 2019). We do not believe that this indicates that the S1 hand somatotopy is lost, but rather that tactile inputs and motor outputs no longer pass the level of injury. Indeed, non-human primate work showing immutable myelin borders between finger representations in S1 post SCI suggests that a latent hand representation may be preserved. Further hints for such preserved somatotopy comes from fMRI studies showing net sensorimotor activity during attempted movements with the paralysed body part, intracortical stimulation studies in SCI patients, and preserved somatotopic maps of the missing hand in amputees. We have restructured our Introduction accordingly, also taking into consideration comments 2.1, 2.2, and 2.4.

      • Methods & Results. The authors refer to an analyses that they call "typicality" where they say that they assess how "typical" a finger map is. Given this is not a standard measure, I was wondering how the authors decided what a "typical" finger map is. In fact, there are a few papers published on this issue where the average location of each finger in a large number of subjects is detailed. Rather than referring to this literature, the authors use another dataset from another study of themselves that was conduced on n=8 individuals and using 7T MRI (note that in the present study, 3T MRI was used) to define what "typical" is. This approach is not valid. First, this "typical" dataset is not validated for being typical (i.e., it is not compared with standard atlases on hand and finger location), second, it was assessed using a different MRI field strength, third, it was too little subjects to say that this should be a typical dataset, forth, the group differed from the patients in terms of age and gender (i.e., non-matched group), and fifth, the authors even say that the design was different ("was defined similarly", i.e., not the same). This approach is therefore in my view not valid, particularly given the authors measured age- and gender-matched controls that should be used to compare the maps with the patients. This is a critical point because changes in typicality is the main result of the paper.

      We respectfully disagree with the reviewer that the typicality measure is not standard, invalid, and inaccurate. RSA-based inter-finger overlap patterns have been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). RSA-based inter- finger representation measures have been shown to have more within-subject stability (both within the same session and between sessions that were 6 months apart) and less inter-subject variability (Ejaz et al., 2015) than these other measures of somatotopy. RSA-based measures are furthermore not prone to some of the problems of measurements of finger selectivity (e.g., dependence on map thresholds). Indeed, over the past years RSA has become the golden standard to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g. Dempsey-Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). Moreover, various papers have been published in eLife and elsewhere that used the same RSA-based typicality criteria to assess plasticity in finger representations (Dempsey-Jones et al., 2019; Ejaz et al., 2015; Kieliba et al., 2021; Wesselink et al., 2019). We now highlight this in the revised Introduction.

      The canonical RDM used in our study has previously been used as a canonical RDM in a 3T study exploring finger somatotopy in amputees (Wesselink et al., 2019) and was made available to us (note that we did not collect this data ourselves). We aimed to use similar measures as in Wesselink et al (2019) and therefore felt it was most appropriate to use the same canonical RDM. One of the strengths of RSA is it can be used to quantitatively relate brain activity measures obtained using different modalities, across different species, brain areas, brain and behavioural measures etc. (Kriegeskorte et al., 2008). As such, the fact that this canonical RDM was constructed based on data collected using 7T fMRI using a digit tapping task should not influence our results. We however agree with the reviewer it is good to demonstrate that our results would not change when using a canonical RDM based on the average RDM of our age-, sex- and handedness matched control group. We therefore recalculated the typicality of all participants using the controls’ average RDM as the canonical RDM. We found a strong and highly significant correlation in typicality scores calculated using the canonical RDM from the independent dataset and the controls’ average RDM (see figure below). This was true for both the patient (rs = 0.92, p < 0.001; red dots) and control groups (rs = 0.78, p < 0.001; grey dots).

      We then repeated all analysis using these newly calculated typicality scores. As expected, we found the same results as when using a canonical RDM based on the independent dataset (see below for details). This analysis has been added to the revised Appendix 1 and is referred to in the main manuscript.

      Revised text Introduction:

      “To investigate whether fine-grained hand somatotopy was preserved and could be activated in S1 following SCI, we assessed inter-finger representational distance patterns using representational similarity analysis (RSA). These inter-finger distance patterns are thought to be shaped by daily life experience such that fingers used more frequently together in daily life have lower representational distances (Ejaz et al., 2015). RSA-based inter-finger distance patterns have been shown to depict the invariant representational structure of fingers in S1 and M1 better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). Over the past years RSA has therefore regularly been used to investigate somatotopy of finger representations both in healthy (e.g., Akselrod et al., 2017; Ariani et al., 2020; Ejaz et al., 2015; Gooijers et al., 2021; Kieliba et al., 2021; Kolasinski et al., 2016; Liu et al., 2021; Sanders et al., 2019) and patient populations (e.g., Dempsey- Jones et al., 2019; Ejaz et al., 2016; Kikkert et al., 2016; Wesselink et al., 2019). We closely followed procedures that have previously been used to map preserved and typical somatotopic finger selectivity and inter-finger representational distance patterns of amputees’ missing hands in S1 using volitional phantom finger movements (Kikkert et al., 2016; Wesselink et al., 2019).”

      Revised text Results:

      “This canonical RDM was based on 7T finger movement fMRI data in an independently acquired cohort of healthy controls (n = 8). The S1 hand ROI used to calculated this canonical RDM was defined similarly as in the current study (see Wesselink and Maimon- Mor, (2017b) for details). Note that results were unchanged when calculating typicality scores using a canonical RDM based on the averaged RDM of the age-, sex-, and handedness-matched control group tested in this study (see Appendix 1).”

      Revised text Methods:

      “While the traditional traveling wave approach is powerful to uncover the somatotopic finger arrangement, a fuller description of hand representation can be obtained by taking into account the entire fine-grained activity pattern of all fingers. RSA-based inter-finger overlap patterns have been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). RSA-based measures are furthermore not prone to some of the problems of measurements of finger selectivity (e.g., dependence on map thresholds).”

      “Third, we estimated the somatotopic typicality (or normality) of each participant’s RDM by calculating a Spearman correlation with a canonical RDM. We followed previously described procedures for calculating the typicality score (Dempsey-Jones et al., 2019; Ejaz et al., 2015; Kieliba et al., 2021; Wesselink et al., 2019). The canonical RDM was based on 7T finger movement fMRI data in an independently acquired cohort of healthy controls (n = 8). The S1 hand ROI used to calculated this canonical RDM was defined similarly as in the current study (see Wesselink and Maimon-Mor, (2017b) for details). Note that results were unchanged when calculating typicality scores using a canonical RDM based on the averaged RDM of the sex-, handedness-, and age matched control group tested in this study (see Appendix 1).”

      Revised text Appendix 1:

      “Typicality analysis using a canonical RDM based on the controls’ average RDM

      To ensure that our typicality results did not change when using a canonical inter-finger RDM based on the age-, sex-, and handedness matched subjects tested in this study, we recalculated the typicality scores of all participants using the averaged inter-finger RDM of our control sample as the canonical RDM. We found a strong and highly significant correlation between the typicality scores calculated using the canonical inter-finger RDM from the independent dataset (reported in the main manuscript) and the typicality scores calculated using our controls’ average RDM. This was true for both the SCI patient (rs = 0.92, p < 0.001) and control groups (rs = 0.78, p < 0.001).

      We then repeated all typicality analysis reported in the main manuscript. As expected, using the typicality scores calculated using our controls’ average RDM we found the same results as when using the canonical inter-finger RDM from the independent dataset: There was a significant difference in typicality between SCI patients, healthy controls, and congenital one-handers (H(2)=27.61, p < 0.001). We further found significantly higher typicality in controls compared to congenital one-handers (U=0, p < 0.001; BF10=76.11). Importantly, the typicality scores of the SCI patients were significantly higher than the congenital one-handers (U=2, p < 0.001; BF10=50.98), but not significantly different from the controls (U=94, p=0.24; BF10=0.55). Number of years since SCI significantly correlated with hand representation typicality (rs=-0.54, p=0.05) and patients with more retained GRASSP motor function of the tested upper limb had more typical hand representations in S1 (rs=0.58, p=0.03). There was no significant correlation between S1 hand representation typicality and GRASSP sensory function of the tested upper limb, spared midsagittal spinal tissue bridges at the lesion level, or cross-sectional spinal cord area (rs=0.40, p=0.15, rs=0.50, p=0.10, and rs=0.48, p=0.08, respectively). An exploratory stepwise linear regression analysis revealed that years since SCI significantly predicted hand representation typicality in S1 with R2=0.33 (F(1,10)=4.98, p=0.05). Motor function, sensory function, spared midsagittal spinal tissue bridges at the lesion level, and spinal cord area did not significantly add to the prediction (t=1.31, p=0.22, t=1.62, p=0.14, t=1.70, p=0.12, and t=1.09, p=0.30, respectively).”

      • Methods & Results: The authors make a few unproven claims, such as saying "generally, the position, order of finger preference, and extent of the hand maps were qualitatively similar between patients and control". There are no data to support these claims.

      As indicated in this sentence, this claim substantiated a qualitative inspection of the finger maps in Figure 2 and we indeed do not support this claim with quantitative analysis. We have therefore removed this sentence from the revised manuscript and instead say, as per the suggestion of reviewer 1, that overall, there were aspects of somatotopic finger selectivity in the SCI patients’ hand maps,

      Revised text Results:

      “Overall, we found aspects of somatotopic finger selectivity in the maps of SCI patients’ hands, in which neighbouring clusters showed selectivity for neighbouring fingers in contralateral S1, similar to those observed in eighteen age-, sex-, and handedness matched healthy controls (see Figure 2A&B). A characteristic hand map shows a gradient of finger preference, progressing from the thumb (red, laterally) to the little finger (pink, medially). Notably, a characteristic hand map was even found in a patient who suffered complete paralysis and sensory deprivation of the hands (Figure 2. patient map 1; patient S01). Despite most maps (Figure 2, except patient map 3) displaying aspects of characteristic finger selectivity, some finger representations were not visible in the thresholded patient and control maps.”

      • Methods & Results: The authors argue that the map architecture is topographic as soon as the dissimilarity between two different fingers is above 0. First, what I am really wondering about is why the authors do not provide the exact dissimilarity values in the text but only give the stats for the difference to 0 (t-value, p-value, Bayes factor). Were the dissimilarity values perhaps very low? The values should be reported. Also, when this argument that maps are topographic as long as the value of two different fingers is above 0 should hold, then the authors have to show that the value for mapping the SAME finger is indeed 0. Otherwise, this argument is not convincing.

      We would like to clarify that a representation is not per se topographic when the RSA dissimilarity is > 0. The dissimilarity value provided by RSA indicates the extent to which a pair of conditions is distinguished – it can be viewed as encapsulating the information content carried by the region (Kriegeskorte et al., 2008). Due to cross-validation across runs, the expected distance value would be zero (but can go below 0) if two conditions’ activity patterns are not statistically different from each other, and larger than zero if there is differentiation between the conditions (fingers’ activity patterns in the S1 hand area in our case; Kriegeskorte et al., 2008; Nili et al., 2014). The diagonal of the RDM reflect comparisons between the same fingers and therefore reflect distances between the exact same activity pattern in the same run and are thus 0 by definition (Kriegeskorte et al., 2008; Nili et al., 2014). This was also the case in our individual participant RDMs. Since this is not a meaningful value (a distance between 2 identical activity patterns will always be 0) we chose not to report this. We have clarified the meaning of the separability measure in the revised Methods section.

      To investigate whether a representation is somatotopic, we have to take into account the full fine-grained inter-finger distance pattern. The full fine-grained inter-finger distance pattern is related to everyday use of our hand and has been shown to depict the invariant representational structure of fingers better than the size, shape, and exact location of the areas activated by finger movements (Ejaz et al., 2015). To determine whether a participant’s inter-finger distance pattern is somatotopic one should associate it to a canonical RDM – which is done in the typicality analysis (see also our response to comment 2.6).

      What can be done to demonstrate the validity of an ROI, is to run RSA on a control ROI where one would not expect to find activity that is distinguishable between finger conditions. Rather than comparing your separability measure against 0, one can then compare the separability of your ROI that is expected to contain this information to that of your control ROI. We created a cerebral spinal fluid (CSF) ROI, repeated our RSA analysis in this ROI, and then compared the separability of the CSF and S1 hand area ROIs. As expected, there was a significant difference between separability (or representation strength) in the S1 hand area and CSF ROIs for both controls (W=171, p < 0.001; BF10=4059) and patients (W=105, p < 0.00; BF10=279). This analysis has been added to the revised manuscript.

      Individual participant separability values (i.e., distances averaged across fingers) are visualised in Figure 3D. Following the reviewer’s suggestion, we have included individual participant inter-finger distance plots for both the controls and SCI patients as Figure 3- Figure supplement 2 and Figure 3- figure supplement 3, respectively. The inter-finger distances for each finger pair and subject can be extracted from this. We feel this is more readily readable and interpretable than a table containing the 10 inter-finger distance scores for all 32 participants. These values have instead been made available online, together with our other data, on https://osf.io/e8u95/.

      Revised text Methods:

      “If there is no information in the ROI that can statistically distinguish between the finger conditions, then due to cross-validation the expected distance measure would be 0. If there is differentiation between the finger conditions, the separability would be larger than 0 (Nili et al., 2014). Note that this does not directly indicate that this region contains topographic information, but rather that this ROI contains information that can distinguish between the finger conditions. To further ensure that our S1 hand ROI was activated distinctly for different fingers, we created a cerebral spinal fluid (CSF) ROI that would not contain finger specific information. We then repeated our RSA analysis in this ROI and statistically compared the separability of the CSF and S1 hand area ROIs.”

      Revised text Results:

      “We found that inter-finger separability in the S1 hand area was greater than 0 for patients (t(13) = 9.83, p < 0.001; BF10 = 6.77e +4) and controls (t(17) = 11.70, p < 0.001; BF10 = 6.92e +6), indicating that the S1 hand area in both groups contained information about individuated finger representations. Furthermore, for both controls (W = 171, p < 0.001; BF10 = 4059) and patients (W = 105, p < 0.001; BF10 = 279) there was significant greater separability (or representation strength) in the S1 hand area than in a control cerebral spinal fluid ROI that would not be expected to contain finger specific information. We did not find a significant group difference in inter-finger separability of the S1 hand area (t(30) = 1.52, p = 0.14; BF10 = 0.81), with the BF showing anecdotal evidence in favour of the null hypothesis.”

      • Discussion. The authors argue that spared midsagittal spinal tissue bridges are not necessary because they were not predictive of hand representation typicality. First, the measure of typicality is questionable and should not be used to make general claims about the importance of structural differences. Second, given there were only n=14 patients included, one may question generally whether predictive modelling can be done with these data. This statement should therefore be removed.

      We would like to clarify that, like the reviewer, we do not believe that spared midsagittal spinal tissue bridges are unimportant. Indeed, a large body of our own research focuses on the importance of spared spinal tissue bridges in recovery of sensorimotor function and pain (Huber et al., 2017; Pfyffer et al., 2021, 2019; Vallotton et al., 2019). We have added a clarification sentence regarding the importance of tissue bridges with regards to recovery of function. We agree with the reviewer that given our limited sample size, it is difficult to make conclusive claims based on non-significant predictive modelling and correlational results. In the revised manuscript we therefore focus this statement (i.e., that sensory and motor hand function and tissue bridges are not necessary to preserve hand somatotopy) on our finding that two patients without spared tissue bridges at the lesion level and with complete or near complete loss of sensory and motor hand function had a highly typical hand representation. We present our predictive modelling results as being in line with this notion and added a word of caution that it is challenging to draw definite conclusions from non-significant predictive modelling and correlation results in such a limited sample size.

      With regards to the reviewer’s concern about the validity of the typicality measure – please see our detailed response to comment 2.6.

      Revised text Discussion:

      “Spared spinal cord tissue bridges can be found in most patients with a clinically incomplete injury, their width being predictive of electrophysiological information flow, recovery of sensorimotor function, and neuropathic pain (Huber et al., 2017; Pfyffer et al., 2021, 2019; Vallotton et al., 2019). However, in this study, spared midsagittal spinal tissue bridges at the lesion level and sensorimotor hand function did not seem necessary to maintain and activate a somatotopic hand representation in S1. We found a highly typical hand representation in two patients (S01 and S03) who did not have any spared spinal tissue bridges at the lesion level, a complete (S01) or near complete (S03) hand paralysis, and a complete (S01) or near complete loss (S03) of hand sensory function. Our predictive modelling results were in line with this notion and showed that these behavioural and structural spinal cord determinants were not predictive of hand representation typicality. Note however that our sample size was limited, and it is challenging to draw definite conclusions from non-significant predictive modelling results.”

      • Discussion. The authors say that hand representation is "preserved" in SCI patients. Perhaps it is better to be precise and to say that they active during movement planning.

      We thank the reviewer for their suggestion and revised the Discussion accordingly.

      Revised text Discussion:

      "In this study we investigated whether hand somatotopy is preserved and can be activated through attempted movements following tetraplegia."

      "How may these representations be preserved over time and activated through attempted movements in the absence of peripheral information?"

      "Together, our findings indicate that in the first years after a tetraplegia, the somatotopic S1 hand representation is preserved and can be activated through attempted movements even in the absence of retained sensory function, motor function, and spared spinal tissue bridges."

      Reviewer #3:

      The demonstration that cortex associated with an amputated limb can be activated by other body parts after amputation has been interpreted as evidence that the deafferented cortex "reorganizes" and assumes a new function. However, other studies suggest that the somatotopic organization of somatosensory cortex in amputees is relatively spared, even when probed long after amputation. One possibility is that the stability is due to residual peripheral input. In this study, Kikkert et al. examine the somatotopic organization of somatosensory cortex in patients whose spinal cord injury has led to tetraplegia. They find that the somatotopic organization of the hand representation of somatosensory cortex is relatively spared in these patients. Surprisingly, the amount of spared sensorimotor function is a poor predictor of the stability of the patients' hand somatotopy. Nonethless, the hand representation deteriorates over decades after the injury. These findings contribute to a developing story on how sensory representations are formed and maintained and provide a counterpoint to extreme interpretations of the "reorganization" hypothesis mentioned above. Furthermore, the stability of body maps in somatosensory cortex after spinal cord injury has implications for the development of brain-machine interfaces.

      I have only minor comments:

      1) Given the controversy in the field, the use of the phrase "take over the deprived territory" (line 45) is muddled. Perhaps a more nuanced exposition of this phenomenon is in order?

      We agree a more nuanced expression would be more appropriate. We have changed this sentence accordingly in the revised manuscript.

      Revised text Introduction:

      “Seminal research in nonhuman primate models of SCI has shown that this leads to extensive cortical reorganisation, such that tactile stimulation of cortically adjacent body parts (e.g. of the face) activated the deprived brain territory (e.g. of the hand; Halder et al., 2018; Jain et al., 2008; Kambi et al., 2014).”

      2) The statement that "results are mixed" regarding intracortical microstimulation of S1 is dubious. In only one case has the hand representation been mislocalized, out of many cases (several at CalTech, 3 at the University of Pittsburgh, one at Case Western, one at Hopkins/APL, and one at UChicago). Perhaps rephrase to "with one exception?"

      We agree that this sentence may give a wrong outlook on the literature and have changed the text per the reviewer’s suggestion.

      Revised text Introduction:

      “Case studies using intracortical stimulation in the S1 hand area to elicit finger sensations in SCI patients hint at such preserved somatotopic representations (Fifer et al., 2020; Flesher et al., 2016), with one exception (Armenta Salas et al., 2018).”

      3) The phrase "tetraplegic sinal cord injury" seems awkward.

      Thank you for highlighting this to us. We have corrected these instances in our revised manuscript to “tetraplegia”.

      4) The stability of the representation is attributed to efference copy from M1. While this is a fine speculation, somatosensory cortex is part of a circuit and is interconnected with many other brain areas, M1 being one. Perhaps the stability is maintained due to the position of somatosensory cortex within this circuit, and not solely by its relationship with M1? There seems to be an overemphasis of this hypothesis at the exclusion of others.

      Thank you for this comment. We agree we overemphasized the efference copy theory. We have adjusted this and now provide a more balanced description of potential circuits and interconnections that could maintain somatotopic representations after tetraplegia.

    1. Author Response:

      Reviewer #3:

      The authors modified a previously reported hybrid cytochrome bcc-aa3 supercomplex, consisting of bcc from M. tuberculosis and aa3 from M. smegmatis, (Kim et al 2015) by appending an affinity tag facilitating purification. The cryo-EM experiments are based on the authors' earlier work (Gong et al. 2018) on the structure of the bcc-aa3 supercomplex from M. smegmatis. The authors then determine the structure of the bcc part alone and in complex with Q203 and TB47.

      The manuscript is well written and the obtained results are presented in a concise, clear-cut manner. In general, the data support the conclusions drawn.

      We thank the reviewer for this evaluation.

      To this reviewer, the following points are unclear:

      1. The purified enzyme elutes from the gel filtration column as one peak, but there seems to be no information given on the subunit composition and the enzymatic activity of the purified hybrid cytochrome bcc-aa3 supercomplex.

      See answers to Question 1 from the major Essential Revisions and Question 1 from the minor Essential Revisions.

      "We have now shown that the purified chimeric supercomplex is a functional assembly with a (mean ± s.d., n = 4), in agreement with the previous study that shows M. tuberculosis CIII can functionally complement native M. smegmatis CIII and maintain the growth of M. smegmatis (Kim et al., 2015). The in vitro inhibitions of this enzyme by Q203 and TB47 was determined by means of an DMNQH2/oxygen oxidoreductase activity assay. In the assay, 500 nM Q203 or TB47 was chosen, which is close to the median inhibitory concentration (IC50) obtained from the menadiol-induced oxygen consumption in our previous study (Gong et al., 2018). After addition of Q203 and TB47, the values of turnover number of the hybrid supercomplex are reduced to 5.8 +/- 2.4 e-s-1 (Figure 4-figure supplement 4) and 5.1 +/- 2.9 e-s-1 (Figure 5-figure supplement 4) respectively, from 23.3 +/- 2.4 e-s-1. We have incorporated this new data into the text (lines 90-93, 187-189, 206-209)."

      "The subunit composition of the purified enzyme has now been provided in Figure 2-figure supplement 1."

      1. It is unclear what is the conclusion of the structure comparison (Fig 6) is regarding the affinity of Q203 for M. smegmatis.

      The structural comparison indicates that Q203 should have a similar binding mechanism and a similar effect on the activity of cytochrome bcc from M. smegmatis and M. tuberculosis. This is in good agreement with previous antimycobacterial activity data and inhibition data for the bcc complexes from M. smegmatis and M. tuberculosis (Gong et al., 2018; Lu et al., 2018a). These have now been incorporated into the revised manuscript (line 223-227).

    1. Author Response:

      Reviewer #1:

      In this manuscript Hill et al, analyze immune responses to vaccination of adults with the seasonal influenza vaccine. They perform a detailed analysis of the hemagglutinin-specific binding antibody responses against several different strains of influenza, and antigen-specific CD4+ T cells/T follicular cells, and cytokines in the plasma. Their analysis reveals that: (i) tetramer positive, HA-specific T follicular cells induced 7 days post vaccination correlate with the binding Ab response measured 42 days later; (ii) the HA-specific T fh have a diverse TCR repertoire; (iii) Impaired differentiation of HA-specific T fh in the elderly; and (iv) identification of an "inflammatory" gene signature within T fh in the elderly, which is associated with the impaired development of HA-specific Tfh.

      The paper addresses a topic of considerable interest in the fields of human immunology and vaccinology. In general the experiments appear well performed, and support the conclusions. However, the following points should be addressed to enhance the clarity of the paper, and add support to the key conclusions drawn.

      We thank the reviewer for their supportive evaluation of the manuscript, and have provided the details of how we have addressed each the points raised below.

      1) Abstract: "(cTfh) cells are the best predictor of high titre antibody responses.." Since the authors have not done any blind prediction using machine learning tools with independent cohort, the sentence should be rephrased thus: "cTfh) cells are were associated with high titre antibody responses."

      We agree that this phrasing better reflects the presented data. The sentence in the abstract (page 2) now reads “we show that formation of circulating T follicular helper (cTfh) cells was associated with high titre antibody responses.”

      2) Figure 1A: Please indicate the age range of the subjects.

      Figure 1 has been updated to include the age range of the subjects.

      3) Almost all the data in the paper shows binding Ab titers. Yet, typically HAI titers of MN titers are used to assess Ab responses to influenza. Fig 1C shows HAI titers against the H1N1 Cal 09 strain. Can the authors show HAI titers for Cal 09 and the other A and B strains contained in the 2 vaccine cohorts? Do such HAI titers correlate with the tetramer positive cells, similar to the correlations show in Fig 2e.

      In this manuscript we have deliberately focussed on the immune response to the H1N1 Cal09 strain, as it is the only influenza strain in the vaccine common to both cohorts. The HAI titre for this strain is now shown as supplementary figure 4. In addition, the class II tetramers were specifically selected to recognise unique epitopes in the Cal 09 strain (J. Yang, {..} W. W. Kwok, CD4+ T cells recognize unique and conserved 2009 H1N1 influenza hemagglutinin epitopes after natural infection and vaccination. Int Immunol 25, 447-457, 2013) because of this we do not think it is appropriate to correlate HAI titres for the non-Cal 09 strains with tetramer positive cells. We agree that showing the correlation of cTfh and other immune parameters with the HAI titres for Cal 09 is important and have included this as supplementary figure 7. The new data and text are presented below:

      Figure 1-figure supplement 4: HAI responses before and after vaccination A) Log2 HAI titres at baseline (d0), d7 and d42 for cohort 1 (n=16) and B) cohort 2 (n = 21). C) Correlation between HAI and A.Cali09 IgG as measured by Luminex assay for cohort 1 and 2 combined. p-values determined using paired Wilcoxon signed rank-test, and Pearson’s correlation.

      Text changes. Page 4. “The increase in anti-HA antibody titre was coupled with an increase in hemagglutination inhibitory antibodies to A.Cali09, the one influenza A strain contained in the TIVs that was shared across the two cohorts and showed a positive correlation with the A.Cali09 IgG titres measured by Luminex assay (Fig. 1C, Figure 1-figure supplement 4).”

      Figure 2-figure supplement 1: Correlations between HAI assay titres and selected immune parameters. Correlation between vaccine-induced A.Cali09 HAI titres at d42 with selected immune parameters in both Cohort 1 and Cohort 2 (n=37). Dot color corresponds to the cohort (black = Cohort 1, grey = Cohort 2). Coefficient (Rho) and p-value determined using Spearman’s correlation, and line represents linear regression fit.

      Results text Changes: Page 5. “Similar trends were seen when these immune parameters were correlated to HAI titres against A/Cali09 (Fig Figure 2-figure supplement 1).”

      4) Fig 2d to i: what % of all bulk activated Tfh at day 7 are tetramer positive? The tetramer positive T cells constitute roughly 0.094% of all CD4 T cells (Fig 2d), of which 1/3rd are CXCR5+, PD1+ (i.e. ~0.03% of CD4 T cells). What fraction of all activated Tfh is this subset of tetramer positive cells? Presumably, there will also be Tfh generated against other viral proteins in the vaccine, and these will constitute a significant fraction of all activated Tfh.

      This is an important point, as the tetramers only recognise one peptide epitope of the Cal.09 HA protein, so there will be many other influenza reactive CD4+ T cells that are responding to other Cal 09 epitopes as well as other proteins in the vaccine. The analysis suggested by the reviewer shows that the frequency of Tet+ cells amongst bulk cTfh cells ranges from 0.14%-1.52% in cohort 1, and from 0.022-2.7% in cohort 2. These data have been included as Figure Figure 1-figure supplement 6C, D in the revised manuscript. In addition, Tet+ cells as a percentage of bulk cTfh cells were reduced in older people compared to younger adults. This data has been included in Figure 5-figure supplement 1C in the revised manuscript.

      Figure 1-figure supplement 6: Percentage of cTfh cells that are Tet+ and CXCR3 and CCR6 expression on HA-specific CD4+ T cells. A) Representative flow cytometry gating strategy for CXCR5+PD-1+ cTfh cells on CD4+CD45RA- T cells, and the proportion of HA-specific Tet+ cells within the CXCR5+PD-1+ cTfh cell gate. B) Percentage Tet+ cells within the CXCR5+PD-1+ cTfh cell population. Within-cohort age group differences were determined using the Mann-Whitney U test.

      Results text, page 4: These antigen-specific T cells had upregulated ICOS after immunisation, indicating that they have been activated by vaccination (Fig. 1F, G). In addition, a median of one third of HA-specific T cells upregulated the Tfh markers CXCR5 and PD1 on d7 after immunisation (Fig. 1H, I). The tetramer binding cells represented between 0.022-2.7% of the total CXCR5+PD-1+ bulk population (Fig Figure 1-figure supplement 6A, B).

      Figure 5-figure supplement 1C: Age-related differences in cytokines and HA-specific CD4+ T cell parameters. C) Percentage Tet+ cells within the CXCR5+PD-1+ cTfh cell population. Within-cohort age group differences were determined using the Mann-Whitney U test.

      Results text, page 8: Across both cohorts, the only CD4+ T cell parameters consistently reduced in older individuals at d7 were the frequency of polyclonal cTfh cells and HA-specific Tet+ cTfh cells, with the strongest effect within the antigen-specific cTfh cell compartment (Fig. 5H-J, Figure 5-figure supplement 1C).

      Reviewer #2:

      Hill and colleagues present a comprehensive dataset describing the recall and expansion of HA-specific cTFH cells following influenza immunisation in two cohorts. Using class II tetramers, IgG titres against a large panel of HA antigens, and quantification of plasma cytokines, they find that activated and HA-specific cTFH cells were a strong predictor of the IgG response against the vaccine after 6 weeks. Using RNAseq and TCR clonotype analysis, they find that, in 10/15 individuals, the HA-specific cTFH response at day 7 post-vaccination is recalled from the available CD4 T cell memory pool present prior to vaccination. Post-vaccination HA-specific cTFH cells exhibited a transcriptional profile consistent with lymph node-derived GC TFH, as well as evidence of downregulation of IL-2 signaling pathways relative to pre-vaccine CD4 memory cells.

      The authors then apply these findings to a comparison of vaccine immunogenicity between younger (18-36) and older (>65) adults. As expected, they found lower levels of vaccine-specific IgG responses among the older cohort. Analysis of HA-specific T cell responses indicated that tet+ cTFH fail to properly develop in the older cohort following vaccination. Further analysis suggests that development of HA-specific cTFH in older individuals is not caused by a lack of TCR diversity, but is associated with higher expression of inflammation-associated transcripts in tet+ cTFH.

      Overall this is an impressive study that provides clarity around the recall of HA-specific CD4 T cell memory, and the burst of HA-specific cTFH cells observed 7 days post-vaccination. The association between defective cTFH recall and lower IgG titres post-vaccination in older individuals provides new targets for improving influenza vaccine efficacy in this age group. However, as currently presented, the model of impaired cTFH differentiation in the older cohort and the link to inflammation is somewhat unclear. There are several issues that could be clarified to improve the manuscript in its current form:

      We thank the reviewer for their supportive and comprehensive summary of our work. We agree that the link between impaired inflammation and cTfh differentiation is correlative, we have added new data to address this, including mechanistic data to support chronic IL-2 signalling as antagonistic to cTfh development, as well as providing new analyses to address the other points raised.

      1) It is somewhat unclear the extent to which the reduction in HA-specific cTFH in the older cohort is also related to an overall reduction in T cell expansion - cohort 1 shows a significant reduction in total tet+ CD4 T cells post-vaccination as well as in the cTFH compartment, and while this difference may not reach statistical significance, a similar trend is shown for cohort 2.

      We agree that a possible interpretation is a global failure in T cell expansion in the older individuals. To determine whether there is a relationship between the degree of Tet+ CD4+ T cell expansion and cTfh cell differentiation with age, we performed correlation analyses. There is no correlation between the expansion of Tet+ cells and the frequency of cTfh cells formed seven days after immunisation in either age group. This suggests that the impaired cTfh cell differentiation in older persons is most likely caused by factors other than the capacity of CD4+ T cells to expand after vaccination. These data have been added as Figure 5-figure supplement 1D, and included in the results text on page 8.

      Figure 5-figure supplement 1D: Age-related differences in cytokines and HA-specific CD4+ T cell parameters. D) Correlation between Tet+ cells (d7-d0, % of CD4+) and cTfh (d7-d0, % of TET+) in both cohorts for each age-group (18- 36 y.o n=37, 65+ y.o. n= 39). Dot color corresponds to the cohort (black = Cohort 1, grey = Cohort 2). Coefficient (Rho) and p-value determined using Spearman’s correlation, and line represents linear regression fit.

      Text changes, Page 8: There was no consistent difference in the total d7 Tet+ HA-specific T cell population with age for both cohorts (Fig. 5H) and we observed no age-related correlation between the ability of an individual to differentiate Tet+ cells into a cTfh cell and the overall expansion of Tet+ HA-specific T cell population (Figure 5-figure supplement 1D). Thus, our data suggests that the poor vaccine antibody responses in older individuals is impacted by impaired cTfh cell differentiation (Fig. 5J) rather than size of the vaccine-specific CD4+ T cell pool.

      2) Transcriptomic analysis indicates that HA-specific cTFH in the older cohort show impaired downregulation of inflammation, TNF and IL-2-related signaling pathways. The authors therefore conclude that excess inflammation can limit the response to vaccination. In its current presentation, the data does not necessarily support this conclusion. While it is clear that downregulation of TNF and IL-2 signalling pathways occur during cTFH/TFH differentiation, there is no evidence presented to support the idea that (a) vaccination results in increased pro-inflammatory cytokine production in lymphoid organs in older individuals or that (b) these pro-inflammatory cytokines actively promote CXCR5-, rather than cTFH, differentiation of existing memory T cells.

      We agree with the reviewer that the data presented in figure 7 are correlative, rather than causative. Unfortunately, we do not have access to secondary lymphoid tissues from younger and older people after vaccination to test point (a) above. In order to test the hypothesis that increased inflammatory cytokine production in lymphoid organs limits Tfh cell differentiation we have used Il2cre/+; Rosa26stop-flox-Il2/+ transgenic mice. In this mouse model, IL-2-dependent cre- recombinase activity facilitates the expression of low levels of IL-2 in cells that have previously expressed IL-2. This creates a scenario in which cells that physiologically express IL-2 cannot turn its expression off therefore increasing expression IL-2 after antigenic stimulation (mice reported in Whyte et al., bioRxiv, 2020, doi: https://doi.org/10.1101/2020.12.18.423431).

      Twelve days after influenza A infection, Il2cre/+; Rosa26stop-flox-Il2/+ transgenic mice have fewer Tfh cells in the draining mediastinal lymph node and in the spleen (Fig. 8A-C), this is accompanied by a reduction in the magnitude of the GC B cell response (Fig. 8D-E). These data provide a proof of concept that sustained IL-2 production limit the formation of Tfh cells, consistent with the negative correlation of an IL-2 signalling gene signature and cTfh cell formation in humans (Figure 7). These new data support the conclusion that excess IL-2 signalling can limit the Tfh cell response. These data are presented in Figure 8, and are discussed on page 12 in the results, and pages 12-13 in the discussion.

      Figure 8: Increased IL-2 production impairs Tfh cell formation and the germinal centre response. Assessment of the Tfh cell and germinal centre response in Il2cre/+; Rosa26stop-flox-Il2/+ transgenic mice that do not switch off IL-2 production, and Il2cre/+; Rosa26+/+ control mice 12 days after influenza A infection. Flow cytometric contour plots (A) and quantification of the percentage of CXCR5highPD-1highFoxp3-CD4+ Tfh cells in the mediastinal lymph node (B) and spleen (C). Flow cytometric contour plots (D) and quantification of the percentage of Bcl6+Ki67+B220+ germinal centre B cells in the mediastinal lymph node (E) and spleen (F). The height of the bars indicates the median, each symbol represents one mouse, data are pooled from two independent experiments. P-values calculated between genotype-groups by Mann Whitney U test.

      Results text, page 12: Sustained IL-2 production inhibits Tfh cell frequency and the germinal centre response. To test the hypothesis that cytokine signalling needs to be curtailed to facilitate Tfh cell differentiation turned to a genetically modified mouse model in which cells that have initiated IL-2 production cannot switch it off, Il2cre/+; Rosa26stop-flox-Il2/+ mice (37). Twelve days after influenza infection Il2cre/+; Rosa26stop-flox-Il2/+ mice have fewer Tfh cells in the draining lymph node and spleen (Fig. 8A-C), which is associated with a reduced frequency of germinal center B cells (Fig. 8D-F). This provides a proof of concept that proinflammatory cytokine production needs to be limited to enable full Tfh cell differentiation in secondary lymphoid organs.

      Discussion text, pages 12, 13: These enhanced inflammatory signatures associated with poor antibody titre in an independent cohort of influenza vaccinees. The dampening of Tfh cell formation by enhanced cytokine production was confirmed by the use of genetically modified mice where IL-2 production is restricted to the appropriate anatomical and cellular compartments, but once initiated cannot be inactivated. Together, this suggests that formation of antigen-specific Tfh cells is essential for high titre antibody responses, and that excessive inflammatory factors can contribute to poor cTfh cell responses.

    1. Author Response:

      Reviewer #1:

      The manuscript is well-written and easy to follow. The authors are thorough in their characterization, shown both through the text itself and the figures. Most of the comments relate to the narrative structure itself and are merely suggestions. Overall, this work represents an important resource for the community and especially to people working on the role of the SEZ in feeding and motor behaviors.

      Specific comments and suggestions:

      • The authors give a very nice overview of the SEZ and the split-Gal4 technique. However, they spend much less time discussing the rationale behind using the cell body numbers within subesophageal neuromeres. This to me assumes two extremely different kinds of readers, one relatively new to Drosophila research and the other relatively well-versed. Since this technique is crucial to the approach used throughout the manuscript and significant in the authors labeling about 1/3 of the region, I would suggest the authors to give a brief summary and justification as to why they decided to use this neuromere labeling technique, and spend more time in the discussion (perhaps in the paragraphs between lines 352-386) talking about the pros and cons of this technique (is it expected to label fewer than 50% of the neurons? How may this complement the EM and FAFB dataset, and what are the advantages and disadvantages using the technique employed here?).

      We now provide a brief introduction to the approach in the results section (lines 82-96) and include additional pros and cons of the approach in the discussion (lines 369-383). We expect that this approach labels the vast majority of SEZ neurons.

      Related suggestions:

      o Line 81: elaborate on deutocerebral contributions

      We have moved this to discussion (lines 374-377). We clarify that not much is known about deutocerebral contributions.

      o Lines 84-85: along similar lines, Hox gene drivers

      We altered this sentence to be clear to a general audience (lines 86-90).

      • Figure 9: having a color legend in the figure itself will facilitate understanding of this figure. I think it would be nice to have visual examples of interneurons, projection neurons, and so forth. Perhaps when the authors first describe neurons in Group 1, instead of marking "first half of the group" (line 210) the authors can explicitly name the neuron types (peep, doublescoop, etc.)

      We now include a color legend as well as a new figure with visual examples of polarity (Figure 10 – figure supplement 1). As suggested, we changed the text to explicitly name the neuron types in Group 1 that are interneurons versus projection neurons (lines 241-243).

      • In the polarity section of the discussion, it would be interesting to have additional remarks relating to how to determine whether these identified neurons are thought to be ascending and why. Since one of the authors has previously characterized some ANs, perhaps comparisons to this work would be helpful to readers new to this region of the brain.

      We now include a brief definition of ascending neurons in the results section (lines 149-150) and note that ascending neurons were not included in the collection.

      • The parallel structures used in characterizing Groups 1 through 6 are very useful. However, I think that when the authors relate each group to previous works, this might fit better in the Discussion section.

      We altered this section of the results to move speculation about group function to the Discussion (lines 421-445), as recommended.

    1. Author Response:

      Reviewer #1:

      This is a very interesting manuscript that attempts to provide evidence of a case of evolutionary interaction (i..e. natural selection) between two human pharmacogenes: ADCY9 and CETP, suggesting also interaction with sex and a case for pleiotropy. This is likely one of the few examples (or maybe the only one) of evolutionary interaction between two genes in the field of human evolutionary genetics. The authors provide a set of genomics-based evidences to support their case of natural selection that include: (i) a replication of population genetics results in another Peruvian cohort, (ii) evidence of epistatic effect from RNAseq data on public datasets and on ad-hoc experiments, (iii) genotype/phenotypes associations in the UK Biobank. While the use of data from different sources (in opposition to a trans-omic approach in a Peruvian population) may be a weakness of the paper, it is important to recognize that performing a large set of experiments in the Peruvian cohort using the required sample sizes may be logistically prohibitive. Therefore, the author's approach is acceptable.

      Specifically, it is interesting that some of the phenotypes found in the UK Biobank is related with adaptation to high altitude (FVC)

      The only result that is difficult to explain of the difference in the level of long-range LD observed between males and females from Peru, both for the discovery and the replication datasets. The authors should elaborate more quantitatively on the plausibility of this finding.

      We thank the reviewer for highlighting our study’s novelty in the field of human evolutionary genetics. Indeed, the lack of a large, well-powered, accessible Peruvian biobank, and generally the overrepresentation of European ancestry resources in human genomics, is a weakness in most studies trying to address evolutionary and medical genetics questions in non-European populations. We hope our work will be cited as another example of why we need additional large and diverse cohorts in human genetics. This being said, the fact that we can report significant signals between our two genes of interest even in cohorts of unmatched ancestry is an argument for the widespread biological relevance of their interaction.

      A selective signal that varies according to sex is a plausible phenomenon, but they are very hard to detect, which means very little is known about these evolutionary events, particularly in humans, and therefore, quantitatively characterizing its likelihood is not straightforward. Such events can be explained by either (1) differences in survival between sexes or (2) by sexual selection, whereby individuals of one sex with specific genotypes would be more successful at reproducing. However, sexual selection alone is less likely here, as the favored parental combination would be equally likely to be transmitted to male or female offspring, so would not explain the preferential linkage between genotypes seen in males. Another explanation is differential survival of individuals receiving a specific combination, depending on their sex. In utero selection, whereby fetal survival chances depend on genotype combinations and sex of the foetus, or gamete selection, whereby reproductive cells with the advantageous genotype combination are more likely to give rise to a fertilized egg, are likely hypotheses. Finally, our results could be due to an ascertainment bias caused by increased mortality of individuals with a specific genotype combination in a sex-specific manner. We detail these hypotheses further in our revised manuscript, and highligth that these different scenarios will need to be investigated using simulations in future work to give a quantitative answer to this important question.

      Reviewer #2:

      This work attempts to identify possible functional links between two pharmacogenetic relevant loci, ADCY9 and CETP, by using signals of positive selection as a starting point. Starting with the 1000 Genomes (1kG) dataset, the authors identify complementary signals of positive selection in the 1kG Peruvian cohort (PEL) using iHS and PBS analyses, specifically in a LD block in ADCY9. They use these results then to support investigating possible coevolution between ADCY9 and CETP in the form of long-range linkage disequilibrium (LRLD), a clever way to identify possible cosegregation between variants that should otherwise not be present. This analysis is particularly apt since two regions have already been identified, one of which is now suggested to have experienced a rapid increase in allele frequency. The authors not only find evidence of LRLD between SNPs in ADCY9 and CETP, but they also identify these results occur in a sex-specific manner. The authors then begin investigating possible functional connections between these two loci, specifically in the form of gene expression analyses. Using both human cell ADCY9 knockdown lines and GEUVADIS/GTEx data, the authors identify that ADCY9 impacts CETP expression, both broadly and through a specific interaction between rs1967309 and rs158477. And lastly, the authors further investigate potential interactions between ADCY9 and CETP via epistatic association analyses using UK BioBank and GTEx data. Encouragingly, among a handful of relevant phenotypes, they find marginally significant, sex-specific interaction effects with CAD and Lp(a), and in both cases the direction of effects match those seen in their LRLD analyses -- eg rs1967309-AA + rs158477-GG containing a protective effect in males.

      Overall, the authors make a strong case that there is coevolution occurring between ADCY9 and CETP. That they are able to also continuously replicate some of their findings using an independent dataset, LIMAA, also strengthens their results. The authors do acknowledge that their sample sizes may be limiting their power at times, but they point out that finding multiple, concordant marginally significant results may represent an unlikely outcome. However, currently these are presented as just observations and not as a formal, integrated test. Therefore we cannot adjudicate whether these concordant, marginally significant results are occurring more than we would expect by chance.

      It is worth noting that by beginning with two loci, the authors are able to make use of 'pairwise' approaches such as LRLD and epistatic analyses. Normally, these types of tests come with large multiple testing burdens due to the rapid increase in test combinations. In fact, a priori one would potentially not expect such analyses to perform well with the limited sample sizes. However, by beginning with a hypothesis that focused on two loci, the authors are able to overcome this normal statistical challenge.

      It is also worth noting that these results are only possible due to the inclusion of diverse datasets. The initial selection signals would not have been identified if the datasets only contained individuals of European ancestry. Additionally, even with the limited sample sizes, the authors are still able to identify statistically significant results. Therefore this work is another example of what can be gained when incorporating more diverse human genomics datasets.

      In terms of identifying a functional or molecular link between ADCY9 and CETP, the authors have begun the work of finding some possible connections between these two loci. However, this goal was not completely met with the current work. There is a clear effect of ADCY9 on CETP gene expression, though whether this is ultimately a direct effect or indirect is unclear. And while the epistatic analyses using phenotypes such as clinical outcomes and biomarkers are encouraging, and concordant in terms of direction of effects, they still do not elucidate a mechanism by which the variants of interest in ADCY9 and CETP are functionally interacting. The finding that sex plays a role in this interaction, and may be important to the mechanistic link as well, is an important result though.

      We thank the reviewer for their careful assessment of our work. Indeed, our main finding here is an example of how population genetics (and specifically, the study of selective signals from genetic data in multiple populations) can help with understanding functional significance of association studies findings (here in the context of pharmacogenomics), which could be made possible thanks to important genomics resources such as the 1000 Genomes Project. Similarly, the functional characterization was made possible thanks to GTEx, and allowed us to explore multiple tissues to understand the association found, which would be otherwise impossible for a single group to perform for a specific research question like this one. We are therefore grateful to all researchers, institutions and funders that contributed in putting together these resources.

      We are happy that the reviewer sees the value in beginning with a well-defined two-loci hypothesis. Hypothesis-driven science using large, publicly available data resources has a very important place in biomedical research. It is true, however, that we do not provide an integrated test here to quantitatively evaluate the probability of our multi-level signals in the different datasets used. This is mainly because there’s no well-established framework to account for all confounders and ascertainment biases from different cohorts in an integrated way, and combining evidence from cell lines experiments with high-throughput data signals is not trivial. Therefore, we took the approach of reporting each line of evidence based on an appropriate experiment and statistical test, following a logical multi-step procedure, where each step is based on the hypotheses generated in the previous experiment/analysis. In the revised manuscript, we added a flowchart figure, showing for each step (natural selection, co-evolution, transcriptomics and pan-phenotypic analyses), the datasets used, and the key results obtained (needed to move forward to next steps), identifying the steps where we considered sex in the analysis. We hope this will help the reader to recognize that the likelihood of finding, by chance alone, interaction effects between our two genes in multiple independent experiments and datasets is very low, providing strong evidence for a functional genetic interaction.

      However, the reviewer is right that, despite indicating that the SNP rs1967309 in ADCY9 is involved in a functional mechanism related to CETP expression and that this mechanism implicates sex as a modulator, our study does not establish the exact molecular mechanism at play here. However, the identification of several relevant tissues in GTEx helps us focus our efforts for the next steps, and very importantly, our results establish that future experiments will now need to take sex into account. In our revised manuscript, we have added discussion points about what our results bring to this ongoing research for precision medicine.

      Reviewer #3:

      The authors have analysed genomic data from populations of South Americans to assess the genetic and functional link between ADCY9 and CETP. These two genes, which are both on chromosome 16 separated by over 50Mbp, show weak but significant long range linkage disequilibrium in some subpopulations (ie from Peru). The genetic link between these genes (and SNPs), despite being weak, is suggestive of positive selection for haplotypes that appear with higher frequency in the population compared to populations from Africa, Asia and Europe. What is surprising is the sex-specific linkage, with most if not all of the association signal being driven by male samples. The explanation behind this remains open, however, with multiple explanations from population dynamics and drift, to potential functional benefits selecting this association.

      Strengths:

      The work carefully assesses this linkage through robust statistical frameworks. Despite weak effects and low sample sizes, there is a replicable signal in two other populations. The data is also easily accessible and I appreciate the author's documentation of their work.

      Weaknesses:

      The effects are still weak, and may still be explained by multiple factors that aren't directly addressed in the work. Most of the caveats of the study have been pointed out by the authors in their discussion, yet some of these detract from their claims and findings. If this is a sexually dimorphic trait, or a sex-specific effect, most of the functional analyses are shown without this distinction.

      We thank the reviewer for their positive feedback and for bringing to our attention the critical point of functional analyses not being done in line with our sexually dimorphic findings. This comment led us to perform additional analyses that highly strengthened the manuscript.

      The preliminary sex-stratified analysis we had performed in our RNAseq discovery cohort (GEUVADIS) were not conclusive, which means we did not pursue these in our replication cohorts (GTEx and CARTaGENE). We have now performed further analyses in GTEx and CARTaGENE that produced very interesting new results, in line with the sex-specific nature of the evolutionary genetics results obtained, that we are happy to report in a revised manuscript. In summary, most sex-combined results were driven by males, but stratifying by sex in GTEx samples revealed additional tissues in which females only show a significant interaction (from tibial artery and heart tissues), and very intriguingly, the sign of interaction effect is reversed in this case. We tested for a three-way interaction (SNP1 x SNP2 x sex) in tissues harboring a convincing sex-specific pattern in stratified analyses. We also note that the donor for the HepG2 cell line, used in our knockdown experiments, was a male and so we highlight in our revised manuscript that future experiments should consider cells from both male and female donors in multiple tissues to better understand the molecular interaction.

  4. Aug 2021
    1. Author Response:

      We thank the three reviewers for their feedback and insightful comments. We share Reviewer 2’s opinion that NIH’s policy on “sex as a biological variable” leaves largely open how that variable should be treated statistically, and this concern was in fact the main impetus for this study.

      In response to the concern of R1 and R3 that all of the articles were coded by just one author, we have expanded the description of how coding was done. All articles were read by both authors, and ~25% of the articles were discussed between them during the coding process. Our coding system was checked by having both authors independently read a subset (~20%) of the articles; interrater reliability exceeded 90%.

      Regarding the fact that most, if not all, of the articles we analyzed contained multiple studies, we used a hierarchical coding method (described in the paper). Our goal was to illuminate cases in which the statistical methods potentially led to unsupported conclusions; therefore, an article that arrived at sound conclusions for one study but questionable conclusions for another was coded into the “questionable” category.

      We apologize that Reviewer 2 could not find the information in our paper on the percentage of articles that “did it right”. We have revised the text to make this information clearer.

      Regarding Reviewer 3’s concern that our method of assigning the articles into disciplines was not clear, we have now emphasized this information more in the paper. We did not assign the articles to disciplines ourselves; the assignment was done originally by Beery & Zucker (2011) on the basis of the journals in which the articles were published, and Woitowich et al. (2020) used the same categorizations, which we followed.

    1. Author Response:

      Reviewer #1 (Public Review):

      "Modality-specific tracking of attention and sensory statistics in the human electrophysiological spectral exponent," Waschke et al. This paper follows upon a recent paper by a subset of the same authors that laid out the signal processing-bases for decomposing the EEG signal into periodic (i.e., "oscillatory") and aperiodic components (Donoghue et al., 2020). Here, the focus is on establishing physiological and functional interpretations of one of these aperiodic components: the exponent term of the 1/f(to the x power) fit to the power spectrum (a.k.a., its 'slope'). This is very important work that will have strong and lasting impact on how people design and interpret the results from EEG experiments, and is also likely to trigger many reanalyses of previously published data sets. However, the manuscript could do a better job of explain WHY this is so. In this reviewer's opinion, more linkage with elements of Donoghue et al. (2020). would help considerably.

      First, a brief summary of what this manuscript does, and why it is important. The first section reanalyzes data sets in human subjects undergoing ketamine or propofol anaesthesia, known to influence the E:I balance in the neural circuits that give rise to the EEG. This is an important step in establishing the physiological validity of the fundamental proposition that flattening of the 1/f component reflects an increase in the E:I balance whereas steepening reflects a decrease. This is because these effects of these two anaesthetic agents has been well established in several invasive studies. The second section demonstrates the functional properties of 1/f slope, in that tracks shifts of attention between visual and auditory stimuli in an electrode-specific manner (i.e., posterior for visual, central for auditory), and it also captures aperiodic stucture in these stimuli. It's not too strong to say that, after this paper, EEG-related research will never be the same again. The reason for this, however, isn't stated as clearly as it could be.

      Thank you for your positive appraisal of our work! We appreciate that you see significant benefit to this work, and also understand that you see significant room from improvements in the way results are presented, framed and discussed and want to express our thanks for these helpful comments. Below, we elaborate on them and the changes they prompted in greater detail.

      With regard to exposition, the manuscript could be improved in terms of building on Donoghue et al. (2020). To simplify, a main take-away from Donoghue et al. (2020) is that many past interpretations of EEG signals have mistakenly attributed to task- (or state-) related changes to changes in one or more oscillatory components of the signal. Perhaps most egregiously, what can appear as a change in power in the alpha band can often be shown to be better explained as no change in alpha but instead a change in either the slope or the offset of the 1/f component of the power spectrum. (E.g., the bump at 10 Hz will increase or decrease if the slope of the 1/f component changes, even though the 'true' oscillator centered at 10 Hz hasn't changed.) In this paper, the authors demonstrate that many conditions, physiological state and cognitive challenge, influence 1/f slope in ways that are systematic and that occur independent of changes that may or may not be occuring simultaneously in oscillatory alpha. Broadly, the authors should consider two modifications: first, point out for each key experimental finding how attributing everything to changes in oscillatory alpha (or sometimes other frequencies) would lead to flawed inference; second, don't stop at demonstrating that the slope effects hold when alpha dynamics are partialed out, but also report the converse -- in what ways is oscillatory alpha sensitive to aspects of physiology and/or behavior that 1/f slope is not? Even if there aren't any such cases (which seems unlikely) it would be informative for this to be tested and reported.

      We agree that a stronger focus on the differentiation between oscillatory and 1/f aspects of EEG activity can help to improve the didactic strength of our manuscript. Wherever possible, we have tried to make clear that the separation of different oscillatory activity and aperiodic signals is essential to not confuse one for the other. This is not only the case for the analysis of anaesthesia data were changes in alpha and beta power have to be separated from changes in spectral exponent but also applies to the proposed attention contrast where common effects of alpha power have to be taken into account and differentiated from spectral exponents. Similarly, an alignment of stimulus spectra with EEG activity could appear as a twofold power change (e.g., increase over low, decrease over high frequencies) if no separation of oscillatory and aperiodic signal parts is performed.

      We agree that explicitly contrasting spectral exponents with estimates of low-frequency or alpha power is essential. The original version of the manuscript already included such a comparison for the effect of attention on EEG spectral exponents and alpha power, respectively. To expand this approach, we inverted models and used stimulus spectral exponents (auditory or visual) as dependent variables while using either EEG spectral exponents, low-frequency power or alpha power as predictors (among the same covariates as in the winning models of the original approach). In a next step, we used likelihood ratio tests to compare model fit separately at each electrode, resulting in a topography of model comparisons.

      (a) Attention contrasts

      As expected, based on decades of EEG research, and as can be seen in figure 3C, average EEG alpha power changed as a function of attentional focus, in a topographically specific manner. Importantly, the observed increase of alpha power from auditory to visual attention took place over and above the reported changes in EEG spectral exponents (as we had reported in the control analyses section). In other words, both EEG spectral exponents and EEG alpha power capture attention-related changes in brain dynamics, but are at least partially sensitive to distinct sources or mechanisms. In the updated version of the manuscript, we emphasize that changes in spectral exponents often can be mistaken for changes in alpha power (as in Donoghue et al., 2020), calling for a dedicated spectral parameterization approach. Attention-related changes in spectral exponents and alpha power might depict results of distinct modes of thalamic activity that transitions from tonic to bursty firing and shapes cortical activity to selectively process attended sensory input. In the updated version of the manuscript, we discuss the potential role of thalamic activity in greater detail. The updated parts of the discussion section are pasted below for convenience.

      “Despite these differences in the sensitivity of EEG signals, our results provide clear evidence for a modality-specific flattening of EEG spectra through the selective allocation of attentional resources. This attention allocation likely surfaces as subtle changes in E:I balance (Borgers et al., 2005; Harris and Thiele, 2011). Importantly, these results cannot be explained by observed attention-dependent differences in neural alpha power (8–12 Hz, Fig 3) which have been suggested to capture cortical inhibition or idling states (Cooper et al., 2003; Pfurtscheller et al., 1996). Also note that the employed spectral parameterization approach enabled to us to separate 1/f like signals from oscillatory activity and hence offered distinct estimates of spectral exponent and alpha power that would otherwise have been conflated (Donoghue et al., 2020).

      How could attentional goals come to shape spectral exponents and alpha oscillations? Both attention-related changes in EEG activity might trace back to distinct functions of thalamo-cortical circuits. On the one hand, bursts of thalamic activity that project towards sensory cortical areas might sculpt cortical excitability in an attention-dependent manner by inhibiting irrelevant distracting information (Klimesch et al., 2007; Saalmann and Kastner, 2011). On the other hand, tonic thalamic activity likely drives cortical desynchronization via glutamatergic projections and, with attentional focus, results in boosted representations of stimulus information within brain signals (Cohen and Maunsell, 2011; Harris and Thiele, 2011; Sherman, 2001).

      Our findings of separate attentional modulations of both, EEG spectral exponents and alpha power, point towards the involvement of both thalamic modes in the realization of attentional states. Recently, momentary trade-offs between both modes of thalamic activity have been suggested to give way to attention-related modulations of alpha power and E:I balance, as captured by EEG spectral exponents (Kosciessa et al., 2021). Here, task difficulty remained constant throughout the experiment an fluctuations between both modes might not follow momentary demand (Kosciessa et al., 2021; Pettine et al., 2021) but varying sensory-cognitive resources.

      Additionally, modulations of both alpha power and EEG spectral exponents appeared uncorrelated across individuals - further evidence that they reflect separate neural sources. Future studies that combine a systemic manipulation of E:I (e.g., through GABAergic agonists) with the investigation of attentional load in humans are needed to specify with greater detail how thalamic activity modes drive alpha oscillations and EEG spectral exponents. Specifying potential demand- and resource-dependent trade-offs between different modes of attention-related modulations of cortical activity and sensory processing will offer crucial insights into the neural basis of adaptive behaviour.”

      (b) Stimulus spectral exponent tracking

      We inverted all models and instead of modelling EEG spectral exponents, we used auditory or visual stimulus exponents as dependent variables. Predictors were identical to the previously reported models (see supplementary table for all details) but additionally included either single trial estimates of alpha power, low-frequency power, or EEG spectral exponents. Note that alpha power estimates were extracted using the same spectral parameterization approach that was used to estimate spectral exponents. Trials without an oscillation in the alpha range were excluded from all models to render likelihood comparisons interpretable (11.2%  3.4 %). Since oscillations were only seldomly detected in the low-frequency range (1–5 Hz), we instead used single trial power averaged across this range. For each electrode, 4 likelihood ratio tests were performed, one for each stimulus modality and one for each predictor (low-frequency or alpha power). Strikingly, low-frequency power resulted in worse model fits (non-positive likelihood ratio test statistics) compared to EEG spectral exponents across all electrodes and both stimulus modalities. The same was true for EEG alpha power when modelling auditory stimulus exponents. However, when modelling visual stimulus exponents, EEG alpha power displayed significantly improved model fit at one parietal electrode. In line with this observation, we observed a positive relationship between single trial alpha power and visual stimulus exponents at this parietal site (see below).

      Figure R5 Model comparison topographies. (a) Single trial auditory (upper row) or visual stimulus exponents (lower row) were modelled based on electrode wise low frequency power (left column) or alpha power (right) column, among other covariates. Models were compare d to a model of same size that only differed in the main predictor that consisted of single trial EEG spectral exponents. Topographies display the likelihood ratio test statistic, illustrating no improvements in model fit compared to EEG spectral exponent based models in all but one model family, illustrating the unique predictive power of aperiodic EEG activity in this context. Alpha power at one parietal electrode explained significantly more variance in visual stimulus exponents. (b) T values representi ng the main effect of alpha power on visual stimulus exponents. Highlighted electrode represents p< .05 after FDR correction.

      (c) Behavioural relevance of spectral exponent tracking

      Given the results from (b), we refrained from re-running PLS analysis focussing on the behavioural relevance of the links between low-frequency and alpha power with stimulus exponents. In our view, the absence of a significant link between single trial stimulus input and a measure of neural activity in this case precludes any further analysis on the between-subject level.

      Reviewer #2 (Public Review):

      The paper investigates two separate studies looking at the spectral exponent of the EEG 1/f-like spectrum: one a study of the effect of anesthesia type (propofol vs. ketamine), using publicly available data, and the other a traditional study of auditory and visual processing relying on selective attention to one modality vs. the other. The authors make a strong case that the value of the spectral exponent depends on the relevant condition, in both studies, but the case for the spectral exponent's dependence on the Excitation:Inhibition balance is much weaker.

      The paper presents the two separate studies as tightly linked, but by the end of the paper it appears they may be quite separate.

      The anesthesia study is brief and compelling. With respect to the effect of anesthesia type on spectral exponent, the results are very strong, and, given the results of Gao et al. (2017) and the stated properties of propofol vs. ketamine, the connection to E:I balance follows naturally.

      The auditory and spectral 1/f tracking study suffers from some weaknesses.

      Most importantly, the design is elegant and the results presented are very compelling. 1) Modality-specific attention selectively reduces the EEG spectral exponent (for relevant electrodes reflecting cortical processing of that modality); 2) Changing the value of the spectral exponent in the stimulus results in a similar change in the value of the spectral exponent of the response, but only for the selectively attended modality (and only for relevant electrodes); and 3) the amount of modality-specific spectral-exponent tracking predicts behavior. The interactions and main effects found all support the importance of the spectral exponent as a physiologically and behaviorally important index.

      The main problem is a weakness in analysis regarding whether the mechanistic origin of the above effects may be due to temporal tracking of the stimulus waveform (visual contrast/acoustic envelope) by the response waveform. [In the speech literature this would be referred to as "speech tracking", or, sometimes, as speech entrainment (in the weak sense of "entrainment").] As pointed out by the authors, this is not a steady state response because the instantaneous fluctuation rate of the stimulus is constantly changing, and so cannot be analyzed as such (it is also distinct from the evoked responses analyzed). But it is a good match for other analysis methods, for instance Ed Lalor's VESPA and AESPA methods, and their reverse-correlation descendants. Specifically, Lalor et al., 2009 analyzed EEG responses to a non-sinusoidal envelope modulation of a broadband noise carrier and found strong evidence for robust temporal locking. The success of such linear methods there (AESPA for auditory; VESPA for visual) implies that a change in the stimulus spectrum exponent would produce a similar change in the response spectrum exponent, having nothing to do with E:I balance.

      The evoked response analysis clearly aims to go in this direction, but since it does not reflect ongoing response properties, it cannot alone speak to this.

      Because this plausible mechanism for the spectral-exponent-tracking has not been explored, it is much harder to associate the observed spectral-exponent-tracking as originating from E:I balance. The study does not then hold together well with the anesthesia study, and weakens the links to E:I balance rather than strengthening it.

      Thank you for this in-depth assessment of our work and your general positive appraisal of it. Importantly, your major point of concern seems to at least partially trace back to a regrettable misunderstanding caused by the way we presented our results in the original version of the manuscript. While the first study aimed at establishing the validity of the EEG spectral exponent as a non-invasive marker of E:I, the second study had two objectives. First, to test attention-related changes in EEG spectral exponents that we assume to depict topographically specific changes in E:I. Second, to test the link between aperiodic stimulus features and aperiodic EEG activity by comparing stimulus spectral exponents and EEG spectral exponents. We understand that the reviewer is doubtful of the link between stimulus-related EEG spectral exponent changes and E:I – and so are we.

      In the updated version of the manuscript, we have tried to make it very clear that despite the displayed and inferred links between EEG spectral exponents and E:I balance, the positive relationship between stimulus spectral exponents and EEG spectral exponents does not necessarily reflect changes in E:I. Nevertheless, we feel that study 1 and 2 integrate well as they offer a comprehensive view on 1/f-like EEG activity and its sensitivity to (1) specific anaesthesia effects, (2) attentional focus, and (3) aperiodic stimulus features in a behaviourally-relevant way. While (1) and (2) can be mapped on to one underlying mechanism, cortical E:I balance, (3) rather represents bottom-up sensory cortical effects similar to those described in SSEP or speech tracking literature. The interaction of attentional focus and stimulus tracking illustrates the connection between top-down (or anaesthesia-driven) changes in E:I as captured by the EEG spectral exponent, and bottom-up sensory-related changes in EEG activity.

      Reviewer #3 (Public Review):

      The balance between excitation and inhibition in the cortex is an interesting topic, and it has already been a focus of study for a while. The current manuscript focuses on the 1/f slope of the EEG spectra as the neural substrate of the change in the balance between excitation and inhibition. While the approach they use to analyze their data is interesting, unfortunately, for the reasons I'll outline below the study's conclusions are not supported by the data, and the findings do not add any new insight conceptually or mechanistically to our understanding of attention, excitation or inhibition. While the study aims to "test the conjecture that 1/f-like EEG activity captures changes in the E:I balance of underlying neural populations.", ultimately the central conclusions of the work is just conjecture in that they are inference formed without sufficient evidence.

      Anaesthesia study: EEG spectral exponents as a non-invasive approximation of E:I balance The authors observe the 1/f slope was different over pre-selected central electrodes sites between 4 participants undergoing ketamine and propofol anaesthesia. The rather small sample size is a cause for concern, as are the authors' rationale for looking at the central electrodes -they claim these electrodes receive contributions from many cortical and subcortical sources, but that can be said of any other electrodes at the scalp. But I believe the most critical weakness here is the authors' claim that during anaesthesia is that propofol is "known" to result in a "net" increase of inhibition, while ketamine an increase in net excitation. We still know very little about what neurophysiologically is happening under anaesthesia and the concept of "net" inhibition and excitation is rather a gross simplification of what happens to the central nervous system under these two agents. Just as an example, propofol has been found to have some excitatory influence on brain function, with dosage of the anaesthetic also playing role: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2717965/. On the other hand, ketamine has been observed to inhibit interneurons and cortical stimulus-locked responses, but cause excitation in the auditory cortex : https://physoc.onlinelibrary.wiley.com/doi/10.1113/JP279705.

      Suffice to say the interaction between anaesthetic agents and the brain is rather complex. Decades of research has shown that the EEG spectra changes during anaesthesia. To rather arbitrarily say one agent has a net inhibitory impact while another excitatory impact, then link those to qualitative changes in the EEG spectra of 4 participants, and further link that back to E:I ratio is committing the scientific fallacy of Begging the Claim.

      We thank the reviewer for their insightful comments. Of course, we do not wish to challenge the complex nature of anaesthetic effects by any means and apologize if the original version of our manuscript had left that impression. Below, we outline that despite the complex impact of anaesthesia on central nervous activity, there exists plenty of evidence justifying our assumption of differentially altered E:I balance through propofol and ketamine, at least in cortical areas.

      First of all, we agree with the reviewer that a change in E:I balance certainly is not the only change that takes place in the central nervous system during anaesthesia. As has been shown before, propofol and ketamine affect the overall level of neural activity (Taub et al., 2013) and spiking (Quirk et al., 2209; Kajiwara et al., 2020), propofol is associated with frontal alpha oscillations and widespread changes in beta power (Purdon et al., 2012). In the updated version of the manuscript, we have added notions to these common patterns and discuss the oscillatory changes we observe in the current dataset.

      Importantly, while there might not be a single identifiable mechanism behind the host of different anaesthesia-induced changes in brain activity, there is relative clarity on the fact that higher doses of propofol drive a change in excitatory and inhibitory activity towards inhibition whereas ketamine drives disinhibition and hence shifts E:I towards excitation. In fact, the study by Deane et al. (2020) reports increased excitation and disinhibition in auditory cortex during ketamine anaesthesia, accompanied by stronger (not weaker, as stated by the reviewer) evoked responses. These findings speak to the validity of the simplification of a net increase of excitation under ketamine anaesthesia. Furthermore, the modelling results by McCarthy et al. (2008) target a dose- and cell-ensemble specific effect of propofol anaesthesia: paradoxical excitation. The observation that low doses of propofol can induce a temporary increase of excitatory activity is in stark contrast to the general GABA-A-potentiating and hence inhibiting nature of propofol (Concas et al., 1991). Importantly, however, higher doses of propofol as used in the analysed dataset are widely accepted to lead to relatively increased inhibition, even after initial paradoxical excitation (Concas et al., 1991; Zhang et al., 2009; Brown et al., 2011; Ching et al., 2010). Taken together, previous invasive physiology justifies the simplification of propofol as leading to net increased inhibition and ketamine leading to net excitation. Finally, our focus on the spectral exponent does not stem from a disregard of oscillatory changes in EEG activity but rather strictly follows from previous work that demonstrated the spectral exponent as a marker of E:I balance (Gao et al., 2017; Colombo et al., 2020; Lendner et al., 2021; Chini et al., 2021). Hence, the central goal of the presented analyses and results lies in the transfer of these previous results to non-invasive EEG recordings and the parameterization approach used by us. We hope that this becomes clearer in the updated version of the manuscript and have pasted relevant parts below.

      “Both anaesthetics exert widespread effects on the overall level of neural activity (Taub et al., 2013) as well as on oscillatory activity in the range of alpha and beta (8–12 Hz; ~15–30 Hz). Importantly, however, propofol is known to commonly result in a net increase of inhibition (Concas et al., 1991; Franks, 2008) whereas ketamine results in a relative increase of excitation (Deane et al., 2020; Miller et al., 2016). In accordance with invasive work and single cell modelling (Chini et al., 2021; Gao et al., 2017), propofol anaesthesia should thus lead to an increase in the spectral exponent (steepening of the spectrum) and ketamine anaesthesia to a decrease (flattening). Based on previous results, the effect of anaesthesia on EEG spectral exponents is expected to be highly consistent and display little topographical variation (Lendner et al., 2020). For simplicity, we focused on a set of 5 central electrodes that receive contributions from many cortical and subcortical sources (see Fig 1) but report topographically-resolved effects in the supplements (see Fig 1 supplement 1). Here, propofol anaesthesia led to an overall increase in EEG power which was especially pronounced in the alpha-beta range. Ketamine anaesthesia decreased the frequency of alpha oscillations and supressed power in the beta range. Importantly, however, EEG spectral exponents that were estimated while accounting for changes in oscillatory activity increased under propofol and decreased under ketamine anaesthesia in all participants (both ppermuted < .0009, Fig 1). These results replicate previous invasive findings and support the validity of EEG spectral exponents as markers of overall E:I balance in humans.”

      “[…] While the EEG spectral exponent as a remote, summary measure of brain electric activity can obviously not quantify local E:I in a given neural population, the non-invasive approximation demonstrated here enables inferences on global neural processes previously only accessible in animals and using invasive methods. Future studies should use a larger sample to directly compare dose-response relationships between GABA-A agonists or antagonists (e.g., Flumanezil) and the EEG spectral exponent as well as common oscillatory changes.”

      Regarding the reviewer’s comment on our choice of electrodes we first wish to highlight that several previous studies have revealed that anaesthesia effects commonly appear throughout the cortex of humans (Zhang et al., 2009; Lendner et al., 2020). Nevertheless, we understand that a priori choices of electrodes always are arbitrary to some degree. Hence, we performed pairwise comparisons of EEG spectral exponents between awake rest and anaesthesia (ketamine vs. propofol) at all 60 electrodes, resulting in the topographies of t-values shown below. As can be discerned from these topographies, ketamine anaesthesia entailed a reduction of spectral exponents across most areas of the scalp, peaking at frontal and central sites. Propofol led to increased EEG spectral components across all electrodes without a clear spatial pattern. The absence of an effect at the left mastoid likely traces back to artefactual recordings at that electrode site. In the updated version of the manuscript, we report topographies of comparisons in the supplements (figure 1 supplement 2).

      Figure R8 Topographically resolved t statistics comparing EEG spectral exponents between awake rest and different anaesthetics. Propofol leads to a wide spread increase in spectral exponents that is present across the entire scalp (left). Ketamine leads to a reduction in spectral exponents that is widely distributed but appears to peak at frontal and central electrodes (right).

      We acknowledge the small sample size of study 1 and have also added a more explicit notion to that in the updated version of our manuscript. Nevertheless, due to their consistency and the used permutation-based statistics which are appropriate for small sample sizes, the results of study can be interpreted. Furthermore, we realized that we had not included two additional participants of the publicly available dataset in our previous analysis. Both sets of recordings (ketamine / propofol) were included in the revised analyses of the data, further strengthening the reported results. Hence, despite the small sample size (now N = 5 per group), we believe that the used methods and the consistency of effects allows for a careful but clear interpretation, especially since they are in close agreement with previous invasive and modelling results as well as recent causal manipulation studies (Gao et al., 2017; Chini et al., 2021).

      Cross-modal study: EEG spectral exponents track modality-specific, attention-induced changes in E:I Here the authors observe a difference in 1/f slope depending on if the participants (n=24) were paying attention to the auditory or visual stream. My central issue here is again with the authors' assumptions: cross-modal attention reflects attention-induced E/I. While attention to a single sensory modality can result in decreased activity in cortical regions that process information from an unattended sensory modality, there is no basis here to say that the task-irrelevant region is actually inhibited. The authors here do observe differences in 1/f slope as a function of attentional location, and these differences do account for some of the variances in behavior in the task.

      But unfortunately other than a purely descriptive exercise, there is not any sort of mechanistic insight is revealed here with regards to attentional allocation, excitation, and inhibition.

      We wish to take this opportunity to briefly elaborate on our hypotheses behind the reported attention contrasts and their interpretation. Spectral exponents of invasively recorded neural field potentials have previously been shown to reflect pronounced changes in E:I balance, including recent causal optogenetic work explicitly testing this link (Gao et al., 2017; Chini, Pfeffer & Haganu-Opatz 2021). In a first step, we analysed data from different anaesthetics to establish the potency of non-invasive EEG recordings to track similar changes (see above). Building on these findings, we tested whether smaller, attention-related and topographically-specific changes in E:I balance can equally be observed by means of EEG spectral exponent changes. Importantly, topographically concise changes in E:I with attention have been reported previously in non-human animals (e.g., Kanashiro et al., 2017; Ni et al., 2018). We found an attention-related topographical pattern of EEG spectral exponents in support of such an idea: spectral exponents at occipital channels decreased during visual attention, pointing towards a relative increase of excitatory activity in visual cortical areas. The same effect was reduced at central electrodes and for auditory attention. These findings demonstrate the potency EEG spectral exponents to detect topographically-specific attention-related changes in brain activity that likely trace back to changes in E:I balance. Of note, we do not imply a role of E:I in the inhibition of unattended sensory input and activity in associated cortical areas but rather point to a potentially separate role of neural alpha power in this context. While it is generally difficult to draw strictly mechanistic insights based on correlational designs, our results at least strongly suggest a mechanistic role of modality-specific attention for EEG dynamics and E:I balance. Furthermore, by demonstrating separate effects of aperiodic activity and alpha power dynamics, we pave the way for a new line of studies (see comments by R1) on the neural dynamics of selective attention and their behavioural relevance in humans.

    1. Author Response:

      Reviewer #1 (Public Review):

      This manuscript describes the role of PMd cck neurons in the invigoration of escape behavior (ie retreat from aversive stimuli located in a circumscribed area of the environment in which testing was conducted). Further, PMd cck neurons are shown to exert their effect on escape via the dorsal PAG. Finally, in an intriguing twist, aversive images are shown to increase the functional coupling between hypothalamus and PAG in the human brain.

      The manuscript is broadly interdisciplinary, spanning multiple subfields of neuroscience research from slice physiology to human brain imaging.

      We thank the Reviewer for recognizing the interdisciplinarity of our work.

      To understand the novelty of the results obtained in the rodent studies, it is important to note that these data are a replication and elaboration of work published recently in Neuron by the primary authors of this manuscript. The current manuscript does not cite the Neuron paper.

      We apologize for this omission. At the time of the current submission the Neuron paper had not been accepted and thus we could not cite it. We now discuss this paper in the introduction and highlight how the current manuscripts expand upon the data published in the Neuron paper.

      The most novel aspect of the rodent experiments presented in this manuscript is the demonstration of a role for cck PMd neurons in invigorating behavioral withdrawal from cues associated with the kind of artificial stimuli commonly used in laboratory settings (ie a grid floor associated with shock). Unfortunately, these results are made somewhat difficult to interpret by a lack of counterbalancing - all subjects receive an assay of escape from a predator prior to the shock floor assay. Certainly, research on stress and sensitization tells us that prior experience with aversive stimuli can influence the response to aversive stimuli encountered in the future. Because the role of this pMD circuitry in predatory escape has already been demonstrated, this counterbalancing issues does somewhat diminish the impact of the most novel rodent data presented here.

      Indeed, as the Reviewer states, prior exposure to aversive stimuli may influence responses to future exposures to threats. We opted to have the rat test before the shock grid test because the rat exposure is a milder experience than the shock grid test, as no actual pain occurs in the rat assay. We thus reasoned that the more intensely aversive assay (the shock assay) was more likely to influence behavior in the rat assay than vice-versa. Nevertheless, we agree with the Reviewer’s point that the lack of counterbalancing between the assays may mask potential influences of the rat assay on the shock grid assay behavior.

      To address this issue we ran a cohort of new mice, showing that behavior in the shock grid assay is not affected by prior experience in the rat assay. We now show in Figure R1 and Figure 1, figure supplement 2 that freezing, threat avoidance and escape metrics in the shock grid assay are not significantly changed by prior exposure to the rat assay.

      Figure R1. (Same as Figure 1, figure supplement 2). The order of threat exposure does not affect defensive behavior metrics. (A) Two cohorts of mice were exposed to the rat and shock grid threats in counterbalanced order, as specified in the yellow and green boxes. (B) The defensive behavioral metrics of these two cohorts were compared for the fear retrieval assay. None of the tested metrics were different between groups (Wilcoxon rank-sum test; each group, n=9 mice).

      The manuscript concludes with an fMRI experiment in which the BOLD response to aversive images is reported to covary across the hypothalamus and PAG. It is intriguing that unpleasant pictures influence BOLD in regions that might be expected to contain circuits homologous to those demonstrated in rodents. It is important to note that viewing images is passive for the subjects of this experiment, and the data include no behavioral analogue of the escape responses that are the focus of the rest of the manuscript.

      We agree with the Reviewer that there are many differences between the mouse and human behavioral tasks, and we have expanded the text highlighting these differences more clearly. One of our results, as highlighted by the Reviewer, is that inhibition of the PMd-dlPAG projection impairs escape from threats. Indeed, there is no escape in the human data, as stated by the Reviewer.

      Now, we conducted new dual photometry recordings, in which we simultaneously monitor calcium transients in the PMd and the dlPAG in contralateral sides. Using these dual recordings, we show that mutual information between the PMd and the dlPAG in mice is higher during exposure to threats (rat and shock grid fear retrieval) than control assays (toy rat and pre-shock habituation) (Figure R2 and Figure 9 and Figure 9, figure supplement 1). Importantly, this analysis was also performed after excluding all time points that include escapes. Thus, the increase in PMd-dlPAG mutual information is independent of escapes, and is related to exposure to threats.

      Similarly, the increase in activity in the human fMRI data in the hypothalamus-dlPAG pathway is also related to the exposure to aversive images, rather than specific defensive behaviors performed by the human subjects. This new finding of increased mutual information in the PMd-dlPAG circuit independently of escapes provides a better parallel to the human data.

      In Figure R2 below we used mutual information instead of correlation because mutual information can capture both linear and non-linear correlation between two time-series. Figure R2E-G shows that the projection from PMd-cck cells to dlPAG is unilateral. Thus, in dual photometry recordings that were done contralaterally in the PMd and the dlPAG, the signals from the dlPAG are from local cell bodies, and are not contaminated by GCaMP signals from PMd-cck axon terminals.

      Figure R2. (Panels from Figure 9(A-D) and from Figure 9, figure supplement 1 (panels E-G)) Dual fiber photometry signals from the PMd and dlPAG exhibit increased correlation and mutual information during threat exposure. (A) Scheme showing setup used to obtain dual fiber photometry recordings. (B) PMd-cck mice were injected with AAV9-Ef1a-DIO-GCaMP6s in the PMd and AAV9-syn-GCaMP6s in the dlPAG. (C) Expression of GCaMP6s in the PMd and dlPAG. (Scale bars: (left) 200 µm, (right) 150 µm) (D) Bars show the mutual information between the dual-recorded PMd and dlPAG signals, both including (left) and excluding (right) escape epochs, during exposure to threat and control. Mutual information is an information theory-derived metric reflecting the amount of information obtained for one variable by observing another variable. See Methods section for more details. (E) Cck-cre mice were injected with AAV9-Ef1a-DIO-YFP in the PMd in the left side. (F) Image shows the expression of YFP in PMd-cck cells in the left side. (scale bar: 200 µm) (G) PMd-cck axon terminals unilaterally express YFP in the dlPAG. (scale bar: 150µm).

      • p<0.05, ** p<0.01.

      Reviewer #2 (Public Review):

      The manuscript by Wang et al. addresses neuronal mechanisms underlying conserved escape behaviors. The study targets the midbrain periaqueductal grey, specifically the dorsolateral aspect (dlPAG), since previous research demonstrated that activation of dlPAG leads to escape behaviors in rodents and panic-related symptoms in humans. The hypothalamic dorsal premammillary nucleus (PMd) monosynaptically projects to the dlPAG and thus could play a role in escape behavior. The authors test whether cholecystokinin (CCK)-expressing PMd cells could be involved in escape behaviors from innate and conditioned threats using mainly two behavioral paradigms in mice: exposure to a live rat and electrical foot shocks.

      Different approaches are used to test the main hypothesis. Using fiber photometry and microendoscopy calcium imaging in freely moving mice, the study finds that PMd CCK+ neurons were more active when mice are close to threats and during escape behaviors. Furthermore, PMD CCK+ activation patterns predicted escape behavior in a general linearized model. Chemogenetic inhibition of CCK+ PMd cells decreased escape speed from threats in both behavioral paradigms, while optogenetic activation of those cells lead to an increase in speed. Observation of c-fos expression after optogenetic activation revealed activation within two target areas of the PMd, the dlPAG and anteromedial ventral thalamus (AMv), in which cellular activity measured by fiber photometry also increased during escape behaviors. Interestingly, inhibition of PMd-to-dlPAG pathway, but not PMd-to-AMv, caused a decrease in escape velocity. Lastly, the authors investigated the response of several human participants to threatening images in an fMRI scan. These results suggest that similar to mice, an activation proportional to the threat intensity within a functional connection between hypothalamus and PAG pathway may occur in humans.

      The authors conclude that a pathway from the PMd to the dlPAG, characterized by expression of CCK, control escape vigor and responsiveness to threat in mice, and that a similar pathway could be present in humans.

      Overall, the comprehensive data from multiple approaches support a role of the identified pathway in escape behavior. However, an insufficient description of the used methods and experimental details makes it difficult to assess the validity and conclusivity of some findings. In addition, the strong interpretation emphasis on the functional specificity of the CCK+ PMd-dlPAG pathway appears not fully supported by the data.

      1) The rationale for selection of CCK+ cells of the PMd is missing in the current manuscript. Despite methodological considerations, a clear description of these cells' role and characteristics from the existing literature is needed.

      To address this point, we justify our choice of cck+ cells by discussing prior data showing that PMd cck cells are the major neuronal population of the PMd. Furthermore, cck is not strongly expressed in other adjacent hypothalamic nuclei, showing the high anatomical specificity of our manipulations targeting PMd-cck cells. We also discuss prior data (Wang et al., 2021) in the Introduction and Discussion about these cells.

      2) The narrowness of the conclusions of the article is unnecessary. Although CCK+ PMd cells could play a role in regulating escape vigor, some of the presented results rather support the notion of a more general role of these cells in mediating defensive states. For example, the photometry data shows correlation of activity with other active defensive behavior. To address this point, a better analysis of the relation between neuronal activity and the general locomotor behavior of the animals is lacking. In addition, the already presented relation with the measured behaviors is not taken into account when interpreting the results (e.g. Fig 7 E). This description would be relevant to more comprehensively attributing functional roles for CCK + PMd cells.

      At the Reviewer's request, we have included an analysis of the relationship between general locomotor behavior and PMd-cck df/F (Figure R3 and Figure 2, figure supplement 2). Interestingly, we found that the df/F increases monotonically with increasing ranges of speed and acceleration in the threat assays, while remaining fairly constant for matched ranges in the control assays.

      We agree with the Reviewer that Figure 7E shows PMd-cck cells are activated not only during escape, but also other behaviors. However, the chemogenetic inhibition data show that PMd-cck cell activity only impaired escape speed, without altering freezing, approach or stretch-attend postures. Thus, the chemogenetic inhibition data indicates that the activity of these cells is only critical for escape, among the behaviors scored. Nevertheless, we discussed a “notion of a more general role of these cells in mediating defensive states” as suggested by Reviewer 2. However, Reviewer 1 provided the opposite feedback, stating that “It needs to be made clear that a specific role of PMd in quantitative measures of escape is the new result, instead of a broader role for this region”. Considering these opposing suggestions, we broadened the discussion on the role of the PMd, but did so conservatively.

      Figure R3. (Same as Figure 2, figure supplement 2). Bars show the mean PMd-cck df/F (z-scored) for increasing ranges of (A) speed and (B) acceleration. (Wilcoxon signed-rank test; n=15) p<0.05, p<0.01, p<0.001.

      3) The imprecision of the methods description, especially the behavioral analysis is contributing to the previous point. In particular, the escape criterion itself seems to include a vague classification based on movement away from the threat- this should be more concretely defined (e.g. using angle of escape direction). In any case, the different behavioral context dimensions between the two paradigms would probably affect the escape criterion itself and thus have to be taken into account when interpreting the results.

      The Reviewer makes an important point that the escape definition included in the Methods section was lacking in detail, specifying only a minimum directional speed. We had neglected to include two crucial criteria that were used as well: a minimum distance-from-threat at which escape must be initiated and a minimum distance traversed during escape. All escapes were therefore required to begin near the threat and lead to a substantial increase in mouse distance from the threat. These details are now included in the Methods section, as follows:

      “'Escapes' were defined as epochs for which (1) the mouse speed away from the threat or control stimuli exceeded 2 cm/s for a minimum of 5 seconds continuously, (2) movement away from the threat was initiated at a maximum distance-from-threat of 30 cm and (3) the distance traversed from escape onset to offset was greater than 10 cm. Thus, escapes were required to begin near the threat and lead to a quick and substantial increase in distance from the threat.

      'Escape duration' was defined as the amount of time that elapsed from escape onset to escape offset.

      'Escape speed' was defined as the average speed from escape onset to offset.

      'Escape angle' was defined as the cosine of the mouse head direction in radians, such that the values ranged from -1 (facing towards the threat) to 1 (facing away from the threat). Mouse head direction was determined by the angle of the line connecting a point midway between the ears and the nose.”

      Using the escape definition above, a higher number of escapes and a higher average escape speed was observed in threat assays compared to control assays (Figure 1). This finding indicates that the definitions we used are capturing defensive evasion.

      Both contexts have a length of 70 cm, so differences in the length of the contexts did not influence the definition of escape across contexts.

      In response to the Reviewer's suggestion of an escape angle criterion, we have included Figure R4 which illustrates that, using the aforementioned escape definition, the resulting escape angle is quite stereotyped. The cosine of the escape angle shows very little variation, showing that only a narrow range of escape angles is used. Given this result, we opted to not include the angle of escape as part of the escape criteria to increase simplicity.

      Figure R4. (A) Lines represent mouse position for all escapes that occurred during an example rat (top) and fear retrieval (bottom) session. Note that, while there is a diversity of escape routes, the escape angle is quite similar. (B) (left) Diagram provides a description of the escape angle metric, here calculated as the cosine of the head direction in radians. A value of 1 indicates an escape parallel with the long walls of the enclosure. (right) Bars represent the mean escape angle for all animals in Figure 1 during the rat and fear retrieval assays (n=32). As is apparent in (A), the mean escape angle cosine has little variability.

      4) In line, more detailed descriptions of the animal's behavior are needed to support assessment of the results regarding the event-related fiber photometry results. Measures like frequency of escape, duration of freezing bouts and angle, duration and total speed of the escape bouts, and a better description of measures like Δ escape speed could be relevant for interpreting the results. In addition, there is no explanation of how the possible overlapping of behaviors in the broad time frame used in the experiments was regarded.

      We have now included the requested measures as a supplement to Figure 2 (see also Fig. R5 below). Regarding overlapping behaviors, we have quantified the overlap between categorized behaviors in the fiber photometry assay and found that only a small fraction of behavioral timepoints were categorized as more than one behavior, primarily during behavioral transitions. This is quantified in Figure R6 below. Moreover, as is now described in the Methods, the analyses presented in Figure 2G-I (as well as Figure 7C-E, 7G-I) were performed only on behaviors that were separated from all other behaviors by a minimum of 5 seconds.

      Figure R5. (Same as Figure 2, figure supplement 1) Behavioral metrics for the PMd fiber photometry cohort during threat exposure assays. (A) Diagram provides a description of the escape angle metric, here calculated as the cosine of the head direction in radians. A value of 1 indicates an escape parallel with the long walls of the enclosure. (B) Table shows pertinent defensive metrics during exposure to rat and fear retrieval assays for the PMd fiber photometry cohort. (n=15 mice).

      Figure R6. The behavioral overlap between classified behaviors is minimal. The colormap depicts the fraction of behavioral timepoints for each of the four classified behaviors that was categorized as each of the remaining behaviors across all PMd fiber photometry assays (n=15 mice).

      5) Part of the experimental results provide suboptimal evidence for the provided interpretation. That is, the lack of clear quantification and statistical analysis of the microendoscopy calcium imaging data on PMd-CCK+ cells makes it hard to reconcile this data with the photometry data. Furthermore, evidence through c-Fos staining after optogenetically stimulation of PMd-cck+ cells is insufficient evidence for the interpretation of broad, but functionally specific, recruitment of defensive networks. While the data on optogenetic inhibition of the PMd-CCK+ projection to the dlPAG seems to confirm the main hypothesis, both an intra-animal control and demonstration of statistical significance in the analysis are desirable to fully support that role.

      We agree with the Reviewer that clear quantification and statistical analyses are essential in interpreting the microendoscopic analysis. However, we are not sure what is being requested, as we have applied both of these approaches to this dataset. For instance, in Figure 3, we quantify the percentage of cells that significantly encode each behavior as well as implement 5-fold logistic regression to determine how well these behaviors can be predicted. This accuracy is statistically compared to chance. Further quantification and statistical comparisons of speed and position decoding accuracy between threat and control assays are included in Figure 4. Concerning the Arch experiments, we have included an intra-animal control by comparing light off and on epochs, and we statistically compare the difference between these epochs with a control group.

      Regarding the c-Fos experiment, we observe increased cfos expression in several nuclei known to be critical for defense, such as the bed nucleus of the stria terminalis and the ventromedial hypothalamus. This finding underlies our claim that optogenetic activation of the PMd recruits defensive networks. Nevertheless, it is entirely possible that naturalistic endogenous activation of the PMd does not recruit these nuclei. We added text addressing this caveat.

      6) The provided fMRI data only provides circumstantial evidence to support a functionally specific hypothalamus to PAG pathway especially due to the technical characteristics and limitations of the experimental setup and behavioral paradigm.

      The Reviewer makes an excellent point. Please see our response to Reviewer 1, point 6, where we provide a better parallel to the fMRI data in a new photometry analysis, as well as the added Figure 9.

      Briefly, we now have conducted contralateral dual photometry recordings of the dlPAG and the PMd, and show an increase in mutual information between the neural activity of these two regions during exposure to threats. This result was found after removing all timepoints with escapes. Thus, the increase in mutual information is related broadly to threat exposure, rather than caused by specific moments during which escape occurs. We argue that this result more closely parallels the human data, as both the fMRI and mutual information from mice data show an increase in functional connectivity in the hypothalamus-dlPAG pathway during threat exposure, independently of escapes.

      Reviewer #3 (Public Review):

      This manuscript by Wang et al extends the Adhikari lab's earlier findings of the hypothalamic dorsal premammillary nucleus' role in defensive behavior. Using cell-type specific calcium imaging, the authors show that the activity of CCK-expressing PMd neurons precedes and predicts escape from both learned and unlearned threats. Optogenetic/chemogenetic inhibition revealed that the PMd-dlPAG pathway contributes to escape vigor. Additionally, optogenetic activation of CCK PMd neurons induces Fos in numerous brain regions implicated in fear and escape behaviors. Last, an analogous hypothalamic-PAG pathway in humans is shown to be activated by aversive images in humans.

      Although these findings are potentially impactful, additional clarification and data are needed to strengthen and streamline the manuscript, as outlined below.

      1) The results of the authors' recent publications (Wang et al Neuron 2021, Reis et al J. Neuro 2021) should be integrated into the manuscript. For example, the rationale for selectively manipulating CCK+ PMd neurons is not stated. Likewise, histological validation that the Cre-dependent GCaMP expression is restricted to CCK+ neurons should be shown or referenced. The authors should also provide discussion as to how the current results integrate with their other recent findings.

      Following the Reviewer’s suggestions, we address these concerns by referencing our previous paper. Cck+ cells were chosen because this marker is expressed in over 90% of PMd cells (Wang et al., 2021), but not in adjacent nuclei (Mickelsen et al., 2020). These cells have also been shown to be important to control escape from innate threats, such as carbon dioxide (Wang et al., 2021). These are the justifications for selecting PMd-cck cells, as discussed in this revised submission. We also reference our prior work to indicate specific expression of GCaMP in PMd cck cells.

      2) The authors used male and female mice in their experiments but there are no analyses of potential sex differences in threat responses or escape vigor. Were there any significant sex differences in the measurements presented in Figure 1? A supplementary figure showing data for male and female mice would be helpful. Also, for Figure 1, please display the individual data points so that the reader can appreciate the variability in the behavioral responses. How many approaches and escapes are observed in each test? What is the average duration of a freezing bout?

      As the results reported in Figure 1 summarize data from a rather large cohort (n=32), we decided it best for clarity's sake to show the variability in behavioral responses as a histogram of the difference scores for each animal (threat - control), now included as Figure 1, Figure Supplement 1, as well as below (Figure R7). Showing 32 individual data points may make the figure difficult to visualize (but of course, we can instead plot these individual points if the Reviewer prefers that instead of the plots shown below). At the Reviewer's request, we have also included the number of approaches and escapes in Figure 1 and the supplement. The average duration of a freezing bout is 2.03s ± 0.15 and is now reported in the Results section. There were no significant sex differences in the Figure 1 measures, and this is stated in the text, as well as plotted below in Figure R8 (male n=17, female n=15; Wilcoxon rank-sum test, p>0.05).

      Figure R7. (Also Figure 1, figure supplement 1) Distribution of the difference scores for threat - control assays. Histograms depict the difference scores for all mice, threat - control, for each behavioral metric in Figure 1. The dotted red line indicates zero, or no difference between threat and control (n=32 mice).

      Figure R8 (Also Figure 1, Figure supplement 3). Distribution of the difference scores for threat - control assays for males and females. Histograms depict the difference scores for all mice, threat - control, for each behavioral metric in Figure 1, separately for males (green) and females (purple). The dotted red line indicates zero, or no difference between threat and control (male n=17, female n=15). No significant differences (p>0.05) were found between males and females in any of the metrics plotted.

      3) In Fig. 2, there appears to be sustained activity of CCK+ neurons after the onset of threat approach, and ramping activity preceding stretch-attend. In-depth analysis of these responses may be beyond the scope of this study, but the findings should be discussed since the representation of approach-related behaviors indicates the PMd is involved in more general representation of threat proximity, rather than simply escape vigor.

      We agree with the Reviewer that PMd-activity represents distance to both innate and conditioned threats. We also include new data showing that PMd-dlPAG mutual information increases in the presence of threats (Figure R2 and Figure 9). Taken together, these data show that PMd activity encodes more than just escape vigor. We have altered the text to emphasize these results. These dual-site recordings were done contralaterally, so that dlPAG-syn cell body GCaMP signals are not contaminated by GCaMP-expressing PMd-cck axon terminals in the dlPAG.

      4) The authors state that PMd CCK neuronal activity regulates escape vigor. Although the authors show a correlation of the calcium signal amplitude and escape distance in Fig. 2I, a correlation with escape velocity would be a more convincing measure of vigor.

      PMd-cck neural activity is related to escape speed, as shown by single cell miniaturized microscopy recordings. Figure 4D shows that PMd ensemble activity can predict escape speed from threats, but not control stimuli. These results were specific to escape, as PMd activity did not encode approach speed towards threats or control stimuli (Figure 4D). Furthermore, we performed new analysis and showed that a greater number of PMd cells show activity significantly correlated with escape from threats, compared to control stimuli. Finally, we have additionally shown that, for the cells whose activity is significantly correlated with escape speed, the mutual information between escape speed and df/F is significantly greater for threat than control. This has now been included as Figure 3I-K (same as Figure R9 below).

      Figure R9. A higher fraction of PMd-cck cells are correlated with escape speed during exposure to threats. (Also Figure 3I-J) (A) Traces show the z-scored df/F (blue) and speed (gray) for one cell classified as a speed cell in the rat exposure assay (top) and one non-correlated cell from the toy rat assay (bottom). Individual escape epochs are indicated by red boxes. (B) Bars show the percent of cells that significantly correlate with escape speed. (Fisher's exact test; toy rat: n correlated = 56, n non-correlated = 405; rat: n correlated = 100, n non-correlated = 366; pre-shock: n correlated = 50, n non-correlated = 571; fear retrieval: n correlated = 122, n non-correlated = 391) (C) Bars show the mutual information in bits between escape speed and calcium activity for cells whose signals were significantly correlated with escape speed in (J). (Wilcoxon rank sum test; toy rat n=56, rat n=100; pre-shock n=50, fear retrieval n=122). p<0.001.

      Unfortunately, the lower resolution provided by photometry did not reveal consistent correlations with escape velocity across assays. Despite this lack of single cell resolution, PMd-cck photometry amplitude correlated with escape velocity during exposure to the rat, but not the toy rat, as shown below (Figure R10). However, this result was not replicated in the fear retrieval assay. Taken together, these data show that PMd activity is indeed related to escape vigor.

      Figure R10. Escape speed correlates with PMd-cck photometry amplitude during rat exposure. Bars depict the Spearman r-value of escape speed and PMd-cck photometry df/F (z-scored) amplitude during exposure to rat and toy rat. (n=9 mice) p<0.001.

      5) The changes in prediction error from control to threat contexts in Figs. 4B and 4D are compelling, but the prediction error in the threat context seems high. Can the authors provide a basis for what constitutes a 'good' error score?

      We have now included the chance error, calculated by training and testing the GLM on circularly permuted data across mice and indicated below with a dotted red line in Figure 4 and its supplement. The Methods have also been updated to reflect this new aspect of the analysis. A ‘good’ error would be a value that is significantly lower than the error expected by chance, which is indicated by the red dashed line in Figure R11.

      Fig. R11. (Also Figure 4B, 4D and Figure 4, figure supplement 1) (A) Bars show the mean squared error (MSE) of the GLM-predicted location from the actual location. The MSE is significantly lower for threat than control assays (Wilcoxon signed-rank test; n=9 mice). The dotted red line indicates chance error, calculated by training and testing the GLM on circularly permuted data. Only threat assay error was significantly lower than chance (Wilcoxon signed-rank test; rat p<0.001, fear retrieval p=0.003). (B) Bars depict the MSE of the GLM-predicted velocity away from (left) and towards (right) the threat. The GLM more accurately decodes threat than control velocities for samples in which the mice move away from the threat (top). Only threat assay error was significantly lower than chance (Wilcoxon signed-rank test; rat p=0.004, fear retrieval p=0.012). (C) Bars depict the mean squared error of the GLM-predicted speed. The GLM more accurately decodes threat than control speeds. Only threat assay error was significantly lower than chance (rat p<0.020, fear retrieval p=0.040). (Wilcoxon signed-rank test; n=9 mice) p<0.01.

      6) Off-target effects are a potential concern at the dose of CNO used (5 mg/kg). For example, the increased approach speed with CNO in the YFP control group (Fig. 5D) may be a result of the high CNO dose. How was the dose of CNO selected?

      This dose was selected based on our prior experience using the same dose to study PMd-cck cells in our prior Neuron paper. Additionally, this is a common dose, used in many papers. Indeed, there are several recent neuroscience papers published in this journal, eLife, that use this exact dose of CNO (Chen et al., 2016; Halbout et al., 2019; Ito et al., 2020; Kwak and Jung, 2019; Li et al., 2020; Mukherjee et al., 2021; O’Hare et al., 2017; Patel et al., 2019).

      Although in this particular case approach velocity trended higher after CNO treatment, this is not a consistent result. We ran another cohort of control mice (n=9 saline, 9 CNO 5 mg/kg) and show that no such trend in approach velocity to the shock grid was observed during fear retrieval (Figure R12).

      Fig. R12. CNO has no effect on approach velocity in a separate control cohort. The experimental protocol was performed as described in Figure 1B for a control cohort. For this group, CNO injection had no significant effect on approach speed (Wilcoxon signed-rank test, n=9).

      7) Given the visible trends in the data, the number of animals used in Fig. 6B is insufficient to make conclusions about the behavioral effect of optogenetic excitation of PMd CCK neurons. Either more animals should be added, or the analysis should be limited to the Fos staining.

      At the Reviewer's request, we have increased the number of animals in this analysis and found the results unchanged. Figure 6B has been replaced in the main manuscript (same as Figure R13 below). The addition of these new animals also erased the previous non-significant trends seen with fewer animals.

      Figure R13. (Also Figure 6B) Delivery of blue light increases speed in PMd-cck ChR2 mice, but not stretch-attend postures or freeze bouts. (PMd-cck YFP n=6, PMd-cck ChR2 n=8; Wilcoxon rank-sum test).

    1. Author Response:

      Reviewer #1 (Public Review):

      Here the authors use a variety of sophisticated approaches to assess the contribution of synaptic parameters to dendritic integration across neuronal maturation. They provide high-quality data identifying cellular parameters that underlie differences in AMPAR-mediated synaptic currents measured between adolescent and adult cerebellar stellate cells, and conclude that differences are attributed to an increase in the complexity of the dendritic arbor. This conclusion relies primarily on the ability of a previously described model for adult stellate cells to recapitulate the age-dependent changes in EPSCs by a change in dendritic branching with no change in synapse density. These rigorous results have implications for understanding how changing structure during neuronal development affects integration of AMPR-mediated synaptic responses.

      The data showing that younger SCs have smaller dendritic arbors but similar synapse density is well-documented and provides compelling evidence that these structural changes affect dendritic integration. But the main conclusion also relies on the assumption that the biophysical model built for adult SCs applies to adolescent SCs, and there are additional relevant variables related to synaptic function that have not been fully assessed. Thus, the main conclusions would be strengthened and broadened by additional experimental validation.

      We thank the reviewer for the positive assessment of the quality and importance of our manuscript. Below we address the reviewer’s comments directly but would like to stress that the goal of the manuscript was to understand the cellular mechanisms underlying developmental slowing of mEPSCs in SCs and the consequent implication for developmental changes in dendritic integration, which have rarely been examined to date, and not to establish a detailed biophysical model of cerebellar SCs. The latter would require dual-electrode recordings (one on 0.5 um dendrites), detailed description of the expression, dendritic localization of the gap junction protein connexin 36 (as done in Szoboszlay neuron 2016), and a detailed description prameter variability across the SC population (e.g. variations in AMPAR content at synapses, Rm, and dendritic morphology). Such experiments are well beyond the scope of the manuscript. Here we use biophysical simulations to support conclusions derived from specific experiments, more as a proof of principle rather than a strict quantitative prediction.

      Nevertheless, we would like to clarify our selection of parameters for the biophysical models for immature and adult SCs. We did not simply “assume” that the biophysical models were the same at the two developmental stages. We either used evidence from the literature or our own measured parameters to establish an immature SC model. As compared to adult SCs, we found that immature SCs had 1) an identical membrane time constant, 2) an only slightly larger dendrite diameter, 3) decreased dendritic branching and maximum lengths, 4) a comparable synapse density, and 5) a homogeneous synapse distribution. Taken together, we concluded that increased dendritic branching during SC maturation resulted in a larger fraction of synapses at longer electrotonic distances in adult SCs. These experimental findings were incorporated into two distinct biophysical models representing immature and adult SCs. Evidence from the literature suggests that voltage-gated channels expression is not altered between the two developmental stages studied here. Therefore, like the adult SC model, we considered only the passive membrane properties and the dendritic morphology. The simulation results supported our conclusion that the increased apparent dendritic filtering of mEPSCs resulted from a change in the distribution of synapse distance to the soma rather than cable properties. Some of the measured parameters (e.g., membrane time constant) were not clearly stated manuscript, which we have corrected in the revised manuscript.

      We are not sure what the reviewer meant by suggesting that we did not examine “other relevant variables related to synaptic function.” Later, the reviewer refers to alterations in AMPAR subunit composition or changes in cleft glutamate concentration (low-affinity AMPAR antagonist experiments). We performed experiments to directly examine both possible contributions by comparing qEPSC kinetics and performing low-affinity antagonist experiments, respectively, but we found that neither mechanism could account for the developmental slowing of mEPSCs. We, therefore, did not explore further possible developmental changes AMPAR subunits. See below for a more specific response and above for newly added text.

      While many exciting questions could be examined in the future, we do not think the present study requires additional experiments. Nevertheless, we recognize that perhaps we can improve the description of the results to justify our conclusions better (see specifics below).

      Reviewer #2 (Public Review):

      This manuscript investigates the cellular mechanisms underlying the maturation of synaptic integration in molecular layer interneurons in the cerebellar cortex. The authors use an impressive combination of techniques to address this question: patch-clamp recordings, 2-photon and electron microscopy, and compartmental modelling. The study builds conceptually and technically on previous work by these authors (Abrahamsson et al. 2012) and extends the principles described in that paper to investigate how developmental changes in dendritic morphology, synapse distribution and strength combine to determine the impact of synaptic inputs at the soma.

      1) Models are constructed to confirm the interpretation of experimental results, mostly repeating the simulations from Abrahamsson et al. (2012) using 3D reconstructed morphologies. The results are as expected from cable theory, given the (passive) model assumptions. While this confirmation is welcome and important, it is disappointing to see the opportunity missed to explore the implications of the experimental findings in greater detail. For instance, with the observed distributions of synapses, are there more segregated subunits available for computation in adult vs immature neurons?

      As described in our response to reviewer 1, this manuscript intends to identify the cellular mechanisms accounting developmental slowing of mEPSCs and its implication for dendritic integration. The modeling was designed to support the most plausible explanation that increased branching resulted in more synapses at longer electrotonic distances. This finding is novel and merits more in-depth examination at a computation level in future studies.

      Quantifying dendritic segregation is non-trivial due to dendritic nonlinearities and the difficulties in setting criteria for electrical “isolation” of inputs. However, because the space constant does not change with development, while both dendrite length and branching increase, it is rather logical to conclude qualitatively that the number of computational segments increases with development.

      We have added the following sentence to the Discussion (line 579):

      “Moreover, since the space constant does not change significantly with development and the dendritic tree complexity increases, the number of computational segments is expected to increase with development.”

      How do SCs respond at different developmental stages with in vivo-like patterns of input, rather than isolated activation of synapses? Answering these sorts of questions would provide quantitative support for the conclusion that computational properties evolve with development.

      While this is indeed a vital question, the in vivo patterns of synaptic activity are not known, so it is difficult to devise experiments to arrive at definitive conclusions.

      2) From a technical perspective, the modeling appears to be well-executed, though more methodological detail is required for it to be reproducible. The AMPA receptor model and reversal potential are unspecified, as is the procedure for fitting the kinetics to data.

      We did not use an explicit channel model to generate synaptic conductances. We simply used the default multiexponential function of Neuron (single exponential rise and single exponential decay) and adjusted the parameters tauRise and tauDecay such that simulated EPSCs matched somatic quantal EPSC amplitude, rise time and τdecay (Figure 4).

      We added the following text to the methods (line 708):

      “The peak and kinetics of the AMPAR-mediated synaptic conductance waveforms (gsyn) were set to simulate qEPSCs that matched the amplitude and kinetics of experimental somatic quantal EPSCs and evoked EPSCs. Immature quantal gsyn had an peak amplitude of 0.00175 μS, a 10-90 % RT of 0.0748 ms and a half-width of 0.36 ms (NEURON synaptic conductance parameter Tau0 = 0.073 ms, Tau1 = 0.26 ms and Gmax = 0.004 μS) while mature quantal gsyn had an peak amplitude of 0.00133 μS, a 10-90 % RT of 0.072 ms and a half-width of 0.341 ms (NEURON synaptic conductance parameters Tau0 = 0.072 ms, Tau1 = 0.24 ms and Gmax = 0.0032 μS). For all simulations, the reversal potential was set to 0 mV and the holing membrane potential was to – 70 mV. Experimental somatic PPR for EPSCs were reproduced with a gsyn 2/ gsyn 1 of 2.25.”

      Were simulations performed at resting potential, and if yes, what was the value?

      The membrane potential was set at – 70 mV to match that of experimental recordings and has been updated in the Methods section.

      How was the quality of the morphological reconstructions assessed? Accurate measurement of dendritic diameters is crucial to the simulations in this study, so providing additional morphometrics would be helpful for assessing the results. Will the models and morphologies be deposited in ModelDB or similar?

      For the two reconstructions imported into NEURON for simulations, we manually curated the dendritic diameters to verify a matching of the estimated diameter to that of the fluorescence image using NeuroStudio, which uses a robust subpixel estimation algorithm (Rayburst diameter, Rodriguez et al. 2008). The reconstructions include all variations in diameter throughout the dendritic tree (see as a example the the result of the reconstruction on the image below for the immature SC presented in the Figure 2D). The mean diameter across the entire dendritic tree of the reconstructed immature and adult SC was 0.42 and 0.36 μm, respectively, similar to the ratio of measured diameters estimated using confocal microscopy.

      We have updated the methods section to include how reconstructions were curated and analyzed (line 693).

      “An immature (P16) and adult SC (P42) were patch loaded with 30 μM Alexa 594 in the pipette and imaged using 2PLSM. Both cells were reconstructed in 3D using NeuronStudio in a semiautomatic mode which uses a robust subpixel estimation algorithm (calculation of Rayburst diameter (Rodriguez et al., 2008)). We manually curated the diameters to verify that it matched the fluorescence image to faithfully account for all variations in diameter throughout the dendritic tree. The measured diameter across the entire dendritic tree of the reconstructed immature and adult SCs was 0.42 and 0.36 μm, respectively. The 16% smaller diameter in adult was similar to the 13% obtained from confocal image analysis from many SCs (see Figure 2B).”

      We agree with the reviewer that accurate measurements of dendritic diameters are crucial for the simulations. We did not rely soley on the reconstructed SCs, but we also performed highresolution confocal microscopy analysis of 16 different dye-filled SCs. We examined differences in the FWHM of intensity line profiles drawn perpendicular to the dendrite between immature and adult SCs. The FWHM is a good approximation of dendritic diameter and was performed similarly to adult SCs (Abrahamsson et al., 2012) to allow direct assessment of possible developmental differences. We confirmed that 98% of the estimated diameters are larger than the imaging resolution (0.27 μm). We observed only a small developmental difference in the mean FWHM (0.41 vs. 0.47 μm, 13% reduction) using this approach. Because the dendritic filtering is similar for diameters ranging from 0.3 to 0.6 μm (Figure 4G and 4H, Abrahamsson et al. 2012), we concluded that developmental changes in dendritic diameter cannot account for for developmental differences in mEPSC time course.

      We added the following text to the methods (line 777):

      “The imaging resolution within the molecular layer was estimated from the width of intensity line profiles of SC axons. The FWHM was 0.30 +/- 0.01 μm (n = 57 measurements over 16 axons) and a mean of 0.27 +/- 0.01 μm (n = 16) when taking into account the thinnest section for each axon. Only 2% of all dendritic measurements are less than 270 nm, suggesting that the dendritic diameter estimation is hardly affected by the resolution of our microscope”

      Regarding additional morphometrics:

      1) We added two panels (H and I) to Figure 6 showing the number of primary dendrites and branch points for immature and adult using the same estimation criteria as Myoga et al;,

      1. We have updated the Results section (line 389). “Thus, the larger number of puncta located further from the soma in adult SCs is not due to increased puncta density with distance, but a larger dendritic lengths (Figure 6E and 6F) and many more distal dendritic branches (Figure 6G, Sholl analysis) due to a larger number of branch points (Figure 6H), but not a larger number of primary dendrites (Figure 6I). The similarity between the shapes of synapse (Figure 6B) and dentric segment (Figure 6C) distributions was captured by a similarity in their skewness (0.38 vs. 0.32 for both distributions in immature and -0.10 and -0.08 for adult distributions). These data demonstrate that increased dendritic complexity during SC maturation is responsible for a prominent shift toward distal synapses in adult SCs.

      2) As suggested by the reviewer, we estimated the dendritic width as a function branch order and observed a small reduction of dendritic segments as a function of distance from the soma that does not significantly alter the dendritic filtering (0.35 to 0.6 μm): there is a tendency to observe smaller diameter for more distal segments.

      3) We also show the variability in dendritic diameter within single SCs and between different SCs, which can be very large. These results have been added to Figure 2B. See also point one below in response to “comment to authors.”

      We will upload the two SC reconstructions to ModelDB.

      3) The Discussion should justify the assumption of AMPA-only synapses in the model (by citing available experimental data) as well as the limitations of this assumption in the case of different spatiotemporal patterns of parallel fiber activation.

      NMDARs are extrasynaptic in immature and adult SCs. Therefore they do not contribute to postsynaptic strength in response to low-frequency synaptic activation. We therefore do not consider their contribution to synaptic integration in this study. Please see also out detailed response to reviewer’s point 4. We have updated the Results accordingly.

      4) What is the likely influence of gap junction coupling between SCs on the results presented here, and on synaptic integration in SCs more generally - and how does it change during development? This should also be discussed.

      Please see a detailed response to Editor’s point 2. In brief, all recordings were performed without perturbing gap junction coupling between cells, which have been shown to affect axial resistance and membrane capacitance in other cell types (Szoboszlay et al., 2016). While our simulations do not explicitly include gap junctions, their effect on passive membrane properties is implicitly included because we matched the simulated membrane time constant to experimental values. Moreover, gap junctions are more prominent in cerebellar basket cells than SCs in both p18 to p21 animals (Rieubland 2015) and adult mice (Hoehne et al., 2020). Ultimately, the impact of gap junctions also depends on their distance from the activated synapses (Szoboszlay et al., 2016). Unfortunately, the distribution of gap junctions in SCs and their conductance is not known at this time. We, therefore, did not explicitly consider gap junction in this study.

      Nevertheless, we have added a section in the Discussion (line 552):

      “We cannot rule out that developmental changes in gap junction expression could contribute to the maturation of SC dendritic integration, since they are thought to contribute to the axial resistivity and capacitance of neurons (Szoboszlay et al., 2016). All the recordings were made with gap junctions intact, including for membrane time constant measurements. However, their expression in SCs is likely to be lower than their basket cell counterparts (Hoehne et al., 2020; Rieubland et al., 2014).”

      5) All experiments and all simulations in the manuscript were done in voltage clamp (the Methods section should give further details, including the series resistance). What is the significance of the key results of the manuscript on synapse distribution and branching pattern of postsynaptic dendrites in immature and adult SCs for the typical mode of synaptic integration in vivo, i.e. in current clamp? What is their significance for neuronal output, considering that SCs are spontaneously active?

      It should be noted that not all simulations were done in voltage-clamp, see figure 8.

      Nevertheless, we have given additional details about the following experimental and simulation parameters:

      1) Description of the whole-cell voltage-clamp procedure.

      2) Series resistance values of experiments and used for simulations.

      Initial simulations with the idealized SC model were performed with a Rs of 20 MOhm. In the reconstructed model Rs was set at 16 mOhm to match more precisely the experimental values obtained for the mEPSC experiments. We verified that there were no statistical difference in Rs between Immature and adult recordings.

      Reviewer #3 (Public Review):

      1) Although the authors were thorough in their efforts to find the mechanism underlying the differences in the young and adult SC synaptic event time course, the authors should consider the possibility of inherently different glutamate receptors, either by alterations in the subunit composition or by an additional modulatory subunit. The literature actually suggests that this might be the case, as several publications described altered AMPA receptor properties (not just density) during development in stellate cells (Bureau, Mulle 2004; Sun, Liu 2007; Liu, Cull-Candy 2002). The authors need to address these possibilities, as modulatory subunits are known to alter receptor kinetics and conductance as well.

      Properties of synaptic AMPAR in SCs are known to change during development and in an activity-dependent manner. EPSCs in immature SC have been shown to be mediated by calcium permeable AMPARs, predominantly containing GluR3 subunits that are associated with TARP γ2 and γ7 (Soto et al. 2007; Bats et al., 2012). During development GluR2 subunits are inserted to the synaptic AMPAR in an activity-dependent manner (Liu et al, 2000), affecting the receptors’ calcium permeability (Liu et al., 2002). However, those developmental changes do not appear to affect EPSC kinetics (Liu et al., 2002) and have very little impact on AMPAR conductance (Soto et al., 2007). When we compare qEPSC kinetics for somatic synapses between immature and adult SC, we did not observe changes in EPSC decay. In the light of this observation and also consistent with the studies cited above, we concluded that differences in AMPAR composition could not contribute to kinetics differences observed in the developmental changes in mEPSC properties.

      We have modified the manuscript to make this point clearer (see section starting line 332) :

      “This reduction in synaptic conductance could be due to a reduction in the number of synaptic AMPARs activated and/or a developmental change in AMPAR subunits. SC synaptic AMPARs are composed of GluA2 and GluA3 subunits associated with TARP γ2 and γ7 (Bats et al., 2012; Liu and Cull-Candy, 2000; Soto et al., 2007; Yamazaki et al., 2015). During development, GluR2 subunits are inserted to the synaptic AMPAR in an activity-dependent manner (Liu and Cull-Candy, 2002), affecting receptors calcium permeability (Liu and Cull-Candy, 2000). However, those developmental changes have little impact on AMPAR conductance (Soto et al., 2007), nor do they appear to affect EPSC kinetics (Liu and Cull-Candy, 2002); the latter is consistent with our findings. Therefore the developmental reduction in postsynaptic strength most likely results from fewer AMPARs activated by the release of glutamate from the fusion of a single vesicle. “

      The authors correctly identify the relationship between local dendritic resistance and the reduction of driving force, but they assume the same relationship for young SCs as well in their model. This assumption is not supported by recordings, and as there are several publications about the disparity of input impedance for young versus adult cells (Schmidt-Hieber, Bischoffberger 2007).

      The input resistance of the dendrite will indeed determine local depolarization and loss of driving force. However, its impact on dendritic integration depends on it precise value, and perhaps the reviewer thought we “assumed” that the input resistance to be the same between immature and adult SCs. This was not the case, and we have since clarified this in the manuscript. We performed three important measurements that support a loss of driving force in immature SCs (for reference, the input resistance for an infinite cable is described by the following equation (Rn= sqrt(RmRi/2)/(2pi*r^(3/2)), where r is the dendrite radius):

      1) The input resistance is inversely proportional to the dendritic diameter, which we measured to be only slightly larger in immature SCs (0.47 versus 0.41 μm). This result is described in Figure 2.

      2) We measured the membrane time constant, which provides an estimate of the total membrane conductance multiplied by the total capacitance. The values between the two ages were similar, suggesting a slightly larger membrane resistance to compensate the smaller total membrane capacitance of the immature SCs. This was explicitly accounted for when performing the simulations using reconstructed immature and adult SCs (Figure 2 and 7 and 8) by adjusting the specific membrane resistance until the simulated membrane time constant matched experimental values. These values were not clearly mentioned and are now included on line 233 in the Results and 704 in the Methods.

      3) We directly examined paired-pulse facilitation of synapses onto immature SC dendrites versus that for somatic synapses. We previously showed in adult SCs that sublinear summation of synaptic responses, due to loss of synaptic current driving force (Tran- Van-Minh et al. 2016), manifests in decreased facilitation for dendritic synapses (Abrahamsson et al. 2012). Figure 8A shows that indeed dendritic facilitation was less than observed in the soma. We have now modified Figure 8 to include the results of the simulations showing that the biophysical model could reproduce this difference in shortterm plasticity (Figure 8B).

      Together, we believe these measurements support the presence of similar sublinear summation mechanisms in immature SCs.

      2) The authors use extracellular stimulation of parallel fibers. The authors note that due to the orientation of the PF, and the slicing angle, they can restrict the spatial extent of the stimuli. However, this method does not guarantee that the stimulated fibers will all connect to the same dendritic branch. Whether two stimulated synapses connect to the same dendrite or not can heavily influence summation. This is especially a great concern for these cells as the Scholl analysis showed that young and adult SC cells have different amount of distal dendrites. Therefore, if the stimulated axons connect to several different neighboring dendrites instead of the one or two in case of young SC cells, then the model calculations and the conclusions about the summation rules may be erroneous.

      We selected isolated dendrites and delivered voltage stimuli using small diameter glass electrodes (~ 1 μm) 10 - 15 V above threshold to stimulate single dendrites. This procedure excites GC axons in brain slices made from adult mice within less than 10 μm from the tip (Figure 2C, Tran-Van-Minh et al. 2016). It produces large dendritic depolarizations that are sufficient to decrease synaptic current driving force (Figure 1, Tran-Van-Minh et al. 2016). When we reproduced the conductance ratio using uncaging of single dendrites, we observed paired-pulse facilitation in the dendrites – suggesting that electrical stimulation activated synapses on common dendritic branches, or at least within close electrotonic distance to cause large dendritic depolarizations (Figure 7, Abrahamsson et al. 2012). Finally, we expect that the decreased branching in immature SCs further ensures that a majority of recorded synapses are contacting a common dendritic segment. We cannot rule out that occasionally some synaptic responses recorded at the soma are from synapses on different dendritic branches, but we do not see how this would alter our results and change our principal conclusions, particularly since this possible error only effects the interpretation of how many synapses are activated in paired-pulse experiments. The majority of the conclusions arise from the stimulation of single vesicle release events, and given the strikingly perpendicular orientation of GC axons, a 10 μm error in synapse location along a dendrite when we stimulated in the outthird would not alter our interpretations of the data.

    1. Author Response:

      Reviewer #1:

      This manuscript presents some new fossil remains from the lower back of one of the specimens of Australopithecus sediba, the Malapa Hominin 2 (MH2). The authors identified portions of four lumbar vertebrae (L1-L4), which refit with some previously found vertebrae. All in this, produce a nearly complete lower back of a female individual, which an invaluable finding to understand the functional morphology and evolution of purported adaptations to bipedalism in fossil hominins. They find that MH2 had both a lumbar lordosis and an intervertebral articular facets width from the upper to lower lumbar column ("pyramidal configuration") similar to that of modern humans. Also, they find that the overall vertebral shape is more similar to that of modern humans compared with that of great apes. These fossils allow other researchers to test existing hypotheses about the evolutionary process implied in the acquisition of obligate bipedalism.

      Some of the conclusions of this paper are supported by the data, but other interpretations of the results need to be modified.

      Strengths:

      The paper present very important fossils and also includes a large comparative sample of lumbar vertebrae from modern humans and great apes. Also, includes other fossil remains from other species of hominins such as Neandertals, Australopithecus afarensis, and Australopithecus africanus. It also includes analyses from two methodologies, geometric morphometrics, and unidimensional linear and angular variables. This complete approach produces interesting and complementary results that give support to their conclusions.

      Weaknesses:

      The weaknesses of the paper are the lack of hypotheses and clear objectives of the work. Also, the methods are not explained in detail, which makes the paper hard to follow in some parts and difficult to replicate. The lack of hypotheses makes difficult to understand the use of some analysis. Finally, the interpretation of some results is not fully justified by the data, the authors need to focus on what all the results indicate, and not only on some of them.

      We appreciate this reviewer’s careful feedback and have taken their advice of adding hypotheses to make the objective of the manuscript clearer (aside from describing new fossil material). We have also justified or modified our interpretations based on comparative data of living species and previously known fossil hominin material. We hope that our revisions address this reviewer’s concerns.

      Reviewer #2:

      Williams et al. present newly discovered lumbar vertebrae of MH2 so that now almost the complete lumbar spine of this important australopithecine specimen is known. This allows better inferences for posture and locomotion in these early hominins than what was previously possible, particularly for lumbar lordosis. This study could, however, benefit from using the correct anatomical and taxonomical terminology (e.g., "costal process" instead of lumbar transverse process, and "australopithecines" instead of australopiths), and a more inclusive consideration of the literature. For example, it might help to discuss that Oreopithecus has been said to show a similar pattern of lumbar vertebrae wedging angles and pyramidal configuration of the articular processes, and that the same inferences for a human-like degree of lordosis have already been made previously based on the pelvic incidence of an alternative reconstruction of MH2. Moreover, lumbar lordosis (particular the wedging angles) should also be compared to the Homo erectus specimen KNM-WT 15000, and to the 95% range of variation of modern humans rather than to the 95% confidence interval for the mean of modern human lumbar lordosis. Finally, the authors could be more precise in saying to which vertebra of their great ape comparative sample they have compared the middle lumbar vertebra of MH2 and how they justify this.

      We appreciate this reviewer’s detailed and helpful feedback, and we have done our best to address their concerns. We take their point about the Terminologia Anatomica and have replaced “lumbar transverse processes” with “costal processes,” although we use “transverse” in parentheses on its first usage in both the abstract and the main text. This is because the leading textbooks in human osteology and human evolutionary anatomy (White et al. and Aiello & Dean) uses “transverse” and not “costal” process, and we have previously published using “lumbar transverse process.” We feel that the term “australopithecine” term is both outdated and incorrect taxonomically in usage, but we understand the reviewer’s dislike of the colloquial term “australopith;” therefore, we refer only to “members of the genus Australopithecus,” “early hominins included in this study,” etc. in the revision. Regarding our consideration of the literature, we did not previously cite the Oreopithecus literature because one of us (Russo) previously tested and rejected the hypothesis that Oreopithecus possessed hominin-like lumbar vertebrae (Russo & Shapiro, 2017). We now cite both the original paper and its refutation. We do regret missing the Tardieu et al. (2017) paper on pelvic incidence, which we have now incorporated in our revision, along with some other reference not included in the previous version. We have not included KNM-WT 15000 in our wedging angle and multivariate analyses because it is a juvenile individual with incomplete growth and fusion of the vertebral bodies. Our comparative sample used here does not include juveniles. We do include KNM-WT 15000 in our inter- articular facet comparison since the articular processes are complete. For the wedging angle figure, we have now added 95% confidence intervals of the data (2 standard deviations of the mean) for individual human lumbar vertebrae. Finally, we clarify and justify which lumbar vertebra is identified as “middle” in cases of great apes with four lumbar vertebrae (L2).

    1. Author Response:

      Reviewer #1:

      In this paper McPherson et al investigated fibrin clot properties and fibrinolysis with recombinant fibrinogen variants lacking parts of the fibrinogen alphaC-region. The aim was to understand the contribution of two subregions, the alphaC-connecter and the alphaC-domain, which are known to be involved in the lateral aggregation of fibrin fibers and their cross-linking. The study measured the contribution of subregions to fibrin fiber growth, mechanical strength, how fibrinolysis proceeds in their absence, their impact on clot retraction, and how the variants affect whole blood clot features.

      Strengths and weaknesses:

      The major strengths of this report lie in the broad range of appropriately selected assays used to characterize the fibrinogen variants described, the clarity of the data presented, and a clear discussion of the mechanistic implications of the findings compared to previous work. We can now understand more clearly how the alphaC-subregions contribute to clot structure, initial fibrin fiber growth, clot strength and stiffness, fibrinolysis, clot contraction linked to erythrocyte retention and platelet binding, and a clinically relevant assessment of the clotting and lysis of whole blood.

      We thank reviewer 2 for the positive, detailed and constructive comments in their public review.

      The methodology does not have weaknesses in the context presented. One issue that arises from such a study, that is centered on the characterization recombinant molecules assessed in vitro, is to what extent each of the characteristics described would have physiological impact in vivo. The study has the advantage of clearly separating out different roles for the fibrinogen alphaC-region, but the more complex interplay of the variants with a complete vasculature and blood composition, in an organism that produced the variants, would enhance the study claims. This issue is hinted at in the paper, albeit in vitro/ex vivo. The fibrin density and fiber thickness of alpha390 clots was different in a purified setting compared to whole reconstituted blood clots post-thromboelastography made using blood from Fga-/- mice. It therefore seems reasonable that characteristics of the alphaC-region functions described may show more or less importance when assessed in vivo.

      While our study is mainly focussed on in vitro and ex vivo studies of fibrinogen with truncations in the αC-region, and while we have full confidence that similar mechanisms involving the fibrinogen αC-region are at play during haemostasis and thrombosis in whole organisms, we appreciate that future in vivo studies of fibrinogens with similar αC truncations may add further information regarding the physiological impact of this intriguing region of the fibrinogen molecule. However, such in vivo models are likely complicated by the observed reduced expression levels of fibrinogen with αC truncations, with subsequent impact resulting from both altered function and protein levels.

      Reviewer #2:

      McPherson H et al designed two fibrinogen variants with truncated alfa- C terminal region: the alfa-390 and alfa-220. The alfa-390 lacks the alfa-C domain, while the alfa-220 lacks both the alfa-C domain and alfa-C connector region. By using different type of optical and electronic microscopy they characterize the fibrin structure at different resolution, and the early fibrin oligomers formation of these homozygous fibrinogen variants, and compared them to the WT fibrinogen. The functional implications of removing these protein stretches in the fibrin mechanical properties and stability were studied by turbidity, FXIIIa fibrin crosslinking, fibrin microrheology, clot contraction (retraction), and rotational thromboelastometry. It was found for the first time the differential roles of these two regions: the role of the alfa-C connector in the longitudinal protofibril/fibre growth, mechanical and fibrinolytic stability, while the alfa-C domain (variant 390) was implicated in the lateral protofibrils association, since its removal gave rise to denser fibrin networks with thinner fibres, already described in the literature. Their finding has clinical implications since support the design of antithrombotic drugs that can limit the thrombus size or growth. Their conclusions are supported by the results. The confocal microscopy fibrin structure was confirmed by the scanning electron microscopy images, and highlight the importance of coupling the fibrinogen under study to the fluorophore in order to do not bias the fibrin structure.

      We thank reviewer 3 for the positive comments and for appreciating the potential clinical implications of our study in their public review.

      In order to study the differential implications of these alfa-C subregions on the susceptibility of fibrinogen/fibrin plasmin degradation that will support thromboelastometric and turbidity results, it would be interesting in the future to perform fibrinogen/fibrin plasmin degradation kinetic of these variants monitored by SDS/PAGE.

      Our study has clearly shown the importance of the fibrinogen αC-region in the mechanical and proteolytic resistance of the blood clot. These were dramatically affected in the total truncation variant, the fibrin polymer of which showed very poor mechanical properties and was lysed extremely rapidly by the fibrinolytic system. We agree that future studies of relevant fibrin degradation products over time may further underpin these findings and the relevance of this part of the fibrinogen molecule in determining clot stability.

    1. Author Response:

      Reviewer #2:

      This study by Schulz et al flushes out in fine detail an interesting consequences of inhibitory synaptic plasticity in plastic neuronal networks, showing that its ability to balance precisely only previously experienced stimuli makes this type pf plasticity an excellent candidate mechanism to allow novelty detection. Strong transient responses will be evoked only by those stimuli which have not previously activated (and thus trained) the stimulus specific set of inhibitory stimuli. An open question remains with regard to the time scales and speed of learning at these inhibitory sites, something that will be answerable by the experimental audience of this paper (but could be investigated in a bit more detail in the model as well).

      We thank the reviewer for their constructive feedback. We have addressed the comments below, and included new substantial information to our manuscript. This includes a new Supplementary Figure (Figure 4-Figure Supplement 2) in which we analyze how the novelty response and the adapted responses depend on the inhibitory learning rate and a detailed discussion about experimental and theoretical studies related to inhibitory plasticity.

      Reviewer #3:

      This computational paper addresses the mechanisms of sensory adaptation and novelty detection in the auditory cortex. A spiking RNN model of 5000 (4000 excitatory/1000 inhibitory) units is developed and adapted to sequences of inputs (ABC…) followed by a novel stimulus N. As with experiments the model captures the adaptation to the repeated stimuli, as well as a strong response to a novel stimulus. In contrast to many models of sensory adaptation that rely on short-term synaptic plasticity, here adaptation arises from STDP at the Inh->Ex connection. Specifically, during ABCABC … the inhibitory connection onto active Ex units is enhanced through associative plasticity mechanisms, but not onto the inactive Ex units, thus the adaptation does not apply to the novel stimuli. While the approach seems fairly novel it is also speculative and seems to run contrary to the existing experimental data.

      We thank the reviewer for their feedback. In our view, through the process of answering the comments below, we have improved our manuscript substantially. To answer the reviewer’s comments, we added new figures to the manuscript (Figure 5) and (Figure 6-Figure Supplement 1 and Figure 1-Figure Supplement 1,5) and added new text in the Discussion (see ‘Timescales of plasticity mechanisms’, and ‘Robustness of the model’). In the following, we give detailed answers to each of the comments.

      1) The reason many models of adaptation focus on short-term synaptic plasticity (STP) as opposed to STDP is that the later generally is not thought to operate on the adaptation time scale of seconds. Specifically, STDP is generally considered to be a form of long-term associative plasticity, and thus to rely on mechanisms such as the insertion of new receptors-a processes that is unlikely to operate on a time scale of a second or so. Adaptation is robustly observed at 400 ms (e.g., Natan et al 2017), a time scale that is generally considered to be incompatible with STDP. E.g. in the D'Amour paper the authors cite, STDP is induced over a 5 minute pairing protocol, and can still increase over the course of 5-10 minutes post pairing (e.g., Fig 1H). I'm not aware of any evidence suggesting that iSTDP could be induced and expressed on the subsecond to a few seconds time scales. So this seems to be a fundamental issue that needs to addressed or at least discussed.

      Related to the point above the model also contains subtractive normalization implemented with a time step of 20 ms. Again if this normalization is critical to the model this assumption would pose a serious challenge to the model because there is little or no experimental data suggesting that normalization can operate at that time scale.

      In our new Figure 1-Figure Supplement 5 we find that the timescale of the subtractive normal- ization mechanism does not influence the generation of novelty responses in our model. Adaptation occurs over multiple timescales from hundreds of milliseconds to tens of seconds or even days [Ulanovsky et al., 2004, Lundstrom et al., 2010, Homann et al., 2017, Latimer et al., 2019, Haak et al., 2014, Ramaswami, 2014]. Our work shows that inhibitory plasticity can readily lead to adaptation on multiple timescales without the need for any additional assumptions. Although during the induction of inhibitory STDP it takes several minutes to sev- eral tens of minutes to achieve a stable baseline of inhibitory synaptic strength [D’amour and Froemke, 2015, Field et al., 2020], inhibitory postsynaptic currents increase significantly immediately after the induction of plasticity (see e.g. [D’amour and Froemke, 2015, Field et al., 2020]). Therefore, inhibitory synaptic strength seems to already change during the plasticity induction protocol. Hence, we propose that inhibitory STDP is a suitable, though clearly not the only, candidate to explain the generation of novelty responses and adaptive phenomena occurring over multiple timescales.

      Justification:

      The points raised by the reviewer are extremely valuable in critically evaluating the model. We first address the issue of the timescale of normalization.

      The generation of adapted and novelty responses in our model does not rely on the fast subtractive normal- ization mechanism, since the normalization only affects the excitatory and not the inhibitory weights; and it is the change in inhibitory synaptic weights through iSTDP that is the key mechanism to explain adapted and novelty responses. In Figure 1-Figure Supplement 5 we now show that even for a normalization time step of 50 seconds, adaptation to repeated stimuli and a novelty response occur. Therefore, we can conclude that the fast normalization mechanism is not a necessary ingredient in our model. We added a discussion at line 542 and in the Methods at line 749. Even if no normalization is applied throughout the entire stimulation paradigm, our findings do not change.

      We note that while normalization is common practice in circuit models, there is a discrepancy between the fast timescales of normalization mechanisms used in computational models to stabilize network dynamics and the much slower timescales measured experimentally ([Fox and Stryker, 2017, Turrigiano, 2017, Keck et al., 2017], among others). This discrepancy has been termed the ‘temporal paradox’ [Zenke and Gerstner, 2017, Zenke et al., 2017].

      Many computational models which implement normalization mechanisms justify them by the experimentally observed synaptic scaling despite the discrepancy between timescales (see e.g. Figure 1 in [Zenke et al., 2017]), which we now acknowledge at line 543. In related work, we propose a different biologically plausible candidate for fast homeostatic stabilization – heterosynaptic plasticity – which operates on similar timescales as homosy- naptic plasticity mechanisms [Field et al., 2020]. Incorporating this mechanism in addition to, or instead of, the fast normalization in recurrent networks is very interesting but beyond the scope of this work.

      Next, we address the issue of the timescale of iSTDP. We studied long-term iSTDP as a candidate mech- anism for the generation of adapted and novel responses for multiple reasons: (1) to explain adaptation over multiple timescales which range from hundreds of milliseconds to tens of seconds [Ulanovsky et al., 2004, Lundstrom et al., 2010, Homann et al., 2017, Latimer et al., 2019], and even multiple days in the case of habitu- ation [Haak et al., 2014, Ramaswami, 2014]. Rather than including STP mechanisms that operate over all those different timescales, we demonstrate that iSTDP is a straightforward mechanism to bridge different timescales without the need of multiple mechanisms or fine-tuning of parameters (Figure 4). (2) We were inspired by the growing experimental literature suggesting an important role of inhibition and inhibitory plasticity in adaptive phenomena (see Discussion subsection “Inhibitory plasticity as an adaptive mechanism”). (3) In computational models, iSTDP is usually studied in the context of balancing excitation [Sprekeler, 2017]. In our study, we present functional consequences of inhibitory plasticity.

      Although inhibitory plasticity is indeed induced over several minutes in pairing experiments [Field et al., 2020, D’amour and Froemke, 2015], inhibitory postsynaptic currents are already increased directly after plasticity in- duction – though it takes additionally several minutes to reach a stable new baseline (e.g. Figure 2A,B in [Field et al., 2020]). For example, the mean increase of inhibitory synaptic strength right after plasticity in- duction in [Field et al., 2020] is approximately 30-50%, while a new stable baseline at about 80-100% increase is reached after approximately 20 minutes (Figure 2A in [Field et al., 2020]; similar results in Figure 1H,I in [D’amour and Froemke, 2015]). This suggests that significant changes of inhibitory synaptic strength in iSTDP experiments already occur while the plasticity induction protocol is still ongoing. How fast these plasticity mech- anisms act in an in vivo setting during stimulation with naturalistic stimuli is to our knowledge not known. In general, the question of the true timescale of iSTDP is still an open problem [Sprekeler, 2017].

      We also point out that the inhibitory synaptic weight changes induced via iSTDP are rather small in our model, i.e. Figure 4C shows that the mean inhibitory synaptic weights onto the adapted excitatory population increase approximately 15-20%. Therefore, we propose that relatively small inhibitory weight changes are suffi- cient for the occurrence of a novelty response and these weight changes might already be happening during the paring protocol in experiments, as we argue above.

      Although we agree with the reviewer that short-term plasticity mechanisms are an important aspect to understand adaptation phenomena (especially on short timescales), we would not a priori exclude iSTDP only based on the argument of timescales. To get a full understanding of adaptive phenomena on all timescales, more detailed experimental and theoretical studies are needed to investigate the role of short versus long-term plasticity of excitatory and inhibitory synapses and how these mechanisms interact in a recurrent circuit.

      Modifications:

      In the new Figure 1-Figure Supplement 5 we study the effect of the timescale of subtractive normalization on the occurrence of a novelty response. In Figure 4-Figure Supplement 2 we quantify the response amplitude and the decay time constant of the novelty response as a function of the learning rate of inhibitory plasticity (see also our response to comment 1 of reviewer 2). Indeed, we find that fast inhibitory plasticity is needed to detect a novelty response. We discuss the mismatch between timescales of homeostatic plasticity in theoretical and experimental studies in lines 543 and 739. Additionally, we added new text in the Discussion subsections ‘Timescales of plasticity mechanism’ (line 509) and ‘Robustness of the model’ (line 533) where we discuss the timescale of inhibitory plasticity and subtractive normalization.

      A further issue relates to the temporal structure of adaptation. The authors show that adaption is independent of the sequence of the stimuli (ABCABC vs BACCBA) (it would be best to refer to this as sequential structure not temporal structure, which would often include the duration of stimuli and interval between stimuli). It is well established that the longer the interstimulus interval the less the adaptation. The model may or may not capture this effect depending on the assumptions regarding the spontaneous activity during the ISI as a result of the non-associative (pre-only) iLTD. However, given that STDP generally grows after induction it seems like the model is not likely to capture the standard observation that adapation should be less if the stimuli are presented with an ISI of 800 ms versus 400 ms. In figure 5, for example what happens if stimuli are presented for 20 seconds consecutively versus for 10 second then a silence of 2 seconds before another 10 second stimulation? No mention of the time course of spontaneous recovery from adaptation is made.

      Recovery from adaptation depends on the background activity level in the network during the inter- stimulus interval (Figure 5 and Figure 6-Figure Supplement 1). Specifically, low background activity between stimulus presentations slows the recovery from adaptation (Figure 6-Figure Supplement 1). However, increasing background activity between stimulus presentations can capture the decreased adaptation as the inter-stimulus interval increases (we show this in our new Figure 5).

      Justification:

      We agree with the reviewer that the term ‘temporal structure’ is misleading, and therefore exchanged it with the term ‘sequence structure’ in our manuscript (see for e.g. line 228).

      As the reviewer aptly predicted, the recovery of the response from adaptation indeed depends on the level of background activity between two stimulus presentations. In our model, the direction of inhibitory weight change (iLTD or iLTP) depends on the firing rate of the postsynaptic excitatory cells (see [Vogels et al., 2011]). Postsynaptic firing rates above a ‘target firing rate’ will on average lead to iLTP, while postsynaptic firing rates above the target firing rate will lead to iLTD. In turn, the average magnitude of inhibitory weight change depends on the firing rate of the presynaptic inhibitory neurons (see [Vogels et al., 2011]). Therefore, if the background activity between two stimulus presentation in our model is very low, recovery from adaptation only happens on a very slow timescale. To show this, we performed simulations similar to Figure 6 where a stimulus (A) was presented again after a pause of either 9 seconds (Figure 6-Figure Supplement 1A) or after 225 seconds (Figure 6-Figure Supplement 1B). Whereas the response to the stimulus was still adapted after 9 seconds, it fully recovers after more than 200 seconds. As expected, the stimulus-specific inhibitory weights decreased very slowly after stimulus presentation (Figure 6-Figure Supplement 1A, B; bottom). This slow decrease of inhibitory weights follows from the fact that the network is silent if no stimulus is being presented. We now discuss this result in line 357.

      However, if the background activity in the inter-stimulus interval is higher (either because of a higher back- ground firing rate or because of evoked activity from other sources, for example other stimuli), the adapted stimulus can recover faster. To address how such elevated background activity can affect adaptation to a specific stimulus, we performed additional simulations (Figure 5), in which we used the experimental paradigm from Figure 1A. Similar to Figure 2C, we changed the number of stimuli in the sequence, which leads to different inter-repetition intervals (the interval until the same stimulus is presented again) of a repeated sequence stimu- lus. For example, if two repeated stimuli (A, B) are presented, the inter-repetition interval for each stimulus is 300 ms apart because each stimulus is presented for 300 ms. If four repeated stimuli are presented (A, B, C, D), the inter-repetition interval for each stimulus is 900 ms. Importantly, this means that in the time between the presentation of the same stimulus, the network is not silent (as in Figure 6-Figure Supplement 1), but active because other stimuli in the sequence are presented. We defined the adaptation level as the difference of the onset population rate, measured at the onset of the stimulation, and the baseline rate, measured shortly before the presentation of a novel stimulus. We found that an increase in the inter-repetition interval reduced the adaptation level of the excitatory population (Figure 5A, D) due to a decrease of inhibitory synaptic strength onto stimulus-specific assemblies (Figure 5B, E). Therefore, we conclude that our model can capture the reduced adaptation for longer inter-repetition intervals when background activity in the inter-repetition interval is elevated, in this case because of the presentation of other stimuli.

      Modifications:

      We replaced the term ‘temporal structure’ with the term ‘sequence structure’ in our manuscript (Results section “Stimulus periodicity in the sequence is not required for the generation of a novelty response”, line 211). We also included a new main figure to demonstrate the effect of varying the inter-repetition interval in the presence of evoked network activity from other stimuli (Figure 5), discussed in the new section ‘The adapted response depends on the interval between stimulus presentations‘” on line 306. Furthermore, we added Figure 6-Figure Supplement 1 to the manuscript and we discuss our findings in line 357 and line 469.

      3) It also does not seem like the model will capture recently reported effects such as the observation that optogenetic inactivation of inhibitory neurons during pulse n can actually increase adaptation to tone n+1 (Seay et al, 2020), indeed I believe the current model would make the opposite prediction

      Our model cannot capture the n+1 experiment in [Seay et al., 2020] because inactivation of inhibition will always increase the response to stimulus n (as in our disinhibitory experiment in Figure 7), hence decreasing adaptation.

      Justification:

      [Seay et al., 2020] measured several different types of short-term plasticity in the auditory cortex at synapses involving two different types of inhibitory interneurons: strong feedforward short-term depression onto PV interneurons and from PV to pyramidal neurons, as well as strong feedforward short-term facilitation onto SST interneurons and very weak short-term facilitation from SST to pyramidal neurons. We believe that including different interneuron types, as well as short-term dynamics of the respective synapses, might be nec- essary to explain this phenomenon. Indeed, using the experimentally observed short-term plasticity in a model, [Seay et al., 2020] showed that inactivation of PV interneurons can decrease the response, hence increase adap- tation, to the next (n+1) tone. Since our model includes a single type of inhibitory interneuron and implements long-term inhibitory plasticity rather than short-term plasticity, we are not surprised that our model cannot capture the increased adaptation in the very specific n+1 experiment. As we acknowledge in the manuscript, our proposed mechanism is not the only candidate to explain the generation of novelty responses and adaptive phenomena in the brain, and likely interacts with other types of plasticity and cell type dynamics.

      Modifications:

      We agree with the reviewer that the findings of [Seay et al., 2020] need to be discussed in context of our manuscript (see also our response to comment 9 of reviewer 1). Therefore, we added the reference and discuss it at line 499.

      4) In its current state I don't think (I may be mistaken) the model accounts for a related and very general property of auditory cortex: lateral inhibition (e.g., Brosch and Schreiner, 1997; Phillips, Schreiner, Hasenstaub, 2017).

      The term ‘lateral inhibition’ seems to have a somewhat different meaning in visual versus audi- tory cortex. In the visual cortex it is often considered as a spatial form of inhibition, while in the auditory cortex it also includes a temporal aspect. Since our model was mostly inspired by data in the visual cortex [Homann et al., 2017], we will answer the question in the context of the visual cortex, but also speculate about the auditory cortex.

      Justification:

      In the visual cortex, lateral inhibition is often defined in a ‘spatial’ manner whereby activated pyramidal neurons reduce the activity of their neighbors. Specifically, SOM-mediated spatial lateral inhibition contributes to surround suppression in visual cortex [Adesnik et al., 2012]. Our model already implements a form of spatial lateral inhibition. Based on experimental data (see e.g. [Harris and Mrsic-Flogel, 2013] for a review), we modeled inhibitory neurons as more broadly tuned than excitatory neurons, such that a single inhibitory neuron is more likely to be driven by multiple external stimuli (probability 15%) than a single exci- tatory neuron (probability 5%). During stimulus presentation, as inhibitory plasticity adjusts the strength of inhibitory-to-excitatory synapses, an inhibitory neuron with a given stimulus selectivity will likely strengthen synapses to multiple excitatory neurons selective to different stimuli – hence implementing spatial lateral inhi- bition.

      In the auditory cortex, lateral inhibition is often referred to as ‘forward suppression’ (or ‘forward masking’) [Brosch and Schreiner, 1997, Phillips et al., 2017]. Here, a preceding ‘masker stimulus’ influences the response of a probe stimulus. This influence depends on several factors, including the time difference between the masker and the probe stimulus, as well as the frequency of the pure tone masker stimulus [Brosch and Schreiner, 1997]. The time differences measured in these experiments are usually too short to be captured by the inhibitory plasticity mechanism proposed in our model. Similar as our answer to comment 3, we suspect that capturing feedforward suppression requires short-term plasticity plasticity (for e.g. [Phillips et al., 2017]).

      Modifications:

      We now included additional text in the Methods to address the issue of spatial lateral inhibi- tion, see line 796 and we now mention the phenomenon of forward masking in line 504.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript the authors have addressed the structural reasons for the ability of fatty acyl-AMP ligases (FAAL) to exclude condensation of activated fatty acids with coenzyme-A and facilitate the reaction with other 4-phosphopantetheine linked acceptors. This issue is of significant interest with regard to understanding how certain fatty acids are channeled to specific metabolic fates. The structural issue is the apparent discrimination of the CoA moiety (adenosine 3',5'-bisphosphate) versus a holo-ACP tethered to the 4-phosphopantethein head group. The authors identified a number of probable issues between FAAL and FACL enzymes.

      1. The authors have shown that many of the FAAL enzymes lack the positively charged residues that have been shown previously to function in recognition of the CoA moiety (Figure 1a).

      Thank you for highlighting one of the strong reasons, “lack of positive selection”, as to why FAALs do not bind CoA.

      1. They have highlighted a number of residues within the putative binding site for the 4'-pantatheine moieties in the FAAL enzymes that likely preclude the binding of this portion of the substrate. They have subsequently mutated these residues in FAAL enzymes from three different organisms and have shown in certain instances that the mutated enzymes are now able to functionally activate CoA (Figure 2c). The authors, however, should have attempted to explain why some of the FAALs behaved differently than others. For example, why does the F284A/M233A mutant of MsFAAL32 function so differently than the corresponding mutant of RsFAAL? Also, the residue numbering is confusing to this reviewer. Thus, in Figures 1b and F1c the specific methionine and phenylalanine residues that are highlighted are labeled as M231 and F279, respectively. Yet in Figure 2 the methionine mutated is listed as M227 and the phenylalanine is listed as F275. Why is there a residue difference of 4?

      We thank the reviewer for raising this interesting point about the differential activity. Variable residues lining the CoA-binding site can influence the reaction in unpredictable ways, which possibly explains the observed differential activity. A detailed analysis has been presented as an answer to the essential review question 1 and a short description is added at page-12.

      We thank the reviewer for pointing out the issue with erroneous residue numbering of EcFAAL. The difference in residues is a typographical error and has been corrected in the figure.

      1. The authors have provided further support for the inability of the apparent canonical site in the FAAL enzymes to be functional by mutating the residues within the active sites of certain FACL enzymes to the bulkier ones found in the FAAL enzymes. Many of the constructs resulted in the loss of function and their ability to activate CoA. However, the loss of function was not uniform across the three FACL enzymes chosen, and the authors have done an insufficient job of explaining the differences. For example, the A276F/A232M mutant of AfFACL is devoid of CoA activity but the corresponding mutants of MtFACL13 and EcFACL are fully functional.

      As previously explained in the answer to question-2, multiple highly divergent residues lining the CoA-binding site in FACLs possibly explains the observed differential activity. A detailed analysis as an answer to the essential review question 2 has been presented and a short description is added at page-12.

      1. The authors have identified a putative alternative binding site for the 4'-pantatheine moiety using various computational searches that can apparently distinguish between CoA and holo-ACP (Figure 3). Mutations of residue within this newly identified pocket (Figure 4c) significantly diminishes the condensation with the ACP from E. coli.

      We thank you for highlighting the identification of the alternative site and its validation through structure-guided mutagenesis and biochemistry.

      Reviewer #2 (Public Review):

      This manuscript succeeds in experimentally establishing the rationale for a acyl-carrier protein (ACP) substrate specificity of the fatty acyl-AMP ligases. While these enzymes structurally resemble the CoA-dependent fatty acyl CoA-ligases (FACLs), the authors demonstrate that the FAALs use a novel binding site to accommodate the ACP substrate. The biochemical studies are solid and clear cut but the evolutionary analysis could be bolstered with additional bioinformatics analysis. With said analysis, this manuscript would contribute significantly to our current knowledge of distinct classes of enzymes that divert fatty acids to virulent lipids in mycobacteria.

      We appreciate the reviewer’s suggestion to include genomic neighborhoods of PKS/NRPS showing the presence of FAALs to strengthen the evolutionary analysis. A detailed sequence analysis has been performed and presented as answer to essential review question 3. The tabulation of the analysis is now presented as Supplementary table-III.

      Reviewer #3 (Public Review):

      This study that attempted to understand how an important family of enzymes (FAALs) involved in fatty acid activation and transfer can recognise one form or a substrate in preference to another. They were trying to identify how certain enzymes recognise the 4'-PP-SH arm of CoASH is not recognised but the same arm attached to a small acyl carrier protein (ACP) is preferred. The author used detailed analysis of crystal structures in the PDB of many examples of ligand-bound ANLs, sequence analysis and molecular modelling to direct site directed mutagenesis of the FAALs and reveal important elements in the enzyme involved in substrate recognition and discriminaton. This is a major strength of the paper. e.g. SFig. 7. There is also a nice evolution discussion about the origin of the ANL family.

      We thank the reviewer for highlighting the important mechanistic aspects that enable CoA discrimination in FAALs and their evolutionary conservation, which led to the proposition of parallel evolution of FAALs and FACLs.

      Once they have identified potential residues involved in CoSH or ACP-SH (holo) binding they make a number of mutants of various enzymes. These included EcFAAL, MsFAAL32, RsFAAL, MxFAAL, AfFACL etc. They appear to use a radioactive substrate assay (using hot FA) and measure incorporation with various gels which are scanned and counted. This is where I began to get lost and to me it is a major weakness. They compare WT with mutants including site directed mutants (made deletions of deltaFS1 etc) . I found Fig. 2 confusing. The gels are also confusing.

      We appreciate the reviewers’ suggestion to improve the description on how the experiment was performed along with the readouts in the form of gels or TLCs. The experimental section has been elaborated detailing different steps of the biochemical assays and provided as answer to essential review question 6. We have modified the legends of figure-2 and supplementary figure-6 labelled the figures with relevant information to make things clearer. We hope that these changes should address the concerns of the reviewer.

      There appears to be a problem with the apo- to holo-ACP conversion - why didn't Bs Sfp work to 100%? There is one mass spec analysis - why wasn't more used?

      We agree with the reviewers’ comment that apo- to holo- conversion was not complete as expected. Initially, we assumed complete conversion to holo-ACP form by BsSfp and used it for biochemical analysis using traditional CS-Urea PAGE, however it failed to show any differential migration. We then resorted to modifying experimental conditions to improve the holo-conversion including cloning the BsSfp afresh and further check its efficiency using Coomassie stained CS-Urea-PAGE. It was only after we used MALDI-TOF, we realized that at least the only ACP that could be detected had ~50-60% conversion. As the remaining ACPs failed to fly and get detected, we have not presented the analysis for the remaining. Despite extensive efforts, we could not see any improvement in the conversion process. So, the reduced conversion efficiency appears to be a protein specific phenomenon and need further investigation.

      The reaction is not clearly drawn to begin with - I work in this area. Be clear in the steps. Draw out the FA + MgATP to give acyl-adenylate + PPi. Then, add CoASH or ACP-SH and show formation of acyl-CoA + AMP or acyl-ACP + AMP.

      We thank the reviewer for the suggestion and have now included the reaction schematic as supplementary figure-1b.

      There are many other assays to measure activity e.g. PPi coupled or measure AMP formation to back up data from the radioactivity.

      We thank the reviewer for the suggestion, and we take this the opportunity to explain our choice of methodology for assessing the biochemical reactions. The coupled assays typically measure the PPi release or AMP, which in turn rely on other enzyme systems and are excellent means to assess the first step of the reaction directly. However, it is an indirect approach to use these for measuring the second step of the reaction. Particularly, in instances where we need to quantify the efficiency of the second step mutants, which are not affected in the first step, it cannot be ascertained using how much PPi or AMP is formed. We agree that mass-spectrometry is another approach for these systems but frequent access on a continuous basis was not feasible. Therefore, we chose an approach that allows us to see the products, acyl-AMP and acyl-CoA/acyl-ACP, directly via a TLC or Urea-PAGE. The potential hazard to this approach of course is the usage of radioactivity.

      Figure 4 confuses me a lot. As does Supp Fig 6.

      The legends of figure-4 and Supplementary figure-6 are modified, and the figures appropriately labelled to include more information. We hope that these changes should address the concerns of the reviewer.

      It appears the authors are looking for a yes/no answer - active with CoASH or ACP-SH. A Table would help to summarise.

      We thank the reviewer for the suggestion and have now tabulated the results of gain of function and loss of function in FAALs and FACLs, respectively, as a supplementary table-V.

    1. Author Response:

      Reviewer #1:

      Weaknesses:

      Although the BOLD data is highly spatially specific, there is just one electrophysiological timeseries per subject. This is no doubt a bi-product of the extensive noise cancellation that is necessary to record within the scanner. The caveat therefore is that the covarying BOLD and electrophysiological changes may derive from different regions.

      We recognize this is a limitation which is also not easily solved by approaches for source analysis, given the nature of the data (only 64 channels) and the usually larger imprecisions related to EEG source reconstruction. We circumvented this by choosing a task that is known from previous studies in MEG to induce changes in multiple frequency bands originating from regions the early visual cortex (Hoogenboom et al., 2006; Hoogenboom et al., 2010; Koch et al., 2009; Muthukumaraswamy and Singh, 2013). Furthermore, the EEG responses are highly similar to invasive recordings in animals from visual regions in the context tasks investigating selective attention (Fries et al., 2008). We mention this limitation now in the introduction (lines 102-111).

      The analysis methods are slightly non-standard, perhaps for good reason. The main thing that stands out is the use of correlation coefficients, rather than regression coefficients, at the first level of analysis. This could potentially conflate changes in signal with changes in noise or unexplained variance.

      We chose here for the correlation, since in our opinion this leads to a more interpretable measure of linear association than a regression slope. A regression slope-based analysis will yield different outcomes for the regression of y on x, than for x on y, doubling the number of analyses needed. The different results for a regression of y on x and x on y are often interpreted as implying directionality, which is not warranted and not what we would like to imply with our analysis. The asymmetry is caused by the implicit assumption that x does not contain noise in a regression of y on x. This is valid when x represents a paradigm condition vector, but not when it is a data vector. We therefore opted to use the difference in (Fisher-z-transformed) correlation as our estimate for linear association/connectivity between laminar fMRI signals.

      In both a correlation as well as in a regression approach differences can be attributed to differences in true underlying coupling andin a difference in noise. This is however not different for correlation-based measures in coupling in fMRI than in for instance coupling measures like coherence and phase-locking-factor in electrophysiology. Coherence can be regarded as the frequency domain version of (squared) correlation. The fact that our measure might indeed be related to differences in noise would therefore not be resolved by opting for a regression based approach, and is not different from often used measures of coupling in electrophysiology.

      Reviewer #2 (Public Review):

      Introduction. The introduction provides an overoptimistic view on the current possibilities with respect to the investigation of layer-specific activation or connectivity in the living human brain. Cortical layers cannot yet be segmented, the fMRI measures only provide an indirect signal that is heavily influenced by partial voluming between cortical depths, and EEG and MEG approaches often only measure two compartments due to low spatial resolution. The introduction, however, gives the impression that layer-specific neuronal connectivity can precisely be measured in the living human brain, which is not the case. The authors should take considerably more care with respect to how they introduce the methodology with clear references to the limitations. Also, statements such as "laminar fMRI allows us to study connectivity.." should be removed. In the same vein, I would suggest to replace laminar fMRI and laminar connectivity with cortical depthdependent fMRI and connectivity to account for the above mentioned aspects.

      In laminar fMRI research it is commonly accepted that what we measure are not true layers, but depth dependent fMRI between the boundaries of white/gray matter and gray matter/CSF. For the general audience we will make this distinction clearer and discuss the limitations of the technique (lines 74-80).

      Concept. Whereas the authors provide a model in the introduction that specifies how different frequency bands could relate to cortical depth-dependent connectivity, they do not develop a working hypotheses based on their experimental design. One conceptual step is therefore missing in the introduction, which has to combine present knowledge on the relationship between different frequency bands and present knowledge on how attention influences frequency-specific activation in the visual system to then make statements about which analyses can be performed to test which aspect of the model.

      The primary focus of our study was to investigate how oscillations across several frequency bands in the EEG relate to laminar specific activity. Recent publications on laminar fMRI have demonstrated the possibility of performing laminar level fMRI connectivity analyses, which led us to revisit our previously recorded data in order to explore whether not only laminar specific BOLD amplitude but also laminar fMRI connectivity relates to frequency specific EEG power. Since laminar fMRI, and especially connectivity derived from those measures is very novel, we started this analysis without a preconceived model or notion on how this relation would be. The results from this project should therefore be interpreted as an exploration of how these laminar fMRI derived connectivity measures relate to neural oscillations rather than directly addressing a specific cognitive process like selective attention, or prediction and/or a model of how neural oscillations play a role in these processes. Our experimental paradigm was also not designed to address such processes. We chose a paradigm that is known from previous studies using MEG and EEG to induce changes in multiple frequency bands in the early visual cortex (Hoogenboom et al., 2006; Hoogenboom et al., 2010; Koch et al., 2009; Muthukumaraswamy and Singh, 2013). Furthermore, the EEG responses are highly similar to invasive recordings in animals from visual regions in the context tasks investigating selective attention (Fries et al., 2008). The crude attention/task modulation added to the paradigm (attention On versus Off) was in the first place introduced to induce meaningful variation over subjects in a task effect across the frequency bands modulated by visual stimulation. It was not intended to investigate specific individual processes such as prediction, attention or arousal. The observed effects can therefore also not be ascribed to such specific processes, since they are co-modulated by the task. We will make this more clear in the introduction now. We make this point now explicitly in the introduction.

      • Concept & Methods: With respect to both the concept and the analyses, what is missing is taking into considerations the brain areas that were investigated. Wheres in the abstract the authors only mention "within brain region connectivity" and "between brain region connectivity" also in the Methods section there is no clear relation to the anatomical areas that were investigated, being V1, V2 and V3. The authors rather classify the areas as "high level" and "low level" where V2 is sometimes classified as high-level and sometimes as low-level. The data are therefore not investigated with reference to the anatomy of the visual system. In my view, it would be beneficial if all analyses could be performed with respect specifically to V1-V2 connecitivity and V2-V3 connectivity as well as V1-V3 connectivity so that the specific anatomical interrelations are taken into account. Also, the authors should develop a conceptual framework of how layer-specific attention-driven connectivity changes should influence the visual cortex, and why.

      In the results for between region connectivity we averaged over several connection pairs (V1- V2,V1-V3,V2-V3) and for within region connectivity across regions (V1-V1,V2-V2,V3-V3) before effects in connectivity were correlated with EEG power. There are several reasons why we opted for this approach: First, we wanted to maximally increase the statistical power to observe patterns of association between laminar connectivity and EEG power. Since the analyses as carried out here have not previously been performed, we had no estimate of effect size. Secondly, by averaging over region combinations we drastically limit the multiple comparisons problem, since the number of comparisons scales with the square of the number of regions connectivity is computed between. Third, by averaging over regions, we target more general effects of connectivity between and within regions that are more likely to correspond to patterns observed within other contexts and other modalities. The effect for individual region combinations would likely be more variable.

      For completeness in the first submission we did include the results for every single region combination in the supplementary material (see Supporting Figures S2-S5). We have now included in the main document the results for region combinations V1-V2,V1-V3 and V2-V3 for between region connectivity, and V1-V1, V2-V2 &V3-V3 for within region connectivity, presented alongside the results for the grand average.

      The results for the individual region-pairs suggest that inter- and intra-region connectivity are generally consistent with the average over individual region combinations, but also have unique features.

      Similarities include: A strong negative correlation between beta power and deep-to-deep layer coupling was observed for average inter-regional connectivity. In line with this, for all three individual region pairs (V1-V2,V1-V3,V2-V3) a negative correlation is observed for deep-to-deep layer coupling. Similar patterns can be observed from alpha and beta for intra regionalconnectivity (averaged over all regions) and connectivity within V1,V2 and V3 in isolation.

      Individual features include: The relation between beta and inter-regional coupling shows variation over the individual region-pairs. In particular for V2-V3 connectivity, but also for V1-V2 the relation seems to differ from the pattern observed on average. For V2-V3, deep layer V3 seems to be coupled to both deep and superficial layers in V2, a pattern that might reflect anatomical feedback projections that go from deep layer V3 to both deep and superficial layers in V2.The stronger correlation between deep V1 and more middle deep V2 is however harder to directly place, since direct anatomical connections here are largely absent here. It might therefore reflect an indirect effect.

      Despite some degree of individual variation we think the overall picture is largely consistent. The strongest features present in the averaged results can clearly be observed in each of the individual region-combinations as opposed to the latter being a collection of vastly different random patterns that happen to add up to the average result (see for example the intraregional alpha results).

      With respect to our classification of regions into higher and lower level cortical regions, we based on standard anatomical hierarchies like that of van Felleman & van Essen (Felleman and Van Essen, 1991). Here, V1, V2 and V3 are ordered from low to higher in the visual cortical hierarchy.

      Methods. Given the missing conceptual overview over how attention-induced changes in EEG frequency bands should influence laminar connectivity in the visual system, also the methods lack a clear analyses strategy. The authors computed one correlation between power level of different frequency bands and connectivity between different brain areas without providing an explanation of which question this analysis addresses. The offered results therefore seem random to me, without a clear relationship to an investigated hypothesis.

      The primary focus of our study was to investigate how oscillations across several frequency bands in the EEG relate to laminar specific activity. Recent publications on laminar fMRI have demonstrated the possibility of performing laminar level fMRI connectivity analyses (Sharoh et al., 2019; Huber et al., 2017; Huber et al., 2020), which led us to revisit our previously recorded data in order to explore whether not only laminar specific BOLD amplitude but also laminar fMRI connectivity relates to frequency specific EEG power. Since laminar fMRI and especially connectivity measures derived from it are very novel, we started this analysis without a preconceived model or notion on how this relation would be. The results from this project should therefore be interpreted as an exploration of how these laminar fMRI derived connectivity measures relate to neural oscillations rather than directly addressing a specific cognitive process like selective attention, or prediction and/or a model of how neural oscillations play a role in these processes. Our experimental paradigm was also not designed to address such processes and test hypotheses derived from these. The primary focus of the work presented here is to provide a first insight in how neural oscillations measured by electrophysiological measures relate to cortical depth resolved fMRI coupling, which is usually correlation based. We believe these results will be relevant for research focused on how neural oscillations relate to inter-and intra regional interactions (e.g. (Bastos et al., 2012)(Fries, 2015)), since depth resolved fMRI allows us to study laminar interactions within and between brain regions non-invasively in humans. For this it is important to know if and how neural oscillations relate to laminar fMRI based connectivity measures, of which our research here provides a first insight. It also provides insight into which neural processes underlie observed changes in laminar fMRI based coupling, and is therefore relevant for research using such methods in general.

      Methods. The authors mention that they only analyzed the strongest two connecting vertices within a layer, which was done to improve SNR. In my view, for a connectivity analyses, this is not valid, as it can bias the effect towards superficial connectivity where the SNR and thus correlation is always higher.

      We did not analyze vertex pairs within a layer. We computed vertex pairs that connect the boundary between gray matter and CSF with the boundary of gray matter and white matter based on a high resolution anatomical MRI scan. Between these vertices we sampled 21 points of functional fMRI data using nearest neighbour interpolation. Since not all parts of V1, V2 and V3 will be involved in the task, we selected the most activated vertex pairs for further analysis. This serves as a localizer to select the parts within a region where task related activation is observed. For the main analysis the top 10% activated vertex pairs were chosen based on data collapsed across all depths and all attention conditions. This selection is therefore independent of depth, task condition, and the relation with any EEG feature. For this procedure we actually excluded the top five depth bins to avoid being too biased to superficial depths since it is known that signal to noise is substantially better near the surface of the cortex in part due to larger pial veins. To investigate whether the observed results are not due to this arbitrary threshold of 10%, we repeated the analyses for top 5% and 25% activated vertex pairs, the results of which are included in the supplementary information.

      Methods. The authors report 21 correlations in cortical depth, where their resolution allows to only sample perhaps 2-3 data points. The correlation analyses are therefore oversampled, which influences the statistical results. I would suggest to first run a component analyses across cortical depth, and to then correlate independent components to one another to investigate independent data points.

      The correlations are not oversampled, since the correlations used for the connectivity analyses are over trials, and not over space. These analyses are not influenced by the number of laminar BOLD data points we sample. Furthermore, spatial supersampling is a very common practice in FMRI research. For instance, the default in SPM is to upsample 3 mm isotropic standard voxel (very common for initial acquisition) size to 2 mm isotropic voxel size. In laminar fMRI laminar signals are often upsampled up to several factors above the the original resolution. This is for a number of reasons, well outlined on the laminar fMRI community website, a resource maintained by L. Huber in collaboration with many layer fMRI labs (see: https://layerfmri.com/2019/02/22/how-many-layers-should-i-reconstruct/) and ~20 layers is thought to be optimal.

      For our statistical test we explicitly chose a non-parametric cluster based technique to correct for multiple comparisons that takes dependencies across space into account. Laminar fMRI data are not well suited to decompose into components using techniques like PCA and ICA, since they violate assumptions of orthogonality/independence of the underlying responses in both the spatial as well as the temporal dimensions. To illustrate: in a recent laminar connectivity methods review an hierarchical, iterative ICA approach resulted in data being split up in columnar maps rather than laminar ones (Huber et al., 2020).

      Methods. The authors refer to their previously published paper with respect to the methods, and do not give any speficiations on the image sequence, image resolution, and image processing in this paper. In my view, all basic methodological steps that are critical to understand the paper should be described here.

      We are willing to include all relevant parts of the methodology described in our previous paper. This would involve copying large parts of the methods section, and might have to be coordinated with the publisher of the previous publication for copyright reasons. We would be pleased if the editor could advise us on this issue.

      Results. The figure captions are too short and do not explain the presented data in an appropriate way. In Figure 1, details on the calculated contrasts, number of participants investigated, sampling and analyses methods should be given that allows interpreting the data. Also, it would be beneficial to explain the attention paradigm in a bit more detail in the figure caption so that panel A can be interpreted. In Figure 3, more details should be given on what data are shown, particularly for panel C where the only information given is "attention effect on laminar connectivity" with no further axes labels.

      We extended the figure captions in the revised article.

      Results. I do not fully understand the results as shown in Figure 3. As those form the major part of the manuscript, this needs revision. As said before, I think that the figure and results section would benefit from region-specific data analyses and presentation, but also clear axes labels are needed to allow interpretation of the data. Also, when I interpret the data correctly, correlations are done for altogether 21 different cortical depth, which would not be valid because of artificially inflating the number of correlations, as pointed out above.

      We have extended our analyses and now split original Figure 3 up into current Figures 3 and 4 where we separately depict the results for intra- and inter-regional connectivity. For both intraand inter regional connectivity we have now also shown the region-specific results that underlie these results. We updated the figures and captions to make clearer what is depicted. We addressed the point raised about the 21 data points above. It is not relevant for the analysis presented here.

      Reviewer #3 (Public Review):

      However, a weakness of the technique as currently presented is that patterns of connectivity are only related to oscillations across subjects. It would be more powerful to examine whether the current network state (estimated by trial-by-trial power estimates) relates to laminar connectivity within subject. This would indeed speak to the nature of neuronal communication, which takes place on a moment-to-moment time scale, and which is not reflected in the current analysis. This may explain why laminar patterns of fMRI connectivity were not found to correlate with gammaband oscillatory activity. In addition, the negative effects of attention on fMRI connectivity itself are somewhat puzzling. This may related to the limitations of the task design which do not perfectly separate attention vs. arousal/expectation, as the authors readily discuss.

      The reviewer suggests that a relationship between fMRI connectivity and EEG power within subjects over trials would be more indicative of a direct link between connectivity and neural communication. We agree that establishing such a link would further strengthen the link between neural oscillations and laminar connectivity. This would not be trivial however, since connectivity in (laminar) fMRI is typically expressed as a measure of linear association (e.g. correlation or regression slope) over trials or time. Even at conventional spatial resolution, single trial/time point estimates of the network state are rarely used. These single data-point measures usually indicate to what extent a single data point contributes to the measure over all data points. We did not opt for such an analysis, since such analyses in normal (e.g. resting state) fMRI studies are uncommon, introducing more complexity to a study that already includes considerable novel analytic approaches. Furthermore, research relating fMRI activation and connectivity across subjects with other variables(e.g. clinical test scores, DTI measures, personality traits) is a well established procedure. Here we followed this more common approach.

    1. Author Response:

      Reviewer #3:

      Weaknesses:

      Previously it was suggested that mitochondrial biogenesis was increased with increased levels of GJA1-20k. Is this a difference in the cellular model (HEK) and do the changes in cell culture accurately recapitulate the changes seen in animals?

      The Reviewer is correct that GJA1-20k did not alter the mitochondrial biogenesis in HEK293 cells (Figure 1–figure supplement 2) whereas AAV9-transduced adult cardiomyocytes showed increased mitochondrial DNA copy number (Figure 1–figure supplement 2C), consistent with our previous study (Basheer et al., JCI insight, 2018). We expect that increased mitochondrial biogenesis is a function of chronic GJA1-20k overexpression in vivo, and thus a separate phenomenon from the acute mitochondrial fission which occurs within one minute of GJA1-20k accumulation around a mitochondrion (Figure 4). The HEK cell line, in which overexpressed GJA1-20k is present for a much shorter time, does not induce mitochondrial biogenesis (Figure 1–figure supplement 2), and thus is an excellent cellular model in which we can study GJA1-20k induced fission.

      The revised manuscript has been modified to include the above new data (Figure 1–figure supplement 2) and discussion:

      —Results section (lines 121 – 129): Previously we reported that GJA1-20k is involved in mitochondrial biogenesis (Basheer, Fu et al. 2018). Consistent with our previous study, AAV9-transduced adult cardiomyocytes showed increased mitochondrial DNA copy number and GJA1-20k deficient mice (Gja1M213L/M213L) had decreased copy number. However, exogenous GJA1-20k did not alter the mitochondrial biogenesis in HEK293 cells. Nor did exogenous GJA1-20k affect membrane potential or baseline ATP production (Figure 1–figure supplement 2A–C). In addition to mitochondrial DNA copy number, neither biogenesis nor mitophagy protein markers were altered in either GJA1-20k transfected HEK293 cells or Gja1M213L/M213L mouse hearts (Figure 1–figure supplement 2D – G).

      —Discussion section (lines 289 – 292): Yet the presence of GJA1-20k, while inducing mitochondrial fission and smaller mitochondria (Figure 1, 3 and 4), does not either reduce MFN1 or MFN2, activate DRP1, change membrane potential, ATP production, mitochondrial biogenesis, or mitophagy (Figure 2; Figure 1 – figure supplement 2).

      Mdivi-1 is not a selective Drp1 inhibitor. It is a Complex I inhibitor, leading to unintended changes in mitochondrial dynamics in response to ETC stress. Rather than Mdivi-1, a dominant negative Drp1 mutant K38A could be overexpressed to see whether this prevents GJA1-20k-mediated fission. If it still goes through, then I agree that Drp1 is not involved at all.

      We appreciate Reviewer #3’s thoughtful suggestion and, in this revised manuscript, we studied mitochondrial morphology in the presence of K38A. As seen in Figure 2C and D of the revised manuscript, K38A elongated mitochondria, as expected from inhibited Drp1 mediated fission. However, despite Drp1 inhibition by K38A, in the presence of GJA1-20k, mitochondria remain small, further supporting that GJA1-20k-mediated fission is DRP1-independent.

      —Results section (lines 140 – 150): To further investigate whether GJA1-20k induced reduction in mitochondrial size is dependent on DRP1, we analyzed mitochondrial morphology after inhibiting DRP1 by performing siRNA- mediated DRP1 knock-down (Figure 2—figure supplement 1A–C) or transfecting DRP1 dominant negative mutant (K38A), all with or without GJA1-20k transfection. With either method of DRP1 inhibition, the average area of individual mitochondria increased, consistent with inhibiting canonical fission (Figure 2C, D). In addition, K38A has more pronounced DRP1 inhibition which resulted in greater mitochondrial enlargement than siDRP1 (Figure 2C, D; Figure 2—figure supplement 1F). However, GJA1-20k acts epistatically to DRP1 loss or interference and prevents DRP1-mediated mitochondrial enlargement (Figure 2C–F; Figure 2— figure supplement 1B, C), indicating GJA1-20k can act at or downstream of DRP1.

      For the kinetics studies (see Fig 4), I think it is important to measure the timing of the actin recruitment and eventual fission when Drp1 is knocked down and/or when a DN mutant (K38A) is involved. Again, I do not trust the chemical inhibitor (Mdivi-1) data since this does not inhibit Drp1 activity.

      We would like to thank Reviewer #3 for suggesting we use an additional method of inhibiting Drp1. We analyzed real time actin dynamics under direct DRP1 knock-down. As seen in Mdivi-1 treatment, GJA1- 20k accumulated and then actin assembled around mitochondria and induced fission under DRP1 knockdown (Figure 4 and Video 1 of revised manuscript). The kinetic parameters of fission were also similar between Drp1 knockdown and Mdivi-1 treatment. The original Figure 4 and Video 1 and 2 have been moved to Figure 4–figure supplement 1 and Video 2 and 3, respectively, in order to accommodate the new Drp1 knockdown data (Figure 4 and Video 1).

      The revised manuscript has been modified to include the above new data (Figure 4; Video 1):

      —Results section (lines 198 – 219): Simultaneous use of fluorescently labelled actin, GJA1-20k, and mitochondria in live cells permit real time imaging of mitochondrial fission events at actin assembly sites. As seen in Video 1 and Figure 4B, GJA1-20k recruits actin to mitochondria, which results in fission. In Video 1, the actin network can be seen to develop around mitochondria and, coinciding with GJA1-20k intensity, forms an increasingly tight band across a mitochondrion which, within one minute, results in mitochondrial fission. The imaging in the bottom row of Figure 4B, and in the right column of Video 1 were obtained by multiplying GJA1-20k signal with actin signal, highlighting the locations at which GJA1-20k and actin are coincident. The respective line-scan profiles in Figure 4C indicate that mitochondrial fission occurs at points where the product of GJA1-20k and actin is the highest. Following accumulation of GJA1-20k and actin (red lines) at these points, a drop in mitochondrial signal (blue lines) is apparent when fission occurs. Fission (low point of blue lines) occurs approximately 45 seconds after co-accumulation of GJA1-20k and actin (high point of red lines, Figure 4C). Time to fission was computed from the time of peak GJA1-20k and actin intensity product, to the time of mitochondrial signal being reduced to background (Figure 4D–F). Statistically, this time to fission occurred at a median of 45 seconds, with a standard deviation of 11 seconds (Figure 4G). Note, the real time imaging shown in Video 1, and Figure 4 were performed under siDRP1. Therefore, the mitochondrial fission induced by cooperation between GJA1-20k and actin can be independent of canonical DRP1-mediated fission. To rule out inadvertent bias by siRNA, we used pharmacologic Mdivi-1 to inhibit DRP1 and, similar to the use of DRP1 siRNA, actin formed around mitochondria at GJA1-20k sites (Figure 4—figure supplement 1A–D) and fission occurred within a similar timescale (Video 2 and 3; Figure 4— figure supplement 1E–H).

      The assessment of the impact of ischemic stress with the heterozygous animal (M213L/WT) is hard to interpret. How reduced is the expression of GJA1-20k in these animals and how is mitochondrial function impacted based on Seahorse analysis? The mitochondrial morphology is not altered in these animals, so would mitochondrial function be largely unchanged as well? It is not clear how much GJA1-20k is needed to observe changes in mitochondrial shape and function, and comparisons with the homozygous mutant (M213L/M213L) are not the same, making it difficult to resolve the interpretation of these data.

      We appreciate Reviewer #3’s thoughtful and valuable comments. We previously reported that the heterozygous mutant (M213L/WT) expresses approximately half of GJA1-20k compared to WT (Figure 1 in Xiao and Shimura et al., J Clin Invest, 2020). Unfortunately, homozygous mutants die before adulthood, preventing effective comparison of GJA1-20k content on mitochondrial function in adult cardiomyocytes. To compare the impact of the amount of endogenous GJA1-20k on mitochondrial function, we added seahorse data from heterozygous neonatal CMs (Figure 5 C, D) and compared these data to seahorse data from neonatal cardiomyocytes from both wildtype and homozygous mutants. Even though there was no significant difference in mitochondrial size between WT and M213L/WT (Figure 5I, J; Figure 5–figure supplement 1A, B) under basal conditions, the seahorse OCR levels from M213L/WT myocytes is in between that of WT and homozygous (M213L/M213L) (Figure 5 C, D; Figure 5–figure supplement 1C) cardiomyocytes. Since GJA1-20k is a stress responsive peptide which increases under ischemic stress, in the present manuscript, we should like to emphasize that even a partial (50%) decrease in GJA1-20k expression induces mitochondrial fragility to oxidative stress. As shown in new Figure 5 I – L of the revised manuscript, the heterozygous mutant (M213L/WT) has more elongated mitochondria and a high distribution of damaged mitochondria post-I/R compared to WT, consistent with TTC staining, even with no change in mitochondrial size under basal conditions.

      The revised manuscript has been modified to include the above new data (Figure 5; Figure 5–figure supplement 1) and discussion:

      —Results section (lines 227 – 233) Similarly, maximal respiration is increased in neonatal CMs derived from GJA1-20k deficient Gja1M213L/M213L mice and maximal respiration for heterozygous Gja1M213L/WT mice is between that of WT and Gja1M213L/M213L (Figure 5C, D; Figure 5—figure supplement 1A, B). In addition, observing other OCR parameters, we found a decrease in ATP-linked respiration and reserve capacity in Gja1M213L/WT cardiomyocytes, and an increase in proton leak and non-mitochondrial respiration in Gja1M213L/M213L suggesting that there can be compensatory long-term effects of the Gja1 mutation (Figure 5—figure supplement 1C).

      —Results section (lines 241 – 250) However, remarkably, reduced GJA1-20k expression results in an almost complete cardiac infarction after I/R injury (Figure 5E, F). Moreover, ROS production after I/R injury was increased in Gja1M213L/WT mice compared to WT post-I/R (Figure 5G, H). There was no significant difference in mitochondria size at the basal condition between WT and Gja1M213L/WT mice adult CMs as with neonatal CMs (Figure 5I, J), whereas the mitochondria size was significantly increased after I/R injury and the heterozygous Gja1M213L/WT mice had larger mitochondria compared to WT mice post-I/R (Figure 5I, J). Interestingly, the area of mitochondrial matrix was also increased, suggesting loss of cristae in Gja1M213L/WT mice heart (Figure 5K, L). These data indicate that even partial deletion of GJA1-20k results in a profoundly impaired response to ischemic stress.

      —Discussion section (lines 350 – 357) Because GJA1-20k-induced fission is associated with less ROS production with oxidative stress (Figure 5 – figure supplement 1D, E), the endogenous generation of GJA1-20k and subsequent decreased ROS production could explain a major benefit of pre-conditioning. Of note, genetic GJA1-20k reduction increases infarct size and ROS production post-I/R injury (Figure 5E–H). In addition, the population of damaged mitochondria is significantly increased in heterozygous Gja1M213L/WT mouse heart post-I/R (Figure 5I–L). Therefore, GJA1-20k induced decreases in ROS production could limit the amount of I/R injury induced by myocardial infarction.

      It is still unclear to me how GJA1-20k is affecting mitochondrial size and function. Based on previous papers, this peptide localizes to the surface of mitochondria, but it is not clear how, or whether, it directly facilitates actin recruitment. The interplay with the endoplasmic reticulum (ER), which can nucleate actin at sites of mitochondrial fission, was not examined. If actin is driving membrane remodeling, is it mediated by ER crossover at these sites?

      We appreciate Reviewer #3’s thoughtful comment and suggestion. Our unpublished data indicate that GJA1-20k has an actin-binding domain, suggesting direct binding and actin dynamics regulation. As shown in Figure 3 in the present study, GJA1-20k recruits actin around mitochondria membrane and their interaction resulted in fission. In addition, as the Reviewer suggested, our preliminary data showed significant increase in ER network in GJA1-20k-transfected cells (Figure below). Therefore, there is the possibility that ER is also involved in GJA1-20k mediated mitochondrial fission, while further research will be required to reveal the detailed mechanisms. In the present manuscript, we would like to focus on the finding that actin is necessary for GJA1-20k-mediated mitochondrial fission but not DRP1.

      ER network association with mitochondria is increased in GJA1-20k-transfected cells. Left: Representative fixed cell images of HEK293 cells with GFP-tagged GST or GJA1-20k. ER and mitochondria were labeled by Protein disulfide-isomerase (PDI) and Tom20, respectively. Right: The quantification of Pearson’s correlation between PDI and mitochondria. The graph is expressed as mean ± SD. p values were determined by two-tailed Mann-Whitney U-test. **p < 0.001.*

      We have updated the Discussion section to point to this excellent consideration in the future.

      —Discussion section (lines 299 – 302) In addition to actin, the endoplasmic reticulum (ER) membrane can be involved in mitochondrial scission (Friedman, Lackner et al. 2011, Tandler, Hoppel et al. 2018). Future studies should be considered whether GJA1-20k induced actin cytoskeleton arrangements involves ER membrane as well.

    1. Author Response:

      Reviewer #1:

      Systematic reviews and meta-analyses are essential tools for synthesizing empirical evidence to advance our knowledge in life science. In this rigorously conducted meta-analytic study, the authors analyzed the data from 119 experiments from 110 published articles (92 on brain functional experiments from 87 articles and 27 on brain structural experiments from 23 papers) and investigated the functional and structural abnormalities associated with developmental dyslexia across languages. Convergent and divergent functional and structural changes as well as language-universal and language-specific brain alternations related to dyslexia are found. In general, the study has generated important results and the findings are interesting.

      I have the following comment:

      Dyslexia in alphabetic languages is generally related to phonological deficits, so there are many neuroimaging experiments using phonology-based tasks. In Chinese, the core deficits of dyslexia are unknown, and neuroimaging tasks devised in the literature are more diverse. Although the authors have done well in Table 1 to specify the experimental tasks in various languages, the meta-analysis did not take into account the task types. I believe that in this article, it is not necessary to conduct task-type based meta-analyses, but one sentence or two in the Discussion section to mention this possibility and the limitation is necessary.

      We actually conducted a confirmation analysis in which we matched task in the two language groups. The results are consistent with the original findings. Please see pages 15-17, 36-37.

      Reviewer #2:

      In the present study, Xiaohui Yan and colleagues attempt to summarize the existing evidence on neurofunctional and neuroanatomical impairments in dyslexia (aka specific reading disorder) in different languages in a meta-analytic manner. The research questions the authors asked are essential but remain largely open. The meta-analysis is a powerful approach to address these problems, and the findings are appealing. Both universal and language-specific neural manifestations in dyslexia are revealed. With this knowledge, the researchers can design experiments to reveal/examine more specific hypotheses, while the educators can refine diagnostic methods and intervention programs. This study has several strengths, including (1) the research questions are explicitly and precisely declared at the beginning; (2) an advanced meta-analytic method (AES-SDM) was used; (3) a series of complementary analyses are done, including a confirmation study with well-matched English and Chinese studies was conducted; (4) a comprehensive discussion is given. At the same time, as too many questions are asked, the analyses/results/interpretations became quite complicated and sometimes hard to follow. In addition, several factors need to be further taken into consideration in data analysis and result explanation. Finally, it should be noted that given meta-analysis is a way to summarize the previous findings, it is necessary to conduct further studies based on it to directly examine the hypotheses. All in all, the main claims are supported by the data, while additional analyses would provide further support.

      I have three main concerns:

      1) The imbalanced numbers of studies in alphabetic and morpho-syllabic languages may bias the results of primary meta-analysis pooling all studies together. Specifically, since there are many more alphabetic studies in this field, the results will mainly reflect patterns in these languages. This can be seen, e.g., when comparing figures S1-S2 with figures S3-S6. In this case, the result cannot answer whether there are the same functional and structural impairments in dyslexia in alphabetic and morpho-syllabic languages (i.e., the "both" question). The same issue exists for the multi-modal analysis across languages.

      We agree with the reviewer. We deleted the overall analysis and only keep analysis for each language group, as well as the comparisons and conjunctions between them, so that the results can show language differences as well as common findings in both groups.

      2) Age range difference may significantly influence the results of the primary meta-analysis across languages. It is shown in p. 15, lines 297-298, "the mean age was 16.55 years for controls and 16.23 years for participants with DD." However, such adolescent age ranges are a result of pooling studies in children and adults together. Given that (1) it has been revealed in previous literatures that participant's age modulates the neural manifestation in dyslexia, (2) fewer adolescent studies exist in alphabetic languages, and (3) research on morpho-syllabic languages is almost with children, the findings of the primary analysis might be influenced by age-related effect.

      We conducted a confirmation analysis with only alphabetic studies on children and we replicated previous findings of language differences between alphabetic and morpho-syllabic languages (page 15, 36). It suggests the language differences we found cannot be due to unmatched age ranges in the two language groups.

      3) Some statements are out of the scope of this study and can be misleading. For instance, in the abstract (p.2, lines 17-19), it says, "…it is still not totally understood where and why the structural and functional abnormalities are consistent/inconsistent across languages." However, while the "why" question is an important one, unfortunately, the current meta-analysis does not answer it.

      We deleted “why” in the abstract.

    1. Author Response:

      Reviewer #1:

      This manuscript describes extensive phenotypic analyses of Flo11, a multifunctional adhesin in S. cerevisiae. Flo11 mediates a variety of adhesin-related activities including flocculation and adhesion to surfaces, and these activities can lead to biofilm and pellicle (floating biofilm) formation, as well as ability to mediate agar invasion. Flo11 activities are activated after shear force, which facilitates surface amyloid formation. Flo11 is highly diverse, with different yeast strains expressing alleles with sequence and repeat number differences. These differences contribute to differences in adhesion behaviors. The alleles have conserved regions including a secretion signal, a globular N-terminal region, and a C-terminal GPI anchor that mediates cell surface attachment. In between these regions are ~1000-1500 amino acids with variable sequence and repeat numbers and a variable number of potential amyloid-core sequences. The manuscript includes extensive sequence analysis of alleles from 4 strains, and analysis of the consequences of expression of different regions of the protein from various alleles. This major study includes detailed AFM analyses showing that a Flo11 allele from a wine-fermenting strain expression leads to presence of cell surface patches of Flo11 adhesin 10-100 nm diameter 10-20 nm elevation from the cell surface. SiN AFM probing determines both adhesive strength and stiffness of the adhesion patches. The regions containing the amyloid-core sequences are essential for most of the Flo11 activities, because the activities are inhibited by deletion of amyloid-forming core regions or by treatment of intact cells with a general amyloid inhibitor or a sequence-specific anti-amyloid peptide.

      We thank the reviewer for these encouraging and stimulating comments.

      The work is extensive and well-documented. Qualitative data on the different constructs shows how activity correlates to specific sequences in the protein. However, the analysis is compromised by the lack of data on the levels of surface expression of the different alleles and constructs. mRNA levels are reported, but for fungal adhesins these values often do not correlate with surface expression levels. The problem is especially confounding for the C-terminal deletion construct, because the deleted region includes the GPI addition signal: its deletion leads to secretion of free adhesin into the medium and decreased surface anchorage (Douglas et al. 1996 Eukaryot. Cell6:2214-2221).

      This critical comment also raised by the Editor has been answered above. The main finding was to show that a Flo11 variant lacking the C-ter still show localization at the cell periphery as witnessed by immunofluorescence, suggesting that this variant can be retained at the cell wall by other structural components. This result appears to be in contradiction with the data from Douglas et (EC, 2007) except that as written in line 485-490 in the revised version), FLO11 was expressed on a 2μ plasmid under PGK1 promoter, leading to huge expression of Flo11p, which may exceed the capacity of cell wall to retain it.

      The quantitative biophysical conclusions would be strengthened if the data from all tested cells were aggregated and presented, not just the data from the individual cell shown in each figure.

      We have already answered to this question above, as it was one of the issues raised by the Editor.

      If these critiques are addressed, the manuscript would greatly benefit from a summary figure and paragraph describing the activities attributed to each region of the sequence in L69 and other alleles. It would also benefit from reference to recent work demonstrating that cell-cell adhesion can be mediated by formation of amyloid-like bonds between cells.

      This is indeed a good suggestion but the paper already contains 9 figures and 2 tables and we would like to extend this comparative analysis with all the FLO-encoded flocculins in S. cerevisiae, which will be the purpose of an ongoing short review on the subject to be submitted soon.

      Reviewer #2:

      The authors argue that the Flo1 protein of yeast can mediate cell-cell aggregation through amyloid formation. They identified a wine-making yeast strain that is particularly strong in this phenotype and this correlates with the presence of aggregation prone repeat regions in the protein. Their frequency seems to be important for the effect.

      The paper makes an extensive case, although I have reservations on experimental specifics. They show AFM imaging of the cell surface (nanodomain formation) and extensive genome editing to alter the suspected regions, which largely (but not perfectly produces effects in line with the hypothesis). Also, they use an inhibitory peptide (not so well validated) and a widely used amyloid dye to disturb these amyloid formation. Taken together these data point toward a role for amyloid formation, similar to what was previously shown by Lipke for candida. However, I remain with some concerns regarding data, controls, interpretation that preclude firmly backing the story.

      We thank this reviewer for his/her critical comments. We have done our best to answer to your relevant questions, comments and remarks.

      Major Concerns:

      There is no direct evidence of amyloid formation in situ.

      The paper is not about to show amyloid fibers formation of Flo11p as this was already shown in a previous paper that Flo11p forms amyloids fibers in vitro. Moreover synthetic peptides with high β- aggregate potential sequences taken from these adhesins formed amyloids in vitro as well (see reference Ramsook et al., Eucaryotic cell, 9: 393-404 2010, and that we referenced in our paper).

      The hydrophobic cluster analysis in figure 1 is not very clear - it is almost impossible to make anything out in those graphs

      The HCA (hydrophobic cluster analysis) representation as shown in this figure has been used previously by others (see Chan et al., mSphere, e00128, 2016). This representation has the advantage for non-expert in the field to readily visualize what is common and different in term of sequences structure between the different Flo proteins. According to your request, we remove this figure and onoyt kep data on TRs in a supplementary File 1.

      The term beta-aggregation potential is used throughout and not defined. I presume this refers to the TANGO score? You should call it TANGO score then. Where was the arbitrary cutoff of 70 for TANGO aggregation propensity determined? In itself it is only a prediction, not a demonstration of amyloid potential.

      It is indeed a valuable suggestion although β-aggregation term is currently used in many other publications on this subject. TANGO is a statistical algorithm designed to predict β- aggregate potential sequences in protein. It does not predict whether or not these sequences will lead to amyloid formation, although correlation between β-aggregation sequences and amyloid formation has been discussed in (Fernandez-Escamilla AM, Rousseau F, Schymkowitz J, Serrano L. Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins. Nat Biotechnol. 2004 Oct;22(10):1302-6. doi: 10.1038/nbt1012. Epub 2004 Sep 12. PMID: 15361882.).

      The cutoff used for β-aggregation propensity was set arbitrarily set a 30% (see line 634 in the M&M section and in Table 1 and Supplementary File 2a) to discriminate which sequence in the protein have a high propensity to beta- aggregate. As shown in Table 1 and Supplementary File 2a, Flo11 protein exhibits several beta-aggregation prone sequences that are largely over this cut-off.

      To do that you need to make the peptide and show it makes amyloid (and that this can be suppressed by thioS and your breaker peptide, see below)

      These experiments have been carried out in a previous paper (See Ramsonk et al, Eukaryotic cell, 9, 393-404, 2010)

      line 124: how many cells were analysed, is this pattern present in the population? analysing a single cell would seem insufficient to me - this comment applies to several analyses below.

      As indicated in M&M, all AFM experiments were done on 3 biological replicates and for each replicate around 6 to 12 cells have been analyzed. Data reported in the figure are from a representative cell. However, as stated above and in accordance with the Editor request, we reported the aggregation of all data from 24 cells, which showed the bimodal behavior of the adhesion forces and stiffness, which are reported as box plots (see Figure 1G, H);

      line 124: "this apparent roughness can be account for by proteins" - there is no data to support this, right? could be anything at this stage, it is just patch. I my mind you would need AFM-IR to ensure that you are actually observing a protein structure. This would immediately show you if its in the amyloid state as well.

      As this stage, we do not have any idea what could be those aggregates and therefore we modified as follows (lines 115-118 in the revised version): The high-resolution height image (Fig. 1B) revealed a multitude of small aggregates on the cell surface with an average diameter of 100 nm and a height above the cell surface in the range of 15-20 nm (Fig. 1C).

      line 134: possibly due to my ignorance: how do you conclude from those forces that one is the hydrophobic interaction and the other is protein unfolding? comes as a complete deus ex machina to me.

      The difference between hydrophobic interactions and protein unfolding is obtained from the shape of the retraction force-distance curves as shown in Fig 1E. A hydrophobic interaction is a physicochemical interactions due to the surface tension which is exerted on the surface of materials. It is particularly related to the organization of the solvent (usually water) at the interface between the solid and the liquid. These forces are exerted at the immediate surface of the material and are characterized by a null distance between the AFM tip and the surface at the moment of rupture.

      On the contrary, a protein unfolding, which as its name indicates, consists in unfolding a complex organic molecule (a protein) and which is characterized by a distance between the AFM tip and the surface, at the moment of rupture, equal to the length of the unfolded protein. If the protein unfolds in several steps, which is often the case for these molecules with complex folding, this results to sawtooth force curves profiles characteristic of the unfolding of proteins containing repeated domains.

      line 151 (also mentioned later): the 'anti-amyloid peptide' is taken with a lot of faith: how do we know it works by disrupting amyloid? At least show it works on the peptide level (Tht curves) (again the AFM-IR measurement would solve this in one single effort). This experiment at least needs a control to show the cell is otherwise intact and other structures are still present. Designing potent amyloid disrupting peptide is not trivial - see various papers on the topic by David Eisenberg. I find it hard to believe these subtle mutations achieve all that they are claimed to achieve.

      The use of a mutated peptide of the amyloid core sequences or amyloid perturbants are well established to investigate amyloid-like nature of nanodomains formation or in cell-cell aggregation. Therefore, it may be a question of semantic because in this case, we are working with a protein that forms cellular aggregates due to the presence of amyloid-core sequence. When these proteins unfold, they expose these sequences that were previously buried inside compact protein, leading to interaction between molecules of the same type, and hence this creates clusters of thousands of molecules that can eventually organize into nanodomain. Such interaction can be inhibited by a mutated peptide of the amyloid core sequence or by anti-amyloid dyes. This event is different from a protein whose amyloid formation is triggered by intermolecular associations leading to a refolded protein that often loses it function and this leads to condition of protein misfolding diseases neither on amyloid prions whose flocculin or adhesion in yeast does not belong to.

      line 157: Maybe I am confused, but why would thioflavin S destabilise amyloid? it is widely used as an amyloid-specific dye, and I would expect its binding energy to stabilise the amyloid state. Later, around line 310, the compound is called a drug, but really, it is just a rotor dye with an affinity for amyloid. Do other amyloid dyes show similar effect (oligothiophenes, congo red, curcumin)?

      It is indeed known that amyloid-dependent aggregation of yeast cells can be monitored by increased thioflavin T fluorescence as a result to its binding to β-amyloids structures. However, other amyloid binding dyes such as Congo red and in particular Thiflavin S had potent anti-aggregation effects (see for instance Ramsook et al. EC, 9, 393-404, 2010). Taking into account that these dyes bind to amyloids, they are thus employed to evaluate whether the cell-cell aggregation mediated by Flo11 (and the same for Als1 and Als5 in C. albicans) is due the presence of amyloid forming sequence in these proteins.

      line 250: are we really comparing single cell instances of each construct?

      Suggestions:

      If the hypothesis of the authors is true, it should be possible to replace the amyloid domains with synthetic ones (STVIIE eg, Serrano and co-workers), or from another protein. Once you have sufficient of these, you should see the cell-interactions etc.

      Also, introducing structure breaking residues in the repeats, like proline should stop the effect and would give strong support for the amyloid nature of the interactions over general hydrophobic patches.

      Each of these strains expresses only one specific variant validated by genome sequencing. Our AFM analysis showed that nanodomains are no longer observed when Flo11p is lacking RR2 that contains the additional amyloid sequence motifs. Thus, we can conclude that amyloid formation sequences are indispensable for their formation. The question of disrupting the cell-cell interactions or preventing the formation of the nanodomains is presented earlier in the paper (see line152 -158 in the revised version).

      Reviewer #3:

      The authors observed nanodomains in FLO11 expressed S. cerevisiae strains when imaging the cell wall of the cell with a bare AFM tip. The adhesion to the tip as well as the stiffness of these domains were characterized using AFM. However, there was no direct proof/confirmation that these nanodomains were actually composed of Flo11 proteins.

      We disagree with this statement because we clearly show that (i) the formation of nanodomains in response to AFM tip does not occur anymore in a mutant defective in FLO11, and (ii) the expression of the FLO11 gene from L69 strain in the lab strain BY leads to the formation of nanodomains at the cell surface of this strain, which is not seen upon overexpression of its endogenous FLO11. Thus, these data taken together demonstrate that a) Flo11 is responsible for nanodomains formation and b) this formation requires a peculiar Flo11 protein. We show actually that this peculiarity lies in the necessity to harbor sufficient amyloid-forming motifs in the protein sequence.

      Previous research showed that Flo11p trans interacts and this trans interaction between cells could be (hypothesis) preceded by a cis interaction between the Flo proteins via beta-aggregation-prone amyloid-forming sequences. It seems that the obtained results could explain this trans-interaction (i.e. the "nanodomains"). However, in this model the main interaction is based on the Flo11 A-domain interaction that is responsible for the trans-interaction. This could not be confirmed in the present work where the cell-cell interaction results were only based on (qualitative) microscopic observation of cell aggregates. Additional, it has been previously shown that homotypic Flo11 A-domain interaction is pH dependent since only at low pH these domains hydrophobically interact. It seems that the current experiments were performed at a too high pH.

      The suggestion that Flo11 dependent trans-interaction that characterizes cell-cell interaction may be preceded by a cis-interaction or homophobic interaction (ie interaction between Flo11p molecules leading to cluster) is still hypothetical. However, recent data from Lipke’ group (se Ho et al, mBio. 2019 Oct 8;10(5):e01766-19; Dehullu et al, Nanolett,19, 3846, 2019) showed that amyloid-forming sequences are implicated in these two types of interaction. Thus, while cis-interaction of Flo11p definitively requires amyloids motifs, our data are in line with this finding that the amyloid-forming sequence contribute to trans-interaction, although they are not essential to it. Altogether, we revised part of the Result section (see lines 312 -315 in the revised version) as well as the Discussion considering both model and our data (see line 484 to 514).

      Finally, we have carried out the cell-cell adhesion experiments shown in Figure 6 at pH 5 (as for AFM experiments). This is now clearly stated in Mat & Meth.

    1. Author Response:

      Reviewer #2:

      The SNX-BAR family of sorting nexin proteins is involved in the formation of tubular carriers at endosomes. The best characterized yeast sorting nexins form part of the retromer complex, which binds sorting signals on cargo proteins to direct their recycling. There is some debate as to the role of sorting nexins in mediating cargo recognition vs tubule formation, and it is unclear which (if any) other members of the sorting nexin family bind directly to cargo.

      In this manuscript, the authors investigate the function of the yeast sorting nexin Mvp1. This protein was previously proposed to cooperate with retromer in the formation of recycling tubules, and to recruit the dynamin-like protein Vps1 to promote their scission (Chi et al, JCB 2014). Here, Suzuki et al find that Mvp1 has a cargo-sorting role that is distinct from that of other sorting nexins. They show that Mvp1 (but not retromer) is required for the correct localization of the membrane protein Vps55, and identify a cytosolically-exposed sequence in Vps55 required for its sorting. Using structurally-guided mutagenesis, they find that dimerization and membrane binding is important for Mvp1 function. They use live cell imaging to show that Vps55 is largely sorted into different tubules compared to the retromer cargo protein Vps10, and use fractionation of vesicle fusion-deficient cells to show these cargo are present in different vesicle populations, suggesting that Mvp1 and retromer form different classes of retrograde carriers. By surveying the trafficking of other membrane proteins, they show that in some cases Mvp1 acts redundantly with two other sorting nexin complexes (Snx4 and/or retromer) to recycle cargo at endosomes. Moreover, they find that loss of all three sorting nexin complexes perturbs endosome function, lipid asymmetry, and the endosomal recruitment of the scission factor Vps1. Although Mvp1 was previously implicated in Vps1 recruitment (Chi et al, 2014), Suzuki et al use a GTPase-defective form of Vps1 to provide the first evidence that Mvp1 physically interacts with Vps1 in vivo and in vitro. Taken together, these data suggest that Mvp1, retromer and Snx4 recognize distinct sets of cargo proteins and mediate independent recycling pathways at endosomes, and imply that each sorting nexin recruits Vps1 to complete tubule scission.

      Overall, this manuscript presents a large number of experiments that are technically well executed and makes several novel observations. It should be noted that many experiments largely repeat previous work: this was not always clearly indicated in the manuscript. For the most novel observations, some weaknesses were noted. A key novel finding was that Mvp1 binds to and sorts the cargo protein Vps55 via recognition of a cytosolic motif. The supporting data do not provide the typical burden of proof for such experiments, because: (1) the identified sequence was shown to be necessary but not sufficient, thus the mutation could indirectly affect binding at another site, and (2) Mvp1 failed to coIP with the Vps55 mutant from cell lysates, but this could be an indirect effect of Vps55 missorting to the vacuole while Mvp1 remains at the endosome, and does not prove that Mvp1 binds directly to Vps55 via this motif.

      Thank you for pointing this out. As mentioned above, to address your point, we examined the Mvp1-Vps55 interaction in cells lacking Vam3, required for endosome fusion with the vacuole. In this mutant, both WT and recycling mutants localize at the endosome (Fig. Rev. 1C). We confirmed that mutations in the recycling sequence altered the Mvp1-Vps55 interaction even in vam3Δ cells (Figure 3-figure supplement 1C was added to the revised manuscript). To address whether the recycling signal is sufficient for Mvp1-mediated recycling, we tried to generate several chimera constructs, but we did not obtain a construct recycled in Mvp1 dependent manner. Hence, we were not able to address this point.

      A second key finding is that Mvp1 and retromer form distinct classes of tubular carriers at endosomes. While the manuscript does provide data to support this conclusion, I was disappointed that there was no discussion of the work of Chi et al, who showed through careful quantitative analysis that Mvp1 and retromer frequently label the same population of tubules.

      Thank you for pointing this out. In the revised manuscript, we have also discussed the differences with Chi et al. in the text (Page 13, line 408).

      Moreover, the authors claim that mvp1 mutants secrete little CPY, yet the literature indicates these mutants secrete ~65% of newly synthesized CPY (Ekena and Stevens, MCB 1995), suggesting a functional link between Mvp1 and Vps10 recycling. In fact, vps55 mutants themselves have a significant CPY missorting defect (~50% secreted) suggesting that some mvp1 phenotypes could be a secondary consequence of Vps55 mislocalization.

      Thank you for pointing this out. We examined the CPY sorting in the recycling signal mutants. Strikingly, CPY was partially missorted to the extracellular space in vps55Y61A/T63A/F66A/M67A mutants (Fig. Rev. 6). Since Vps10 recycling was not altered in mvp1Δ cells (Figure 5A), we believe that the mislocalization of Vps55 causes the CPY sorting defect in mvp1Δ cells.

      It was not mentioned that Vps55 interacts with the transmembrane protein Vps68: these proteins are interdependent for their stability and loss of Vps68 slows traffic out of the endosome (Schluter et al MBOC 2008). This provides a simple explanation for the observed ubiquitination and degradation of overexpressed Vps55, which presumably saturates available Vps68.

      As suggested by the reviewer, we have revised the manuscript (Page 5, line 158). Also, as mentioned above, we observed that Vps55 missorting was suppressed by overexpression of Vps68 (Figure 3-supplement 1E was added to the revised manuscript), suggesting that Vps68 was saturated in this condition.

      Other experiments in this manuscript were not completely novel, including: the demonstration that Mvp1 tubules bud from endosomes and that Mvp1 is important for Vps1 recruitment to endosomes (Chi et al, JCB 2014); that Vps1 GTPase mutants accumulate Mvp1 at endosomes (Ekena and Stevens, MCB 1995); that Mvp1 plays a role in Vps55 localization (Bean et al, Traffic 2017); and that GFP-SNX8 is present on endosomal tubules when expressed in mammalian cells (van Weering et al, Traffic 2012). While in most cases the experiments presented in this manuscript build on and extend previous work, I would like to see the earlier work fully acknowledged, and any discrepancies appropriately discussed. The fact that many of the experiments presented in this manuscript are not entirely novel detracts from the overall impact of the work. Despite this, key original findings presented in this paper - including the discovery that Mvp1 is required for sorting specific cargo and binds directly to the dynamin-like protein Vps1 - will be of broad interest to the trafficking field.

      Thank you for pointing this out. In the revised manuscript, we have carefully revised the manuscript (Page 5, line 133; Page 8, line 236; Page 13, line 414; Page 12, line 377).

    1. Author Response:

      Reviewer #2 (Public Review):

      1) The authors describe their algorithm as a tool that (i) was validated "across heterogeneous populations around the world"; (ii) has an "accuracy matching or exceeding human accuracy"; (iii) "is easy to use". I take issue with these three statements. First, the authors did not test the performance of their algorithm in clinical populations with sleep disorders, despite the fact that individuals with sleep disorders represent (logically) the vast majority of sleep recordings. Crucially, such a comparison was made in the best (to my knowledge) published automated sleep staging algorithm (Stephansen et al. Nature Communications 2018, doi: 10.1038/s41467-018-07229-3). The omission of this work is very surprising. Quantifying the impact of sleep disorders on a sleep scoring algorithm is critical for its deployment in sleep clinics.

      We apologize as we were not clear in describing our training and testing data set. Indeed, both the training and testing set 1 included a significant number of individuals with sleep disorders. Indeed, about 30% of the individuals had moderate to severe sleep apnea (AHI >= 15). The validation dataset (DOD, or testing set 2) also includes 55 nights from individuals with obstructive sleep apnea (average AHI = 18.5 ± 16.2). Furthermore, both the training and testing set 1 included individuals with a medical diagnosis of insomnia, depression, diabete and hypertension.

      The health status and demographics data of the training and testing sets have now been clarified throughout in the manuscript to avoid any such confusion:

      1) Methods: We have added an extensive description of each dataset in the training and testing sets, including data on health and sleep disorders.

      2) Results: We have added a new table to report and compare demographics/health data of the training and testing set, as suggested in a later comment by the reviewer.

      3) Results: Performance results of the testing set 2 are now reported separately for healthy individuals and individuals with sleep disorders.

      Second, the authors wrote that their algorithm is "matching or exceeding" human accuracy but seem to present uncorrected one-to-one comparisons to support their claim. The fact that an algorithm is better than some humans do not mean it exceeds human performance.

      Thanks for noting that. We have now removed all instances of “exceeding human accuracy”.

      Third, although I agree that the tool seems easy to use even for individuals with limited programming skills, it still requires some. I don't think someone who is used to software with graphical interfaces and who has never used (or heard of!) python would describe the tool as easy to use. This poses an important implementation challenge.

      2) An important limitation of this algorithm is that it captures only one part of the visual examination of sleep data. Indeed, especially in clinical settings, the data is not only examined to establish the hypnogram but to also identify markers of common sleep disorders (e.g. sleep apnea, leg movements, etc). Although this algorithm could significantly speed up sleep scoring, it does not allow to detect these other important markers. Currently, and in link with the previous comment, the algorithm could not replace the visual inspection of the data for clinical diagnoses.

      We have now revised the manuscript such that we discussed in this possibility in “Limitations and future directions” subsection of the new Discussion:

      “The algorithm is not currently able to identify markers of common sleep disorders (such as sleep apnea, leg movements) and as such may not be suited for clinical purposes. It should be noted however that our software does include several other functions to quantify phasic events during sleep (slow-waves, spindles, REMs, artefacts) as well as sleep fragmentation of the hypnogram. Rather than replacing the crucial expertise of clinicians, YASA may thus provide a helpful starting point to accelerate clinical scoring of polysomnography recordings. Furthermore, future developments of the algorithm should prioritize automated scoring of clinical disorders, in particular apnea-hypopnea events. On the latter, YASA could implement some of the algorithms that have been developed over the last few years to detect apnea-hypopnea events from the ECG or respiratory channels (e.g. Varon et al. 2015; Koley and Dey 2013).”

      3) The data were curated with some recordings or portions of recordings being excluded (see p. 7). While I understand that this curation is important for the training set, I think it should not be applied to the test set. Indeed, it goes contrary to the logic of automating sleep staging. For example, cutting the beginning and end of the recording according to sleep start and end (p. 7) supposes that the start and end of sleep are already known (i.e. it has already been scored).

      This truncation step has now been removed from the pipeline and all the results have been updated accordingly. In addition, we have also removed all other exclusion criteria (e.g. PSG data quality, recording duration, etc) to improve the generalization power of the algorithm, thanks to the suggestions of the reviewer.

      4) Two types of EEG derivations were used (C4-M1 or C4-Fpz). Was the performance impacted by this variable? Is it fair to assume that the choice of features (spectral features or summary statistics of time series data) could explain the absence of differences but that introducing new features (i.e. phase-sensitive features) could increase the influence of the choice of the derivation?

      Thanks for raising this. First, our choice of the EEG reference was determined by the datasets: the CFS, CCSHS, MrOS, CHAT and HomePAP datasets were all referenced to Fpz, while the MESA, SHHS and DOD datasets were referenced to the contralateral mastoid. The montage of each dataset has now been added to the Methods section.

      Second, as rightly pointed out by the reviewer, the features implemented in the algorithm were chosen to be robust to various recording montages. This is now explicitly discussed in the “Features extraction” subsection of the Methods:

      “The features included in the current algorithm were chosen to be robust to different recording montages. As such, we did not include features that are dependent on the phase of the signal, and/or that require specific events detection (e.g. slow-waves, rapid eye movements). However, the time-domain features are dependent upon the amplitude of the signal, and the algorithm may fail if the input data is not expressed in standard units (uV) or has been z-scored prior to applying the automatic sleep staging.”

      5) Given that markers of sleep stages are very different in EOG, EMG and EEG time series, could the authors explain the logic behind applying the same pre-processing and extracting the same features on these three very different types of data? Could this explain why the majority of the features in the top-20 features were EEG features?

      We now provide a more detailed explanation on the inclusion of EOG and EMG features in the “Features extraction” subsection of the Methods:

      “These features were selected based on prior work in features-based classification algorithms for automatic sleep staging (Krakovská and Mezeiová 2011; Lajnef et al. 2015; Sun et al. 2017). For example, it was previously reported that the permutation entropy of the EOG/EMG as well as the EEG spectral powers in the traditional frequency bands are the most important features for accurate sleep staging (Lajnef et al. 2015), thus warranting their inclusion in the current algorithm. Several other features are derived from the authors’ previous works with entropy/fractal dimension metrics1. ” https://github.com/raphaelvallat/antropy

      Furthermore, we have added a “Limitations and future directions” section in the Discussion in which we propose future improvements of the algorithm. One of these potential improvements is the development of EOG and EMG features that would provide a higher discrimination of the sleep stages:

      “This suggests that one way to improve performance on this population could be the inclusion of more EEG channels and/or bilateral EOGs. For instance, using the negative product of bilateral EOGs may increase sensitivity to rapid eye movements in REM sleep or slow eye movements in N1 sleep (Stephansen et al. 2018; Agarwal et al. 2005). Interestingly, the Perslev 2021 algorithm does not use an EMG channel, which is consistent with our observation of a negligible benefit on accuracy when adding EMG to the model. This may also indicate that while the current set of features implemented in the algorithm performs well for EEG and EOG channels, it does not fully capture the meaningful dynamic information nested within muscle activity during sleep.”

      6) Sleep scoring guidelines incorporate not only what can be observed on a given epoch of data but also what is observed in the previous epoch(s). For example, an epoch can be scored as N2 even if there is no marker of N2 but there was (i) a marker of N2 in a previous epoch, (ii) no reason to change the score since. To reproduce this, the authors employed a symmetrical smoothing approach (a combination of a triangular-weighted rolling average and asymmetrical rolling average). Why did the authors choose to incorporate data from following epochs, which is not implemented in established guidelines? How was the duration of the smoothing window chosen? Indeed, 5 minutes appear as rather long could explain the poor performance of the algorithm for fast changing portions of the data (i.e. N1 or transitions). Importantly, these transitions can be very relevant in clinical settings and to establish a diagnosis.

      This is a great question. We have addressed this in the revised manuscript.

      Temporal smoothing

      We have also conducted a new analysis of the influence of the temporal smoothing on the performance. The results are described in Supplementary File 3a. Briefly, using a cross-validation approach, we have tested a total of 49 combinations of time lengths for the past and centered smoothing windows. Results demonstrated that the best performance is obtained when using a 2 min past rolling average in combination with a 7.5 minutes centered, triangular-weighted rolling average. Removing the centered rolling average resulted in poorer performance, suggesting that there is an added benefit of incorporating data from both before and after the current epoch. Removing both the past and centered rolling averages resulted in the worst performance (-3.6% decrease in F1-macro). Therefore, the new version of the manuscript and algorithm now uses a 2 min past and 7.5 min centered rolling averages. All the results in the manuscript have been updated accordingly. We have now edited the “Smoothing and normalization” subsection of the Methods section as follow:

      “In particular, the features were first duplicated and then smoothed using two different rolling windows: 1) a 7.5 minutes centered, and triangular-weighted rolling average (i.e. 15 epochs centered around the current epoch with the following weights: [0.125, 0.25, 0.375, 0.5, 0.625, 0.75, 0.875, 1., 0.875, 0.75, 0.625, 0.5, 0.375, 0.25, 0.125]), and 2) a rolling average of the last 2 minutes prior to the current epoch. The optimal time length of these two rolling windows was found using a parameter search with cross-validation (Supplementary File 3a). [...] The final model includes the 30-sec based features in original units (no smoothing or scaling), as well as the smoothed and normalized version of these raw features.”

      Reviewer #3 (Public Review):

      This study presents a new sleep scoring tool that is based on a classification algorithm using machine-learning approaches in which a set of features is extracted from the EEG signal. The algorithm was trained and validated on a very large number of nocturnal sleep datasets including participants with various ethnicities, age and health status. Results show that the algorithm offers a high level of sensitivity, specificity and accuracy matching or sometimes even exceeding that of typical interscorer agreement. The conclusions are supported by the data. Importantly, a measure of the algorithm's confidence is provided for each scored epoch in order to guide users during their review of the output. The software is described as easy to use, computationally low-demanding, open source and free. This paper addresses an important need for the field of sleep research. There is indeed a lack of accurate, flexible and open source sleep scoring tools. I would like to commend the authors for their efforts in providing such a tool for the community and for their adherence to the open science framework as the data and codes related to the current manuscript are made available. I predict that this automated tool will be of use for a large number of researchers in the field. However, there are plenty of automated sleep scoring tools already available in the field (most of them are not open source and rather expensive, as noted by the authors). The current work does not provide a clear view on whether the new algorithm presented in this research performs better than algorithms already available in the field. No formal comparisons between algorithms is provided and the matter is not discussed in the paper.

      Thanks so much for pointing this out. We have now added this relevant reference throughout the manuscript. To build on the reviewer’s point, the current algorithm and Stephansen’s algorithm did not use the same public data. The Stephansen 2018 algorithm was trained and validated on “10 different cohorts recorded at 12 sleep centers across 3 continents: SSC, WSC, IS-RC, JCTS, KHC1, AHC, IHC, DHC, FHC and CNC”, none of which are included in the training/testing sets of the current algorithm. Nevertheless, we certainly agree that the manuscript will benefit from a more extensive comparison against existing tools. To this end, we have made several major modifications to the manuscript. First, we have added a dedicated paragraph in the Introduction to review existing sleep staging algorithms:

      “Advances in machine-learning have led efforts to classify sleep with automated systems. Indeed, recent years have seen the emergence of several automatic sleep staging algorithms. While an exhaustive review of the existing sleep staging algorithms is out of the scope of this article, we review below — in chronological order — some of the most significant algorithms of the last five years. For a more in-depth review, we refer the reader to Fiorillo et al. 2019. The Sun et al. 2017 algorithm was trained on 2,000 PSG recordings from a single sleep clinic. The overall Cohen's kappa on the testing set was 0.68 (n=1,000 PSG nights). The “Z3Score” algorithm (Patanaik et al. 2018) was trained and evaluated on ~1,700 PSG recordings from four datasets, with an overall accuracy ranging from 89.8% in healthy adults/adolescents to 72.1% in patients with Parkinson’s disease. The freely available “Stanford-stage” algorithm (Stephansen et al. 2018) was trained and evaluated on 10 clinical cohorts (~3,000 recordings). The overall accuracy was 87% against the consensus scoring of several human experts in an independent testing set. The “SeqSleepNet” algorithm (Phan et al. 2019) was trained and tested using a 20-fold cross-validation on 200 nights (overall accuracy = 87.1%). Finally, the recent U-Sleep algorithm (Perslev et al. 2021) was trained and evaluated on PSG recordings from 15,660 participants of 16 clinical studies. While the overall accuracy was not reported, the mean F1-score against the consensus scoring of five human experts was 0.79 for healthy adults and 0.76 for patients with sleep apnea.”

      Second, and importantly, we now perform an in-depth comparison of YASA’s performance against the Stephansen 2018 algorithm and the Perslev 2021 algorithm using the same data for all three datasets. Specifically, we have applied the three algorithms to each night of the Dreem Open Datasets (DOD) and compared their performance in dedicated tables in the Results section (Table 2 and Table 3). This procedure is fully described in a new “Comparison against existing algorithms” subsection of the Methods. None of these algorithms included nights from the DOD in their training set, thus ensuring a fair comparison of the three algorithms. Related to point 4 of the Essential Revisions, performance of the three algorithms are reported separately for healthy individuals (DOD-Healthy, n=25) and patients with sleep apnea (DOD-Obstructive, n=50). To facilitate future validation of our algorithm, we also provide the predicted hypnograms of each night in Supplementary File 1 (healthy) and Supplementary File 2 (patients).

      Overall, the comparison results show that YASA’s accuracy is not significantly different from the Stephansen 2018 algorithm for both healthy adults and patients with obstructive sleep apnea. The accuracy of the Perslev 2021 algorithm is not significantly different from YASA in healthy adults, but is higher in patients with sleep apnea. However, it should be noted that while the YASA algorithm only uses one central EEG, one EOG and one EMG, the Perslev 2021 algorithm uses all available EEGs as well as two EOGs. This suggests that adding more EEG channels and/or using the two EOGs may improve the performance of YASA in patients with sleep apnea. Though an important counterpoint is that YASA requires a far less extensive array of data (channels) to accomplish very similar levels of accuracy, which has the favorable benefit of reducing analysis computational and processing demands, improves speed of analysis (i.e. a few seconds per recording versus ~10 min for the Stephansen 2018 algorithm), and is amenable to more data recordings since many may not have sufficient EEG channels. All these points are now discussed in detail in the new “Limitations and future directions” subsection of the Discussion (see point 3 of the Essential Revisions).

      There are some overstatements in the manuscript. For example, the algorithm was trained and validated on nocturnal sleep data. Sleep characteristics (eg duration and distribution of sleep stages etc.) are different, for example, during diurnal sleep (nap) and the algorithm might not perform as well on nap data. As such, the tool might not be as "universal" as stated in the title. Additionally, as human scores are used as the ground-truth for the validation step, it might be misleading to state that "this tool offers high sleep-staging accuracy matching or exceeding human accuracy". The algorithm exceeded the accuracy of some human scorers and matched the scores of the best scorer.

      We have now removed the word “universal” from the title and replaced “exceeded human accuracy” with “matched human accuracy”. Furthermore, we have now added the fact that the algorithm was trained and validated only on nocturnal data in the Limitations section of the discussion, and as such, noted that there is the possibility that the algorithm may not perform at the same accuracy levels for daytime nap data.

      No reflection on further improvement is offered in the paper. The algorithm performs worse on N1 stage, older individuals and patients presenting sleep disorders (sleep fragmentation) and it is unclear how this could be improved in future research. In the same vein, the current work does not present performance accuracy separately for healthy individuals and patients when it is expected that accuracy would be poorer in the patient group.

      The revised manuscript now includes a dedicated section in the Discussion to propose ideas for improvements.

      First, we have now added a “Limitations and Future Directions” subsection in the Discussion to present ideas for improving the algorithm, with a particular focus on fragmented nights and/or nights from patients with sleep disorders:

      “Despite its numerous advantages, there are limitations to the algorithm that must be considered. These are discussed below, together with ideas for future improvements of the algorithm. First, while the accuracy of YASA against consensus scoring was not significantly different from the Stephansen 2018 and Perslev 2021 algorithms on healthy adults, it was significantly lower than the latter algorithm on patients with obstructive sleep apnea. The Perslev 2021 algorithm used all available EEGs and two (bilateral) EOGs, whereas YASA’s scoring was based on one central EEG, one EOG and one EMG. This suggests that one way to improve performance in this population could be the inclusion of more EEG channels and/or bilateral EOGs. For instance, using the negative product of bilateral EOGs may increase sensitivity to rapid eye movements in REM sleep or slow eye movements in N1 sleep (Stephansen et al. 2018; Agarwal et al. 2005). Interestingly, the Perslev 2021 algorithm does not use an EMG channel, which is consistent with our observation of a negligible benefit on accuracy when adding EMG to the model. This may also indicate that while the current set of features implemented in the algorithm performs well for EEG and EOG channels, it does not fully capture the meaningful dynamic information nested within muscle activity during sleep.”

      Second, we have now conducted a random forest analysis to identify the main contributors of accuracy variability. The analysis is described in detail in the “Moderator Analyses” subsection of the Results as well as Supplementary File 3b, the revision now states:

      “To better understand how these moderators influence variability in accuracy, we quantified the relative contribution of the moderators using a random forest analysis. Specifically, we included all aforementioned demographics variables in the model, together with medical diagnosis of depression, diabetes, hypertension and insomnia, and features extracted from the ground-truth sleep scoring such as the percentage of each sleep stage, the duration of the recording and the percentage of stage transitions in the hypnograms. The outcome variable of the model was the accuracy score of YASA against ground-truth sleep staging, calculated separately for each night. All the nights in the testing set 1 were included, leading to a sample size of 585 unique nights. Results are presented in Supplementary File 3b. The percentage of N1 sleep and percentage of stage transitions — both markers of sleep fragmentation — were the two top predictors of accuracy, accounting for 40% of the total relative importance. By contrast, the combined contribution of age, sex, race and medical diagnosis of insomnia, hypertension, diabete and depression accounted for roughly 10% of the total importance.”

      In addition, the performance of the algorithm in the DOD testing dataset is now reported separately for healthy individuals and patients with sleep disorders.

      As requested by the reviewer, we now analyze and report the performance of YASA on the DOD testing set separately for healthy individuals (DOD-healthy) and patients with obstructive sleep apnea (DOD-Obstructive), which can be found in section “Testing set 2”.

      There is series of methodological choices that is not justified. For example, nights were cropped to 15 minutes before and after sleep to remove irrelevant extra periods of wakefulness or artefacts on both ends of the recording. This represents an issue for the computation of important sleep measures such as sleep efficiency and latency as the onset/offset of sleep might be missed. It is also unclear how the features were selected and a description of said features is currently missing. The custom sleep stage weights procedure is unclear. The length of the time window for the smoothing procedure seems arbitrary. Last, it is currently unclear when / how the EEG and EMG data were analyzed.

      As recommended by the reviewers, the 15-min truncation step has now been removed from the pipeline. Furthermore, the Methods section has been improved to provide more details on the features. Finally, the best class-weights and smoothing windows are now found using a cross-validation analysis on the training set. For more details, we refer the reviewer to the “Justification for some methodological choices” section below.

    1. Author Response:

      Reviewer #2 (Public Review):

      The study appears robust and comprehensive, and relevant quality checks for systematic review have been applied. The results are valuable and contribute to the scientific knowledge in this field.

      Interesting findings include:

      -Adult patients with severe disease had on average a somewhat higher upper respiratory tract viral load at 1 day from symptom onset than patients with non-severe disease. After this stratification for severity, respiratory viral loads did not differ significantly for age and sex. Rates of viral clearing were similar. Children and adults with non-severe disease had similar upper respiratory tract viral loads and viral clearance rates.

      -High and persistent lower respiratory tract shedding of SARS-CoV-2 was associated with severe but not non-severe illness. The difference in lower respiratory viral load for severe and non-severe cases was more pronounced than for upper respiratory tract viral loads. In contrast to the upper respiratory tract, viral clearance from the lower respiratory tract was more rapid in non-severe than in severe cases. Again, age and sex did not differ significantly after stratification for severity.

      -The authors then aimed to assess whether the observed difference in shedding in the first days after start of symptoms could be used to predict which people would develop more severe COVID-19. Typically, deterioration into severe disease only happens around 10 days from symptom onset. The authors conclude that upper respiratory tract viral shedding is so heterogeneous that its predictive capacity of disease severity is inaccurate. In contrast, lower respiratory tract shedding does have a predictive accuracy of up to 81% for disease severity.

      Potential impact: Lower respiratory tract viral load could thus potentially be used as an early warning for developing severe COVID-19. However, lower respiratory tract samples are not routinely taken, the standard nasopharyngeal swab is an upper respiratory sample. Some discussion on the practical applicability of this suggestion could enhance the paper's impact.

      We have included additional discussion on the applicability of this:

      “Thus, LRT shedding may predict COVID-19 severity, serving as a prognostic factor. As emerging evidence suggests that timing influences the efficacy of anti-SARS-CoV-2 therapies (O'Brien et al., 2021; D. M. Weinreich et al., 2021), early clinical decision making is crucial. A prognostic indicator guides early risk stratification, identifying high-risk individuals before they deteriorate into severe COVID-19. This facilitates the early administration of the efficacious therapies to these patients and may reduce the incidence of severe and fatal COVID-19 (O'Brien et al., 2021; D. M. Weinreich et al., 2021; David M. Weinreich et al., 2021). Additional studies should further explore the prognostic utility of LRT shedding in clinical settings, including towards improving COVID-19 outcomes.

      LRT shedding can be assessed noninvasively. This study predominantly analyzed expectorated sputum, which can be obtained from a deep cough, as the LRT specimen. Since SARS-CoV-2 detection occurs more frequently in expectorated sputum than in URT specimens, including nasopharyngeal swabs (Fajnzylber et al., 2020; Wang et al., 2020; Wolfel et al., 2020), SARS-CoV-2 quantitation from sputum may more accurately diagnose COVID-19 while simultaneously predicting severity. Noninvasively induced sputum presents a potential alternative for patients without sputum production (Lai et al., 2020), although it was not assessed in this study and its prognostic utility remains to be evaluated. Furthermore, our data suggest that sex and age may not significantly influence prognostic thresholds but that the time course of disease may. Prognostication should account for the dynamics of shedding, and both the rVL and DFSO of a sputum specimen should be considered.”

      (page 12-13, line 327-355).

    1. Author Response:

      Reviewer #2 (Public Review):

      The authors used a nice combination of biochemical methodologies, including NMR and mass spectrometry, to work out the substrate specificity of this activity; they concluded that it mainly transfers alpha1-2Fuc to Galbeta1-3GlcNAc but not Galbeta1-4GlcNAc dissacharides. Interestingly, Galbeta1-3GlcNAc residues are highly abundant on surface glycoconjugates expressed by the procyclic (midgut) stage and also as terminal sugars on N-glycans from flagellar pocket glycoproteins expressed in the blood stages, but their mitochondrial presence has never been found in any organism. It remains to be determined what is the natural substrate of this enzyme and whether fucosylation indeed occurs in the mitochondrion of kinetoplastid organisms.

      I only have few comments and no further experiments are requested.

      1. Did the authors try to determine the exact location of TbFUT1 within the parasite mitochondrion, for instance using TEM immunogold? This could help understanding access to GDP-Fuc (although I believe the enzyme is facing the cytosol) and also the possible location of the natural acceptor molecule(s).

      No, we did not. Cryo-immunoEM performed by Guo et al. suggests that the LmjFUT1 localises to the mitochondrial lumen. The assumption would be that TbFUT1 also localises to this mitochondrial compartment, but this indeed needs to be experimentally confirmed. A localisation in the mitochondrial lumen or the inter-membrane space would not suggest easy access to cytosolic GDP-Fuc pools and the need for transporters, since T. brucei sugar nucleotide biosynthetic enzymes are predominantly localised in the glycosomes. Consequently, these sugar precursors would need to be transported out of these organelles to the cytosol and then into ER, Golgi and, as we suggest here, mitochondrion.

      1. In relation with the previous point, given the challenges in trying to localise fucosylated glycans using fucose-specific lectins, I wonder if there is a precedent for detecting terminal beta-Gal residues on the trypanosome mitochondrion using lectins.

      Thank you for the suggestion. This is definitely an experiment worth trying in the search for TbFUT1 endogenous substrates. We concentrated our efforts on the fucose-specific lectins, although unsuccessfully, and did not look at staining of either wild type or TbFUT1-deficient cells with beta-Gal specific lectins. Previously published IFA with the beta-Gal specific lectin RCA120 labelled the flagellar pocket of bloodstream form trypanosomes, where the N-glycans carrying poly-LacNAc repeats are abundant. However, no permeabilisation step was performed and so any organelle labelling would have been missed (Atrih et al., 2005, JBC, 280:865-871)

      1. I found interesting that N-terminal tagging of TbTUF1 sends the protein to the Golgi apparatus. This seems like a great coincidence for a protein that normally would be predicted to be Golgi-resident, so I wonder if there is any identifiable Golgi targeting sequence within TbTUF1. Also, there was any attempt to localise the protein after deletion of the mitochondrial signal (no tagging)?

      We agree with the reviewer that the localization of the N-terminally tagged TbFUT1 to the Golgi is an odd coincidence. It is worth noting that L. major FUT1 tagged at the N-terminus localizes to the cytosol as described in Guo et al., 2021. The same is observed of a tagged LmjFUT1 lacking the N-terminal mitochondrial targeting sequence. Furthermore, data from the TrypTag project suggests N-terminal tagging of TbFUT1 in PCF results in cytosolic and nucleoplasm localisation (tryptag.org, Dean et al., 2017, Trends Parasitol, 33:80-82). As briefly mentioned in the results section, the same algorithms that suggested untagged TbFUT1 was likely to localize to the mitochondrion, confirm this prediction for the C-terminally MYC3-tagged protein, but not the N-terminal HA3-tagged versions. However, the predictions for either HA3-TbFUT1 or HA3-TbFUT1-MYC3 do not support the observed Golgi localisation and are in better agreement with the cytosolic staining observed for the comparable LmjFUT1 construct or PCF expression. In the case of mammalian and yeast Golgi-resident glycosyltransferases (GTs), retention of these type II membrane proteins to the Golgi seems to be dependent on a combination of features in the cytosolic tail, TM and stem domains however no specific sequence or motif has been identified so far (Tu & Banfield 2010, Cell Mol Life Sci, 67:29-41). There is no predicted signal peptide for TbFUT1 and the prediction of a type II topology is weak (see Reviewer 1, point 3) and thus it does not fit with what is known about retention in the Golgi.

      We did not try to localize TbFUT1 lacking the putative mitochondrial targeting sequence, but the experiment is very much worth performing to better understand this fucosyltransferase behaviour (see also Reviewer 3, point 1).

    1. Author Response:

      Reviewer #1 (Public Review):

      The manuscript provides very high quality single-cell physiology combined with population physiology to reveal distinctives roles for two anatomically dfferent LN populations in the cockroach antennal lobe. The conclusion that non-spiking LNs with graded responses show glomerular-restricted responses to odorants and spiking LNs show similar responses across glomeruli generally supported with strong and clean data, although the possibility of selective interglomerular inhibition has not been ruled out. On balance, the single-cell biophysics and physiology provides foundational information useful for well-grounded mechanistic understanding of how information is processed in insect antennal lobes, and how each LN class contributes to odor perception and behavior.

      Thank you for this positive feedback.

      Reviewer #2 (Public Review):

      The manuscript "Task-specific roles of local interneurons for inter- and intraglomerular signaling in the insect antennal lobe" evaluates the spatial distribution of calcium signals evoked by odors in two major classes of olfactory local neurons (LNs) in the cockroach P. Americana, which are defined by their physiological and morphological properties. Spiking type I LNs have a patchy innervation pattern of a subset of glomeruli, whereas non-spiking type II LNs innervate almost all glomeruli (Type II). The authors' overall conclusion is that odors evoke calcium signals globally and relatively uniformly across glomeruli in type I spiking LNs, and LN neurites in each glomerulus are broadly tuned to odor. In contrast, the authors conclude that they observe odor-specific patterns of calcium signals in type II nonspiking LNs, and LN neurites in different glomeruli display distinct local odor tuning. Blockade of action potentials in type I LNs eliminates global calcium signaling and decorrelates glomerular tuning curves, converting their response profile to be more similar to that of type II LNs. From these conclusions, the authors infer a primary role of type I LNs in interglomerular signaling and type III LNs in intraglomerular signaling.

      The question investigated by this study - to understand the computational significance of different types of LNs in olfactory circuits - is an important and significant problem. The design of the study is straightforward, but methodological and conceptual gaps raise some concerns about the authors' interpretation of their results. These can be broadly grouped into three main areas.

      1) The comparison of the spatial (glomerular) pattern of odor-evoked calcium signals in type I versus type II LNs may not necessarily be a true apples-to-apples comparison. Odor-evoked calcium signals are an order of magnitude larger in type I versus type II cells, which will lead to a higher apparent correlation in type I cells. In type IIb cells, and type I cells with sodium channel blockade, odor-evoked calcium signals are much smaller, and the method of quantification of odor tuning (normalized area under the curve) is noisy. Compare, for instance, ROI 4 & 15 (Figure 4) or ROI 16 & 23 (Figure 5) which are pairs of ROIs that their quantification concludes have dramatically different odor tuning, but which visual inspection shows to be less convincing. The fact that glomerular tuning looks more correlated in type IIa cells, which have larger, more reliable responses compared to type IIb cells, also supports this concern.

      We agree with the reviewer that "the comparison of the spatial (glomerular) pattern of odor-evoked calcium signals is not necessarily a true apples-to-apples comparison". Type I and type II LNs are different neuron types. Given their different physiology and morphology, this is not even close to a "true apples-to-apples comparison" - and a key point of the manuscript is to show just that.

      As we have emphasized in response to Essential Revision 1, the differences in Ca2+ signals are not an experimental shortcoming but a physiologically relevant finding per se. These data, especially when combined with the electrophysiological data, contribute to a better understanding of these neurons’ physiological and computational properties.

      It is physiologically determined that the Ca2+ signals during odorant stimulation in the type II LNs are smaller than in type I LNs. And yes, the signals are small because small postsynpathetic Ca2+ currents predominantly cause the signals. Regardless of the imaging method, this naturally reduces the signal-to-noise ratio, making it more challenging to detect signals. To address this issue, we used a well-defined and reproducible method for analyzing these signals. In this context, we do not agree with the very general criticism of the method. The reviewer questions whether the signals are odorant-induced or just noise (see also minor point 12). If we had recorded only noise, we would expect all tuning curves (for each odorant and glomerulus) to be the same. In this context, we disagree with the reviewer's statement that the tuning curves do not represent the Ca2+ signals in Figure 4 (ROI 4 and 15) and Figure 5 (ROI 16 and 23). This debate reflects precisely the kind of 'visual inspection bias' that our clearly defined analysis aims to avoid. On close inspection, the differences in Ca2+ signals can indeed be seen. Figure II (of this letter) shows the signals from the glomeruli in question at higher magnification. The sections of the recordings that were used for the tuning curves are marked in red.

      Figure II: Ca2+ signals of selected glomeruli that were questioned by the reviewer.

      2) An additional methodological issue that compounds the first concern is that calcium signals are imaged with wide-field imaging, and signals from each ROI likely reflect out of plane signals. Out of plane artifacts will be larger for larger calcium signals, which may also make it impossible to resolve any glomerular-specific signals in the type I LNs.

      Thank you for allowing us to clarify this point. The reviewer comment implies that the different amplitudes of the Ca2+ signals indicate some technical-methodological deficiency (poorly chosen odor concentration). But in fact, this is a key finding of this study that is physiologically relevant and crucial for understanding the function of the neurons studied. These very differences in the Ca2+ signals are evidence of the different roles these neurons play in AL. The different signal amplitudes directly show the distinct physiology and Ca2+ sources that dominate the Ca2+ signals in type I and type II LNs. Accordingly, it is impractical to equalize the magnitude of Ca2+ signals under physiological conditions by adjusting the concentration of odor stimuli.

      In the following, we address these issues in more detail: 1) Imaging Method 2) Odorant stimulation 3) Cell type-specific Ca2+ signals

      1) Imaging Method:

      Of course, we agree with the reviewer comment that out-of-focus and out-of-glomerulus fluorescence can potentially affect measurements, especially in widefield optical imaging in thick tissue. This issue was carefully addressed in initial experiments. In type I LNs, which innervate a subset of glomeruli, we detected fluorescence signals, which matched the spike pattern of the electrophysiological recordings 1:1, only in the innervated glomeruli. In the not innervated ROIs (glomeruli), we detected no or comparatively very little fluorescence, even in glomeruli directly adjacent to innervated glomeruli.

      To illustrate this, FIGURE I (of this response letter) shows measurements from an AL in which an uniglomerular projection neuron was investigated in an a set of experiments that were not directly related to the current study. In this experiment, a train of action potential was induced by depolarizing current. The traces show the action potential induced fluorescent signals from the innervated glomerulus (glomerulus #1) and the directly adjacent glomeruli.

      These results do not entirely exclude that the large Ca2+ signals from the innervated LN glomeruli may include out-of-focus and out-of-glomerulus fluorescence, but they do show that the bulk of the signal is generated from the recorded neuron in the respective glomeruli.

      Figure I: Simultaneous electrophysiological and optophysiological recordings of a uniglomerular projection using the ratiometric Ca2+ indicator fura-2. The projection neuron has its arborization in glomerulus 1. The train of action potentials was induced with a depolarizing current pulse (grey bar).

      2) Odorant Stimulation: It is important to note that the odorant concentration cannot be varied freely. For these experiments, the odorant concentrations have to be within a 'physiologically meaningful' range, which means: On the one hand, they have to be high enough to induce a clear response in the projection neurons (the antennal lobe output). On the other hand, however, the concentration was not allowed to be so high that the ORNs were stimulated nonspecifically. These criteria were met with the used concentrations since they induced clear and odorant-specific activity in projection neurons.

      3) Cell type-specific Ca2+ signals:

      The differences in Ca2+ signals are described and discussed in some detail throughout the text (e.g., page 6, lines 119-136; page 9, lines 193-198; page 10-11, lines 226-235; page 14-15, line 309-333). Briefly: In spiking type I LNs, the observed large Ca2+ signals are mediated mainly by voltage-depended Ca2+ channels activated by the Na+-driven action potential's strong depolarization. These large Ca2+ signals mask smaller signals that originate, for example, from excitatory synaptic input (i.e., evoked by ligand-activated Ca2+ conductances). Preventing the firing of action potentials can unmask the ligand-activated signals, as shown in Figure 4 (see also minor comments 8. and 10.). In nonspiking type II LNs, the action potential-generated Ca2+ signals are absent; accordingly, the Ca2+ signals are much smaller. In our model, the comparatively small Ca2+ signals in type II LNs are mediated mainly by (synaptic) ligand-gated Ca2+ conductances, possibly with contributions from voltage-gated Ca2+ channels activated by the comparatively small depolarization (compared with type I LNs).

      Accordingly, our main conclusion, that spiking LNs play a primary role in interglomerular signaling, while nonspiking LNs play an essential role in intraglomeular signaling, can be DIRECTLY inferred from the differences in odorant induced Ca2+ signals alone.

      a) Type I LN: The large, simultaneous, and uniform Ca2+ signals in the innervated glomeruli of an individual type I LN clearly show that they are triggered in each glomerulus by the propagated action potentials, which conclusively shows lateral interglomerular signal propagation.

      b) Type II LNs: In the type II LNs, we observed relatively small Ca2+ signals in single glomeruli or a small fraction of glomeruli of a given neuron. Importantly, the time course and amplitude of the Ca2+ signals varied between different glomeruli and different odors. Considering that type II LNs in principle, can generate large voltage-activated Ca2+ currents (larger that type I LNS; page 4, lines 82-86, Husch et al. 2009a,b; Fusca and Kloppenburg 2021), these data suggest that in type II LNs electrical or Ca2+ signals spread only within the same glomerulus; and laterally only to glomeruli that are electrotonically close to the odorant stimulated glomerulus.

      Taken together, this means that our conclusions regarding inter- and intraglomerular signaling can be derived from the simultaneously recorded amplitudes and the dynamics of the membrane potential and Ca2+ signals alone. This also means that although the correlation analyses support this conclusion nicely, the actual conclusion does not ultimately depend on the correlation analysis. We had (tried to) expressed this with the wording, “Quantitatively, this is reflected in the glomerulus-specific odorant responses and the diverse correlation coefficiiants across…” (page 10, lines 216-217) and “ …This is also reflected in the highly correlated tuning curves in type I LNs and low correlations between tuning curves in type II LNs”(page 13, lines 293-295).

      3) Apart from the above methodological concerns, the authors' interpretation of these data as supporting inter- versus intra-glomerular signaling are not well supported. The odors used in the study are general odors that presumably excite feedforward input to many glomeruli. Since the glomerular source of excitation is not determined, it's not possible to assign the signals in type II LNs as arising locally - selective interglomerular signal propagation is entirely possible. Likewise, the study design does not allow the authors to rule out the possibility that significant intraglomerular inhibition may be mediated by type I LNs.

      The reviewer addresses an important point. However, from the comment, we get the impression that he/she has not taken into account the entire data set and the DISCUSSION. In fact, this topic has already been discussed in some detail in the original version (page 12, lines 268-271; page 15-16; lines 358-374). This section even has a respective heading: "Inter- and intraglomerular signaling via nonspiking type II LNs" (page 15, line 338). We apologize if our explanations regarding this point were unclear, but we also feel that the reviewer is arguing against statements that we did not make in this way.

      a) In 11 out of 18 type II LNs we found 'relatively uncorrelated' (r=0.43±0.16, N=11) glomerular tuning curves. These experiments argue strongly for a 'local excitation' with restricted signal propagation and do not provide support for interglomerular signal propagation. Thus, these results support our interpretation of intraglomerular signaling in this set of neurons.

      b) In 7 out of 18 experiments, we observed 'higher correlated' glomerular tuning curves (r=0.78±0.07, N=7). We agree with the reviewer that this could be caused by various mechanisms, including simultaneous input to several glomeruli or by interglomerular signaling. Both possibilities were mentioned and discussed in the original version of the manuscript (page 12, lines 268-271; page 15-16; lines 358-374). In the Discussion, we considered the latter possibility in particular (but not exclusively) for the type IIa1 neurons that generate spikelets. Their comparatively stronger active membrane properties may be particularly suitable for selective signal transduction between glomeruli.

      c) We have not ruled out that local signaling exists in type I LNs – in addition to interglomerular signaling. The highly localized Ca2+ signals in type I LNs, which we observed when Na+ -driven action potential generation was prevented, may support this interpretation. However, we would like to reiterate that the simultaneous electrophysiological and optophysiological recordings, which show highly correlated glomerular Ca2+ dynamics that match 1:1 with the simultaneously recorded action potential pattern, clearly suggest interglomerular signaling. We also want to emphasize that this interpretation is in agreement with previous models derived from electrophysiological studies(Assisi et al., 2011; Fujiwara et al., 2014; Hong and Wilson, 2015; Nagel and Wilson, 2016; Olsen and Wilson, 2008; Sachse and Galizia, 2002; Wilson, 2013).

      In light of the reviewer's comment(s), we have modified the text to clarify these points (page 14, lines 317-319).

      Reviewer #3 (Public Review):

      To elucidate the role of the two types of LNs, the authors combined whole-cell patch clamp recordings with calcium imaging via single cell dye injection. This method enables to monitor calcium dynamics of the different axons and branches of single LNs in identified glomeruli of the antennal lobe, while the membrane potential can be recorded at the same time. The authors recorded in total from 23 spiking (type I LN) and 18 non-spiking (type II LN) neurons to a set of 9 odors and analyzed the firing pattern as well as calcium signals during odor stimulation for individual glomeruli. The recordings reveal on one side that odor-evoked calcium responses of type I LNs are odor-specific, but homogeneous across glomeruli and therefore highly correlated regarding the tuning curves. In contrast, odor-evoked responses of type II LNs show less correlated tuning patterns and rather specific odor-evoked calcium signals for each glomerulus. Moreover the authors demonstrate that both LN types exhibit distinct glomerular branching patterns, with type I innervating many, but not all glomeruli, while type II LNs branch in all glomeruli.

      From these results and further experiments using pharmacological manipulation, the authors conclude that type I LNs rather play a role regarding interglomerular inhibition in form of lateral inhibition between different glomeruli, while type II LNs are involved in intraglomerular signaling by developing microcircuits in individual glomeruli.

      In my opinion the methodological approach is quite challenging and all subsequent analyses have been carried out thoroughly. The obtained data are highly relevant, but provide rather an indirect proof regarding the distinct roles of the two LN types investigated. Nevertheless, the conclusions are convincing and the study generally represents a valuable and important contribution to our understanding of the neuronal mechanisms underlying odor processing in the insect antennal lobe. I think the authors should emphasize their take-home messages and resulting conclusions even stronger. They do a good job in explaining their results in their discussion, but need to improve and highlight the outcome and meaning of their individual experiments in their results section.

      Thank you for this positive feedback.

      References:

      Assisi, C., Stopfer, M., Bazhenov, M., 2011. Using the structure of inhibitory networks to unravel mechanisms of spatiotemporal patterning. Neuron 69, 373–386. https://doi.org/10.1016/j.neuron.2010.12.019

      Das, S., Trona, F., Khallaf, M.A., Schuh, E., Knaden, M., Hansson, B.S., Sachse, S., 2017. Electrical synapses mediate synergism between pheromone and food odors in Drosophila melanogaster . Proc Natl Acad Sci U S A 114, E9962–E9971. https://doi.org/10.1073/pnas.1712706114

      Fujiwara, T., Kazawa, T., Haupt, S.S., Kanzaki, R., 2014. Postsynaptic odorant concentration dependent inhibition controls temporal properties of spike responses of projection neurons in the moth antennal lobe. PLOS ONE 9, e89132. https://doi.org/10.1371/journal.pone.0089132

      Fusca, D., Husch, A., Baumann, A., Kloppenburg, P., 2013. Choline acetyltransferase-like immunoreactivity in a physiologically distinct subtype of olfactory nonspiking local interneurons in the cockroach (Periplaneta americana). J Comp Neurol 521, 3556–3569. https://doi.org/10.1002/cne.23371

      Fuscà, D., and Kloppenburg, P. (2021). Odor processing in the cockroach antennal lobe-the network components. Cell Tissue Res.

      Hong, E.J., Wilson, R.I., 2015. Simultaneous encoding of odors by channels with diverse sensitivity to inhibition. Neuron 85, 573–589. https://doi.org/10.1016/j.neuron.2014.12.040

      Husch, A., Paehler, M., Fusca, D., Paeger, L., Kloppenburg, P., 2009a. Calcium current diversity in physiologically different local interneuron types of the antennal lobe. J Neurosci 29, 716–726. https://doi.org/10.1523/JNEUROSCI.3677-08.2009

      Husch, A., Paehler, M., Fusca, D., Paeger, L., Kloppenburg, P., 2009b. Distinct electrophysiological properties in subtypes of nonspiking olfactory local interneurons correlate with their cell type-specific Ca2+ current profiles. J Neurophysiol 102, 2834–2845. https://doi.org/10.1152/jn.00627.2009

      Nagel, K.I., Wilson, R.I., 2016. Mechanisms Underlying Population Response Dynamics in Inhibitory Interneurons of the Drosophila Antennal Lobe. J Neurosci 36, 4325–4338. https://doi.org/10.1523/JNEUROSCI.3887-15.2016

      Neupert, S., Fusca, D., Kloppenburg, P., Predel, R., 2018. Analysis of single neurons by perforated patch clamp recordings and MALDI-TOF mass spectrometry. ACS Chem Neurosci 9, 2089–2096.

      Olsen, S.R., Bhandawat, V., Wilson, R.I., 2007. Excitatory interactions between olfactory processing channels in the Drosophila antennal lobe. Neuron 54, 89–103. https://doi.org/10.1016/j.neuron.2007.03.010

      Olsen, S.R., Wilson, R.I., 2008. Lateral presynaptic inhibition mediates gain control in an olfactory circuit. Nature 452, 956–960. https://doi.org/10.1038/nature06864

      Sachse, S., Galizia, C., 2002. Role of inhibition for temporal and spatial odor representation in olfactory output neurons: a calcium imaging study. J Neurophysiol. 87, 1106–17.

      Shang, Y., Claridge-Chang, A., Sjulson, L., Pypaert, M., Miesenbock, G., 2007. Excitatory Local Circuits and Their Implications for Olfactory Processing in the Fly Antennal Lobe. Cell 128, 601–612.

      Wilson, R.I., 2013. Early olfactory processing in Drosophila: mechanisms and principles. Annu Rev Neurosci 36, 217–241. https://doi.org/10.1146/annurev-neuro-062111-150533

      Yaksi, E., Wilson, R.I., 2010. Electrical coupling between olfactory glomeruli. Neuron 67, 1034–1047. https://doi.org/10.1016/j.neuron.2010.08.041

    1. Author Response:

      Reviewer #1:

      This paper sought to dissect the relative impact of history, selection, and chance, on the evolution of antibiotic resistance in the clinically relevant species Acinetobacter baumannii. The authors conducted adaptive evolutions of A. baumannii isolates that had been previously adapted to diverse environments, thus establishing distinct histories. The authors show that the impact of history becomes increasingly diminished as selection strength increases, and several specific observations were made about resistance to beta lactams and their collateral effects of ciprofloxacin resistance. Overall the question being asked is important and the observations made are quite interesting. However, the analysis lacks sufficient depth to draw specific conclusions, and many confounding effects (such as the lack of propagation in a drug-free environment) are not taken into account.

      Thanks for the comments. We have included the assumptions and limitations of our study, toned down some conclusions., and clarified that we propagated a control in a drug-free environment, which wasn’t clear in the previous version of the manuscript.

      Minor comments:

      The authors seem to cite themselves an inappropriate amount of times for key findings, and many highly established evolutionary studies on this very topic were not included. For example in line 79 - mutation rate is a well documented parameter that has been estimated long before their work in 2019. Likewise, there have been a large number of studies that leverage population data that were not included.

      Thanks for the comment. We have carefully reviewed the citations and included or deleted some references better reflecting the state-of-the-art of the field. To clarify, the citation in line 79 is fully justified. We agree that mutation rate is a well-documented parameter, and in the cited paper we used previous literature to analyze the probability that each base was mutated in an 80 generations experimental evolution propagating Acinetobacter baumannii with pops sizes higher than 1 × 107 that it is exactly the experimental setup of the current manuscript. Nevertheless, we agree with the reviewer and we have added more citations and reduced the number of citations of ourselves.

      Reviewer #2:

      The experimental design in this manuscript is exquisite. Its is simple in rationale yet also very clever and the work is performed to an excellent standard. The authors clearly address the extent to which history, chance and selection lead to the evolution of AMR, and it is all the more stronger that this is done in a real MDR clinical pathogen (A. baumannii) rather than lab E. coli.

      The work shows that history can influence AMR evolution, but that clearly natural selection is a dominant driver. This provides clear unambiguous data on the importance of antibiotic exposure on the evolution of AMR and will interest evolutionary biologists, microbiologists and clinicians.

      We are proud of this summary, and we would like to acknowledge Travisano, Lenski and coworkers for the elegant, simple and clever experimental design described in 1995, which was foundational for our study 25 years later.

      Reviewer #3:

      The manuscript by Santos-Lopez and colleagues investigates the roles of history, chance, and selection on the evolution of antibiotic resistance in the pathogen A. Baumannii. In previous work, they showed that the genotypic and phenotypic evolution of (fluoroquinolone) resistance differed between well-mixed and spatially extended (biofilm) environments; this work uses laboratory evolution experiments to investigate further evolution in response to new (beta lactam) drugs. Their experimental design is based on a simple but elegant assay for distinguishing the impact of previous adaptation ("history"), random deviations across replicate populations ("chance"), and selective pressure from the newly applied drug ("selection"). They found that while prior history of selection (including prior growth environment) often impacts evolution of resistance to a new drug, increasing concentrations of that drug generally reduced historical contingencies-that is, the prior selecting conditions became less influential on the new adaptation trajectories (quantified by MIC-based direct and collateral resistances). They also performed extensive population sequencing of the evolved populations and similarly quantified the effects of history, chance, and selection using aggregate measures of genome similarity based on Manhattan distance metrics. Notably, they found that strains originally selected in structured environments exhibited genetic reversion and a corresponding loss of resistance to the initial drug.

      Overall, this study addresses an interesting and important problem. It is well designed, with careful attention to both the phenotypic and genotypic analysis of evolved strains, and the results contribute new insight into the trade-offs associated with antibiotic resistance in an ESKAPE pathogen. I enjoyed reading this work. My comments below are suggestions to improve the paper and can be addressed by additional clarification and/or discussion of the limitations of the approach.

      Thanks for this comment, which, in our opinion, is a perfect summary of the two manuscripts that compose this research.

      Minor:

      • Define / cite ESKAPE pathogens for readers not familiar

      We have included the definition (Lines 104-106)

      • Why choose the Manhattan metric? It is not unreasonable, but I am wondering 1) if there is a deeper theoretical justification and 2) whether other metrics could be expected to give similar qualitative results.

      In previous experiments (Turner et al. 2018) we have used Bray-Curtis similarity as a metric for genetic difference between populations. However, Bray-Curtis and other related metrics calculate an average similarity across the genes with mutations. For assessing the roles of history, chance and adaptation, we needed an additive metric where the difference between populations strictly increases as more mutations occur. Of the commonly used distance metrics, Manhattan distance was a logical choice over Euclidean distance because each mutation independently adds to the genetic distance between populations. Hamming distance would consider only the presence or absence of mutations, not their frequency.

    1. Author Response:

      Reviewer #1 (Public Review):

      The paper is a tour-de-force across multiple techniques and model systems from classical forward screening in C. elegans over ChIP to targeted CRISPR mutagenesis. The data is of a very high quality and supports most of the authors' claims strongly and convincingly. Finally, the manuscript is well written and, in spite of complex experiments and genetics, interesting and easy to comprehend.

      • CAMTA, as the name CaM-binding transcription activator implies, have been studied previously and across many different organisms including plants, mice and humans. It was thus presumed and in part shown that CAMTAs regulate transcription depending on CaM levels.
      • The authors confirm that the gene cmd-1 (encoding CaM) is directly regulated by Camt-1 by using a combination of cell-specific RNAseq and ChIP. This allows them to identify three binding sites upstream of the cmb-1 gene that bind to Camt-1.
      • Moreover, the authors show that overexpression of CaM in the nervous system fully rescues the observed behavioral phenotypes.
      • Importantly, the authors make another discovery. They show that CaM can directly repress its own transcription by binding to specific residues of Camt-1. Thereby, the authors argue, Camt-1 is used to precisely and bidirectionally regulate CaM levels dependent on the cell, animal's state etc.

      The reported data are interesting and, in particular, the aspect that CAMTAs likely act as activators AND repressors is a novel aspect previously not appreciated. In spite of all these strengths, a potential weakness is that it remains open whether this mechanism is primarily a house-keeping mechanism or is indeed, as the authors speculate, regulated by internal and external factors that might, through CAMTA, make cells more or less responsive to Ca2+-CaM signaling.

      We are grateful for and encouraged by our reviewer’s comments. We think that our discovery that CAMTAs regulate CaM expression is thought provoking.

      Reviewer #2 (Public Review):

      Vuong-Brender, Flynn, and de Bono report a detailed analysis of the function of a highly conserved calcium-calmodulin-dependent transcriptional regulator in the function of the C. elegans sensory nervous system. The C. elegans homolog of this factor - CAMT-1 - emerged from a genetic screen for mutants defective in a sensory-driven aggregation behavior. The authors find that multiple chemosensory modalities are disrupted by loss of CAMT-1, and this factor has distributed functions in the nervous system, including in interneurons that receive inputs from sensory neurons. A major finding of this study is that many of the effects of CAMT-1 mutation can be linked to a critical role for CAMT-1 in regulating expression of calmodulin itself. This finding is supported by multiple lines of experimentation, including a demonstration that the effects of losing CAMT-1 can be compensated by restoring expression of calmodulin. The authors further show that what is true for CAMT-1 and calmodulin in C. elegans also applies to Drosophila, indicating that CAMT-1 is a regulator of calmodulin expression whose function has been conserved throughout evolution. This manuscript has many strengths. Key hypotheses are tested using quantitative and technically independent experimental methods. The case that CAMT-1 is a regulator of calmodulin expression is built carefully and, for the most part, the logic of the argument is made clearly and supported by compelling data. Another strength of the manuscript is its candid exposition of data that do not fit neatly into the most simple and accessible model. It is refreshing to see authors who freely admit that they haven't neatly wrapped up every question in a field. The loose ends in this study do not impact the authors' main conclusions. However, some observations seem to consume more bandwidth than warranted, and the authors should consider reorganizing the manuscript so that the loose ends do not distract from the main thread of the narrative. The paper does have a few minor weaknesses that could be addressed. These are listed below.

      We thank our reviewer for their thoughtful review.

      Specific comments:

      1. The initial description of the isolation of camt-1 mutants seemed a bit disorganized. A description of the gene and gene product preceded descriptions of the mutants. Also, some mutants were mentioned in the text but not presented in the corresponding figure. The authors should consider minor changes to better communicate how the mutations were cloned.

      We have sought to do this.

      1. In Fig. 2 npr-1 baselines vary a great deal between panels A, B, and C. It is not clear why npr-1 behavior is this variable, and the authors do not mention this obvious feature of their data. Data presented in Fig. 2 indicate that heat-shock-induced expression of camt-1 restores a defect in basal locomotion, but it is unclear whether it restores O2-sensitivity - the effect of oxygen on speed of transgenics seems the same +/- heatshock (compare black traces in panels 2B and 2C). We understand the concern of the reviewer. Since the design of these experiments was different from the rest (with only one shift in O2 concentration), we repeated them with 3 O2 changes, bringing them in line with the rest of the manuscript. The results are presented in the new Figure 2. We observed a more consistent baseline speed between different conditions, however some differences still exist (for example between panel 2A and 2B). One explanation is that for heatshock experiments we keep npr-1 animals at lower temperature (20 degree Celsius, panels 2B and 2C) to minimize basal activity of the heatshock promoter, whereas in the rescue experiment in Figure 2A, and in the rest of the manuscript, animals were kept at 22 oC. Figure 2B-C of our original submission used worms raised at 15 oC for the heatshock experiment, which may explain the greater discrepancy in npr-1 speed values. Heatshock also modifies slightly the response of the npr-1 control animals to O2.

      Regarding whether heat-shock-induced expression of camt-1 restores O2 responses, we found that the npr-1; camt-1; dbExhsp-16p::camt-1 heat-shocked strains aggregated much more than npr-1; camt-1 heat-shocked animals. However, the rescue is not complete. Thus expressing camt-1 using heatshock-induced expression restores some O2 sensitivity which correlates well with the partial rescue of the baseline in Figure 2C. We have noted this in the results.

      1. Unlike other datasets, the responses of wild-type AFDs to CO2 do not look particularly convincing (panel 3C). There is clearly an effect of camt-1 mutation on AFD calcium, but the AFD responses seem qualitatively different from the responses of BAGs to CO2 or URXs to O2. The authors might consider moving these data to a supplementary figure and tempering their description of wild-type AFDs as CO2-sensors.

      The data on AFD has been moved to Figure 3 – figure supplement 1. We should add that we agree that in the absence of an identified CO2 sensor expressed in AFD, we cannot be sure that AFD neurons are primary CO2 sensors. Although the AFD CO2-evoked responses are retained in mutants defective in synaptic transmission, they may very well still be indirectly evoked by other neurons.

      1. The authors candidly present data that do not conform to a simple model for how camt-1 affects behavior. Loss of camt-1 increases calcium in sensory neurons that activate the speed-controlling interneuron RMG. However, RMG calcium is reduced in camt-1 mutants. This inversion in the effect of camt-1 mutation might be caused by a homeostatic mechanism, as the authors propose. It might be possible to test this hypothesis by testing whether reducing excitatory input into RMGs elevates resting calcium in camt-1 mutants, for example via mutations that affect sensory transduction.

      In the interest of simplifying the manuscript, and given other comments, we have now removed the RMG Ca2+ imaging data. However, this is an interesting way of testing what is going.

      1. In Fig. 4H RMG data are presented as fractional ratio change - all other imaging data are presented as absolute ratios of YFP and CFP fluorescence. It is not clear why these data are treated differently. It is also no clear that these data are consistent with data shown in Fig. 3F. Which dataset represents the effect of camt-1 mutation on RMG calcium? More measurements might be warranted.

      As highlighted above we have removed the RMG imaging data from the paper. .

      1. Nice experiments show that regulation of calmodulin in Drosophila requires a CAMT-1 homolog. The bar graphs showing unity for values normalized to themselves are a bit odd - perhaps there's a more compact way to plot these data.

      We have sought to address this question in two ways. First, we have further buttressed our results by performing in situ immunofluorescence staining of dissected fly retinas with a calmodulin antibody. We see a significant decrease in calmodulin expression in fly CAMTA mutants compared to controls.

      Prompted by this comment, we also realized we omitted an explanation of how we normalized the data for the qPCR graphs in the figure legend. This was done using rRNA as a control. The Yamamoto lab had previously used the same control to normalize CAMTA expression in wild type and mutant flies. We add a note saying this.

      1. ChIPseq analysis of CAMT-1 is also quite nice. Is there a sequence motif for CAMT-1 binding that emerges from this study? If so, how does this motif compare to motifs from studies of CAMT-1 homologs in other species?

      We used the MEME algorithm, (motif-based sequence analysis tools (https://meme-suite.org/) to seek enriched sequence motifs in our ChIPSeq data. This identified a series of enriched motifs, although none coincided with the peaks at the CMD-1 promoter. However, we did observe sequences resembling the mouse CAMTA1 binding site at the centre of each of the three CAMT-1 binding peaks upstream of cmd-1. We now say this is the discussion.

      1. Figure 7 shows that CMD-1 inhibits cmd-1 expression via interaction with CAMT-1. These data are interesting, but it is not clear how this effect can be related to prior data showing that forced expression of CMD-1 can compensate for loss of CAMT-1. The authors behavioral and physiological studies suggest that in vivo CAMT-1 promotes CMD-1 expression. In Figure 7, they suggest that CAMT-1 inhibits expression of CMD-1, but there is no clear link to behavior or physiology for this repressor-function of CAMT-1. The manuscript might be more clear without these data, and the absence of these data would not affect the overall impact of the study.

      We agree that the feedback control of cmd-1 gene expression by CMD-1 interacting with CAMT-1 is a part of the story that has not been fully developed. Given the feedback from our reviewers and Editors to give these findings less prominence, but not remove them entirely, we moved the data into supplementary information. We have also altered the main text and the legend of Figure 7 to explicitly say that further experiments are needed to establish if this feedback is relevant under physiological conditions.

      Reviewer #3 (Public Review):

      Vuong-Brender et al present a thorough study investigating how CaM-binding transcription activators (CAMTAs) in C. elegans and Drosophila are required for numerous behaviors and proper neuronal function. The study is strong in how it uses a variety of approaches to study a major underlying mechanism for CAMTA. First, they use reporters, mutant analysis, and heat-shock rescue to show how cart-1 is expressed widely in neurons and functions in adults in several behaviors. They used transcriptional profiling to show that cart-1 is required to upregulate CaM in subsets of neurons in worm. They next use ChIP-seq to zero in on where worm CAMT-1 binds regulatory regions upstream of the CaM gene cmd-1 to promote its expression. They find that overexpression of CaM compensates for behavioral and neuronal response deficits in a cart-1 mutants. Lastly, they propose that when CaM highly expressed, it may down regulate its own expression by binding CART-1.

      We thank our reviewer for their critique of our work.

      1. Overall, I feel that the study is excellent and most conclusions are justified by evidence. However, I do not think the title is supported by the data. It currently is listed as: CAMTA TUNES NEURAL EXCITABILITY AND BEHAVIOR BY MODULATING CALMODULIN EXPRESSION. The authors show evidence that camt-1 is required for the normal function of neurons and behavior by promoting expression of CaM. Their only evidence that camt-1 downregulates CaM is a more artificial situation where CaM is overexpressed. I don't think they provide any evidence that camt-1 is used to "tune" behavior or neuron activity up and down in a wild-type strain. Tuning implies that the molecule modulates a physiological system bidirectionally in a natural situation. I suggest using a more accurate title that better fits the experimental evidence.

      We have changed the title to ‘Neuronal Calmodulin levels are controlled by CAMTA transcription factors’. We hope this more neutral title is appropriate to describe our findings.

      1. They show ample evidence that cart-1 appears to promote the expression of cmd-1 in most cases. This includes showing that overexpression of cmd-1 suppresses the behavioral and imaging phenotypes of cart-1. But they didn't perform the more straight forward epistasis test with the cart-1;cmd-1 double mutant in worm or fly , presumably because there is no viable loss-of-function allele in the coding area of the cmd-1 gene. It would help the readers understand why this simpler experiment was not performed if they explain this in the paper. A good place would be near line 220, where they generate hypomorphic promoter alleles using CRISPR. If they have tried to make their own loss-of-function alleles by mutating the coding area of cmd-1, but it resulted in presumed lethality, this might be mentioned here too.

      This is a good point, and one that we had overlooked. cmd-1 loss of function mutations do indeed confer lethality. We have added a sentence to say:

      ‘Straightforward comparison of camt-1 and cmd-1 loss of function phenotypes was not possible, since disrupting cmd-1 confers lethality (7, 8).’

      1. I am most worried about the potential caveats with the calcium imaging experiments. As the authors note, it is challenging to infer absolute levels of calcium using the ratiometric sensor cameleon across different individuals and genotypes. However, the authors do not note that the YFP/CFP FRET signal from cameleon might be perturbed because it uses calmodulin to bind calcium. At the end of their study (line 244), they provide evidence that calmodulin may bind to CART-1 to suppress its own expression when calmodulin is highly expressed. This is worrisome because cameleon is probably expressed highly in some or most of these strains. The authors may want to re-examine neuronal activity for a subset of experiments with a method that is independent of a calmodulin-based sensor (if possible).

      We agree that this is a potential concern. As suggested by our referee, we therefore repeated some of our Ca2+ imaging experiments using a genetically-encoded Ca2+ indicator that does not contain CaM. We opted to use TN-XL, an indicator that uses troponin C as the Ca2+ binding moiety, and which has previously been used successfully in C. elegans. We imaged CO2-evoked Ca2+ responses in BAG sensory neurons, in wild type and in camt-1 mutant animals. The data obtained using TN-XL recapitulated what we observed using YC3.60 (BAG).

      1. The title of "Fig 3 - Figure supplement 1" is confusing because it suggests that they measured the levels of YC2.60 cameleon, when in fact they measured a separate GFP reporter, albeit using the same promoter. So they could clarify the figure title.

      The reviewer is right – our heading was confusing. We have changed it, and now say: ‘Expression from the gcy-37 promoter is reduced when CAMT-1 is overexpressed.’

      1. D. Bazopoulou, A. R. Chaudhury, A. Pantazis, N. Chronis, An automated compound screening for anti-aging effects on the function of C. elegans sensory neurons. Sci Rep 7, 9403 (2017).
      2. M. S. Choi et al., Isolation of a calmodulin-binding transcription factor from rice (Oryza sativa L.). J Biol Chem 280, 40820-40831 (2005).
      3. J. Han et al., The fly CAMTA transcription factor potentiates deactivation of rhodopsin, a G protein-coupled light receptor. Cell 127, 847-858 (2006).
      4. N. Bouche, A. Scharlat, W. Snedden, D. Bouchez, H. Fromm, A novel family of calmodulin-binding transcription activators in multicellular organisms. J Biol Chem 277, 21851-21861 (2002).
      5. T. Yang, B. W. Poovaiah, A calmodulin-binding/CGCG box DNA-binding protein family involved in multiple signaling pathways in plants. J Biol Chem 277, 45049-45058 (2002).
      6. E. Kodama-Namba et al., Cross-modulation of homeostatic responses to temperature, oxygen and carbon dioxide in C. elegans. PLoS Genet 9, e1004011 (2013).
      7. V. Au et al., CRISPR/Cas9 Methodology for the Generation of Knockout Deletions in Caenorhabditis elegans. G3 (Bethesda) 9, 135-144 (2019).
      8. A. Karabinos et al., Functional analysis of the single calmodulin gene in the nematode Caenorhabditis elegans by RNA interference and 4-D microscopy. Eur J Cell Biol 82, 557-563 (2003).
    1. Author Response:

      Evaluation Summary:

      This manuscript is of primary interest to readers in the field of infectious diseases especially the ones involved in COVID-19 research. The identification of immunological signatures caused by SARS-CoV-2 in HIV-infected individuals is important not only to better predict disease outcomes but also to predict vaccine efficacy and to potentially identify sources of viral variants. In here, the authors leverage a combination of clinical parameters, limited virologic information and extensive flow cytometry data to reach descriptive conclusions.

      We have extensively reworked the paper.

      Reviewer #1 (Public Review):

      The methods appear sound. The introduction of vaccines for COVID-19 and the emergence of variants in South Africa and how they may impact PLWH is well discussed making the findings presented a good reference backdrop for future assessment. Good literature review is also presented. Specific suggestions for improving the manuscript have been identified and conveyed to the authors.

      We thank the Reviewer for the support.

      Reviewer #2 (Public Review):

      Karima, Gazy, Cele, Zungu, Krause et al. described the impact of HIV status on the immune cell dynamics in response to SARS-CoV-2 infection. To do so, during the peak of the KwaZulu-Natal pandemic, in July 2020, they enrolled a robust observational longitudinal cohort of 124 participants all positive for SARS-CoV-2. Of the participants, a group of 55 people (44%) were HIV-infected individuals. No difference is COVID-19 high risk comorbidities of clinical manifestations were observed in people living with HIV (PLWH) versus HIV-uninfected individuals exception made for joint ache which was more present in HIV-uninfected individuals. In this study, the authors leverage and combine extensive clinical information, virologic data and immune cells quantification by flow cytometry to show changes in T cells such as post-SARS-CoV-2 infection expansion of CD8 T cells and reduced expression CXCR3 on T cells in specific post-SARS-CoV-2 infection time points. The authors also conclude that the HIV status attenuates the expansion of antibody secreting cells. The correlative analyses in this study show that low CXCR3 expression on CD8 and CD4 T cells correlates with Covid-19 disease severity, especially in PLWH. The authors did not observe differences in SARS-CoV-2 shedding time frame in the two groups excluding that HIV serostatus plays a role in the emergency of SARS-CoV-2 variants. However, the authors clarify that their PLWH group consisted of mostly ART suppressed participants whose CD4 counts were reasonably high. The study presents the following strengths and limitations

      We thank the Reviewer for the comments. The cohort now includes participants with low CD4.

      Strengths:

      A. A robust longitudinal observational cohort of 124 study participants, 55 of whom were people living with HIV. This cohort was enrolled in KwaZulu-Natal,South Africa during the peak of the pandemic. The participants were followed for up to 5 follow up visits and around 50% of the participants have completed the study.

      We thank the Reviewer for the support. The cohort has now been expanded to 236 participants.

      B. A broad characterization of blood circulating cell subsets by flow cytometry able to identify and characterize T cells, B cells and innate cells.

      We thank the Reviewer for the support.

      Weaknesses:

      The study design does not include

      A. a robust group of HIV-infected individuals with low CD4 counts, as also stated by the authors

      This has changed in the resubmission because we included participants from the second, beta variant dominated infection wave. For this infection wave we obtained what we think is an important result, presented in a new Figure 2:

      This figure shows that in infection wave 2 (beta variant), CD4 counts for PLWH dropped to below the CD4=200 level, yet recovered after SARS-CoV-2 clearance. Therefore, the participants we added had low CD4 counts, but this was SARS-CoV-2 dependent.

      B. a group of HIV-uninfected individuals and PLWH with severe COVID-19. As stated in the manuscript the majority of our participants did not progress beyond outcome 4 of the WHO ordinal scale. This is also reflected in the age average of the participants. Limiting the number of participants characterized by severe COVID-19 limits the study to an observational correlative study

      Death has now been added to Table 1 under the “Disease severity” subheading. The number of participants who have died, at 13, is relatively small. We did not limit the study to non-critical cases. Our main measure of severity is supplemental oxygen.

      This is stated in the Results, line 106-108:

      “Our cohort design did not specifically enroll critical SARS-CoV-2 cases. The requirement for supplemental oxygen, as opposed to death, was therefore our primary measure for disease severity.”

      This is justified in the Discussion, lines 219-225:

      “Our cohort may not be a typical 'hospitalized cohort' as the majority of participants did not require supplemental oxygen. We therefore cannot discern effects of HIV on critical SARS-CoV-2 cases since these numbers are too small in the cohort. However, focusing on lower disease severity enabled us to capture a broader range of outcomes which predominantly ranged from asymptomatic to supplemental oxygen, the latter being our main measure of more severe disease. Understanding this part of the disease spectrum is likely important, since it may indicate underlying changes in the immune response which could potentially affect long-term quality of life and response to vaccines.”

      C. a control group enrolled at the same time of the study of HIV-uninfected and infected individuals.

      This was not possible given constraints imposed on bringing non-SARS-CoV-2 infected participants into a hospital during a pandemic for research purposes. However, given that the study was longitudinal, we did track participants after convalescence. This gave us an approximation of participant baseline in the absence of SARS-CoV-2, for the same participants. Results are presented in Figure 2 above.

      D. results that elucidate the mechanisms and functions of immune cells subsets in the contest of COVID-19.

      We do not have functional assays.

      Reviewer #3 (Public Review):

      Karim et al have assembled a large cohort of PLWH with acute COVID-19 and well-matched controls. The main finding is that, despite similar clinical and viral (e.g., shedding) outcomes, the immune response to COVID-19 in PLWH differs from the immune response to COVID-19 in HIV uninfected individuals. More specifically, they find that viral loads are comparable between the groups at the time of diagnosis, and that the time to viral clearance (by PCR) is also similar between the two groups. They find that PLWH have higher proportions and also higher absolute number of CD8 cells in the 2-3 weeks after initial infection.

      The authors do a wonderful job of clinically characterizing the research participants. I was most impressed by the attention to detail with respect to timing of viral diagnosis as it related to symptom onset and specimen collection. I was also impressed by the number of longitudinal samples included in this study.

      We thank the Reviewer for the support.

    1. Author Response:

      Evaluation Summary:

      These are pressing times for nature, standing alone the impact of multiple (human-based) ecological stressors. Wildlife trade is one of these stressors. And, although it is an acute one, it is the easiest solvable global ecological problem. The authors increase dramatically our understanding of legal and illegal trade of amphibians, and offer a wider methodology (however, and importantly, not necessarily a more complex one) to gain a deeper understanding of the causes and consequences) of amphibians' trade. The work will inspire in conservation biologists similar approaches to learn about the trade of other taxa.

      We thank the readers for their opinion and completely agree, we hope that our analysis facilitates better analysis and the development of better strategies to prevent unsustainable trade.

      Reviewer #1 (Public Review):

      By linking several databases, the authors tried to measure the impact of trade on all amphibian species. Thereby origins of species traded, volumes, the purpose for trade have been assessed while noting that several loopholes exist in making overall robust assessments e.g., dynamics in trade and taxonomy. However, indicating that gaps and shortcomings are based on the way current databases available have been set up, the authors did an enormous job in applying the extensive digital methods to achieve the best possible reflection of the current amphibian trade. Through the use of several software programmes to measure/analyse current databases, also figures could be built to visualize e.g. trends. This is one of the major strengths of this paper. With the various specifically named methodological queries as well as the use of categorized keywords, which were essential for the inclusion and linking of the various databases, unambiguous results could be generated in the best possible sense, on the basis of which correct and convincing recommendations were made. To make it explicitly clear again, due to the lack of data on the global anthropogenic use of amphibian species, this seemingly complex applied methodological approach was necessary to shed light on the dark. In return, the effort would have been many times less, there would have been more comprehensive and informative databases that transparently and up-to-date illuminate the population status of species, their threats and the impact of trade. The importance of this work is essential to understand how much the various interest groups are lagging behind in order to communicate responsibly and transparently the use of resources, in this case the most threatened group of vertebrates, amphibians; thus, it is difficult to understand as the reader learns here how easy it is to trade species that have already been classified as threatened.

      Thank you for this, we wholeheartedly agree, and greatly appreciate the reviewers comments

      If I would have to mention weaknesses of this paper there are none that I would address explicitly. Apart from the comprehensive Suppl. Mat., I would just not overload the actual manuscript with figures and make sure that these are self-explanatory. As already mentioned, the methodological part in particular is very extensive and complex, but this is essential for this type of study.

      Thank you, we have looked again at figure legends to ensure they are self-explanatory, and easy to follow.

      Reviewer #2 (Public Review):

      While it is wildly assumed that the trade in wildlife is well documented and data are thorough and widely available, this is not the case. The authors scour online sources (databases, websites, marketplaces), in multiple languages, to assess the true extent of wildlife trade related to those values reported and find large discrepancies. Wildlife trade has both direct and indirect effects on wild populations of amphibian species and therefore having more accurate values is essential for measuring potential effects. They call for change in how data are collected and reported so that those data can properly influence policy and conservation measures.

      Thank you, we are glad this is clear and that the need for such an approach is apparent.

    1. Author Response:

      Reviewer #1:

      The authors demonstrate deficits in perceptual tests related to fine-time perception in non-speech and speech sounds in a group of patients with stroke aphasia compared to a control group without a lesion. A subgroup of patients with deficits in spectrotemporal processing at a fine timescale have lesions mapped to the posterior STS, MTG and adjacent white matter. The area associated with deficits in spectrotemporal analysis with a fine timescale is then used as a seed for probabilistic fibre tractography based on diffusion MR. These results show connectivity of the functionally defined seed region with a number of areas including the cerebellum.

      The work is carefully done and I think interesting in demonstrating the cerebellar connections of the functionally defined region associated with deficits in fine temporal analysis that might be a basis for event representation at this temporal level.

      We appreciate the referee's evaluation and constructive feedback.

      Reviewer #2:

      Based on consideration of supportive evidence in the literature, the authors propose that a cerebellar-temporal lobe functional network plays a key role in auditory temporal processing. The precise parsing of temporal information is critical to understanding dynamic auditory processing and thus is an interesting area of study. Better understanding of how the cerebellum and temporal lobe may interact to achieve such parsing of the dynamic signal in a generative/predictive internal model is of clear interest to a broad readership. This idea is put to the test by first having individuals with lesions in the posterior portion of left temporal lobe perform speech perception and timing tasks and comparing performance with 12 healthy controls to establish the role of this region in tasks reliant on intact fine temporal processing. Typically, a lesion model will be helpful when a dissociation between structure and function can be demonstrated, and preferably this would be a double dissociation. Here, while lesions to auditory regions of the left temporal lobe are associated with impoverished performance on speech and temporal order tasks relative to a healthy control group, performance on comparably difficult auditory tasks that do not require good temporal discrimination is not tested to determine if there is such a dissociation. Given the extensive discussion of hypothesized different time sensitivities of right and left auditory cortices in the Introduction, patients with right homologous lesions might also have served as an interesting control and could have supported a double dissociation. In a second step to their study, a seed region was generated based on comparison of the lesion loci for the half of the patients who performed most poorly on the behavioral tasks to the other half, and this was used to explore anatomical tract connectivity of the seed region to the rest of the brain in the neuroimaging data from the healthy controls, with a focus on connections with the cerebellum. This approach to establishing that "temporo-cerebellar connectivity underlies timing constraints in audition" is unfortunately just not that convincing. The data are interesting, but taken alone they simply do not support such a conclusion. In the data, there is no clear functional link established or even hinted at between the temporal lobe and the cerebellum.

      We appreciate the referee's evaluation and constructive feedback. We address the raised concerns point by point below. We appreciate the concerns regarding our methodological choice and our interpretation of a functional link between the temporal lobe and the cerebellum. It certainly is more reasonable to derive a functional interpretation based on disconnection measured directly in patients’ DTI. However, if unavailable, indirect measures of disconnection can also be used to establish a functional link between a lesioned region and the networks associated with it. The rationale behind this is that it reflects an indirect estimation of the effect of a lesion on structural brain networks. To make this approach clearer, we have revised the manuscript accordingly. See revised manuscript pages 6 and 12:

      [...] Assessing connectivity in healthy participants based on lesion information is a relatively new method that measures structural disconnection in networks associated with given anatomical regions (Foulon et al., 2018). This allows for the indirect estimation of the lesion effect on structural brain networks. In this regard, it was shown that behavioral deficits can be explained similarly by local brain damage and indirectly measured disconnection (Salvalaggio et al., 2020). [...]

      [...] We next used the respective areas as seed regions for probabilistic fiber tractography in a healthy age-matched sample to visualize the underlying common connectivity pattern (see Methods). Thus, we indirectly explored the association between posterior superior temporal disconnection and processing of sound at short timescales. [...]

      We also changed the abstract and conclusion accordingly. See pages 2 and 15 of the revised manuscript.

      [...] Here we tested whether temporo-cerebellar disconnection is associated with the processing of sound at short timescales. [...]

      [...] The evidence we describe (i) shows that lesion-related deficits in spectrotemporal analysis occur in posterior temporal regions connected to the cerebellum [...].

      Reviewer #3:

      Stockert et al. investigate the cortico-cerebellar network underpinning rapid temporal auditory analysis. This study uses a well-defined group of stroke participants with mostly circumscribed lesions to the left posterior superior temporal lobe to motivate probabilistic tractography from cortical regions associated with verbal and non-verbal rapid auditory temporal analysis. Lesion-symptom mapping identifies a specific region of the posterior superior temporal sulcus and underlying white matter as statistically associated with impairment in rapid auditory temporal analysis. Tractography results demonstrate that these regions have high structural connectivity to wider regions of the left hemisphere cortical language network and ipsilateral and contralateral connectivity to postero-lateral cerebellum and dentate nucleus. It is interpreted that this cortico-cerebellar network is crucial to developing representations of fine auditory temporal structure.

      The conclusions of the paper are an interpretation which is based on integrating previous neuropsychology with the current tractography results and based on well-defined models in the motor domain. Such conclusions are not unreasonable but there is no direct (associative) evidence linking this network to the cognitive function of interest.

      Strengths:

      The paper integrates neuropsychology and neuroimaging methodologies to build a coherent picture which is more than the sum of its parts. The stroke group has well-defined and selected lesions which enable testing of the hypotheses put forward by the authors. The behavioural measures are sensitive and suitable to identify impairments in the behaviours of interest. There has been a detailed analysis of the behavioural speech perception data in the stroke group which largely, although perhaps not entirely, conforms to the asymmetric temporal sampling hypothesis. The lesion-symptom mapping approach is suitable for the nature of the population (small group with similar lesion distributions) and has allowed neuropsychologically guided tractography in the neurotypical population. This has clearly illustrated the complexity of the structural connectivity of the posterior superior temporal sulcus and underlying regions.

      Weaknesses:

      The selective nature of the stroke population - relatively small, chronic lesions - has resulted in only mild impairments for a small number of participants (6/12 participants). At the group level there is no difference between the stroke and neurotypical population on speech perception measures - group statistics do not reach one tailed significance. This reduces the certainty with which the regions identified are associated with the behaviour or interest. However, the results do conform to previous neuropsychology and lesion studies and it is likely that this lack of effect is due to low statistical power.

      Please refer to our response to the next point.

      All the stroke participants have a similar lesion distribution, and this makes lesion-symptom mapping challenging. For example, lesion data do not give an indication of the functional integrity of perilesional regions which can be reduced, even at the chronic stage, therefore the superior temporal sulcus may not be functioning effectively, even in the proportion of the group without lesions to this area. Lesion symptom mapping is more robust with a wider distribution of lesions and the inclusion of participants with lesions remote from the area of interest. Having said that, the behavioural measures appear sensitive enough to identify mild impairments and the authors, for good reason, wished to reduce the extension of lesion into primary auditory regions. As above, given the limited sample and homogeneous lesion, the lesion symptom mapping approach is reasonable.

      We agree that the small number of patients is a possible limitation to the study and add this point to the limitations section. See revised manuscript page 21.

      [...] First, the study population is relatively small and lesion symptom mapping is typically applied to larger populations with wider lesion distribution. Although careful selection of circumscribed lesions has the advantage of highlighting behavioral differences without confounding other deficits (e.g., primary auditory processing), it is possible that additional regions are involved in processing of sound at short timescales. However, tractography based on healthy participants makes it possible to indirectly obtain information (i.e., structural disconnection) about brain regions contributing to the investigated function. In addition, it is likely that the small number of patients might hamper the ability to detect statistically significant differences between the behavior of controls and patients. Nevertheless, we are confident that the current results align with the fact that the posterior superior temporal cortex contributes to the processing of sound at short timescales, as indicated by previous neuropsychological evidence and lesion studies (Boemio et al., 2005; Chedru, Bastard, and Efron, 1978; Efron, 1963; Robson, Grube, Lambon Ralph, Griffiths, & Sage, 2013; Swisher & Hirsh, 1972). Further studies should however test larger populations to replicate and extend this finding. [...]

      The authors suggest that the behavioural results conform to the asymmetric temporal sampling hypothesis in that only place of articulation discrimination impairments in the stroke group can be (just about) detected, whereas there were no significant stroke-neurotypical differences in other phonetic contrasts. It is not clear that the VOT differences associated with plosive voicing changes and the cues associated with place changes happen over fundamentally different time-scales and, therefore, it is important to further justify the interpretation of the data. In the future it will be helpful to have this level of analysis applied to individuals with lesions to the wider speech perception network to draw conclusions about the specificity of the impairment to these regions - for example, impairments in phoneme discrimination have been associated with frontal lobe lesions.

      It appears that voicing contrasts in which shorter and longer voice onset times result in the perception of a voiced or voiceless plosive (for example [t] and [d]) are encoded in both the temporal envelope and fine structure (Rosen 1992) of the speech signal that occur in time windows of 20-500 ms and <2 ms, respectively. In words an additional cue is the closure time, which can be further used to discriminate between voiced and voiceless plosives. However, place of articulation contrasts are exclusively encoded in the temporal fine structure (i.e., very quick transitions of the frequency spectrum, formant transitions). Even though for all contrasts shorter timescale information plays a role, somewhat redundant encoding is present for voice contrasts. Ultimately, place of articulation contrasts seem to be the most difficult to discriminate. In Figure 2D it is apparent that despite highest error rates for the place of articulation contrasts, several patients also showed impaired discrimination for voicing contrast when compared to healthy controls. We do agree with the referee that it would be interesting to also extend this level of analysis to individuals with lesions in the wider speech perception network in future work.

      The tractography results reveal a complex pattern of structural connectivity, including other regions associated with speech perception. The authors have a theoretical motivation to focus on the importance of the temporo-cerebellar pathway but there is no correlation evidence to link auditory temporal analysis to the integrity of this pathway in the neurotypical population. The non-verbal measures appear to be sufficiently sensitive for this type of analysis. This lack of association with behaviour makes it hard to draw conclusions about the functional role of this network.

      We appreciate the referee’s concerns about our interpretation of the functional link between the temporal lobe and the cerebellum regarding auditory temporal analysis. It certainly is more reasonable to derive a functional interpretation based on disconnection measured directly in patients DTI. However, if unavailable, indirect measures of disconnection can also be used to establish a functional link between a lesioned region and the networks associated with it. The rationale behind this is that it reflects an indirect estimation of the effect of a lesion on structural brain networks. To make this approach clearer, we have modified the manuscript as such. See revised manuscript pages 6 and 12:

      [...] Assessing connectivity in healthy participants based on lesion information is a relatively new method that measures structural disconnection in networks associated with given anatomical regions (Foulon et al., 2018). This allows for the indirect estimation of the effect of a lesion on structural brain networks. In this regard, it has been shown that behavioral deficits are explained to a similar extent by both the local damage and indirectly measured disconnection (Salvalaggio et al., 2020). [...]

      [...] We next used the respective areas as seed regions for probabilistic fiber tractography in a healthy age-matched sample to visualize the underlying common connectivity pattern (see Methods). Thus, we indirectly explored the association between posterior superior temporal disconnection and processing of sound at short timescales. [...]

      We also changed the abstract and conclusion accordingly. See revised manuscript pages 2 and 15.

      [...] Here we tested whether temporo-cerebellar disconnection is associated with processing of sound at short timescale. [...]

      [...] The evidence we describe (i) shows that lesion-related deficits in spectrotemporal analysis occur in posterior temporal regions connected to the cerebellum [...].

    1. Author Response:

      Reviewer #1 (Public Review):

      The authors interrogated an underexplored feature of CRISPR arrays to enhance multiplexed genome engineering with the CRISPR nuclease Cas12a. Multiplexing represents one of the many desirable features of CRISPR technologies, and use of highly compact CRISPR arrays from CRISPR-Cas systems allows targeting of many sites at one time. Recent work has shown though that the composition of the array can have a major impact on the performance of individual guide RNAs encoded within the array, providing ample opportunities for further improvements. In this manuscript, the authors found that the region within the repeat lost through processing, what they term the separator, can have a major impact on targeting performance. The effect was specifically tied to upstream guide sequences with high GC content. Introducing synthetic separator sequences shorter than their natural counterparts but exhibiting similarly low GC content boosted targeted activation of a reporter in human cells. Applying one synthetic separator to a seven-guide array targeting chromosomal genes led to consistent though more modest targeted activation. These findings introduce a distinct design consideration for CRISPR arrays that can further enhance the efficacy of multiplexed applications. The findings also suggest a selective pressure potentially influencing the repeat sequence in natural CRISPR arrays.

      Strengths:

      The portion of the repeat discarded through processing normally has been included or discarded when generating a CRISPR-Cas12a array. The authors clearly show that something in between-namely using a short version with a similarly low GC content-can enhance targeting over the truncated version. A coinciding surprising result was that the natural separator completely eliminated any measurable activation, necessitating the synthetic separator.

      The manuscript provides a clear progression from identifying a feature of the upstream sequences impacting targeting to gaining insights from natural CRISPR-Cas12a systems to applying the insights to enhance array performance.

      With further support, the use of synthetic separators could be widely adopted across the many applications of CRISPR-Cas12a arrays.

      Weaknesses:

      The terminology used to describe the different parts of the CRISPR array could better align with those in the CRISPR biology field. For one, crRNAs (abbreviated from CRISPR RNAs) should reflect the final processed form of the guide RNA, whereas guide RNAs (gRNAs) captures both pre-processed and post-processed forms. Also, "spacers" should reflect the natural spacers acquired by the CRISPR-Cas system, whereas "guides" better capture the final sequence in the gRNA used for DNA target recognition.

      We thank the reviewer for this correction. We have now changed most uses of “crRNA” to “gRNA”. We decided to retain the use of the word “spacer” for the target recognition portion of the gRNA rather than changing it to “guide” as the reviewer suggests, because we think there is a risk that the reader would confuse “guide” with the non-synonymous “guide-RNA”. We have added a remark explaining our use of “spacer” (“A gRNA consists of a repeat region, which is often identical for all gRNAs in the array, and a spacer (here used synonymously with “guide region”)”)

      A running argument of the work is that the separator specifically evolved to buffer adjacent crRNAs. However, this argument overlooks two key aspects of natural CRISPR arrays. First, the spacer (~30 nts) is normally much longer than the guide used in this work (20 nts), already providing the buffer described by the authors. This spacer also undergoes trimming to form the mature crRNA.

      If we understand this comment correctly, the argument is that, in contrast to a ~20-nt spacer, a 30-nt spacer would provide a buffer between adjacent guides even if a separator is not present. However, even a 30-nt spacer may have high GC content and form secondary structures that would interfere with processing of the subsequent gRNA. Our hypothesis is that the separator is AT-rich and so insulates gRNAs from one another regardless of the length or GC composition of spacers. Please let us know if we have misunderstood this comment.

      Second, the repeat length is normally fixed as a consequence of the mechanisms of spacer acquisition. At most, the beginning of each repeat sequence may have evolved to reduce folding interactions without changing the repeat length, although some of these repeats are predicted to fold into small hairpins.

      We agree with this comment. Indeed, we propose that the separator, which is part of the repeat sequence, has evolved to reduce folding interactions. We now clarify this at the end of the Results section: “Taken together, the results from our study suggest that the CRISPR-separator has evolved as an integral part of the repeat region that likely insulates gRNAs from the disrupting effects of varying GC content in upstream spacers.”

      Prior literature has highlighted the importance of a folded hairpin with an upstream pseudoknot within the repeat (Yamano Cell 2016), where disrupting this structure compromises DNA targeting by Cas12a (Liao Nat Commun 2019, Creutzburg NAR 2020). This structure is likely central to the authors' findings and needs to be incorporated into the analyses.

      We thank the reviewer for this important insight. We have now performed experiments exploring the involvement of the pseudoknot in the disruptive effects of high-GC spacers.

      First, we used our 2-gRNA CRISPR array design (Fig. 1D) where the second gRNA targets the GFP promoter and the first gRNA contains a non-targeting dummy spacer. We generated several versions of this array where we iteratively introduced targeted point mutations in the dummy spacer to either form a hairpin restricted to the dummy spacer, or a hairpin that would compete with the pseudoknot in the GFP-gRNA’s repeat region (new Fig. S3). We found that both of these modifications significantly reduced performance of the GFP-targeting gRNA. These results suggest that interfering with the pseudoknot indeed disrupts gRNA performance, but that also hairpins that presumably don’t interfere directly with the pseudoknot are detrimental – perhaps by sterically hindering Cas12a from accessing its cleavage site. Interestingly, the AAAT synSeparator largely rescued performance of the worst-performing of these constructs. These results are displayed in the new Fig. S3 and discussed in the related part of the Results section.

      Second, we have now performed a computational analysis using RNAfold where we correlated the performance of all dummy spacers with their predicted secondary structure (Fig. 1M). The correlation between predicted RNA structure and array performance was higher when the structural prediction included both the dummy spacer and the entire GFP-targeting gRNA (R2 = 0.57) than when it included only the dummy spacer (R2 = 0.27; new figure panel S1C). This higher correlation suggests that secondary structures that involve the GFP-targeting gRNA play a more important role in our experiment than secondary structures that only involve the dummy spacer. These results are described in the Results section and in the Fig. 1 legend.

      Third, we now also performed secondary structure analysis (RNAfold) of two of our worst-performing dummy spacers (50% and 70% GC), which indicated that these spacers are likely to form secondary structures that involve both the repeat and spacer of the downstream GFP-targeting gRNA (Fig. 3G-H). Interestingly, this analysis suggested that the AAAT synSeparator improves performance of these spacers by loosening up these secondary structures or creating an unstructured bulge at the Cas12a cleavage site. These results are presented in Fig. 3G-H and the accompanying portion of the Results section.

      To conclude, our analyses suggest that the secondary structure in the spacer and its interference with the pseudoknot in the repeat hairpin play a role in gRNA performance, wherein the inclusion of the AAAT synSeparator can partly rescue the performance, likely by restoring the Cas12a accessibility to the gRNA cleavage site.

      Many claims could better reflect the cited literature. For instance, Creutzburg et al. showed that adding secondary structures to the guide to promote folding of the repeat hairpin enhanced rather than interfered with targeting.

      We thank the reviewer for this comment. Creutzburg et al. report the interesting finding that a carefully designed 3’ extension of the spacer can counteract secondary structures that disrupt the repeat. In this way, the extension rescues disruptive secondary structures that involve the repeat and any upstream sequence. Relevant to this finding, it is conceivable that the synSeparator (AAAT) exerts its beneficial effect at the 3’ end of the GFP spacer by folding back onto the GFP spacer and in this way blocking secondary structures caused by a GC-rich dummy spacer located upstream of the GFP gRNA, according to the mechanism reported by Creutzburg et al. However, we used structural prediction of the GFP-targeting gRNA with and without the AAAT synSeparator and did not find evidence that the AAAT extension would cause this spacer to fold back onto itself (data not shown). Moreover, our experimental data (Fig. 3E) demonstrate that the synSeparator exerts its main beneficial effect when located upstream of the GFP-targeting gRNA, which would not be the case if the main mechanism was the one demonstrated by Creutzburg et al. We already had a paragraph discussing the Creutzburg paper in the Discussion, but we have now added a sentence specifying the mechanism that Creutzburg et al. demonstrated: “RNA secondary structure prediction (RNAfold) did not indicate that the GFP-targeting spacer would fold back on itself when an AAAT extension is added to the 3’ end, which would have been the case for the mechanism demonstrated by Creutzburg et al. (data not shown).”

      Liu et al. NAR 2019 further showed that the pre-processed repeat actually enhanced rather than reduced performance compared to the processed repeat.

      The experiment referenced by the reviewer (Fig. 2 in Liu et al., Nucleic Acids Research, 2019) in fact nicely supports our findings. In Liu et al., the pre-processed repeat only shows improved performance if it is located upstream of the targeting gRNA, and the gRNA is not followed by an additional pre-processed repeat (DRf-crRNA in their Fig. 2B & C). In this situation, the pre-processed repeat (containing the natural separator) may serve to enhance gRNA processing, as would be expected based on our results. At the same time, the absence of a full-length repeat downstream of the gRNA means that after gRNA processing, there will not remain any piece of RNA attached to the 3’ end of the spacer, which might disrupt gRNA performance. In contrast, when Liu et al. added an additional pre-processed repeat downstream of their gRNA (DRf-crRNA-DRf in the same panel), this construct performed the worst of all tested variants. This is consistent with our conclusion that the full-length separator reduces performance of gRNAs if it remains attached to the 3’ end of spacers. We have added a paragraph in the Discussion about this (Line 376).

      Finally, the complete loss of targeting with the unprocessed repeat appears represent an extreme example given multiple studies that showed effective targeting with this repeat (e.g. Liu NAR 2019, Zetsche Nat Biotechnol 2016).

      We acknowledge that our CRISPR array containing the full, natural separator (Fig. 3B) appears to be completely non-functional in contrast to the studies mentioned by the reviewer. We think this difference may have a few possible explanations. First, this array is in fact not entirely non-functional. Re-running the same experiment with a stronger dCas12a-activator (dCas12a-VPR, full length VPR, also used in Fig. 5) shows some modest GFP activation even with the full separator (1.4% vs 20.8% GFP+ cells; see the Appendix Figure 1). But for consistency, we have used the same, slightly less effective, dCas12a-activator (dCas12a-miniVPR) for all GFP-targeting experiments. Second, both the Liu et al. and Zetsche et al. studies used CRISPR editing rather than CRISPRa. We speculate that this might explain their relatively high indel frequency: Only a single cleavage event needs to take place for an indel to occur, whereas gene activation presumably requires the dCas12a-activator to be present on the promoter for extended periods of time. Thus, any inefficiency in DNA binding caused by the separator remaining attached to the spacer might disfavor CRISPRa activity more than CRISPR-editing activity. We have added these considerations to the Discussion and referenced the suggested papers (Line 376).

      Appendix Figure 1: Percentage of GFP+ cells without or with a full-length separator using dCas12a-VPR (full length) gene activation.

      Relating to the above point, the vast majority of the results relied on a single guide sequence targeting GFP. While the seven-guide CRISPR array did involve other sequences, only the same GFP targeting guide yielded strong gene activation. Therefore, the generalizability of the conclusions remains unclear.

      We have now performed several experiments that address the generalizability of our conclusions:

      First, we now include data demonstrating that the beneficial effect of adding a synSeparator is not limited to the AAAT sequence derived from the Lachnospiraceae bacterium separator. We now include three other 4-nt, AT-rich synSeparators derived from Acidaminococcus s. (TTTT), Moraxella b. (TTTA) and Prevotella d. (ATTT) (Fig. 3I). All these synSeparators rescued the poor GFP activation caused by an upstream spacer with high GC content, though not equally effectively. The quantitative difference between the synSeparators could either be due to the intrinsic “insulation capacity” of these sequences, or the way they interact with the Lb-Cas12a protein, or to sequence-specific interactions with this particular CRISPR array. We discuss these possibilities in the Discussion (Line 437).

      Second, we now include data demonstrating that nuclease-deactivated, enhanced-Cas12a from Acidaminococcus species (enAsdCas12a; Kleinstiver et al., 2019) is also sensitive to the effects of high-GC spacers (Fig. 3J). This poor performance was largely rescued by including a TTTT synSeparator derived from the natural AsCas12a separator.

      Furthermore, we have now included a paragraph in the Discussion where we speculate on why the effect of adding the synSeparator was more modest for the endogenous genes than for GFP: 1) Our GFP-expressing cell line has multiple GFP insertions in its genome, and each copy has seven protospacers in its promoter. This may amplify the effect of the synSeparator. 2) The gRNAs used for endogenous activation were taken from the literature or had been pre-tested by us. These guides had thus already proven to be successful and might not be particularly disruptive (e.g., they were not selected by us for having high GC content). Therefore, researchers might experience the greatest benefit from the synSeparator with newly designed spacers that have not already proven to be effective even without the synSeparator.

      Reviewer #3 (Public Review):

      Magnusson et al., do an excellent job of defining how the repeated separator sequence of Wild Type Cas12a CRISPR arrays impacts the relative efficacy of downstream crRNAs in engineered delivery systems. High-GC content, particularly near the 3' end of the separator sequence appears to be critically important for the processing of a downstream crRNA. The authors demonstrated naturally occurring separators from 3 Cas12a species also display reduced GC content. The authors use this important new information to construct a synthetic small separator DNA sequence which can enhance CRISPR/Cas12a-based gene regulation in human cells. The manuscript will be a great resource for the synthetic biology field as it shows an optimization to a tool that will enable improved multi-gene transcriptional regulation.

      Strengths:

      • The authors do an excellent job in citing appropriate references to support the rationale behind their hypotheses.
      • The experiments and results support the authors' conclusions (e.g., showing the relationship between secondary structure and GC content in the spacers).
      • The controls used for the experiments were appropriate (e.g., using full-length natural separator vs single G or 1 to 4 A/T nucleotides as synthetic separators).
      • The manuscript does a great job assessing several reasons why the synthetic separator might work in the discussion section, cites the relevant literature on what has been done and restates their results to argument in favor or against these reasons.
      • This paper will be very useful for research groups in the genome editing and synthetic biology fields. The data presented (specially the data concerning the activation of several genes) can be used as a comparison point for other labs comparing different CRISPR-based transcriptional regulators and the spacers used for targeting.
      • This paper also provides optimization to a tool that will be useful for regulating several endogenous genes at once in human cells thus helping researchers studying pathways or other functional relationships between several genes.

      Opportunities for Improvement:

      • The authors have performed all the experiments using LbCas12a as a model and have conclusively proven that the synSeparator enhances the performance of Cas12a based gene activation. Is this phenomenon will be same for other Cas12a proteins (such as AsCas12a)? The authors should perform some experiments to test the universality of the concept. Ideally, this would be done in HEK293T cells and one other human cell type.

      We thank the reviewer for these suggestions. We have now addressed the generalizability of our findings with several new experiments. First, we now include data demonstrating that nuclease-deactivated, enhanced Cas12a from Acidaminococcus species (denAsCas12a; Kleinstiver et al., 2019) is also sensitive to the effects of high-GC spacers (Fig. 3J). This poor performance was largely rescued by including a TTTT synSeparator derived from the natural AsCas12a separator.

      Second, we now include data demonstrating that the beneficial effect of adding a synSeparator is not limited to the AAAT sequence derived from the Lachnospiraceae b. separator. We now include three other 4-nt, AT-rich synSeparators derived from Acidaminococcus s. (TTTT), Moraxella b. (TTTA) and Prevotella d. (ATTT) (Fig. 3I). All these synSeparators rescued the poor GFP activation caused by an upstream spacer with high GC content, though not equally effectively. The quantitative difference between the synSeparators could either be due to the intrinsic “insulation capacity” of these sequences, or the way they interact with the Lb-Cas12a protein, or to sequence-specific interactions with this particular CRISPR array. We discuss these possibilities in the Discussion.

      Third, as described above, we have now performed an in vitro Cas12a cleavage assay and present the data in a new figure (Fig. 4). We found that a CRISPR array containing a 70%-GC dummy spacer was processed less efficiently than an array containing a 30%-GC spacer, but that addition of a synSeparator could to a large extent rescue this processing defect (Fig. 4E). The fact that this result was observed even in a cell-free in vitro setting demonstrates that it is a general feature of Cas12a CRISPR arrays that is likely to work the same way in many cell types rather than being specific to HEK293T cells.

      Fourth, we attempted to investigate the effect of the synSeparator in different cell types. However, either due to poor transfection efficiency or poor expression of the Cas12a activator construct, CRISPRa activity was consistently poor in these cell types, both with and without the synSeparator (e.g., we did not visually observe fluorescence from the mCherry gene fused to the dCas12a activator, which we always see in HEK293T cells). Because of the low general efficiency of CRISPRa, it was not possible to evaluate the performance of the synSeparator. Many cell types are difficult to transfect and dCas12a-VPR-mCherry is a big construct (>6 kb). To our knowledge, there have not been many reports using dCas12a-VPR in cell types other than HEK293T. While we think that it will be important to optimize CRISPRa in many cell types (e.g., by optimizing transfection conditions, Cas12a variants, promoters, expression vectors, etc.), the focus of our study has been to show the separator’s mechanism and general function; we believe that optimizing general CRISPRa for different cell types is beyond the scope of this paper. We acknowledge that this is a limitation of our study and we have added a paragraph about this in the Discussion (line 355). We nevertheless hypothesize that the negative influence of high-GC spacers and the insulating effect of synSeparators are generalizable across cell types. That is because we could observe improved array processing with the synSeparator even in the cell-free context of an in vitro expression system, as described above (Fig. 4). This suggests that the sensitivity to spacer GC content is determined only by the interaction between Cas12a and the array, rather than being dependent on a particular cellular context.

    1. Author Response:

      Reviewer #2 (Public Review):

      Here the authors explore the role of PKA signaling in signaling downstream of the Moody GPCR in the BBB. The discovery that PKA is involved is interesting but not entirely surprising, as it functions downstream of many GPCRs to execute function (the really interesting question is how the same signal, changes in cAMP, causes PKA to do different things). The authors make the claim of a monotonic relationship between septate junctions (SJs) and cell-cell contact zones. I do not think they have measured the necessary parameters in a way that allows them to claim a "monotonic relationship between PKA activity, membrane overlap and the amount of SJ components in the area of cell contact." There is a correlation, but that is probably overstating it. There is an interesting analysis of several markers. These cells are very small and it is not clear what do the cytoskeletal markers really tell us. The markers change, no doubt, and do so in a way that correlates with the proposed Moody/PKA antagonistic relationship. The markers do change at the edges of cells and in regions of overlap, but wouldn't that be expected based on the changes in morphology? Again, the claim for "monotonic" changes is probably overstating the relationship.

      We appreciate the suggestion and made the change of the word “monotonic” to “major”. Our results suggest that the primary role of Moody/PKA in this process is to regulate the membrane contact area between neighboring cells. This is consistent with the results of a temporal analysis of epithelium formation and SJ insertion in late embryos of WT and Moody pathway mutants, which shows that membrane contact precedes and is necessary for the appearance of SJs (Schwabe et al., 2017). Our ssTEM-based 3D reconstruction shows that the total area covered by SJs and the length of individual contiguous SJ segments are independent parameters. The latter appears to be critical for the paracellular seal, consistent with the idea that Moody plays a role in the formation of continuous SJ stands.

      Doesn't the fact the total SJ area covered remains at 30% whether there is more or less overlap also argue against this (i.e. 30% of more overlap is not the same of 30% of less overlap…so more or less SJs are being made)?

      Our ssTEM analysis of the larval SPG epithelium clarifies the relationship between the inter-cell membrane overlap and SJ organization and function at the ultrastructural level. This analysis indicates that the percentage of septate junction areas remains constant (at about 30%) across different PKA activity levels. This proportionality suggests a mechanism that couples cell contact with SJ formation. The finding that the surface area occupied by SJs did not significantly change, irrespective of the absolute area of cell contact, suggests an intrinsic, possibly steric limitation in how much junction can be fitted into a given cell contact space.

      The study would need to be strengthened by more rigorous quantification. There is no quantification in figure 3. This is a primary point in the manuscript-that cytoskeletal markers change (in a claimed "monotonic" way) in subperineurial glia when PKA is altered.

      We agree, and as suggested by the reviewer we now added quantifications for all cytoskeletal markers used in figure 3 (including GFPactin, TauGFP, EB1GFP, NodGFP, and two endosome markers Rab4RFP, Rab11GFP). The additional data are now presented in the right column of Figure 3. Technical details are provided in the method section.

      There is also no quantification or statistics in Figure 5, which is among the most interesting observations.

      The complementary localization of Moody and the PKA catalytic (activated) subunit is very nice. It shows a very interesting cellular polarity. However, it is unclear whether this is altered in Moody mutants (the authors only did knockdown) and whether catalytic (activated) PKA now goes everywhere.

      In response to the reviewer’s suggestion, we further examined the subcellular distribution of Moody and Pka-C1 under gain- and loss-of-function conditions of Moody/PKA signaling in SPG and revised Fig. 5. We show lateral views of the CNS/hemolymph border for each condition, and line scans of fluorescence intensities for each channel along the apical-basal axis. Our results clearly demonstrate that the polarization of Moody and PKA depends on the activity of the Moody/PKA signaling pathway. In WT (Fig. 5A), Moody localized to the apical side and PkaC1 was enriched at the basal side of SPG. Under loss of Moody signaling (moody>MoodyRNAi) (B), PKA spread throughout the cell and lost its basal localization. Moody lost its apical localization in response to reduced (C) or increased PKA activity (E). Under GPCR gain-of-function. conditions (D), Pka-C1 was basally polarized, while Moody lost its asymmetric localization in SPG.

      Throughout, the authors use some very nice genetic studies, using loss-of-function, gain-of-function, and enhancer/suppressor approaches, and their findings are consistent with polarized localization of Moody being important.

      Reviewer #3 (Public Review):

      Proper sealing of the blood brain barrier (BBB) is essential for viability in many animals, including humans and Drosophila. In Li et al., the authors used Drosophila as a simple genetic system to define the signaling pathways that control BBB formation and maintenance. In Drosophila, the BBB is composed of a thin epithelial sheath of subperineural glia (SPG) that are connected by septate junctions. Previously, the authors found that the G protein-coupled receptor Moody is essential for BBB formation during embryogenesis, but the downstream signaling pathways that facilitate septate junction assembly were not known. Here, they performed a series of genetic screens and epistasis experiments to uncover that Moody and PKA antagonistic signaling drives BBB assembly and expansion throughout organismal development. In the present study, they show that loss of PKA signaling components results in a leaky BBB both during development, and during adulthood. They further show that these functions of PKA are dependent on downstream suppression of Rho and changes in cytoskeletal dynamics. Interestingly, overexpression of PKA also causes BBB permeability, indicating that PKA signaling levels must be tightly regulated for BBB integrity. The authors then use serial section TEM to visualize the intact SPG sheath for the first time at ultrastructural resolution, and show that overexpression of PKA results in an enlarged yet patchy septate junction, accounting for the leakiness. In sum, the authors show that the combined signaling of Moody (apically located) and PKA (basally located) shapes the cytoskeleton to drive efficient assembly and maintenance of septate junctions, and thus, the BBB.

      The conclusions of this paper are mostly well supported by the data, but the study would be improved by some expanded analyses and descriptions of statistical assessment.

      We performed additional analysis and added more statistical data throughout the entire manuscript. All changes have been marked in the article.

    1. Author Response:

      Reviewer #1 (Public Review):

      In this manuscript, Levi-Ferber et al use C elegans to study how germline cells maintain pluripotency and avoid GED (germline ectopic differentiation) before fertilization. The authors previously showed that activation of the ER stress sensor Ire1 (but not its major downstream target Xbp1) enhances GED, and here they explore the mechanism of this effect.

      The authors convincingly – and surprisingly – show that the Ire1-mediated GED increase results not from Ire1 activity in the germline but in the nervous system, specifically in certain sensory neurons. Worms lacking a specific neuropeptide (FLP-6) or a particular neuron that produces this peptide (ASE) also displayed increased GED. Although FLP-6 deficiency did not induce ER stress, ER stress did lead to a reduction of FLP-6 transcript (and protein) levels in an Ire1-dependent manner, suggesting this RNA is a target of Regulated Ire1-dependent decay (RIDD). The authors then go on to map out the signaling cascade that begins with FLP6 reduction in ASE by Ire1 and is transmitted to the gonad via an ASE-AIY-HSN circuit, including serotonin produced by HYE.

      This paper is quite interesting and for the most part the data are very convincing and support the model. The demonstration that Ire1 and the ER stress response have non-cell autonomous effects is of particular interest, and is very well supported here. The description of this circuit linking particular neurons and signaling molecules to gonad pluripotency is also very strong.

      A weakness of the paper is the link between RIDD of FLP6 and the disruption of this circuit. The data presented do clearly support the model. However, additional information would strengthen this considerably. The authors show that FLP6 mRNA levels are reduced in Ire1+ but not Ire-/- animals subjected to ER stress. They also show that GED results from the nuclease activity of Ire1 in the ASE; and that loss of FLP6 can also induce a similar effect. However, they do not show as clearly that Ire1's effects on GED are mediated primarily through FLP6.

      We significantly strengthened the link between RIDD of FLP6 and the disruption of this circuit, as detailed above in the response to the essential revisions. We have also added experiments to show that that ire-1's effects on GED are mediated primarily through FLP-6. This has been achieved by generating CRISPR-designed worms harboring silent mutations in the flp-6 gene that disrupt the sequence and structure of the predicted cleavage site while preserving the CDS. We find that this mutation stabilizes the flp-6 transcript under ER stress conditions and protects from ER stress-induced GED in otherwise WT animals.

      Reviewer #2 (Public Review):

      Levi-Ferber and colleagues showed in their previous paper that ER stress regulates germline transdifferentiation in a way that is IRE-1 dependent, but XBP-1 independent. An open question at that time was how IRE-1 activation could mediate this signaling. The authors present several experiments in this manuscript that support the idea that neuronal Ire-1 can cell non-autonomously control germline differentiation through regulation of the neuropeptide FLP-6. Mechanistically, the authors characterize that FLP-6 is a target of IRE-1 RIDD activity. This is the first demonstration of RIDD in C. elegans, an important finding given that no RIDD targets have yet been identified in this organism. Using a wide range of mutants, the authors were also able to identify a neuronal circuit that can control the germline ectopic differentiation (GED) phenotype, involving the sensory neuron ASE, the interneuron AIY, and the motor neuron HSN. The data presented in the manuscript are sound, the mapping of a pivotal three-neuron circuit is impressive, and the findings are likely to be of high interest to a broad readership. However, some more evidence is required to support some of the conclusions made, in particular the characterization of flp-6 as a substrate for RIDD.

      We significantly strengthened the link between RIDD of FLP6 and the disruption of this circuit.

      Reviewer #3 (Public Review):

      In a previous study, the authors had shown that germline tumors that accumulate in the C. elegans gonad because of the lack the RNA binding translational repressor GLD-1, have an increased propensity to differentiate and express somatic proteins in response to ER stress induced by tunicamycin or the absence of the TRK kinase protein tfg-1 (a process the authors call GED). Using this as a model, here, the authors investigate the mechanisms by which the abnormal nuclei accumulate in the tumorous gonad of glp-1 animals by manipulating genes in the soma and germline.

      The key message of this paper is, then, the identification of neurons and neuromodulators that suppress or enhance this accumulation of abnormal germline cells in the glp-1 germline. While the results of this analysis could potentially provide an interesting advance, the validity of the many of the conclusions are difficult to evaluate because of limitations posed by the experimental methods and ambiguity in defining the GED.

      Weaknesses:

      A key issue is the identity of the abnormal germline cells that accumulate in glp-1 gonads. Modulation of the neuronal circuits examined (FLP-6, serotonin, cholinergic) change the germline, alter ovulation rates, modulate somatic gonad contraction rates etc. in wild-type animals. The effects of these circuits on a glp-1 germline are not known, but some of the same effects are likely to continue even if germ cells turned tumorous. Therefore, how neurons and neuromodulators alter the accumulation of abnormal cells in the gonad may or may not be surprising or novel, based on what is actually happening to these cells (the phenotype scored as GED). However, this is unclear as all the abnormal effects on the germline are assessed using DAPI at some steady state. Therefore, GED (ectopic differentiation) needs to be better demonstrated separate from the simple accumulation of abnormal nuclei, which could happen for a number of different reasons.

      Because depletion of gld-1 prevents the transition of the mitotic germline into oocytes, this creates a more simplified gonad for analysis, devoid of oocytes and embryos, in which all the cells within the gonad should be mitotic germline. Thus, the relative homogeneity imposed in the gld-1 gonads simplifies the analysis of the nature of the aberrant nuclei and rule out the possibility that these are endomitotic oocytes or their derivatives. Nevertheless, to further demonstrate that the disruption of the proposed neuronal circuit results in the accumulation of ectopically differentiated cells in the gonad, we directly assessed expression of somatic markers within the gonad, under similar conditions. These include neuronal over-expression of ire-1 as well as mutations that disrupt the ASE-AIY-HSN neuronal circuit by impairing the relevant neurons or by interfering/repairing their ability to communicate via specific neurotransmitters and neuropeptides (Fig 1D, Fig 2F, Fig 5D, Fig 6A).

      Strengths:

      One strength of this paper is the identification of the neuropeptide FLP-6 as a suppressor of GED and a possible RIDD target. However, there is insufficient analysis conducted to fully support this claim.

      We provide more in-vivo and in-vitro evidence demonstrating that flp-6 is a target of RIDD. In-vitro: We confirmed that the in vitro cleavage assay results in cleavage products of the expected sizes, and that mutation of the predicted cleavage site prevents degradation of the flp-6 RNA. In-vivo: We now show that flp-6 RNA levels are reduced under different ER stress conditions in an ire-1-dependent xbp-1 independent manner. We show that over-expression of IRE-1 in ASE is sufficient to reduce flp-6 transcript levels. We show that CRISPR-designed worms harboring silent mutations in the flp-6 gene that disrupt the predicted cleavage site while preserving the CDS protect the stability of the transcript under ER stress conditions and protect the animals from ER stress-induced germline differentiation.

    1. Author Response:

      Reviewer #1 (Public Review):

      In their paper, Spurlock and colleagues look at the role of mitochondria fusion caused by Drp1 repression in driving the stem/progenitor-like state of skin stem cells. Prior work hinted at the possibility that mitochondrial fission/fusion activity is important in supporting neoplastic transformation, but it was unclear exactly what this role was. Here, the authors use an assay for neoplastic transformation induced by carcinogen treatment to demonstrate that diminution in mitochondrial fission activity (from increased phosphorylated Drp1 pools) can prime a stem/progenitor-like state in carcinogen-treated cells, leading to accelerated neoplastic transformation. Using genetic strategies and single cell RNAseq they additionally show that only partial repression of Drp1 is necessary for establishing the stem/progenitor-like state for driving neoplastic transformation, with too much or too little Drp1 repression having no effect. The data are therefore relevant for understanding the conditions for driving neoplastic transformation. Overall the results support the conclusions drawn by the authors and the work helps to clarify the mitochondria's role in neoplastic transformation. The paper is currrently overall difficult and in places confusing to read.

      We thank the Reviewer finding value in our manuscript and providing constructive comments to improve the quality of our manuscript. We have provided clarifications to all the comments and provided new experimental data to address concerns. We have also substantially improved the readability of our manuscript. The inclusion of new experiments and revisions provided has only strengthened the main conclusion of manuscript about fine tuning of Drp1 repression facilitating neoplastic transformation by enriching a stem cell state.

      Reviewer #2 (Public Review):

      The authors used a carcinogen to increase proliferation of the keratinocyte cell line HaCaT and to increase the capacity to form xenograft tumors in mice. They found that the levels of certain mitochondrial fission and fusion proteins (Drp1, Mfn1 and Opa1) were increased in the derived cell lines, but Fis1 levels was decreased in the most tumorigenic derivative as was the phosphorylation of Drp1 at position 616. Through single cell expression analysis, the author show that transformed cells have retained a subpopulation of slowly dividing cells with high expression of stem cell markers and reduced levels and phosphorylation of Drp1. This state could be mimicked by reducing Drp1 expression with shRNA. Cells with moderately reduced levels of Drp1 appeared to be more susceptible to enhanced proliferation caused by treatment with a carcinogen. The authors conclude that a moderate reduction in Drp1 levels causes an increase in proliferation and tumorigenesis of keratinocytes upon treatment with a carcinogen.

      The main strength of this paper is the use of single-cell analysis to identify a subpopulation of cells with increased stem cell gene expression and reduced levels of Drp1 and of Drp1 phosphorylation.

      A causal relation between tumorigenicity and Drp1 levels was tested by reducing levels of Drp1 with shRNA, but unfortunately, the data are very limited. The key contention that partial reduction in Drp1 levels increases proliferation is only supported by a single point and it contradicts results from other labs where it was shown that Drp1 phosphorylation and fission are increased with transformation.

      We thank the reviewer for this comment. Now, we provide 2 more lines of evidence in support of the main conclusion that a slow cycling ‘stem/progenitor-like [CyclinEhi-Sox2hi-Krt15hi] state’ is sustained by a fine-tuned “goldilocks” level of Drp1 activity that maintains small networks of fused mitochondria. The stem/progenitor state driven neoplastic transformation is supported by fine-tuned Drp1 repression maintained by reduced Drp1 protein levels, while the neoplastic stem/progenitor state is supported by fine-tuning Drp1 by reducing its S-616 phosphorylation that modulates mitochondrial potential. In the light of the new data, we have modified the title to: Fine-tuned repression of Drp1 driven mitochondrial fission primes a ‘stem/progenitor-like state’ to support neoplastic transformation. These new results are as follows:

      1. Now we show that lowering the knockdown efficacy for both the Drp1 shRNAs reduces abundance of cells with >80 Fusion1 metric (Figure 4-figure supplement 1C and its legend, Lines: 440-445), increases abundance of self-renewing cells and accelerates neoplastic transformation in Parental cells (Figure 4F, G and their legends, Lines: 402-409). Plotting the accelerated transformation efficacy with Drp1 protein levels remaining after knockdown predicts ~50% repression of Drp1 protein levels may maximally accelerate transformation within the experimental range (Figure 4-figure supplement 1A and its legend, Lines: 415-419) (such remnant Drp1 levels may remain overestimated due to the reduction of the Actin control with Drp1 knockdown, Figure 4A,E).

      2. Now we provide multiple analyses of the impact of overexpression of Drp1-wild type and the phospho-deficient Drp1-S616A mutant (Lines: 329-356). Our data suggest that elevated Sox2hi/Krt15hi sub-population is maintained by reducing Drp1-S616 phosphorylation of the elevated Drp1 protein levels in the TF-1 population, but not in the Parental (Figure 3I). We also confirmed Drp1-WT overexpression increases the [Fission] metric and reduces [Fusion] metrics, while the Drp1-S616A mutant remains attenuated in this ability in the Parental cells (Figure 3-figure supplement Figure 1C). But paradoxically in the TF-1 population, Drp1-WT overexpression enhances [Fusion] metrics that is not observed with the Drp1-S616A mutant, while not impacting the [Fission] metric (Figure 3-figure supplement 1C). We discuss this paradox in the light of the report that overexpression of certain Drp1 activators maintains mitochondrial fusion by unnaturally sequestering Drp1. Nonetheless, this data is consistent with our findings across various cell populations that moderate attenuation of [Fusion] metric happens with fine-tuned repression of Drp1, which supports enhanced Sox2-hi/Krt15-hi subpopulation (Figure 5F). The impact of the Drp1-WT and the Drp1-S616A mutant on TMRE also remains consistent (see Response in point 3), while that of on Cyclin E remains to be explored further (Figure 3-figure supplement 1D).

      It is unclear what mechanisms connect the proposed window of Drp1 activity to tumorigenesis. In previous studies the effects of different levels of fission and fusion proteins on metabolism and tumorigenesis were analyzed in detail, showing effects on metabolism that could lead to increased tumorigenesis. That is not done here and so one is left guessing as to what functions are affected by the proposed window of Drp1 expression and how that might affect tumorigenesis.

      We thank the Reviewer highlighting the strength of the manuscript and providing critical and constructive comments to improve the quality of our manuscript. We have provided clarifications to all the comments and provided new experimental data to address concerns.

      Reviewer #3 (Public Review):

      Spurlock et al. investigated how differential repression of Drp1, a master regulator of mitochondrial fission, affect neoplastic transformation of keratinocytes as well as key aspects of gene regulation and mitochondrial network dynamics. They find that "weak" repression of Drp1 in keratinocytes results in a gene expression profile reminiscent of a stem/progenitor like state, which is especially primed for neoplastic transformation. On the other hand, they show that "strong" repression of Drp1 has a very different effect and results in cells with hyperfused mitochondrial networks and less propensity towards transformation. They find that "weak" repression of Drp1 leads not to hyperfused networks but rather to small networks of fused mitochondria. These results are especially surprising as according to the authors analysis, there is less than 20% difference in the level of knockdown efficiency under the "weak" vs "strong" shRNA conditions. But the key findings in the weak vs strong knockdown conditions seem to be well supported by RNASeq analysis, mitochondrial network analysis, and immunofluorescence data (although quantification of specific data would likely strengthen their arguments).

      The authors relate these findings to those where they use differing levels of TCDD (1 nM vs 10nM) to transform HaCaT cells. While it is clear from the data that TF-1 has a different effect from TF-10 on gene expression, cell proliferation, and certain measures of stem/progenitor cell characteristics, the key findings concerning Drp1 levels that would directly relate TF-1/TF-10 to Drp1-shRNA weak/strong are not as well supported. In particular, the immunoblots of pDrp1 and Drp1 levels as well as the mitochondrial network analysis do not necessarily support the hypothesis that the differing characteristics of TF-1 vs parental or TF-10 results from Drp1/mitochondrial changes and not simply due to cell cycle or other effects of TCDD levels. Nevertheless, both sets of data are interesting and compelling and present a more nuanced view of how differing levels of transformation agents or shRNA-mediate depletion can have considerably different effects even within the same cell type. These data may also help to clarify differences seen in past studies between distinct cell types when Drp1 levels are manipulated but this remains to be tested and clarified.

      We thank the Reviewer highlighting the strength of the manuscript and providing constructive comments to improve the quality of our manuscript. We have provided clarifications to all the comments and provided new experimental data to address concerns.

      The individual conclusions of this paper are generally well supported by the data, but some aspects of data analysis need to be clarified and/or quantified.

      1) To better support the main link between the two sets of data, the levels of Drp1 (protein and activity) in TF-1 vs TF-10 conditions must be clarified and quantified (immunoblot analysis and/or in the immunofluorescence). Since the overall levels of Drp1 actually increase in both TF-1 and TF-10 compared to Parental but the authors suggest that pDrp1 decreases in TF-1, this must be quantified. Furthermore, the authors note that Drp1 is phosphorylated in a cell cycle dependent manner and go on to show significant differences in cell cycle dynamics between Parental, TF-1 and TF-10, and so the difference in pDrp1 levels could simply be a result of the cell cycle differences. While this would not change the conclusions about how differing levels of TCDD impact gene expression, transformation efficiency, and stem/progenitor cell like characteristics, it would call into question how related the effects from direct repression of Drp1 levels through shRNA are to the TCDD effects seen.

      We thank the reviewer for this comment. We have now quantitated all the previous and newly added blots (Figure 1D, 3A,D, 4A,E, Figure 1-figure supplements 1B, C). Key findings from immunoblots are consistent with data from single cell RNA-seq, immunofluorescence and microscopy analyses, as clarified in the manuscript (mentioned in relevant Result sections in the manuscript).

      Stemness is largely determined by the cell cycle modulation, while Cyclin E and other cyclins have been shown to sustain stemness. Given Drp1 knockdown modulates Cyclin E1 (Figure 4B,C) and other S phase genes (Figure 5C) (as expected from our previous work and others), we conclude that fine-tuned Drp1 repression modulates cell cycle towards enrichment of the stem cell state and facilitate the neoplastic transformation. Our gene expression data is consistent with the consensus that Drp1-S616 phosphorylation is driven by cell cycle in a CyclinB-CDK1 dependent manner. These data together support the working model that Drp1 gets modulated by certain cell cycle regulators to be able to impact other cell cycle regulators like Cyclin E to enrich a stem cell state supporting neoplastic transformation (Lines: 360-366, 530-534).

      2) There does not seem to be a big difference between the mitochondrial networks of TF-1 and parental line except possibly the spread of the Fusion5 metric. Is this statistically significant? Are any of the other measures of the mitochondrial network found to be different in Drp1-kd (W) similarly changed in TF-1? This could strengthen the connection between these data.

      We thank the reviewer for this comment. In the revised manuscript, we have provided a more thorough analyses supported by statistical test between parental, TF1 and TF-10 cells (Figure 1E and its legend, Lines: 134-165). Bivariate analyses of the [Fission] and [Fusion5] metrics, demonstrates that the TF-1 population, with minimum Drp1 activity, exhibits maximal enrichment of a cellular sub-population with defined mitochondrial [Fission] and [Fusion5] (Figure 1E). This same sub-population is also enriched in the Parental population with weak Drp1 repression, while more complete Drp1 repression expectedly increases the cell population with maximum mitochondrial fusion (hyperfused mitochondria). Therefore, our findings are consistent with our conclusion that TF-1 and Drp1-kd (W) share the uniqueness in the profile of mitochondrial morphology, as well as gene expression (Figure 5F and its legend).

    1. Author Response:

      Reviewer #1:

      General overview and merit of academic rigor:

      Xu et. al put forth an innovative experimental pipeline to examine the connections of the raphe nuclei. This manuscript details elegantly designed viral tract-tracing methods coupled with fMOST intact imaging and sophisticated analyses. All figures are of good quality. The studies presented in the current manuscript will be a valuable contribution to the field, therefore an enthusiastic recommendation for publication is endorsed presently. However, there is a cluster of revisions and clarifications warranted before publication.

      Major concerns:

      1. The manuscript's English needs to be proofread extensively for readability and clarity.

      We invited two native English experts to proofread the manuscript's English and revise the whole manuscript.

      1. The term MR (median raphe) is used in the atlas of Paxinos and Franklin. But, the entire study follows the Allen Reference Atlas nomenclature, in which the same raphe nucleus is called the "Superior center nucleus" (CS). To keep consistency, I suggest using "CS" instead of "MR". Alternatively, in the Introduction, please make a clear statement that the MR is equivalent to CS in the Allen Reference Atlas.

      As suggested, we added the statement that MR is equivalent to CS in the Allen CCFv3 in Line 15-18.

      “The dorsal raphe nucleus (DR) and median raphe nucleus (MR, equivalent to the superior central nucleus raphe in the Allen Mouse Brain Common Coordinate Framework version 3 (Allen CCFv3)) belong to the rostral group of the raphe nuclei and contain most of brain’s serotonergic neurons (Wang et al., 2020; Watson, et al., 2012).”

      1. In the Introduction, it is unclear the rationale behind the decision to selectively study the DR and MR here (why other raphe nuclei are not included?).

      We have revised the Introduction and described why to selectively study the DR and MR in Line 15-25.

      “The dorsal raphe nucleus (DR) and median raphe nucleus (MR, equivalent to the superior central nucleus raphe in the Allen Mouse Brain Common Coordinate Framework version 3 (Allen CCFv3)) belong to the rostral group of the raphe nuclei and contain most of brain’s serotonergic neurons (Wang et al., 2020; Watson, et al., 2012). The DR and MR are involved in a multitude of functions (Domonkos et al.,2016; Huang et al., 2019; Szőnyi et al., 2019); moreover, they have different, and even antagonistic roles in the regulation of specific functions, including emotional behavior, social behavior, and aggression (Balázsfi et al., 2018; Ohmura et al., 2020; Teissier et al., 2015). The diverse regulatory processes are related to the connectivity of heterogeneous raphe groups (Muzerelle et al., 2016; Nectow et al., 2017; Schneeberger et al., 2019). Deciphering precise input and output organization of different neuron types in the DR and MR is fundamental for understanding their specific functions.”

      1. In the Results, I did not find any figure panel or images to show the anatomical location of the MR. Figure 1 shows only one injection site in DR. It is necessary to also show at least one representative injection site in the MR.

      As suggested, we added more information of injection site in Figure 1—figure supplement 2 and Figure 4—figure supplement 1.

      Figure1—figure supplement 2. Validation of the labeling of whole-brain inputs. (A) Representative coronal section of the injection site showing the starter cells (cyan). The image is from a representative sample that label the inputs to MR Gad2+ neurons. Scale bar, 1mm. (B) Enlarged view of dotted box area in (A). Scale bar, 100 μm. (C) The number and on-target rate of labeled starter neurons, and the ratio of input neurons to starter cell. The data are from validation samples that label the inputs to MR Gad2+ neurons. Data are shown as mean ± s.e.m., n = 3. (D) Comparison of inputs to MR Gad2+ neurons.

      Figure 4—figure supplement 1. Validation of the injection sites of whole-brain outputs. (A) Representative coronal section of the injection site of a representative sample that label the outputs of MR Vglut2+ neurons. The dataset has been registered to the Allen CCFv3. White dotted lines, MR in the Allen CCFv3; Yellow lines, segmented injection site. (B) Representative coronal sections of the injection site of the representative sample in (A). (C) Proportion of signal of the injection site in the DR/MR. Data are shown as mean ± s.e.m., n = 4 per group. Scale bars, A, 1 mm; B, 500 μm.

      1. This study is designed to map the input/output of two major populations of neurons (Glu+ and GABAergic) in the DR and MR using two cre-driver lines (Vglut2-cre and Gad2-cre). Please clarify how these two cre lines were characterized and whether those cre expressions are consistent with endogenous gene expressions. What are their distribution patterns in the DR and MR? Are they intermingled or relatively segregated? How are their distributions in comparison with that of serotonergic neurons?

      The Vglut2-Cre and Gad2-Cre mice were purchased from Jackson Laboratory and carried out genotyping according to the instructions. To verify the expressions characterization and distribution pattern of Vglut2+ and Gad2+ neurons, we crossed the Cre driver line mice with reporter line respectively (Figure 1—figure supplement 1). In the DR, Vglut2+ neurons were mostly found in the rostral part of the DR, while Gad2+ neurons were widely distributed and densely assembled in the lateral DR. In the MR, Vglut2+ neurons were mainly found in the caudal part of the MR, and the Vglut2+ neurons in the rostral part of the MR were mainly distributed laterally; moreover, Gad2+ neurons were distributed throughout the MR. And there are obvious differences between the overall distribution pattern of Vglut2+ and Gad2+ neurons in the same raphe nucleus. Compared with the distribution of serotonergic neurons (http://connectivity.brain-map.org/ transgenic/experiment/100140881), the distribution of Vglut2+ neurons seem to be relatively segregated with them, and the distribution of Gad2+ neurons are intermingled with them.

      As Gad2-Cre generally labels all mature GABAergic neurons, while Vglut2-Cre only labels a population of glutamatergic neurons, and there are also numerous Vglut3+ neurons in the DR and MR, we decide to perform experiments to characterize the specificity of the labeled Vglut2+ starter cells. We performed in situ hybridization to characterize the specificity of labeled starter cells in the Vglut2-Cre mice and found that they were Vglut2 positive, with a few simultaneously being Vglut3 positive (Figure 1B,C; Figure1—figure supplement 3), which was confirmed by immunohistochemical staining (Figure 1—figure supplement 4).

      Figure 1—figure supplement 1. Distribution and total number of Vglut2+ and Gad2+ neurons in the DR and MR. (A) Representative coronal sections of maximum intensity projection showing the distribution of Vglut2+ and Gad2+ neurons in the DR. The projections are 200 μm thick. Scale bar, 200 μm. The total number of Vglut2+ and Gad2+ neurons in the DR are presented as mean ± s.e.m., n = 2. (B) Representative coronal sections of maximum intensity projection showing the distribution of Vglut2+ and Gad2+ neurons in the MR. The projections are 200 μm thick. Scale bar, 200 μm. The total number of Vglut2+ and Gad2+ neurons in the MR are presented as mean ± s.e.m., n = 2. (C) Density plot of specific neuron types in the DR and MR along the anterior-posterior axis. Bin width, 100 μm. The shaded area indicates s.e.m., n=2.

      Figure 1—figure supplement 3. Characterization of the specificity of starter cells using in situ hybridization. (A) In situ hybridization at the MR in Vglut2-Cre mouse. (B) Enlarged view of the box area in (A). White arrows, starter cells. Scale bar, A, 200 μm, B, 20 μm.

      Figure 1—figure supplement 4. Validation of the specificity of starter cells using immunohistochemical staining. (A) Immunohistochemical staining against Vglut3 at the DR in Vglut2-Cre mouse. White arrows, starter cells. Red arrows, starter cells that are Vglut3 positive. (B) Immunohistochemical staining against Vglut3 at the MR in Vglut2-Cre mouse. White arrows, starter neurons. Red arrows, starter cells that are Vglut3 positive. (C) Control experiment, immunohistochemical staining against Vglut3 at the SSp in Vglut2-Cre mouse. Scale bar, 50 μm.

      1. Overall Discussion is not well organized. I suggest to start with a clear statement about the novel discoveries of this study in comparison with existing literature, and then compare the overall input/output patterns of Glu+ and GABAergic populations in the DR and MR. The current discussion focuses on a few major targets (i.e., CEA, LH), but missed a big picture. Additionally, it is necessary and important to carefully compare their connectivity patterns with that of serotonergic neurons in these two raphe nuclei.

      As suggested, we have reorganized the Discussion. At first, we compared the results with existing literature and pointed out the similarities and differences of connectivity patterns compared with that of serotonergic neurons. Then, we compared the overall input/output patterns of glutamatergic and GABAergic neurons in the DR and MR and discussed their implications for behavior functions. At last, we discussed the potential caveats in our viral tracing techniques and data analysis.

      Minor concerns:

      1. The Impact statement reads, "We reconstructed the input-output circuits of glutamatergic and GABAergic neurons in the dorsal raphe nucleus and median raphe nucleus and proposed a more refined model of the habenula-raphe circuit." When a comparison like this is put down, a specific reference to what your method is more refined than is required. This is well explained in lines 242 and 243, "Based on the conventional model of the habenula-raphe circuit (Hikosaka, 2010; Hu et al., 2020), we proposed a more refined model of the habenula-raphe circuit (Figure 5C)." Make a similar claim earlier in the Impact statement.

      We have revised the impact statement as follow:

      “Whole-brain quantitative input-output circuits of glutamatergic and GABAergic neurons in the mouse dorsal and median raphe nuclei were mapped using viral tracing and high-resolution optical imaging.”

      1. For Figure 2A, it would be easier on the reader if inputs for each region (DR and MR) and each plane of the section were placed on the same image akin to the inputs presented on coronal maps in Figures 2B and 5A and the inputs/outputs for each region (MR and DR) in the sagittal summary diagrams in Figure 7.

      For Figures 2A and 2B, we wanted to present the whole-brain inputs from different perspectives. For Figure 2A, as there were tens of thousands of input neurons and the input patterns were similar, if we placed the inputs on the same image, the color would mix up and it would be difficult to see clearly. Thus, we presented them separately in sagittal and horizonal views in Figure 2A. Further, we presented the inputs together on coronal maps in Figure 2B.

      1. It is unclear what the nonsignificant grey open circles represent in Figures 3A-D; 4D and E.

      In Figures 3A-D, the circles represent the proportion of input neurons in each brain region. If there is a significant difference between the inputs in one brain region to two neuron groups, the circle is red and solid, and the name of the brain region are presented nearby. If there is no significant difference between the inputs in that region to two neuron groups, the circle is gray and hollow. To highlight those brain areas with significant differences, the names of these brain regions are not presented. Moreover, we provided the source data in Supplementary File 2. As for Figures 4D and E, it is akin to Figures 3A-D.

      1. In Figure 4A, the imaging portion would be clearer if it read "optical sectioning."

      As suggested, we revised the image portion to make it clearer.

      1. In Figure 7A and B, the position of ACA on the flatmap looks odd to me (it is a little bit too caudal).

      As suggested, we revised the position of ACA on the flatmap.

      Reviewer #2:

      This work from Xu et. al. "Whole-brain connectivity atlas of glutamatergic and GABAergic neurons in mouse dorsal and median raphe nucleus" provided a comprehensive brain-wide analysis for input and output patterns to/from specific DR/MR neuronal populations in adult mouse brain. With exceptional strength in experimental approaches for high quality whole brain imaging that this group is famous for, their data and analysis are thorough and convincing for the general conclusion of the manuscript for describing both convergent and divergent patterns of DR/MR connectivity. While the current study is based on structural but not functional correlation analysis, the results are validated with prior knowledge of the field. It will provide a more complete picture to facilitate future investigation of DR/MR connectivity and physiological functions.

      The work would provide a significant and useful knowledge for the field, while also promoting the generation and application of advanced brain-wide profiling resource to advance board neuroscience research topics. However, there are still a few technical and analytical concerns that need to be addressed or discussed to refine the conclusions.

      Major concerns:

      1. For targeted injection-based analysis, it is critical to carefully analyze and discuss on-target vs off-target rates of labeled cells in DR/MR to validate the datasets. Whole mount data would best fit for such accurate analysis not possible before.

      As for the inputs, samples from the same batch of virus tracing experiments were treated as validation datasets to analyze on-target rates of labeled starter cells. As for samples that label inputs to MR Gad2+ neurons, the on-target rate of labeled starter cells is 66.40 ± 2.78% (Figure 1—figure supplement 2C). And we counted the input neurons and calculated the ratio of input neurons to starter cells (Figure 1—figure supplement 2C). Compared with experiment datasets, they have consistent input patterns (Figure 1—figure supplement 2D). As for the outputs, we manually segmented the injection region and calculated the proportion of signal of the injection region in the DR/MR (Figure 4—figure supplement 1).

      1. It is also important to know what percentage of the cells get labeled over individual samples, and how many samples and in total what coverage/saturation over the entire anatomical structure has been achieved to justify a complete/comprehensive analysis.

      We counted the Vglut2+ and Gad2+ neurons in the DR and MR in crossed mice (Vglut2-Cre: LSL-H2B-GFP mice and Gad2-Cre: LSL-H2B-GFP mice; Figure 1—figure supplement 1A,B). As for the inputs to MR Gad2+ neurons, the labeling rate is 11.60±1.28 % (Figure 1—figure supplement 2C). As for the outputs, we counted the labeled Vglut2+ and Gad2+ neurons in the DR and MR and calculated the percentage (DR Vglut2+: 18.38±8.33%, DR Gad2+: 10.66±2.65%, MR Vglut2+: 43.67±8.25%, MR Gad2+: 11.10±2.09%) (Supplementary File 3). The data were replicated in 4 samples, which was comparable to previous studies of input and output circuits (Ährlund-Richter et al. 2019. Nature Neuroscience, 22: 657–668; Do et al. 2016. eLife, 5: e13214; Gehrlach et al. 2020. eLife, 9: e55585.).

      1. Further on last point, the labeling rates need to be small enough to warrant a more meaningful analysis in Figure 6. From another aspect, is there any anatomical correlation of the target sites in DR/MR for the distinct input/output clusters? This can probably be best addressed with single neuron resolution analysis that this group is good at. For the current study it is a vital part to include this detailed information for better resource to the field (e.g. to guide or map to future spatial transcriptomic analysis to study molecular-cellular correlations).

      Following the previous question, the labeling rate is at a low level, which could ensure that the analysis is meaningful. The analysis in Figure 6 implied that the glutamatergic and GABAergic neurons in the DR/MR might receive inputs from and project to various unions of brain regions. The brain regions in one cluster might be connected with the same subsets of specific neuron types. The brain regions of negative correlation might be connected with distinct subsets of specific neuron types (Weissbourd et al. 2014. Neuron 83: 645–662). As for the inputs to DR Vglut2+ neurons, Vglut2+ neurons receiving inputs from the SNc might be the same as those receiving inputs from the VTA and SNr, but distinct from those receiving inputs from the BST (Figure 6A). These implications are worth illustrating through complete single neuron reconstruction. However, single neuron reconstruction needs substantial time, which is beyond the scope of this work but in our future plans. And our datasets have been registered to the AllenCCFv3, which enables to be directly incorporated to more resource with the same coordinate system. Spatial transcriptome is the current research hotspot, but spatial localization cannot reach the level of single neuron, and it is difficult to integrate with the morphology. We are engaged in this research, but there is no significant progress.

      Reviewer #3:

      Xu et al utilize retrograde and anterograde viral tracing in Cre-transgenic mouse lines to map the inputs and outputs of glutamatergic and GABAergic neuronal populations in the dorsal (DR) and median raphe (MR) nucleus. The experiments generate a large anatomical dataset which the authors analyse with correlation analysis, revealing subtle differences in connectivity patterns between the targeted cell types and nuclei. The study furthermore focuses on the lateral habenula (LH) to raphe nucleus circuit, identifying large amounts of inputs from the LH to both glutamatergic and GABAergic DR and MR populations, but scarce projections from these cells back to the LH, with some cell-type specific differences. In particular, MR glutamatergic neurons send the strongest projections to LH among the targeted populations, supporting previous studies which identified this pathway as playing a role in aversive behaviors.

      Overall, this study nicely complements previously published work on whole-brain connectivity of the DR and MR which have chiefly focused on the main neuromodulatory neurons found in these nuclei, ie. serotonin and dopamine neurons. Some of the experiments in the study are not completely novel, such as input tracing to GAD2-expressing neurons in DR (Weissbourd et al, 2014). However, comprehensive side-by-side comparison analysis between glutamatergic and GABAergic connectivity of both DR and MR nuclei has not been performed before, and will provide a welcome resource to circuit neuroscientist looking to elucidate functional circuits of the raphe nuclei. A further strength of the study is the high-resolution 3D imaging, revealing three distinct projection pathways from MR glutamatergic neurons to LH.

      Two main concerns regarding the study are:

      1) The authors do not sufficiently justify the use of Vglut2 as a marker for glutamatergic neurons in DR and MR. The majority of previous studies, especially of the DR, use another glutamatergic marker which is more specifically expressed in the raphe nuclei, namely Vglut3. Vglut3 is much more anatomically restricted to the DR and MR (but has also been shown to partially overlap with serotonergic expression). In contrast, Vglut2 is very broadly expressed throughout the brain and in regions adjacent to DR and MR. For this reason, and from the data in the main manuscript as well as raw microscopy images provided in the accompanying website, it is unclear how specific the starter neuron targeting really is. The authors should show more detailed starter neuron analysis for both the broadly expressed Vglut2 and Gad2 in the DR and MR, showing the histology of the helper virus BFP and RV-ΔG-EnvA-GFP, their anatomical locations, and some quantification of proportion of starter cells within DR/MR (Fig 1B-C shows it only for Vglut2, but in insufficient detail). Furthermore, a rationale for using Vglut2 instead of Vglut3 would be appreciated, especially given that the vast majority of functional studies of the DR have used Vglut3.

      The authors also miss the chance to characterize the topography of Vglut2 and Gad2 starter cell expression within the DR and MR and emphasize the interesting differences between these two populations, which may be relevant to the differences in input and output connectivity.

      We added more information of starter cells in Figure1—figure supplement 2. And we performed in situ hybridization and immumohistochemical staining to characterize the specificity of the labeled Vglut2+ starter cells. The labeled starter cells were Vglut2 positive, while a fraction of them was simultaneously Vglut3 positive (Figure 1B, C; Figure1—figure supplement 3,4).

      As glutamatergic neurons in the DR and MR are mainly comprised of Vglut2+ neurons and Vglut3+ neurons, but numerous Vglut3+ neurons are also serotonergic (Huang et al. 2019. eLife, 8: e46464; Pinto et al. 2019. Nature Communications 10, 4633–4633; Sos et al. 2017. Brain Structure and Function, 222: 287–299.). The anatomical connections of serotonergic neurons in the DR and MR have been well studied (Pollak Dorocic et al. Neuron. 83: 663–678; Ren, et al.2019. eLife 8: e49424; Weissbourd et al. Neuron. 83: 645–662). DR and MR Vglut2+ neurons are relatively independent from Vglut3+ neurons. And they have been revealed to regulate multitudinous functions, such as emotional behaviors (Szőnyi et al.2019. Science 366: eaay8746), but their whole-brain connectivity remains incomplete. Thus, we choose to study the inputs and outputs of Vglut2+ neurons.

      And there are differences between the distribution of Vglut2+ and Gad2+ neurons in the DR/MR (Figure 1—figure supplement 1), and these differences might be relevant to the differences in input and output connectivity, which are worth illustrating in our future studies.

      2) The quantification throughout the manuscript refers to the relative proportion of inputs or outputs for each cell population and nucleus. The manuscript would be strengthened by also including total cell counts for starter cells in each group, as well as total numbers of input neurons. For example, is the Vglut2 population in DR much larger than the Gad2 population, and do DR Vglut2 neurons receive more inputs in total than DR Gad2 neurons? Including raw numbers would provide concrete information to contextualize connectivity patters between cell types and nuclei to the readers.

      We added the number of input neurons in Supplementary File 2. As we discussed in lines 413-415, the monosynaptic rabies tracing technique might only label a portion of inputs, and the labeling could be biased toward specific neuron types and affected by many factors. Further, the ratio of the number of input neurons to starter cells variate in a vast range (Callaway and Luo. 2015. The Journal of Neuroscience, 35: 8979–8985). Thus, the larger population of specific neuron types might not indicate that they receive more inputs.

    1. Author Response:

      Reviewer #2:

      To address the roles of NmMetQ protein, the authors used multiple biochemical and biophysical techniques to characterize the structure and function of NmMetQ without and with its cognate ABC transporter NmMetNI. However, considering the similar substrate binding protein EcMetQ from E. coli has been experimentally verified to be a lipoprotein, the major conclusion of this manuscript is not particularly novel. Besides, the authors should address some points to further strengthen their conclusion.

      Major points:

      1) The LC-MS results suggest that the recombinantly expressed and purified lipo-NmMetQ protein has lipid modifications, mainly deduced from the molecular masses. Did the authors perform other experiments to further support the presence of lipid modifications?

      In addition to using LC-MS to demonstrate that recombinantly expressed full-length NmMetQ leads to the production of lipid-modified NmMetQ, we changed the cysteine residue at position 20 to alanine. If NmMetQ was not a lipoprotein, we would expect the mass change to reflect the mass difference between cysteine and alanine. However, if full-length NmMetQ was a lipoprotein, this amino acid change would prevent lipid modification and lead to the accumulation of pre-protein NmMetQ. LC-MS analysis of both the full length and C20A NmMetQ proteins support our assertion that recombinant expression of full-length NmMetQ leads to the production of lipidated NmMetQ. Furthermore, the size exclusion chromatography trace illustrated in Figure 1B reveals that only a very small amount of unacylated wild-type protein is present (the small bump near 100 mL), indicating that the extent of lipidation of the wild-type protein is nearly complete.

      2) I noticed that the NmMetQC20A protein was also purified with DDM detergent, could the mutant protein be purified without detergent? And could the WT protein be purified without detergent? This experiment could be an additional evidence to support the absence of lipid modifications on the mutant protein and presence of lipid modifications on the WT protein.

      To maintain consistency between experiments, all proteins were purified in the presence of detergent. No attempts were made to purify NmMetQC20A or full length NmMetQ in the absence of detergent. We speculate that maximal extraction of lipo-NmMetQ and pre-protein NmMetQ would be difficult in the absence of detergent, since the lipid-moiety and the N-terminal signal sequence of these proteins are believed to be associated with membranes. However, secreted NmMetQ should readily purify in the absence of detergent, as we have observed with NmMetQ construct with the signal peptide truncated in our previous study (see reference below).

      Nguyen PT, Lai JY, Kaiser JT, Rees DC. Structures of the Neisseria meningitides methionine‐ binding protein MetQ in substrate‐ free form and bound to l‐ and d‐ methionine isomers. Protein Science. 2019 Oct;28(10):1750-7.

      3) It is very interesting to see that the lipid moiety of lipo-NmMetQ is required for maximal NmMetNI stimulation, especially compared to the secreted NmMetQ. This result suggests that the lipid moiety could participate in the NmMetNI stimulation directly. But the lipid moiety could not be resolved in the lipo-NmMetQ:NmMetNI complex structure, probably due to the limited resolution at 6.4 Å. This point is quite novel, unfortunately, this manuscript provided little insight on this.

      We agree that this is an important point; as noted above in our response to Essential Revision #1, it was not possible to increase the resolution of the lipo-NmMetQ:NmMetNI structure; in view of the observations by Liu et al also noted there, it is possible that the lipid is poorly ordered and not visible even in higher resolution structures.

      4) The inward-facing NmMetNI structure was resolved in the presence of lipo-NmMetQ and AMPPNP, but only the apo NmMetNI structure was captured. This is unexpected, and the authors should comment on this.

      Thank you for the suggestion. We have expanded our result section to comment on why we believe no complex formation was detected.

      5) The bioinformatic prediction of the distribution of lipid-modified MetQ proteins in different classes of Proteobacteria is very weak. As the authors mentioned in the Discussion part, future efforts should be made to experimentally determine which SBPs have lipid modifications. I would suggest that the authors should move the Fig. 5 to supplementary information and move the corresponding text to Discussion.

      We thank the reviewer for this suggestion. The observations from the bioinformatic analysis demonstrating that many Proteobacterial families can possess SBP lipoproteins, however, are important to establish the general relevance of our biochemical and structural findings. We therefore respectfully request that this data remain in the results section.

      Minor points:

      1) The authors should provide the details for cryo-EM sample preparations in the Methods and Materials, such as how the protein complex was assembled, the protein complex concentrations, AMP-PNP/ATP concentrations, methionine concentrations, et al.

      Thank you for noting this - we have expanded our sample preparation section to include the information requested.

      2) Please state the ATP concentration used in ATPase experiments in Methods and Materials.

      Thank you for noting this - we have added the information requested to the Methods and Materials.

    1. Author Response:

      Reviewer #2:

      Wodeyar and colleagues describe a new method for phase estimation and compare their method to a range of previously published approaches. Using a state space model, they separately model the signal and noise, and demonstrate accurate phase tracking for broadband signals, and in the presence of multiple rhythms and phase-resets.

      The major strength of the manuscript is the ability to track broadband signals without the need to use bandpass filters and to better distinguish between multiple rhythms, even those which are quite close in frequency. The methods and results segments are very well written and describe the approach in great detail. The manuscript also allows the reader to compare multiple methods, commonly used in the field. Processing rhythms without the need for a threshold based method is an added contribution of the method.

      The main weaknesses of the manuscript are (1) not being able to compensate for non stationary rhythms (2) and in-vivo phase estimation accuracy. For real-time closed loop phase-locked stimulation, stimulation itself has been shown to speed-up / slow down target rhythms depending on the stimulation angle, and also different rhythms have been shown to drift over time, therefore compensating for non stationary centre frequencies could be critical for such applications. Based on previously published phase-locked stimulation papers, an average 60 degree phase estimation accuracy (in vivo) may not be sufficient to determine effective stimulation parameters.

      While the paper makes a great contribution to phase estimation by removing the dependency on filters, whether or not this would actually improve applications (with respect to already trialled approaches) remains unclear.

      In the revised manuscript, we now discuss these two important points raised by the Reviewer, and how the method developed here can help address these points:

      (1) If we have understood the Reviewer correctly, the concern is that the underlying statistics/distribution of the rhythmic parameters vary over time, e.g., the central frequency and/or bandwidth of the rhythm may vary over time. The SSPE method, by virtue of the model structure (which permits stochastic frequency modulation) and the Kalman Filter (correcting the instantaneous frequency to better fit the observation) can track small changes in the center frequency. However, to the best of our knowledge, no current method for real-time phase estimation performs well with non-stationary rhythms. A potential future approach would incorporate an extended Kalman Filter that adjusts parameters of the model while filtering the state.

      We have updated the Discussion to include this important point as follows:

      Discussion, Limitations and Future Directions: “... Rhythms, observed over time, may be better represented by models with changing parameters. Indeed, non-stationarity is an important issue to consider when tracking brain rhythms. The SSPE method is robust (by virtue of the model structure and application of the Kalman Filter) to small changes in the central frequency or the bandwidth of a rhythm (e.g., Figure 2). However, non- stationary rhythms require new algorithms to be developed (such as the PLSO method in Song et al., 2020). An extension of the SSPE method that could potentially track changing central frequencies is to apply an extended Kalman Filter (Schiff, 2012) that simultaneously estimates the frequencies of interest while filtering the state. To address this, The SSPE method could also be extended to implement a switching model that utilizes multiple sets of parameters and switches between parameter sets as necessary, or by refitting the SSPE model as time evolves. ...”

      (2) To address the issue of in-vivo phase estimation accuracy, we first note that, in the original manuscript, we estimated the error in phase using all in-vivo data, which effectively serves as an upper bound on the possible error when estimating phase in real time. In the revised manuscript, we now consider an additional constraint of thresholding based on the credible interval width. Doing so, we find that the accuracy of the SSPE method dramatically improves. We include this new analysis in the Results:

      Results, Confidence in the Phase Estimate: “... at other times - when the theta rhythm is less obvious - the credible intervals expand. We can restrict error in the phase estimate by examining only samples with small CI width. When thresholding at 10 degrees of credible interval width, we find that the error decreases from (mean) 46.9 (s.e. 0.89) degrees to 26.91 (s.e. 1.07) degrees, while still retaining 27 percent of the data. This is within the current state-of-the-art in phase estimation with minimum of 20 to 40 degrees error as a function of SNR (when applying a sufficiently high amplitude threshold, Zrenner et al., 2020).”

      Additionally, for the second in-vivo analysis of the EEG mu rhythm we have added a similar analysis:

      Results, Example in-vivo Application: human EEG: “... When a strong mu rhythm emerges, tight credible intervals surround the mean phase estimate tracking the rhythm of interest; as the mu rhythm wanes, the credible intervals expand. When we restrict analysis to samples with credible interval width less than 25 degrees, the error drops from 40.5 (s.e. 1.1) to 11.2 (s.e. 0.46) degrees while still retaining 16.4 percent of the data. As with the LFP, we are able to assess certainty in our phase estimates using the credible intervals.”

    1. Author Response:

      Reviewer #1:

      The manuscript by Takahashi et al describes the interaction between MLL fusion proteins with HBO1 and its role in leukemogenesis. Myeloid progenitor transformation assays using various MLL fusion proteins reveal that MLL fusion proteins requires the TRX2 domain of MLL for effective leukemic transformation. IP-MS identifies HBO1 as a bona fide binding partner of the MLL TRX2 domain. ChIP-seq experiments show genome-wide colocalization of HBO1 complex with MLL-ENL and the WT MLL in MLL-fusion leukemia cells and MLL WT cells, respectively. ChIP-qPCR in MLL-deficient cells suggest that recruitment of HBO1 to MLL target genes (such as MYC and CDKN2C) depends on MLL. Truncation analysis of the ELL part of the MLL-ELL fusion reveal that MLL-ELL transformation activity requires OHD domain-mediated recruitment of AF4 and EAF1. Furthermore, co-IP and ChIP experiments with various fragments show that AF4 and EAF1 form two distinct SL1/MED26-containing complexes and likely the AEP/SL1/MED26 complex is competent for transactivation. Series of transformation assays suggest that MLL-ELL transforms hematopoietic progenitors via association with AEP, but not other ELL-associated proteins. Finally, the authors also show that NUP98-HBO1 fusion transforms myeloid progenitors through interaction with MLL. Overall, this is a quite comprehensive study demonstrating that various MLL fusions and NUP98 fusions transform hematopoietic progenitors via HBO1-MLL interaction, which suggests that targeting their interaction might be s new therapeutic approach.

      We appreciate the comments and inputs from the reviewers.

      Reviewer #2:

      In this manuscript, the authors identified an interesting interaction of MLL (a methyltransferase) with an HBO1-JADE complex (an acetyltransferase) and investigated the functional impact in leukemogenesis by fusion proteins containing MLL or HBO1. The data is clear and the connection between MLL and HBO1 is unexpected. The manuscript is also well organized and relatively easy to follow.

      Comments:

      1) The functional relevance of the interaction between MLL and HBO1 is still correlative. It would be important to know whether there are any results directly about the impact of the loss of the HBO1 complex on the function of MLL.

      We performed a sgRNA-dropout assay which showed that HBO1 is critically required for the survival of leukemia cell lines, as depicted in Figure 2F and Figure 2-figure supplement 3.

      2) It is important to show the source and specificity of the antibodies that were used for ChIP of the HBO1 complex.

      The details of the antibodies are provided in Key Resource Table.

      3) It might be interesting to check whether other JADE proteins and also BRD1 (another partner of HBO1) are involved.

      We agree that it would be very interesting to examine the involvement of other JADE/BRPF family proteins in the future because they share the ING4/5 subunits and BRD1 plays an important role in hematopoiesis (1). This can be addressed in future studies.

      4) The acronym TRX2 may be confusing as some might think that it is thioredoxin.

      As advised, we have changed this to THD2 (TRX homology domain 2).

      Reviewer #3:

      This paper starts with a series of bone marrow transformation assays comparing MLL fusions and domain-deletion mutants thereof to define the minimal elements for robust leukemic transformation and surveying growth and attendant common fusion targets HoxA9, Meis1 in colony replanting assays. Here they discover that a region of the MLL-N portion just upstream of the well-studied CXXC domain, termed in their previous work the "TRX2 domain" is important for the transformation capacity for several different MLL-fusions (and more minimal chimeras of key modules). A small region of the MLL-N protein encompassing the TRX2 domain and the CXXC module are subjected to complex purification, it is clear from comparison to number of controls that the TRX2 domain is an important mediator of association, perhaps indirect, with the HBO complex. Drop out experiments confirm that HBO1 knockout is lethal to MLL-rearranged leukemia, nicely confirming recent work (Ay et al., MacPherson et al.).

      ChIP-seq experiments in an ALL with MLL-ENL fusion, and then more extensively in a kidney cancer cell line indicate overlap with some of the HBO complex subunits and MLL, however this does not establish recruitment at these sites. ChIP-qPCR at a few MLL-fusion target genes with MLL depletion supports the recruitment hypothesis somewhat although mixed and modest effect sizes indicate that alternate pathways for HBO1 recruitment are involved, and could also be explained as reduced deposition of marks known to recruit HBO1, rather than direct recruitment. Sadly, the real potential strength of this work goes unrealized, as the recruitment of HBO1 mechanism remains tantalizingly out of reach. More experiments in this space could conclusively establish the molecular mechanism of a seemingly biomedically important recruitment paradigm, and thereby have much more impact.

      As the reviewer pointed out, MLL is not the only element that recruits the HBO1 complex to the target chromatin. MLL is known to deposit H3K4me marks, and the HBO1 complex is known to recognize these marks via ING4/5 subunits. We performed a ChIP-qPCR analysis of H3K4me3 in MLL-knockout cells. At the MYC promoter, the H3K4me3 marks were substantially decreased (Figure 3F). Moreover, recruitment of HBO1 was not recovered by transient expression of an MLL mutant containing THD2, indicating that the presence of H3K4me3 marks is a prerequisite for HBO1 recruitment. In accordance with this, ING5-histone interaction is required for the stable association of MLL with the HBO1 complex (Figure 8A-C). Thus, a more appropriate molecular mechanism would be the cooperative recruitment of the HBO1 complex by ING4/5-mediated chromatin association and MLL-mediated association. Because of the multiple contacts involved in this molecular network, it is not easy to pinpoint the direct contacts as desired, but our biochemical analyses indicate that PHF16 and ING4/5 offer relatively strong binding surfaces (Figure 8A-C). The ING domain of ING5 is the most likely direct binding surface identified thus far.

      At this point the paper shifts to a seemingly distinct line of inquiry, which is not closely related to the HBO1-TRX2 story to the first three figures. The new direction examines the ELL fusion partner in some detail using similar fusion protein chimeras, but a portion of Figure 4, is largely confirmatory of previously established findings about the critical regions of ELL for transformation and its AF4/EAF1 partners, adding only that portions of the MLL fusion protein are dispensable, provided that they are replaced with the PWWP of LEDGF. It is a little bit of a Frankenstein's monster experiment, and does not add much new to the field. Further experiments define potentially two distinct complexes that have already been characterized being recruited by ELL, although there is overlap here again with their previous studies, and the results are a little hard to interpret.

      A portion of Figure 4 was confirmatory to previous results. We have moved this to figure supplements in the revised manuscript (Figure 4-figure supplement 1B,C). The main topic of this paper is the role of the HBO1 complex in MLL-mediated transactivation pathways. The structure/function analysis of MLL fusion proteins demonstrated that MLL-ELL is highly dependent on the HBO1-mediated function in leukemic transformation (Figures 1 and 2). Hence, it was important to clarify the mechanism of gene activation by MLL-ELL in this study to understand why HBO1 association is required for MLL-ELL-mediated transformation. Because MLL-ELL associates with AEP similarly to major MLL fusions such as MLL-AF4 and MLL-ENL, it was speculated that MLL-ELL also activates its target genes via AEP. However, ELL associates with EAF family proteins and MLL-EAF also has transforming ability (3). Thus, EAF1-mediated functions could be more important for MLL-ELL-mediated transformation rather than AEP-mediated functions. To clarify the mechanism of MLL-ELL-mediated transformation, we generated a point mutant that selectively impaired ELL-EAF interaction and demonstrated that EAF1-association is dispensable for MLL-ELL-mediated transformation (Figure 6), thereby indicating that MLL-ELL transforms via AEP-mediated functions, which demands HBO1-mediated functions. We also showed that the presence of THD2 enhances ELL-AEP association to further suggest that one of the roles of the HBO1 complex is to enhance the association of ELL with AEP (Figure 6E). These findings are not reinterpretations of our prior results and are relevant to the main topic of this paper. We believe this part adds new information to the field, and therefore we have included it in the revised manuscript.

      The authors create structure-guided separation of function mutants in the ELL domain that binds both AEP and SL1, permitting them to specifically disrupt EAF1 interactions but not AF4. Further experiments solidify this interpretation, and find that this mutant shows no deficits in hematopoietic progenitor transformation or primary leukemia lethality, although there appears to be some effect upon reimplantation.

      The last figure in the paper tackles the seemingly unrelated Nup98-HBO1 fusion, a rare patient mutation-they demonstrate a requirement for MLL for viability of hematopoietic progenitors transformed by this fusion, connecting back to the TRX2 interaction, and show that menin inhibitors slow growth.

      Strengths:

      The identification of the TRX2 region of the MLL-N protein as the major point of contact (perhaps not direct), to the HBO1 complex adds mechanistic depth to the really important recent discovery (confirmed in this work) the MLL-fusion leukemias rely on HBO1 function. This lab has published a number of technically similar types of papers defining minimal regions of MLL and distinct interacting partners by chimeric fusions, with bone marrow transformation assays, mouse model engrafting studies, IP's, ChIP etc. In my view they are very much under cited, likely because they are similarly so challenging to read.

      Thank you for your pointed feedback. We will try our best to make the necessary improvements so that our papers are widely read and cited.

      The mixture of Co-IP biochemistry, bone marrow transformation assays, and ChIP, to define interactions, minimal requirements for transformation, and their chromatin consequences for a host of different MLL-fusions and HBO1-fusions has the potential to define the key interfaces underlying recruitment.

      Weaknesses:

      The mechanistic inquiry stops short of really defining the critical MLL-HBO1 complex interface. Defining the point of contact on the HBO1 side (even which subunit) and determining whether it is direct, or bridged by some, as yet unidentified factor, as well as conclusively demonstrating that this is the mechanism of HBO1 recruitment remain the major shortcomings.

      To address this criticism, we further investigated the mechanism of complex formation by MLL and the HBO1 complex. As we demonstrated in Figure 8A-C, the association appears to be mediated by multiple contacts mainly through PHF16 and ING4/5. Because this association needs an intact PHD finger of ING5, it likely occurs depending on the context where ING4/5 is bound to histone H3K4me2/3. The ING domain of ING5 was also required for the association, indicating that this portion may contains a point of direct contact. We speculate that HBO1 recruitment is mediated primarily by ING4/5-H3K4me3 interaction and MLL reinforces its chromatin association.

      And the follow-on figures apart from the last one, appear disconnected from this portion of the story and distract from it.

      We depicted a revised model incorporating the above-mentioned aspect in Figure 8D of the revised manuscript.

      The complex nomenclature and density/organization/logic of the presentation of experiments makes this paper difficult to read. Absence of sufficient grounding in the broader literature much beyond their own lab's work further compounds the problem.

      We changed some of the nomenclature and density/organization/logic of the presentation of the experiments to improve the readability.

      There is a lot of overlap, particularly in parts of figure 1 and figure 4 with previously published results. So perhaps re-organizing the display of data, and the organization of presentation, putting confirmatory work in the supplementary figures, would improve accessibility.

      We moved some portions of Figure 4 to figure supplement. The data for MLL-AF10 and MLL-ENL were retained in the Figure 1 as important references.

    1. Author Response:

      Reviewer #3:

      Weaknesses:

      • The in vivo suppression phenotype is relatively minor and suggests that other factors or pathways play a significant role in vivo. For example, the Diffley lab has previously shown that deletion of EXO1 almost completely suppresses the drug sensitivity of rad53∆ cells. This suggests that the Rad53-Mrc1 axis of regulation described here remains insufficient to prevent the accumulation of Exo1-sensitive lesions or conditions.

      We agree that the MRC18D rescue of the rad53Δ sensitivity to HU and MMS is small. We note, however, that it is reproducible and suggest that it may be only one part of Rad53’s complex role in protecting replication forks during the replication checkpoint. Because Mcm10 cannot be phosphorylated in MRC18D rad53Δ cells, replication speed is likely to be faster than in RAD53 wild-type cells, where both Mrc1 and Mcm10 would be phosphorylated. Thus MRC18D by itself may not provide sufficient fork slowing to protect forks. Though outside the scope of this work, it would be interesting to identify an Mcm10 mutant to combine with the Mrc1 mutant and see if this further rescues rad53Δ sensitivity. We have added this point to the discussion on page 14. We also note that the ability of EXO1 deletion to suppress drug sensitivity has turned out to be more complex than originally believed because of suppressors which accumulate rapidly in rad53 mutants (Gómez-González et al, Genetics, 2019), hence the importance of tackling this issue with biochemistry.

      • Despite the development of a quantitative method for DNA unwinding and the high quality of the data, there is no quantitative analysis of the data by statistical method. At least there needs to be clear evidence of reproducibility.

      We have now included the details of how we quantify the unwound DNA in the helicase assay and how we extract the unwinding rate to the Experimental Procedures “CMG helicase assay” section on page 17. We have also added the 95% confidence interval for the unwinding rate in the “Mrc1 regulation of replication rate” section of the results on page 8.

    1. Author Response:

      Reviewer #1:

      Heinze and colleagues present InsectBrainDatabase (IBdb), a resource that collects, displays and shares mostly neuroanatomical data from several insect species. Users are able to search and visualise neuronal morphologies in several ways, brain regions and species' information. On the whole, the site is very well built, with a clear intent on providing a good user experience, both for experienced insect researchers and naïve users. The authors have designed the site in a way that it could be used as a general data hub, pre-publication, with a clear versioning system, and this will be appealing for some researchers. The manuscript describes the site accurately, focusing on how a user would interact with it.

      Although the authors intend for IBdb to become a major resource for their community, it is not made clear how they will increase data deposition beyond simplifying this process. With a distributed curation approach, and the expectation that researchers will be submitting their own data, it is also not clear how they will ensure consistency, completeness and accuracy of data curation, an essential aspect to broaden the usage of their platform, guarantee transparency to users, while, at the same time, enriching the amount of searchable data.

      There are three major incentives for data deposition that go beyond providing a simple and intuitive deposition procedure. First, our database serves not only as an interactive data repository, but as a data management tool at the same time. The possibility to securely store, manage and visualize unpublished data in an online platform is, according to our knowledge, unique and, once routinely used, will make data deposition a natural part of the general workflow in a research laboratory. With increasingly mandatory data management plans for researchers, we provide an intuitive, integrated solution for data management and deposition for all researchers in the field of insect neuroscience, a scope that to date is not met by any other repository. Second, the fully automatic access granted by our API for both upload and download of data enables users to exploit the database structure to easily integrate their own data (as well as accessible data of other researchers) into automatic data analysis workflows. This automatic interface also enables third party applications to access deposited data and to reuse it for meta-analysis, computational modeling, etc. Thus, deposited data increases the visibility of the data owner via increased chance of reuse. Finally, the visualization tools available for data (especially anatomical data) on the IBdb go beyond what is easily achievable by single users and common free software. Freely combining own (published and unpublished) neuronal morphologies with data from other researchers is currently time consuming and difficult at best. With the provided tools, we have created a tool that significantly aids state of the art data visualization without requiring any previous training.

      We have emphasized these aspects more in the abstract as well as the main paper.

      1) The authors use the term 'cell type' often in the manuscript, and a few times in the user guide. The concept of 'cell type' is an essential one for neuroscience researchers. However, I was unable to find any reference to it on the site, particularly in the information pages for neurons. This led to ambiguity about what data was displayed, what constitutes a cell type and how I would search for it. For example, a pair of neurons that look of the same type are am-AMMC-1 (NIN-0000159) and am-AMMC-1' (NIN-0000171), although they have distinct, yet related, full names. A similar case for am-AMMC-2' (NIN-0000172) and am-AMMC-2 (NIN-0000160) although it looks like this could be in fact the same cell - the neuron image caption of the former notes that it is a mirror image of the actual cell though this information is not given elsewhere. Lastly, the authors mention that they do not require neuron morphologies to be registered to a particular template. This point only makes it more essential for neuron types to be transparently labelled.

      The reviewer addresses a very important point and we have made several additions to the manuscript and the database to address them. We have added a cell type definition in the paper and at several locations on the database site, as well as in the user guide (“All neurons of one brain hemisphere that exhibit identical projection patterns are defined as belonging to the same cell type. In many cases a cell type will consist of one individual neuron, while in other cases, many identical neurons will comprise a cell type. In those cases, the first level of similarity beyond the individual neuron will be defined as the cell type.”). We have added a short section to the results, addressing this issue. We also added a new supplemental figure illustrating our cell type definitions with examples.

      Regarding the specific examples mentioned by the reviewer: The mirrored bee neurons were a leftover from early stages of the database before consistent standards were in place. These datasets have been archived (see novel function below), so that they are no longer searchable, but remain accessible for anyone with the exact handle.

      With respect to cell types, we essentially pursue a connectivity based cell type definition in the long run, i.e. neurons that have the same up- and downstream connections belong to the same cell type (as in Hulse et al., 2020). Yet, while connectivity data will increasingly become available, it does not exist for most species. Therefore, a first approximation is an identity defined by neural projection fields. As two neurons with identical projections can still have different connections, these projection based definitions will likely have to be subdivided into several cell types once connectomics data becomes available. We have additionally added a discussion point to the database forum to enable a public discussion about these definitions, their limits and practicalities.

      Several issues were considered in the process:

      First, neuron types on the right and left hemisphere. While we assume that the insect brain is largely symmetric with respect to the midline, asymmetries are present and one cannot simply assume that all neurons from one side of the brain are completely identical to the other side. To not obstruct discovering potential systematic differences between hemispheres (and for practical data handling issues), we define analogous neuron types from the right and left hemisphere as separate cell types. This is also in line with the fact that they originate from different neuroblasts, hence providing a larger degree of compatibility with lineage based cell type definitions.

      Second, modular brain regions. In regions like the central complex and the optic lobe, highly similar neurons occur in isomorphic sets of repeating individuals. In the case of the central complex columnar cells, we classify those as different cell types (according to column identity), as it has become clear recently that the CX columns are not identical copies of each other, and they additionally derive from different neuroblasts. For the thousands of repeating modules in the optic lobe, this approach is not practical, as it would require that each columnar neuron (e.g. of the medulla) has a firmly assigned ommatidium. Additionally, while not all optic lobe cartridges can be expected to be identical, there will be a limited number of types (e.g. corresponding to ommatidial types). If those can be reliably identified, cell types can be assigned based on this information. Practically, the columnar processing systems in the optic lobe will be collapsed to a single column in the database, or to as many required to capture the ommatidial diversity in a species.

      Third, definitions of cell types across species: We define cell types only within single species. This is because a neuron could have the same function, but different developmental origin, the same origin but different connections, etc. While neurons with identical projection patterns will emerge when comparing across species, the thereby suggested homology will remain hypothetical unless developmental studies are conducted on these cells. This in fact highlights a major value of the insect brain database, i.e. the possibility to identify shared features of neural circuits across species and the development of novel hypotheses based on this information.

      Highly organized neuropils: In neuropils like the mushroom body, antennal lobe, or the central complex, cell types can be embedded in a hierarchy, including supertypes, neuron families, and cell classes. As for species phylogeny, the same terms are likely not always representing the same level in the hierarchy (i.e. a supertype in one brain region might correspond to a class in another), but such terms nevertheless provide helpful guidance for classifying neurons in complex and organized brain regions. For example, in the CX, there are many types of columnar neurons, each defined by their specific projection patterns (e.g. all individuals of the PFNv-R3 type). Yet they can be grouped into several supertypes, defined by the projection patterns without referring to specific columns (e.g. PFNv), into neuron families, defined by the overall projection patterns (e.g. PFN), and into a neuron class (columnar CX neuron; as opposed to tangential neurons, or pontine neurons). We have added the possibility to add these higher level classifiers as tags to cell types, allowing more complex search queries and easier handling of increasing numbers of database entries.

      We hope that these additions clarify what we mean by the term cell type.

      2) I found the curation of data was often incomplete or inconsistent. This might be a consequence of the distributed strategy for curation together with the significant direct input from users. A few illustrative examples: a. Completeness: the curation of neuroanatomical information for neurons is often missing, although there is enough information to do this for 'soma location', 'fiber bundle', 'morphology description'.

      We intentionally kept the mandatory requirements for neuron deposition low, to not increase the burden for data deposition. How rich(how complete) a data set is depends on the user who deposits it. Our philosophy is to provide a framework for data deposition rather than mandating content. Nevertheless, we realize that rudimentary datasets are of limited use to the community and undermine the credibility of the database. We previously only required arborization regions, an image, and the schematic neuron path to be specified before approval (to make the data findable). We have added soma location and morphology description to the mandatory fields to be filled in before approval, so that each entry has to include a text that describes the uniqueness of the cell type. Additionally, and more importantly, we now automatically block approval requests if mandatory fields are not populated/valid. We have made substantial effort to update existing datasets to a comparable standard, and will continue to do so until this standard has been reached for each entry in the database.

      b. Consistency: (1) I found 3 different forms to designate AMMC, used in 'keyword'. Furthermore, I found functional terms used for 'keyword' ('mechanosensory'), not captured by a 'Modality' descriptor. (2) There is significant inconsistency also in the species's pages. Some have descriptions taken from the web, others cite academic literature while others are missing this section entirely. Often, no linked publications are available.

      We thank the reviewers for highlighting these inconsistencies. We have now merged similar keywords into single, consistent keywords. We additionally added a new interface to encourage consistent keyword use moving forward. We agree that functional terms must indeed be reflected in function entries. To avoid these issues in the future, we have developed a checklist for curators and data contributors (found under the help menu on the site) that should be used to ensure a correct submission and approval process. We have explicitly included the point referring to function entries in both the curator and contributor checklists, that need to be followed when submitting or approving a dataset. This issue should thus no longer emerge for future entries and was fixed for existing entries. The issue of misspelled keywords will never be fully avoidable (albeit will occur much more rarely with the new interface), but the database administrators are now aware of the issue and will regularly review and, if needed, clean up the list of used keywords.

      Regarding the species descriptions; we have left this to the species owner and believe that the mentioned variety is not necessarily bad. As the datasets can evolve, revised texts can be added by the species owner. In some cases, original placeholder text has been carried over from early stages of development. These texts have now been replaced by up to date information. We have additionally ensured that publication lists are available on all species pages.

      As a note, many of the mentioned inconsistencies had been carried over from very early stages of the database, before the currently used curation and approval methods were developed. Some of these issues had actually been the incentive to develop these standards. We have eliminated inconsistencies from old datasets as much as possible by both updating and archiving.

      In general, we would like to point out that inconsistencies between neurons of different species often reflect different traditions and conventions between research groups working on those species. We see it as one of the advantages of the multi-species approach of the IBdb that these differences/inconsistencies become obvious. While resolving these issues will often take time and effort (and will involve researchers from all groups involved), the main function of the IBdb in this context is not to impose a strict set of rules, but to provide the framework in which inconsistencies can be exposed and resolved.

      3) The authors present the platform as a data hub for not only neuroanatomical but also functional data. There is of course potential, but currently there is very little functional data on the site. Thus I find the authors' claim on the abstract "by intimately linking data on structure and function" unproven at this point.

      Indeed that is true. To address this issue and to validate that our database can indeed handle functional data beyond a few examples, we have added more data. We have started to move substantial amounts of available data from the locust, Megalopta, the Monarch butterfly, and the remaining species the co-authors are curating, into the database. While this is straightforward for new data, adding functional data from several years back to complement morphologies from publications covering the last two decades, will require some time. This is because raw data has to be located, verified, and uploaded, which becomes increasingly difficult the older the data are (both because of data formats and storage choices). Intriguingly, this highlights a main advantage of using the IBdb as a data management tool. When using the database for managing ongoing research, functional data can be uploaded alongside morphological data, before it is made public, thus keeping data permanently findable and reusable. Generating this possibility was one of the main incentives for creating the database (rather than generating large amounts of content), but we agree with the reviewers that we need to show more proof of concept data that this possibility is indeed functioning satisfactorily. We hope that this was achieved by our additions of both experiment entries (now over 100) and function entries for numerous neuron datasets.

      Reviewer #2:

      Heinze et al. created a public online database for morphology and function of neurons in many insect species (IBdb). This database is a platform for shared and searchable data repository. It can visualize the morphology of multiple identified neurons in the context of neuropils with options to control various visualization features (i.e. color, transparency, etc.). The uniqueness of IBdb is to index brains and neuron data of many insect species, allowing comparative approaches. The structure and functions of the IBdb are uncomplicated and intuitive.

      All these features were described clearly, and the paper can also serve as an instruction of the database. My comments and concerns are therefore mainly about usability and future development.

      1) This comparative database does not include some species, most critically Drosophila melanogaster. This exclusion is a drawback, as searching homologous neurons of the Drosophila neurons in other insects, or vice versa, would be inspiring and promote further comparative approach. As the same neuropil nomenclature was used in the largest and probably most elaborate database with similar functions for the Drosophila brain, VirtualFlyBrain, and IBdb, it would be helpful to implement cross-species neuron search based on arbor areas (as mentioned in Line 508).

      We view the current state of the database as a starting point. The lead author’s group alone has 14 more species located in the private section of the database and other groups are similarly preparing species for publication in the near future (see new supplemental figure for total number of current species). Importantly, the manuscript is intended not primarily to describe the content of the database, but its outline and functional principles, with data integrated into the database to illustrate its usefulness.

      The omission of Drosophila was decided after consulting with the virtual flybrain (VFB) platform hosts, to prevent that we duplicate their efforts and in turn potentially inspire VFB to duplicate the content of our database. However, we completely agree that cross links between the two databases are essential. Thus, we have now integrated Drosophila by cross linking our database to VFB as suggested by both reviewers. With the support and expertise of VFB staff we have mapped all brain regions in the IBDB to their corresponding neuropils in VFB, so that search results in IBDB (graphical search) have a direct corresponding search result in VFB. While searching in the IBdb, an API mediated query is sent to VFB and results are displayed in a newly introduced panel in the bottom half of the screen as a list of single neuron entries. Each list contains information obtained from VFB and is linked to the corresponding entry at VFB. This feature is available only for the graphic brain region search, but includes complex multi-neuropil searches as well. We have added the description of this function into the section of the paper that describes the link between VFB and IBdb.

      2) More comprehensive 'preset' depository of published data would make this database more attractive, as users naturally tend to first go to the largest and most comprehensive one. VFB also made a big success in this respect by actively indexing massive data taken in different labs.

      We agree and have added more content from our own research. We are currently making an effort to scan the literature and contact authors of older papers to encourage the deposition of their data (similar to what NeuroMorpho.org hosts are doing). This naturally takes some time, but we have now mentioned this effort in the manuscript. To facilitate this process and the visibility of neurons in the database that belong to specific publications, we have added a new list item (publication-based datasets for neurons and experiments), for which each entry contains a list of all items that can be linked to a specific publication (automatically generated), thus increasing visibility for the authors of these publications.

      Additionally, to attract large datasets, we have introduced a new type of experiment (interactive experiment) that is aimed at depositing connectomics data from EM studies and similar large datasets that are based on skeleton data combined with neuropil surfaces. The first two datasets already contain more than 1500 neurons (including partial neurons) from the bumblebee central complex. Complete cells from these datasets will be included as cell types in the main part of the database, once a bridging registration is in place.

      3) There is a concern on sustainability, as administration/management (e.g. species, curation, approval) continuously need expertise. It would be powerful to come up with a mechanism to encourage participation of more active users.

      Encourage participation: We are planning to actively advertise the database via presentations at (virtual and in person) conferences at the Arthropod Neuroscience Network and the International Society for Neuroethology, to encourage more users to contribute and identify users interested in active curation. As an incentive for curators, curation will increase visibility of the curator (as each curator is credited on each dataset they approved), which should be particularly attractive for early to mid-stage career researchers. To increase this visibility, we have created an explicit page that highlights the individual curators, their affiliation and their responsibilities. This feature will be expanded to full curator profiles in the near future.

      We see three points related to sustainability that are relevant for the IBdb long term perspective:

      Technical maintenance: This issue is essential to ensure persistence of deposited data. As discussed in the manuscript, we have planned the technical maintenance and the associated costs for the next 10 years and are confident that this plan is sustainable and suited to manage the site, keep existing data accessible and keep the code up to date regarding changing web- standards and data formats. In practical terms, this is facilitated by the fact that the lead author, as well as several co-authors, are using the site as primary data management and deposition tool for all ongoing and future research activities. The mandatory nature of data management and public deposition and the lack of suitable alternatives will ensure that third party funding dedicated to this purpose will continuously be available for technical maintenance, server fees etc.

      Oversight to ensure quality of content: This issue requires continuous expertise, relevant for ensuring that new datasets meet all required standards and that old datasets are kept up to date, if new information becomes available. We believe that, at the level of cell types and experiments, the strongest incentive for keeping data up to date lies with the data owners, who also possess the highest expertise for these data. Overall, updating data is optional, and even without updates, any data deposited will adhere to the standards initially applied upon approval. The species curators, in coordination with the scientific administrator, have an oversight function to reinforce these standards (see point 1 above). As curators actively perform research on the species they curate, there is substantial self-interest to maintain high standards to facilitate their research. On the level of species, the scientific administrator has the main responsibility. Given the limited number of species that will be included (realistically not more than several hundred for the near future), the associated workload is limited and manageable by a single person. Additionally, we are actively pursuing dedicated funding for one to two full time assistant positions for database curation. These positions will be devoted to performing routine maintenance, assisting new and existing users with data upload, identifying and resolving issues with existing datasets by regularly checking all entries, as well as actively attracting new users by identifying research papers with data appropriate for deposition in the IBdb. These assistants will also have the role to identify issues that regularly occur on the user end, with the aim of developing solutions to help streamlining and improving the site.

      Finally, continued relevance of the IBdb: The incentives for users to deposit data are given by the visualization possibilities and exposure of the data. Additionally, the possibility to programmatically access own and other researchers’ data via API allows to integrate this data into automatic data analysis workflows, allow effective data management and increases the chance of data reuse by third party applications (e.g. computational modelling apps). These provide a strong, bottom up mechanism to ensure that data volume will increase over time. With sufficient data deposited, new users will more easily be attracted, continuously increasing the relevance of the database. Users without associations with the authors have already begun to deposit data on the site, so we believe that the critical mass of data has been reached to keep the site relevant and in use long term. We will facilitate this development by actively approaching authors of publications that produce data suited for deposition in the IBdb. To illustrate the momentum of the database usage and the steady increase in registered users, we have added a supplemental figure displaying the change of usage over time, providing a basis for extrapolation.

      4) This database doesn't seem to require registration of neurons to a standard brain of the species (Line 502). It is unclear how one can make visualization as in Fig. 4 without registration. It would be helpful to detail what one can/cannot do depending on the data type.

      We have now emphasized in the results section (data contribution) that no shared 3D view can be carried out with neurons that are not registered into a common standard. While we encourage registration for more advanced visualizations, we have deliberately not made registration a requirement in order to allow easier deposition of older data and data from species without existing reference atlas. This limit is now also highlighted when uploading a neuron reconstruction that is not registered. This information was also highlighted in the user guide.

    1. Author Response:

      Reviewer #2:

      Some aspects of the analysis and interpretation of the fluorescence result require clarification:

      1) The propagating patterns in the kymographs of mCFP and mCit (Fig 5BCFG) are puzzling. The authors contributed it to inherent locomotion artifacts, noise and internal sarcomere rearrangement during motion. While some of these may be true, could these be image processing artifacts? The authors stated in the method section that the fluorescence intensity at a particular body segment is obtained by drawing a perpendicular line to the midline. The pixels that it intersects will provide the fluorescence intensity. This approach does not seem to account for the fluorophore density change due to tissue compaction and expansion, resulting in overcounting intensity in the inner circle and undercounting at the outer circle - similar to the observed intensity patterns Fig 5BCFG. Perhaps it would have been helpful if the intensities were normalized by the arclength at the different radii from the center of the curvature.

      We agree that the pure fluorescent intensity changes could in part be due to fluorophores being compacted because tissues are being compacted, as in Fig 5BCFG. However, ratiometric imaging (Fig 5DH) takes this into account since the compacting happens to both the mCFP and the mCitrine signals. This is the main reason that ratiometric imaging is more advantageous. Fig 5DH do show differences between the two strains. On the bottom of page 11 and beginning of page 12, we discuss the ratiometric nature of FRET measurement, and on the second paragraph on page 12, we specifically discuss the differences in the dynamics of the FRET-contraction- relaxation cycle between the strains. We believe both of these points address the concern on optical/image processing artifacts.

      We also note that normalization of either the mCitrine or the mCFP signals are not possible because both of these signals theoretically could change, both due to FRET and due to motion artifacts. In other words, we do not expect either signal to remain constant while the worm is moving around, so it is not possible to normalize.

      2) As related to the previous comment, but more generally, image analysis is a critical and sensitive step towards the interpretation of the fluorescence results. The authors would need to elaborate if and how errors in the image processing might contribute to the emergence of correlation between FRET and curvature. For instance, the CFP and mCit expression levels vary significantly along the body of the worm (Fig 5) and should be time-invariant. If an error in image processing picks up nearby spatial variations as the worm moves, the detected fluorescence will become time-variant and correlate with the worm's motion. It is unclear whether this could this happen with the current algorithm. This is a crucial assessment as it is crucial to ensure the observed small FRET changes (+/- 0.015) are due to molecular stretching and not artifacts of image processing.

      We agree that the expression level of the fluorophore proteins is not uniform along the body positions and also varies from animal to animal, but we would like to clarify that the FRET measurement is self-referenced. More importantly we are not simply looking at the FRET signal strength, but in the strength of the correlation between FRET and curvature in each position locally, which is not intensity-dependent. Furthermore, in the analyses we have performed, we essentially have sampled tens of thousands of points, so errors from image processing would be averaged out. One potential issue the reviewer alluded to is motion-artifact; this is very important, as we have shown in the control strain. Indeed there is motion-induced artifact (e.g. light scattered from the groove in the agar the animal is making). This is the reason we do not just take correlations at face value; both experimental and control groups show correlation, but our data show that the experimental group shows stronger correlation. Our conclusion is thus based on comparisons with controls and takes into account potential sources of error.

      3) The shape and meaning of FRET change in the contraction-relaxation cycles (Fig 7) would require further interpretation. The data shows that the extrema and phase of the FRET signal correlate to curvature, and thereby, sarcomere stretching. It is unclear whether it is valid to assume the stretching or relaxing of sarcomeres apply tension directly over each twitchin. Is the binding-unbinding transition of NL to TwcK two-state? If so, would this lead to two-state behaviour in the observed FRET?

      The force-induced unfolding of TwcK at the molecular level has been studied using AFM (Greene et al, 2008, Biophys J, 95:1360-1370) as well as computationally, applying steered molecular dynamics simulations (von Castelmur et al, 2012, PNAS, 109: 13608-13). Neither AFM data nor simulations revealed defined unfolding features that could be attributed to the NL sequence under those experimental conditions. Thus, we must apply the simplified assumption that the unfolding/refolding of the NL is a one-step process, leading to two states: (1) folded NL- Kin assembly and (2) stretched NL plus kinase. It is very possible that mechanical intermediates of unfolding exist, but these remain unknown and undetected to this date. Equally, it seems rather plausible that unfolding occurs somewhat asynchronously in the sarcomere, where individual molecules might be at different stages of unfolding at the molecular level. In this regard, it is important to notice that - contrary to single molecule methodologies- the FRET signal in this study is an “average” value over a huge number of individual molecules. Thus, the highly averaged nature of the signal does not permit revealing or interpreting fine detail in the folding/unfolding phases. In summary, the "two states" approximation seems to suffice and to be consistent with the fundamental sine wave character of the change in FRET signal.

      4) The reason behind the small observed FRET change (+/-0.015) requires further clarification. Is it because (1) all FRET sensors changed slightly, or (2) a small fraction of FRET sensors changed from high to low FRET.

      We thank the referee for calling our attention to the need of clarifying this point to the reader. The FRET method applied in this study yields an “average” signal, to which a large number of individual molecules contribute. Because of its “average” nature, the signal cannot be attributed to individual changes in individual molecules. Therefore, the two scenarios described by the reviewer are not resolvable in this case. In other words, asynchronous unfolding cannot be studied in this way. To clarify this aspect of the work to the reader, we have added a statement in the Discussion section.

      5) The manuscript provides strong evidence of FRET correlating to curvature during the muscle contraction cycle. However, the causality is less clear. It is unclear whether the contraction force causes the FRET change, or can curvature without any active contraction cause FRET change. For instance, it is unclear whether, if the worm were dead or myosin activity inhibited, the bending of the worm would cause FRET change.

      This is a very interesting question from the reviewer that is well worthy of future investigation. As mentioned in the response to reviewer 1’s comment, we have tried “bending” worms and “freezing” worms in different postures, but that experimentation did not yield detectable or interpretable FRET changes. It is, however, the case that our experimentation was neither directed nor technically designed to establish the mechanistic source of the molecular conformational change in this kinase. The work - on first principle - was directed to reveal whether the hypothesized molecular changes occur in the working muscle context in vivo and, therefore, to test the physiological validity of the partial-unfolding mechanism of kinase regulation. We agree that clarifying the link between such changes and the active/passive mechanics of the sarcomere is a much desirable future pursuit.

    1. Author Response:

      Reviewer #1:

      1) The user manual and tutorial are well documented, although the actual code could do with more explicit documentation and comments throughout. The overall organisation of the code is also a bit messy.

      We have now implemented an ongoing, automated code review via Codacy (https://app.codacy.com/gh/caseypaquola/BigBrainWarp/dashboard). The grade is published as a badge on GitHub. We improved the quality of the code to an A grade by increasing comments and fixing code style issues. Additionally, we standardised the nomenclature throughout the toolbox to improve consistency across scripts and we restructured the bigbrainwarp function.

      2) My understanding is that this toolbox can take maps from BigBrain to MRI space and vice versa, but the maps that go in the direction BigBrain->MRI seem to be confined to those provided in the toolbox (essentially the density profiles). What if someone wants to do some different analysis on the BigBrain data (e.g. looking at cellular morphology) and wants that mapped onto MRI spaces? Does this tool allow for analyses that involve the raw BigBrain data? If so, then at what resolution and with what scripts? I think this tool will have much more impact if that was possible. Currently, it looks as though the 3 tutorial examples are basically the only thing that can be done (although I may be lacking imagination here).

      The bigbrainwarp function allows input of raw BigBrain data in volume and surface forms. For volumetric inputs, the image must be aligned to the full BigBrain or BigBrainSym volume, but the function is agnostic to the input voxel resolution. We have also added an option for the user to specify the output voxel resolution. For example,

      bigbrainwarp --in_space bigbrain --in_vol cellular_morphology_in_bigbrain.nii \ --interp linear --out_space icbm --out_res 0.5 \ --desc cellular_morphology --wd working_directory

      where “cellular_morphology_in_bigbrain.nii” was generated from a BigBrain volume (see Table 2 below for all parameters). The BigBrain volume may be the 100-1000um resolution images provided on the ftp or a resampled version of these images, as long as the full field of view is maintained. For surface-based inputs, the data must contain a value for each vertex of the BigBrain/BigBrainSym mesh. We have clarified these points in the Methods, illustrated the potential transformations in an extended Figure 3 and highlighted the distinctiveness of the tutorial transformations in the Results.

      3) An obvious caveat to bigbrain is that it is a single brain and we know there are sometimes substantial individual variations in e.g. areal definition. This is only slightly touched upon in the discussion. Might be worth commenting on this more. As I see it, there are multiple considerations. For example (i) Surface-to-Surface registration in the presence of morphological idiosyncracies: what parts of the brain can we "trust" and what parts are uncertain? (ii) MRI parcellations mapped onto BigBrain will vary in how accurately they may reflect the BigBrain areal boundaries: if histo boundaries do not correspond with MRI-derived ones, is that because BigBrain is slightly different or is it a genuine divergence between modalities? Of course addressing these questions is out of scope of this manuscript, but some discussion could be useful; I also think this toolbox may be useful for addressing this very concerns!

      We agree that these are important questions and hope that BigBrainWarp will propel further research. Here, we consider these questions from two perspectives; the accuracy of the transformations and the potential influence of individual variation. For the former, we conducted a quantitative analysis on the accuracy of transformations used in BigBrainWarp (new Figure 2). We provide a function (evaluate_warp.sh) for BigBrainWarp users to assess accuracy of novel deformation fields and encourage detailed inspection of accuracy estimates and deformation effects for region of interest studies. For the latter, we expanded our Discussion of previous research on inter-individual variability and comment on the potential implications of unquantified inter-individual variability for the interpretation of BigBrain-MRI comparisons.

      Methods (P.7-8):

      “A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, Dice coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      Figure 2: Evaluating BigBrain-MRI transformations. A) Volume-based transformations i. Jacobian determinant of deformation field shown with a sagittal slice and stratified by lobe. Subcortical+ includes the shape priors (as described in Methods) and the + connotes hippocampus, which is allocortical. Lobe labels were defined based on assignment of CerebrA atlas labels (Manera et al., 2020) to each lobe. ii. Sagittal slices illustrate the overlap of native ICBM2009b and transformed subcortical+ labels. iii. Superior view of anatomical fiducials (Lau et al., 2019). iv. Violin plots show the DICE coefficient of regional overlap (ii) and landmark misregistration (iii) for the BigBrainSym and Xiao et al., approaches. Higher DICE coefficients shown improved registration of subcortical+ regions with Xiao et al., while distributions of landmark misregistration indicate similar performance for alignment of anatomical fiducials. B) Surface-based transformations. i. Inflated BigBrain surface projections and ridgeplots illustrate regional variation in the distortions of the mesh invoked by the modified MSMsulc+curv pipeline. ii. Eighteen anatomical landmarks shown on the inflated BigBrain surface (above) and inflated fsaverage (below). BigBrain landmarks were transformed to fsaverage using the modified MSMsulc+curv pipeline. Accuracy of the transformation was calculated on fsaverage as the geodesic distance between landmarks transformed from BigBrain and the native fsaverage landmarks. iii. Sulcal depth and curvature maps are shown on inflated BigBrain surface. Violin plots show the improved accuracy of the transformation using the modified MSMsulc+curv pipeline, compared to a standard MSMsulc approach.

      Discussion (P.18):

      “Cortical folding is variably associated with cytoarchitecture, however. The correspondence of morphology with cytoarchitectonic boundaries is stronger in primary sensory than association cortex (Fischl et al., 2008; Rajkowska and Goldman-Rakic, 1995a, 1995b). Incorporating more anatomical information in the alignment algorithm, such as intracortical myelin or connectivity, may benefit registration, as has been shown in neuroimaging (Orasanu et al., 2016; Robinson et al., 2018; Tardif et al., 2015). Overall, evaluating the accuracy of volume- and surface-based transformations is important for selecting the optimal procedure given a specific research question and to gauge the degree of uncertainty in a registration.”

      Discussion (P.19):

      “Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large-scale cytoarchitectural patterns are conserved across individuals, the position of areal boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can affect interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject- specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter-subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Reviewer #2:

      This is a nice paper presenting a review of recent developments and research resulting from BigBrain and a tutorial guiding use of the BigBrainWarp toolbox. This toolbox supports registration to, and from, standard MRI volumetric and surface templates, together with mapping derived features between spaces. Examples include projecting histological gradients estimated from BigBrain onto fsaverage (and the ICMB2009 atlas) and projecting Yeo functional parcels onto the BigBrain atlas.

      The key strength of this paper is that it supports and expands on a comprehensive tutorial and docker support available from the website. The tutorials there go into even more detail (with accompanying bash scripts) of how to run the full pipelines detailed in the paper. The docker makes the tool very easy to install but I was also able to install from source. The tutorials are diverse examples of broad possible applications; as such the combined resource has the potential to be highly impactful.

      The minor weaknesses of the paper relate to its clarity and depth. Firstly, I found the motivations of the paper initially unclear from the abstract. I would recommend much more clearly stating that this is a review paper of recent research developments resulting from the BigBrain atlas, and a tutorial to accompany the bash scripts which apply the warps between spaces. The registration methodology is explained elsewhere.

      In the revised Abstract (P.1), we emphasise that the manuscript involves a review of recent literature, the introduction of BigBrainWarp, and easy-to-follow tutorials to demonstrate its utility.

      “Neuroimaging stands to benefit from emerging ultrahigh-resolution 3D histological atlases of the human brain; the first of which is “BigBrain”. Here, we review recent methodological advances for the integration of BigBrain with multi-modal neuroimaging and introduce a toolbox, “BigBrainWarp", that combines these developments. The aim of BigBrainWarp is to simplify workflows and support the adoption of best practices. This is accomplished with a simple wrapper function that allows users to easily map data between BigBrain and standard MRI spaces. The function automatically pulls specialised transformation procedures, based on ongoing research from a wide collaborative network of researchers. Additionally, the toolbox improves accessibility of histological information through dissemination of ready-to-use cytoarchitectural features. Finally, we demonstrate the utility of BigBrainWarp with three tutorials and discuss the potential of the toolbox to support multi-scale investigations of brain organisation.”

      I also found parts of the paper difficult to follow - as a methodologist without comprehensive neuroanatomical terminology, I would recommend the review of past work to be written in a more 'lay' way. In many cases, the figure captions also seemed insufficient at first. For example it was not immediately obvious to me what is meant by 'mesiotemporal confluence' and Fig 1G is not referenced specifically in the text. In Fig 3C it is not immediately clear from the text of the caption that the cortical image is representing the correlation from the plots - specifically since functional connectivity is itself estimated through correlation.

      In the updated manuscript, we have tried to remove neuroanatomical jargon and clearly define uncommon terms at the first instance in text. For example,

      “Evidence has been provided that cortical organisation goes beyond a segregation into areas. For example, large- scale gradients that span areas and cytoarchitectonic heterogeneity within a cortical area have been reported (Amunts and Zilles, 2015; Goulas et al., 2018; Wang, 2020). Such progress became feasible through integration of classical techniques with computational methods, supporting more observer-independent evaluation of architectonic principles (Amunts et al., 2020; Paquola et al., 2019; Schiffer et al., 2020; Spitzer et al., 2018). This paves the way for novel investigations of the cellular landscape of the brain.”

      “Using the proximal-distal axis of the hippocampus, we were able to bridge the isocortical and hippocampal surface models recapitulating the smooth confluence of cortical types in the mesiotemporal lobe, i.e. the mesiotemporal confluence (Figure 1G).”

      “Here, we illustrate how we can track resting-state functional connectivity changes along the latero-medial axis of the mesiotemporal lobe, from parahippocampal isocortex towards hippocampal allocortex, hereafter referred to as the iso-to-allocortical axis.”

      Additionally, we have expanded the captions for clarity. For example, Figure 3:

      “C) Intrinsic functional connectivity was calculated between each voxel of the iso-to-allocortical axis and 1000 isocortical parcels. For each parcel, we calculated the product-moment correlation (r) of rsFC strength with iso-to- allocortical axis position. Thus, positive values (red) indicate that rsFC of that isocortical parcel with the mesiotemporal lobe increases along the iso-to-allocortex axis, whereas negative values (blue) indicate decrease in rsFC along the iso-to-allocortex axis.”

      My minor concern is over the lack of details in relation to the registration pipelines. I understand these are either covered in previous papers or are probably destined for bespoke publications (in the case of the surface registration approach) but these details are important for readers to understand the constraints and limitations of the software. At this time, the details for the surface registration only relate to an OHBM poster and not a publication, which I was unable to find online until I went through the tutorial on the BigBrain website. In general I think a paper should have enough information on key techniques to stand alone without having to reference other publications, so, in my opinion, a high level review of these pipelines should be added here.

      There isn't enough details on the registration. For the surface, what features were used to drive alignment, how was it parameterised (in particular the regularisation - strain, pairwise or areal), how was it pre-processed prior to running MSM - all these details seem to be in the excellent poster. I appreciate that work deserves a stand alone publication but some details are required here for users to understand the challenges, constraints and limitations of the alignment. Similar high level details should be given for the registration work.

      We expanded descriptions of the registration strategies behind BigBrainWarp, especially so for the surface-based registration. Additionally, we created a new Figure to illustrate how the accuracy of the transformations may be evaluated.

      Methods (P.7-8):

      “For the initial BigBrain release (Amunts et al., 2013), full BigBrain volumes were resampled to ICBM2009sym (a symmetric MNI152 template) and MNI-ADNI (an older adult T1-weighted template) (Fonov et al., 2011). Registration of BigBrain to ICBM2009sym, known as BigBrainSym, involved a linear then a nonlinear transformation (available on ftp://bigbrain.loris.ca/BigBrainRelease.2015/). The nonlinear transformation was defined by a symmetric diffeomorphic optimiser [SyN algorithm, (Avants et al., 2008)] that maximised the cross- correlation of the BigBrain volume with inverted intensities and a population-averaged T1-weighted map in ICBM2009sym space. The Jacobian determinant of the deformation field illustrates the degree and direction of distortions on the BigBrain volume (Figure 2Ai top).

      A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, DICE coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      (SEE FIGURE 2 in Response to Reviewer #1)

      I would also recommend more guidance in terms of limitations relating to inter-subject variation. My interpretation of the results of tutorial 3, is that topographic variation of the cortex could easily be driving the greater variation of the frontal parietal networks. Either that, or the Yeo parcel has insufficient granularity; however, in that case any attempt to go to finer MRI driven parcellations - for example to the HCP parcellation, would create its own problems due to subject specific variability.

      We agree that inter-individual variation may contribute to the low predictive accuracy of functional communities by cytoarchitecture. We expanded upon this possibility in the revised Discussion (P. 19) and recommend that future studies examine the uncertainty of subject-specific topographies in concert with uncertainties of transformations.

      “These features depict the vast cytoarchitectural heterogeneity of the cortex and enable evaluation of homogeneity within imaging-based parcellations, for example macroscale functional communities (Yeo et al., 2011). The present analysis showed limited predictability of functional communities by cytoarchitectural profiles, even when accounting for uncertainty at the boundaries (Gordon et al., 2016). [...] Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large- scale cytoarchitectural patterns are conserved across individuals, the position of boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can affect interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject-specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter-subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Reviewer #3:

      The authors make a point for the importance of considering high-resolution, cell-scale, histological knowledge for the analysis and interpretation of low-resolution MRI data. The manuscript describes the aims and relevance of the BigBrain project. The BigBrain is the whole brain of a single individual, sliced at 20µ and scanned at 1µ resolution. During the last years, a sustained work by the BigBrain team has led to the creation of a precise cell-scale, 3D reconstruction of this brain, together with manual and automatic segmentations of different structures. The manuscript introduces a new tool - BigBrainWarp - which consolidates several of the tools used to analyse BigBrain into a single, easy to use and well documented tool. This tool should make it easy for any researcher to use the wealth of information available in the BigBrain for the annotation of their own neuroimaging data. The authors provide three examples of utilisation of BigBrainWarp, and show the way in which this can provide additional insight for analysing and understanding neuroimaging data. The BigBrainWarp tool should have an important impact for neuroimaging research, helping bridge the multi-scale resolution gap, and providing a way for neuroimaging researchers to include cell-scale phenomena in their study of brain data. All data and code are available open source, open access.

      Main concern:

      One of the longstanding debates in the neuroimaging community concerns the relationship between brain geometry (in particular gyro/sulcal anatomy) and the cytoarchitectonic, connective and functional organisation of the brain. There are various examples of correspondance, but also many analyses showing its absence, particularly in associative cortex (for example, Fischl et al (2008) by some of the co-authors of the present manuscript). The manuscript emphasises the accuracy of their transformations to the different atlas spaces, which may give some readers a false impression. True: towards the end of the manuscript the authors briefly indicate the difficulty of having a single brain as source of histological data. I think, however, that the manuscript would benefit from making this point more clearly, providing the future users of BigBrainWarp with some conceptual elements and references that may help them properly apprise their results. In particular, it would be helpful to briefly describe which aspects of brain organisation where used to lead the deformation to the different templates, if they were only based on external anatomy, or if they took into account some other aspects such as myelination, thickness, …

      We agree with the Reviewer that the accuracy of the transformation and the potential influence of inter-individual variability should be carefully considered in BigBrain-MRI studies. To highlight these issues in the updated manuscript, we first conducted a quantitative analysis on the accuracy of transformations used in BigBrainWarp (new Figure 2). We provide a function (evaluate_warp.sh) for users to assess accuracy of novel deformation fields and encourage detailed inspection of accuracy estimates and deformation effects for region of interest studies. Second, we expanded our discussion of previous research on inter-individual variability and comment on the potential implications of unquantified inter-individual variability for the interpretation of BigBrain-MRI comparisons.

      Methods (P.7-8):

      “A prior study (Xiao et al., 2019) was able to further improve the accuracy of the transformation for subcortical structures and the hippocampus using a two-stage multi-contrast registration. The first stage involved nonlinear registration of BigBrainSym to a PD25 T1-T2 fusion atlas (Xiao et al., 2017, 2015), using manual segmentations of the basal ganglia, red nucleus, thalamus, amygdala, and hippocampus as additional shape priors. Notably, the PD25 T1-T2 fusion contrast is more similar to the BigBrainSym intensity contrast than a T1-weighted image. The second stage involved nonlinear registration of PD25 to ICBM2009sym and ICBM2009asym using multiple contrasts. The deformation fields were made available on Open Science Framework (https://osf.io/xkqb3/). The accuracy of the transformations was evaluated relative to overlap of region labels and alignment of anatomical fiducials (Lau et al., 2019). The two-stage procedure resulted in 0.86-0.97 Dice coefficients for region labels, improving upon direct overlap of BigBrainSym with ICBM2009sym (0.55-0.91 Dice) (Figure 2Aii, 2Aiv top). Transformed anatomical fiducials exhibited 1.77±1.25mm errors, on par with direct overlap of BigBrainSym with ICBM2009sym (1.83±1.47mm) (Figure 2Aiii, 2Aiv below). The maximum misregistration distance (BigBrainSym=6.36mm, Xiao=5.29mm) provides an approximation of the degree of uncertainty in the transformation. In line with this work, BigBrainWarp enables evaluation of novel deformation fields using anatomical fiducials and region labels (evaluate_warps.sh). The script accepts a nonlinear transformation file for registration of BigBrainSym to ICBM2009sym, or vice versa, and returns the Jacobian map, Dice coefficients for labelled regions and landmark misregistration distances for the anatomical fiducials.

      The unique morphology of BigBrain also presents challenges for surface-based transformations. Idiosyncratic gyrification of certain regions of BigBrain, especially the anterior cingulate, cause misregistration (Lewis et al., 2020). Additionally, the areal midline representation of BigBrain, following inflation to a sphere, is disproportionately smaller than standard surface templates, which is related to differences in surface area, in hemisphere separation methods, and in tessellation methods. To overcome these issues, ongoing work (Lewis et al., 2020) combines a specialised BigBrain surface mesh with multimodal surface matching [MSM; (Robinson et al., 2018, 2014)] to co-register BigBrain to standard surface templates. In the first step, the BigBrain surface meshes were re-tessellated as unstructured meshes with variable vertex density (Möbius and Kobbelt, 2010) to be more compatible with FreeSurfer generated meshes. Then, coarse-to-fine MSM registration was applied in three stages. An affine rotation was applied to the BigBrain sphere, with an additional “nudge” based on an anterior cingulate landmark. Next, nonlinear/discrete alignment using sulcal depth maps (emphasising global scale, Figure 2Biii), followed by nonlinear/discrete alignment using curvature maps (emphasising finer detail, Figure 2Biii). The higher- order MSM procedure that was implemented for BigBrain maximises concordance of these features while minimising surface deformations in a physically plausible manner, accounting for size and shape distortions (Figure 2Bi) (Knutsen et al., 2010; Robinson et al., 2018). This modified MSMsulc+curv pipeline improves the accuracy of transformed cortical maps (4.38±3.25mm), compared to a standard MSMsulc approach (8.02±7.53mm) (Figure 2Bii-iii) (Lewis et al., 2020).”

      (SEE Figure 2 in response to previous reviewers)

      Discussion (P.18, 19):

      “Cortical folding is variably associated with cytoarchitecture, however. The correspondence of morphology with cytoarchitectonic boundaries is stronger in primary sensory than association cortex (Fischl et al., 2008; Rajkowska and Goldman-Rakic, 1995a, 1995b). Incorporating more anatomical information in the alignment algorithm, such as intracortical myelin or connectivity, may benefit registration, as has been shown in neuroimaging (Orasanu et al., 2016; Robinson et al., 2018; Tardif et al., 2015). Overall, evaluating the accuracy of volume- and surface-based transformations is important for selecting the optimal procedure given a specific research question and to gauge the degree of uncertainty in a registration.”

      “Despite all its promises, the singular nature of BigBrain currently prohibits replication and does not capture important inter-individual variation. While large-scale cytoarchitectural patterns are conserved across individuals, the position of boundaries relative to sulci vary, especially in association cortex (Amunts et al., 2020; Fischl et al., 2008; Zilles and Amunts, 2013) . This can have implications on interpretation of BigBrain-MRI comparisons. For instance, in tutorial 3, low predictive accuracy of functional communities by cytoarchitecture may be attributable to the subject-specific topographies, which are well established in functional imaging (Benkarim et al., 2020; Braga and Buckner, 2017; Gordon et al., 2017; Kong et al., 2019). Future studies should consider the influence of inter- subject variability in concert with the precision of transformations, as these two elements of uncertainty can impact our interpretations, especially at higher granularity.”

      Minor:

      1) In the abstract and later in p9 the authors talk about "state-of-the-art" non-linear deformation matrices. This may be confusing for some readers. To me, in brain imaging a matrix is most often a 4x4 affine matrix describing a linear transformation. However, the authors seem to be describing a more complex, non-linear deformation field. Whereas building a deformation matrix (4x4 affine) is not a big challenge, I agree that more sophisticated tools should provide more sophisticated deformation fields. The authors may consider using "deformation field" instead of "deformation matrix", but I leave that to their judgment.

      As suggested, we changed the text to “deformation field” where relevant.

      2) In the results section, p11, the authors highlight the challenge of segmenting thalamic nuclei or different hippocampal regions, and suggest that this should be simplified by the use of the histological BigBrain data. However, the atlases currently provided in the OSF project do not include these more refined parcellation: there's one single "Thalamus" label, and one single "Hippocampus" label (not really single: left and right). This could be explicitly stated to prevent readers from having too high expectations (although I am certain that those finer parcellations should come in the very close future).

      We updated the text to reflect the current state of such parcellations. While subthalamic nuclei are not yet segmented (to our knowledge), one of the present authors has segmented hippocampal subfields (https://osf.io/bqus3/) and we highlight this in the Results (P.11-12):

      “Despite MRI acquisitions at high and ultra-high fields reaching submillimeter resolutions with ongoing technical advances, certain brain structures and subregions remain difficult to identify (Kulaga-Yoskovitz et al., 2015; Wisse et al., 2017; Yushkevich et al., 2015). For example, there are challenges in reliably defining the subthalamic nucleus (not yet released for BigBrain) or hippocampal Cornu Ammonis subfields [manual segmentation available on BigBrain, https://osf.io/bqus3/, (DeKraker et al., 2019)]. BigBrain-defined labels can be transformed to a standard imaging space for further investigation. Thus, this approach can support exploration of the functional architecture of histologically-defined regions of interest.”

    1. Author Response:

      Reviewer #1 (Public Review):

      Overall, the authors have done a nice job covering the relevant literature, presenting a story out of complicated data, and performing many thoughtful analyses.

      However, I believe the paper requires quite major revisions.

      We thank the reviewer for their encouraging assessment of our manuscript. We are grateful for their valuable and especially detailed feedback that helped us to substantially improve our manuscript.

      Major issues:

      I do not believe the current results present a clear, comprehensible story about sleep and motor memory consolidation. As presented, sleep predicts an increase in the subsequent learning curve, but there is a negative relationship between learning curve and task proficiency change (which is, as far as I can tell, similar to "memory retention"). This makes it seem as if sleep predicts more forgetting on initial trials within the subsequent block (or worse memory retention) - is this true? Regardless of whether it is statistically true, there appears another story in these data that is being sacrificed to fit a story about sleep. To my eye, the results may first and foremost tell a circadian (rather than sleep) story. Examining the data in Figure 2A and 2B, it appears that every AM learning period has a higher learning curve (slope) than every PM period. While this could, of course, be due to having just slept, the main story gleaned from such a result is not a sleep effect on retention, which has been the emphasis on motor memory consolidation research in the last couple of decades, but on new learning. The fact that this effect appears present in the first session (juggling blocks 1-3 in adolescents and blocks 1-5 in adults) makes this seem the more likely story here, since it has less to do with "preparing one to re-learn" and more to do with just learning and when that learning is optimal. But even if it does not reach statistical significance in the first session alone, it remains a concern and, in my opinion, should be considered a focus in the manuscript unless the authors can devise a reason to definitively rule it out.

      Here is how I recommend the authors proceed on this point: include all sessions from all subjects into a mixed effect model, predicting the slope of the learning curve with time of day and age group as fixed effects and subjects as random effects:

      learning curve slope ~ AM/PM [AM (0) or PM (1)] + age [adolescent (0) or adult (1)] + (1|subject)

      …or something similar with other regressors of interest. If this is significant for AM/PM status, they should re-try the analysis using only the first session. If this is significant, then a sleep-centric story cannot be defended here at all, in my opinion. If it is not (which could simply result from low power, but the authors could decide this), the authors should decide if they think they can rule out circadian effects and proceed accordingly. I should note that, while to many, a sleep story would be more interesting or compelling, that is not my opinion, and I would not solely opt to reject this paper if it centered a time-of-day story instead.

      The authors need to work out precisely what is happening in the behavior here, and let the physiology follow that story. They should allow themselves to consider very major revisions (and drop the physiology) if that is most consistent with the data. As presented, I am very unclear of what to take away from the study.

      We thank the reviewer for the opportunity to further elaborate on our behavioral results. We agree that the interpretation of the behavior in the complex gross-motor task is not straight forward, which might be partly due to less controllability compared to for example finger-tapping tasks. The reviewer is correct that, initially sleep seems to predict more forgetting on initial trials within the subsequent block given the dip in task proficiency and a resulting increase in steepness of the learning curve after the sleep retention interval. Notably, this dip in performance after sleep has also been reported for finger-tapping tasks (cf. Eichenlaub et al, 2020). The performance dip is also present in the wake first group (Figure 2) after the first interval. This observation suggests that picking up the task again after a period of time comes at a cost. Interestingly, this performance dip is no longer present after the second retention interval indicating that the better the task proficiency the easier it is to pick up juggling again. In other words, juggling has been better consolidated after additional training. Critically, our results show, that participants with higher SO-spindle coupling strength have a lower dip in performance after the retention interval, thus indicating a learning advantage.

      Figure 2

      (A) Number of successful three-ball cascades (mean ± standard error of the mean [SEM]) of adolescents (circles) for the sleep-first (blue) and wake-first group (green) per juggling block. Grand average learning curve (black lines) as computed in (C) are superimposed. Dashed lines indicate the timing of the respective retention intervals that separate the three performance tests. Note that adolescents improve their juggling performance across the blocks. (B) Same conventions as in (A) but for adults (diamonds). Similar to adolescents, adults improve their juggling performance across the blocks regardless of group.

      We discuss the sleep effect on juggling in the discussion section (page 22 – 23, lines 502 – 514):

      "How relevant is sleep for real-life gross-motor memory consolidation? We found that sleep impacts the learning curve but did not affect task proficiency in comparison to a wake retention interval (Figure 2DE). Two accounts might explain the absence of a sleep effect on task proficiency. (1) Sleep rather stabilizes than improves gross-motor memory, which is in line with previous gross-motor adaption studies (Bothe et al, 2019; Bothe et al, 2020). (2) Pre-sleep performance is critical for sleep to improve motor skills (Wilhelm et al, 2012). Participants commonly reach asymptotic pre-sleep performance levels in finger tapping tasks, which is most frequently used to probe sleep effects on motor memory. Here we found that using a complex juggling task, participants do not reach asymptotic ceiling performance levels in such a short time. Indeed, the learning progression for the sleep-first and wake-first groups followed a similar trend (Figure 2AB), suggesting that more training and not in particular sleep drove performance gains."

      If indeed the authors keep the sleep aspect of this story, here are some comments regarding the physiology. The authors present several nice analyses in Figure 3. However, given the lack of behavioral difference between adolescents and adults (Fig 2D), they combine the groups when investigating behavior-physiology relationships. In some ways, then, Figure 3 has extraneous details to the point of motor learning and retention, and I believe the paper would benefit from more focus. If the authors keep their sleep story, I believe Figure 3 and 4 should be combined and some current figure panels in Figure 3 should be removed or moved to the supplementary information.

      We thank the reviewers for their suggestion and we agree that the figures of our manuscript would benefit from more focus. Therefore, we combined Figure 3 and 4 from the original manuscript into a revised Figure 3 in the updated version of the manuscript. In more detail, subpanels that explain our methodological approach can now be found in Figure 3 – figure supplement 1, while the updated Figure 3 now focuses on developmental changes in oscillatory dynamics and SO-spindle coupling strength as well as their relationship to gross-motor learning.

      Updated Figure 3:

      (A) Left: topographical distribution of the 1/f corrected SO and spindle amplitude as extracted from the oscillatory residual (Figure 3 – figure supplement 1A, right). Note that adolescents and adults both display the expected topographical distribution of more pronounced frontal SO and centro-parietal spindles. Right: single subject data of the oscillatory residual for all subjects with sleep data color coded by age (darker colors indicate older subjects). SO and spindle frequency ranges are indicated by the dashed boxes. Importantly, subjects displayed high inter-individual variability in the sleep spindle range and a gradual spindle frequency increase by age that is critically underestimated by the group average of the oscillatory residuals (Figure 3 – figure supplement 1A, right). (B) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Grey-shaded areas indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. (C) Top: comparison of SO-spindle coupling strength between adolescents and adults. Adults displayed more precise coupling than adolescents in a centro-parietal cluster. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of coupling strength (mean ± SEM) for adolescents (red) and adults (black) with single subject data points. Exemplary single electrode data (bottom) is shown for C4 instead of Cz to visualize the difference. (D) Cluster-corrected correlations between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents (red, circle) and adults (black, diamond) of the sleep-first group (left, data at C4). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. Note that the change in task proficiency was inversely related to the change in learning curve (cf. Figure 2D), indicating that a stronger improvement in task proficiency related to a flattening of the learning curve. Further note that the significant cluster formed over electrodes close to motor areas. (E) Cluster-corrected correlations between individual coupling strength and overnight learning curve change. Same conventions as in (D). Participants with more precise SO-spindle coupling over C4 showed attenuated learning curves after sleep.

      and

      Figure 3 - figure supplement 1

      (A) Left: Z-normalized EEG power spectra (mean ± SEM) for adolescents (red) and adults (black) during NREM sleep in semi-log space. Data is displayed for the representative electrode Cz unless specified otherwise. Note the overall power difference between adolescents and adults due to a broadband shift on the y-axis. Straight black line denotes cluster-corrected significant differences. Middle: 1/f fractal component that underlies the broadband shift. Right: Oscillatory residual after subtracting the fractal component (A, middle) from the power spectrum (A, left). Both groups show clear delineated peaks in the SO (< 2 Hz) and spindle range (11 – 16 Hz) establishing the presence of the cardinal sleep oscillations in the signal. (B) Top: Spindle frequency peak development based on the oscillatory residuals. Spindle frequency is faster at all but occipital electrodes in adults than in adolescents. T-scores are transformed to z-scores. Asterisks denote cluster-corrected two-sided p < 0.05. Bottom: Exemplary depiction of the spindle frequency (mean ± SEM) for adolescents (red) and adults (black) with single subject data points at Cz. (C) SO-spindle co-occurrence rate (mean ± SEM) for adolescents (red) and adults (black) during NREM2 and NREM3 sleep. Event co-occurrence is higher in NREM3 (F(1, 51) = 1209.09, p < 0.001, partial eta² = 0.96) as well as in adults (F(1, 51) = 11.35, p = 0.001, partial eta² = 0.18). (D) Histogram of co-occurring SO-spindle events in NREM2 (blue) and NREM3 (purple) collapsed across all subjects and electrodes. Note the low co-occurring event count in NREM2 sleep. (E) Single subject (top) and group averages (bottom, mean ± SEM) for adolescents (red) and adults (black) of individually detected, for SO co-occurrence-corrected sleep spindles in NREM3. Spindles were detected based on the information of the oscillatory residual. Note the underlying SO-component (grey) in the spindle detection for single subject data and group averages indicating a spindle amplitude modulation depending on SO-phase. (F) Grand average time frequency plots (-2 to -1.5s baseline-corrected) of SO-trough-locked segments (corrected for spindle co-occurrence) in NREM3 for adolescents (left) and adults (right). Schematic SO is plotted superimposed in grey. Note the alternating power pattern in the spindle frequency range, showing that SO-phase modulates spindle activity in both age groups.

      Why did the authors use Spearman rather than Pearson correlations in Figure 4? Was it to reduce the influence of the outlier subject? They should minimally clarify and justify this, since it is less conventional in this line of research. And it would be useful to know if the relationship is significant with Pearson correlations when robust regression is applied. I see the authors are using MATLAB, and the robustfit toolbox (https://www.mathworks.com/help/stats/robustfit.html) is a simple way to address this issue.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      Additionally, with only a single night of recording data, it is impossible to disentangle possible trait-based sleep characteristics (e.g., Subject 1 has high SO-spindle coupling in general and retains motor memories well, but these are independent of each other) from a specific, state-based account (e.g., Subject 1's high SO-spindle coupling on night 1 specifically led to their improved retention or change in learning, etc., and this is unrelated to their general SO-spindle coupling or motor performance abilities). Clearly, many studies face this limitation, but this should be acknowledged.

      We thank the reviewers for their important remark. We agree that it is impossible to make a sound statement about whether our reported correlations represent trait- or state-based aspects of the sleep and learning relationship with the data that we have reported in the manuscript. However, while we are lacking a proper baseline condition without any task engagement, we still recorded polysomnography for all subjects during an adaptation night. Given the expected pronounced differences in sleep architecture between the adaptation nights and learning nights (see Table R3 for an overview collapsed across both age groups), we initially refrained from entering data from the adaptation nights into our original analyses, but we now fully report the data below. Note that the differences are driven by the adaptation night, where subjects first have to adjust to sleeping with attached EEG electrodes in a sleep laboratory.

      Table R3. Sleep architecture (mean ± standard deviation) for the adaptation and learning night collapsed across both age groups. Nights were compared using paired t-tests

      To further clarify whether subjects with high coupling strength have a motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect), we ran additional analyses using the data from the adaptation night. Note that the coupling strength metric was not impacted by differences in event number and our correlations with behavior were not influenced by sleep architecture (please refer to our answer of issue #7 for the results).Therefore, we considered it appropriate to also utilize data from the adaptation night.

      First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure R5A), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait), similar to what has been reported about the stable nature of sleep spindles as a “neural finger-print” (De Gennaro & Ferrara, 2003; De Gennaro et al, 2005; Purcell et al, 2017).

      To investigate a possible state-effect for coupling strength and motor learning, we calculated the difference in coupling strength between the two nights (learning night – adaptation night) and correlated these values with the overnight change in task proficiency and learning curve. We identified no significant correlations with a learning induced coupling strength change; neither for task proficiency nor learning curve change (Figure R5B). Note that there was a positive correlation of coupling strength change with overnight task proficiency change at Cz (Figure R5B, left), however it did not survive cluster-corrected correlational analysis (rhos = 0.34, p = 0.15). Combined, these results favor the conclusion that our correlations between coupling strength and learning rather reflect a trait-like relationship than a state-like relationship. This is in line with the interpretation of our previous studies that SO-spindle coupling strength reflects the efficiency and integrity of the neuronal pathway between neocortex and hippocampus that is paramount for memory networks and the information transfer during sleep (Hahn et al, 2020; Helfrich et al, 2019; Helfrich et al, 2018; Winer et al, 2019). For a comprehensive review please see Helfrich et al (2021), which argued that SO-spindle coupling predicts the integrity of memory pathways and therefore correlates with various metrics of behavioral performance or structural integrity.

      Figure R5

      (A) Topographical plot of spearman rank correlations of coupling strength in the adaptation night and learning night across all subjects. Overall coupling strength was highly correlated between the two measurements. (B) Cluster-corrected correlation between learning induced coupling strength changes (learning night – adaptation night) and overnight change in task proficiency (left) as well as learning curve (right). We found no significant clusters, although correlations showed similar trends as our original analyses, with more learning induced changes in coupling strength resulting in better overnight task proficiency and flattened learning curves.

      We have now added the additional state-trait analyses (Figure R5) to the updated manuscript as Figure 3 – figure supplement 2HI and report them in the results section (page 17, lines 361 – 375):

      "Finally, we investigated whether subjects with high coupling strength have a gross-motor learning advantage (i.e. trait-effect) or a learning induced enhancement of coupling strength is indicative for improved overnight memory change (i.e. state-effect). First, we correlated SO-spindle coupling strength obtained from the adaptation night with the coupling strength in the learning night. We found that overall, coupling strength is highly correlated between the two measurements (mean rho across all channels = 0.55, Figure 3 – figure supplement 2H), supporting the notion that coupling strength remains rather stable within the individual (i.e. trait). Second, we calculated the difference in coupling strength between the learning night and the adaptation night to investigate a possible state-effect. We found no significant cluster-corrected correlations between coupling strength change and task proficiency- as well as learning curve change (Figure 3 – figure supplement 2I).

      Collectively, these results indicate the regionally specific SO-spindle coupling over central EEG sensors encompassing sensorimotor areas precisely indexes learning of a challenging motor task."

      We further refer to these new results in the discussion section (page 23, lines 521 – 528):

      "Moreover, we found that SO-spindle coupling strength remains remarkably stable between two nights, which also explains why a learning-induced change in coupling strength did not relate to behavior (Figure 3 – figure supplement 2I). Thus, our results primarily suggest that strength of SO-spindle coupling correlates with the ability to learn (trait), but does not solely convey the recently learned information. This set of findings is in line with recent ideas that strong coupling indexes individuals with highly efficient subcortical-cortical network communication (Helfrich et al, 2021)."

      Additionally, we now provide descriptive data of the adaptation and learning night (Table R3) in the Supplementary file – table 1 and explicitly mention the adaptation night in the results section, which was previously only mentioned in the method section(page 6, lines 101 – 105):.

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      Reviewer #2 (Public Review):

      In this study Hahn and colleagues investigate the role of Slow-oscillation spindle coupling for motor memory consolidation and the impact of brain maturation on these interactions. The authors employed a real-life gross-motor task, where adolescents and adults learned to juggle. They demonstrate that during post-learning sleep SO-spindles are stronger coupled in adults as compared to adolescents. The authors further show, that the strength of SO-spindle coupling correlates with overnight changes in the learning curve and task proficiency, indicating a role of SO-spindle coupling in motor memory consolidation.

      Overall, the topic and the results of the present study are interesting and timely. The authors employed state of the art analyse carefully taking the general variability of oscillatory features into account. It also has to be acknowledged that the authors moved away from using rather artificial lab-tasks to study the consolidation of motor memories (as it is standard in the field), adding ecological validity to their findings. However, some features of their analyses need further clarification.

      We thank the reviewer for their positive assessment of our manuscript. Incorporating the encouraging and helpful feedback, we believe that we substantially improved the clarity and robustness of our analyses.

      1) Supporting and extending previous work of the authors (Hahn et al, 2020), SO-spindle coupling over centro-parietal areas was stronger in adults as compared to adolescents. Despite these differences in the EEG results the authors collapsed the data of adults and adolescents for their correlational analyses (Fig. 4a and 4b). Why would the authors think that this procedure is viable (also given the fact that different EEG systems were used to record the data)?

      We thank the reviewers for the opportunity to clarify why we think it is viable to collapse the data of adolescents and adults for our correlational analyses. In the following we split our answers based on the two points raised by the reviewers: (1) electrophysiological differences (i.e. coupling strength) between the groups and (2) potential signal differences due to different EEG systems.

      1. Electrophysiological differences

      Upon inspecting the original Figure 4, it is apparent that the coupling strength of the combined sample does not form isolated clusters for each age group. In other words, while adult coupling strength is on the higher and adolescent coupling on the lower end due to the developmental increase in coupling strength we reported in the original Figure 3F, both samples overlap forming a linear trend. Second, when running the correlational analyses between coupling strength and task proficiency as well as learning curve separately for each age group, we found that they follow the same direction (Figure R3). Adolescents with higher coupling strength show better task proficiency (Figure R3A, rhos = 0.66, p = 0.005). This effect was also present when using robust regression (b = 109.97, t(15)=3.13, rho = 0.63, p = 0.007). Like adolescents, adults with higher coupling strength at C4 displayed better task proficiency after sleep (Figure R3B, rhos = 0.39, p = 0.053). This relationship was stronger when using robust regression (b = 151.36, t(23)=3.17, rho =0.56, p = 0.004). For learning curves, we found the expected negative correlation at C4 for adolescents (Figure R3C, rhos = -0.57, p = 0.020) and adults (Figure R3D, rhos = -0.44, p = 0.031). Results were comparable when using robust regression (adolescents: b = -59.58, t(15) = -2.94, rho = -0.60, p = 0.010; adults: b = -21.99, t(23 )= -1.71, rho = -0.37, p = 0.101).

      Taken together, these results demonstrate that adolescents and adults show the effects and the same direction at the same electrode, thus, making it highly unlikely that our results are just by chance and that our initial correlation analyses are just driven by one group.

      Additionally, we already controlled for age in our original analyses using partial correlations (also refer to our answer to issue #6). Hence, our additional analyses provide additional support that it is viable to collapse the analyses across both age groups even though they differ in coupling strength.

      1. Different EEG-systems

        The reviewers also raise the question whether our analyses might be impacted by the different EEG systems we used to record our data. This is an important concern especially when considering that cross-frequency coupling analyses can be severely confounded by differences in signal properties (Aru et al, 2015). In our sample, the strongest impact factor on signal properties is most likely age, given the broadband power differences in the power spectrum we found between the groups (original Figure 3A). Importantly, we also found a similar systematic power difference in our longitudinal study using the same ambulatory EEG system for both data recordings (Hahn et al, 2020). This is in line with numerous other studies demonstrating age related EEG power changes in broadband- as well as SO and sleep spindle frequency ranges (Campbell & Feinberg, 2016; Feinberg & Campbell, 2013; Helfrich et al, 2018; Kurth et al, 2010; Muehlroth et al, 2019; Muehlroth & Werkle-Bergner, 2020; Purcell et al, 2017). Therefore, we already had to take differences in signal property into account for our cross-frequency analyses. Regardless whether the underlying cause is an age difference or different signal-to-noise ratios of different EEG systems.

      To mitigate confounds in the signal, we used a data-driven and individualized approach detecting SO and sleep spindle events based on individualized frequency bands and a 75-percentile amplitude criterion relative to the underlying signal. Additionally we z-normalized all spindle events prior to the cross-frequency coupling analyses (Figure R3E). We found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents that were recorded with an ambulatory amplifier system (alphatrace) and adults that were recorded with a stationary amplifier system (neuroscan) using cluster-based random permutation testing. This was also the case for the SO-filtered (< 2 Hz) signal (Figure R3E, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs.

      Consequently, our analysis pipeline already controlled for possible differences in signal property introduced through different amplifier systems. Nonetheless, we also wanted to directly compare the signal-to-noise ratio of the ambulatory and stationary amplifier systems. However, we only obtained data from both amplifier systems in the adult sleep first group, because we recorded EEG during the juggling learning phase with the ambulatory system in addition to the PSG with the stationary system. First, we computed the power spectra in the 1 to 49 Hz frequency range during the juggling learning phase (ambulatory) and during quiet wakefulness (stationary) for every subject in the adult sleep first group in 10-seconds segments. Next, we computed the signal-to-noise ratio (mean/standard deviation) of the power spectra per frequency across all segments. We only found a small negative cluster from 21.9 to 22.5 Hz (p = 0.042, d = 0.53; Figure R3F), which did not pertain our frequency-bands of interest. Critically, the signal-to-noise ratio of both amplifiers converged in the upper frequency bands approaching the noise floor, therefore, strongly supporting the notion that both systems in fact provided highly comparable estimates.

      In conclusion, both age groups display highly similar effects and direction when correlating coupling strength with behavior. Further, after individualization and normalization the analytical signal, we found no differences in signal properties that would confound the cross-frequency analysis. Lastly, we did not find systematic differences in signal-to-noise ratio between the different EEG-systems. Thus, we believe it is justified to collapse the data across all participants for the correlational analyses, as it combines both, the developmental aspect of enhanced coupling precision from adolescence to adulthood and the behavioral relevance for motor learning which we deem a critical research advance from our previous study.

      Figure R3

      (A) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) for adolescents of the sleep-first group (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. Grey-shaded area indicates 95% confidence intervals of the robust trend line. Participants with a more precise SO-spindle coordination show improved task proficiency after sleep. (B) Cluster-corrected correlation of coupling strength and overnight task proficiency change) for adults. Same conventions as in (A). Similar trend of higher coupling strength predicting better task proficiency after sleep (C) Cluster-corrected correlation of coupling strength and overnight learning curve change for adolescents. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (D) Cluster-corrected correlation of coupling strength and overnight learning curve change for adults. Same conventions as in (A). Higher coupling strength related to a flatter learning curve after sleep. (E) Spindle peak locked epoch (NREM3, co-occurrence corrected) grand averages (mean ± SEM) for adolescents (red) and adults (black). Inset depicts the corresponding SO-filtered (2 Hz lowpass) signal. Black lines indicate significant clusters. Note, we found no difference in amplitude after normalization. Significant differences are due to more precise SO-spindle coupling in adults. Spindle frequency is blurred due to individualized spindle detection. (F) Signal-to-noise ratio for the stationary EEG amplifier (green) during quiet wakefulness and for the ambulatory EEG amplifier (purple) during juggling training. Grey shaded area denotes cluster-corrected p < 0.05. Note that signal-to-noise ratio converges in the higher frequency ranges.

      We have now added Figure R3E as Figure 3B to the revised version of the manuscript to demonstrate that there were no systematic differences between the two age groups in the analytical signal due to the expected age related power differences or EEG-systems. Specifically, we now state in the results section (page 13 – 14, lines 282 – 294):

      "We assessed the cross frequency coupling based on z-normalized spindle epochs (Figure 3B) to alleviate potential power differences due to age (Figure 3 – figure supplement 1A) or different EEG-amplifier systems that could potentially confound our analyses (Aru et al, 2015). Importantly, we found no amplitude differences around the spindle peak (point of SO-phase readout) between adolescents and adults using cluster-based random permutation testing (Figure 3B), indicating an unbiased analytical signal. This was also the case for the SO-filtered (< 2 Hz) signal (Figure 3B, inset). Critically, the significant differences in amplitude from -1.4 to -0.8 s (p = 0.023, d = -0.73) and 0.4 to 1.5 s (p < 0.001, d = 1.1) are not caused by age related differences in power or different EEG-systems but instead by the increased coupling strength (i.e. higher coupling precision of spindles to SOs) in adults giving rise to a more pronounced SO-wave shape when averaging across spindle peak locked epochs."

      Further, we added the correlational analyses that we computed separately for the age groups (Figure R3A-D) to the revised manuscript (Figure 3 – figure supplement 2CD) as they further substantiate our claims about the relationship between SO-spindle coupling and gross-motor learning.

      We now refer to these analyses in the results section (page 16, lines 338 – 343):

      "Critically, when computing the correlational analyses separately for adolescents and adults, we identified highly similar effects at electrode C4 for task proficiency (Figure 3 – figure supplement 2C) and learning curve (Figure 3 – figure supplement 2D) in each group. These complementary results demonstrate that coupling strength predicts gross-motor learning dynamics in both, adolescents as well as adults, and further show that this effect is not solely driven by one group."

      2) The authors might want to explicitly show that the reported correlations (with regards to both learning curve and task proficiency change) are not driven by any outliers.

      We thank the reviewers for their suggestion. We agree that when inspecting the scatter plots it looks like that the correlations could be severely influenced by two outliers in the adult group. Because this is an important matter, we recalculated all previously reported correlations without the two outliers (Figure R4, left column) and followed the reviewer’s suggestion to also compute robust regression (Figure R4, right column) and found no substantial deviation from our original results.

      In more detail, increase in task proficiency resulted in flattening of the learning curve when removing outliers (Figure R4A, rhos = -0.70, p < 0.001) and when applying robust regression analysis (Figure R4B, b = -0.30, t(67) = -10.89, rho = -0.80, p < 0.001). Likewise, higher coupling strength still predicted better task proficiency (mean rho = 0.35, p = 0.029, cluster-corrected) and flatter learning curves after sleep (rho = -0.44, p = 0.047, cluster-corrected) when removing the outliers (Figure R4CE) and when calculating robust regression (Figure R4DF, task proficiency: b = 82.32, t(40) = 3.12, rho = 0.45, p = 0.003; learning curve: b = -26.84, t(40) = -2.96, rho = -0.43, p = 0.005). Furthermore, we calculated spearman rank correlations and cluster-corrected spearman rank correlations in our original manuscript, to mitigate the impact of outliers, even though Pearson correlations are more widely used in the field. Therefore, we still report spearman rank correlations for single electrodes instead of robust correlations as it is more consistent with the cluster-correlation analyses.

      We now use robust trend lines instead of linear trend lines in our scatter plots. Further, we added the correlations without outliers (Figure R4ACE) to the supplements as Figure 2 – figure supplement 1D and Figure 3 – figure supplement 2 FG. These additional analyses are now reported in the results section of the revised manuscript (page 9, lines 186 – 191):

      "[…] we confirmed a strong negative correlation between the change (post retention values – pre retention values) in task proficiency and the change in learning curve after the retention interval (Figure 2F; rhos = -0.71, p < 0.001), which also remained strong after outlier removal (Figure 2 – figure supplement 1D). This result indicates that participants who consolidate their juggling performance after a retention interval show slower gains in performance."

      And (page 16, lines 343 – 346):

      "[…] Furthermore, our results remained consistent when including coupled spindle events in NREM2 (Figure 3 – figure supplement 2E) and after outlier removal (Figure 3 – figure supplement 2FG)."

      Furthermore, we now state that we specifically utilized spearman rank correlations to mitigate the impact of outliers in our analyses in the method section (page 35, lines 808 – 813)::

      "For correlational analyses we utilized spearman rank correlations (rhos; Figure 2F & Figure 3DE) to mitigate the impact of possible outliers as well as cluster-corrected spearman rank correlations by transforming the correlation coefficients to t-values (p < 0.05) and clustering in the space domain (Figure 3DE). Linear trend lines were calculated using robust regression."

      Figure R4:

      (A) Spearman rank correlation between task proficiency change and learning curve change collapsed across adolescents (red dot) and adults (black diamonds) after removing two outlier subjects in the adult age group. Grey-shaded area indicates 95% confidence intervals of the robust trend line. (B) Robust regression of task proficiency change and learning curve change of the original sample. (C) Cluster-corrected correlations (right) between individual coupling strength and overnight task proficiency change (post – pre retention) after outlier removal (left, spearman correlation at C4, uncorrected). Asterisks indicate cluster-corrected two-sided p < 0.05. (D) Robust regression of coupling strength at C4 and task proficiency of the original sample. (E) Same conventions as in (C) but for overnight learning curve change. (F) Same conventions as in (D) but for overnight learning curve change.

      3) The sleep data of all participants (thus from both sleep first and wake first) were used to determine the features of SO-spindle coupling in adolescents and adults. Were there any differences between groups (sleep first vs. wake first)? This might be in interesting in general but especially because only data of the sleep first group entered the subsequent correlational analyses.

      We thank the reviewers for their remark. We agree that adding additional information about possible differences between the sleep first and wake first groups would allow for a more comprehensive assessment of the reported data. We did not explain our reasoning to include only the sleep first groups for the correlation analyses clearly enough in the original manuscript. Unfortunately, we can only report data for the adolescents in our sample, because we did not record polysomnography (PSG) for the adult wake first group. This is also one of the two reasons why we focused on the sleep first groups for our correlational analyses.

      Adolescents in the sleep first group did not differ from adolescents in the wake first group in terms of sleep architecture (except REM (%), which did not correlate with behavior [task proficiency: rho = -0.17, p = 0.28; learning curve: -0.02, p = 0.90]) as well as SO and sleep spindle event descriptive measures (see Table R2). Importantly, we found no differences in coupling strength between the two groups (Figure R2A).

      Table R2. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents in the sleep first and wake first group (mean ± standard deviation). Independent t-tests were used for comparisons

      The second reason why we focused our analyses on sleep first was that adolescents in the wake first group had higher task proficiency after the sleep retention interval than the sleep first group (Figure R2A; t(23) = -2.24, p = 0.034). This difference in performance is directly explained by the additional juggling test that the wake first group performed at the time point of their learning night, which should be considered as additional training. Therefore, we excluded the wake first group from our correlational analyses because sleep and wake first group are not comparable in terms of juggling training during the night when we assessed SO-spindle coupling strength.

      Figure R2

      (A) Comparison of SO-spindle coupling strength in the adolescent sleep first (blue) and wake first (green) group using cluster-based random permutation testing (Monte-Carlo method, cluster alpha 0.05, max size criterion, 1000 iterations, critical alpha level 0.05, two-sided). Left: exemplary depiction of coupling strength at electrode C4 (mean ± SEM). Right: z-transformed t-values plotted for all electrodes obtained from the cluster test. No significant clusters emerged. (B) Comparison of task proficiency between sleep first and wake first group after the sleep retention interval (mean ± SEM). Adolescents in the wake first group had higher task proficiency given the additional juggling performance test, which also reflects additional training.

      These additional analyses (Figure R2) and the summary statistics of sleep architecture and SO/spindle event descriptives of adolescents in the sleep first and wake first group (Table R2), are now reported in the revised version of the manuscript as Figure 3 – figure supplement 2AB and Supplementary file – table 7. We now explicitly explain our rationale of why we only considered participants in the sleep first group for our correlational analyses in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)"

      And (page 15, lines 311 – 320):

      "[…] Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6). Notably, we found no differences in electrophysiological parameters (i.e. coupling strength, event detection) between the adolescents of the wake first and sleep first group (Figure 3 – figure supplement 2B & Supplementary file – table 7)."

      4) To allow a more comprehensive assessment of the underlying data information with regards to general sleep descriptives (minutes, per cent of time spent in different sleep stages, overall sleep time etc.) as well as related to SOs, spindles and coupled events (e.g. number, density etc.) would be needed.

      We agree with the reviewers that additional information about sleep architecture and SO as well as sleep spindle characteristics are needed for a more comprehensive assessment of our data. We now added summary tables for sleep architecture and SO/spindle event descriptive measures for the whole sample (Table R4) and for the sleep first groups that we used for our correlational analyses (Table R5) to the supplementary material in the updated manuscript. It is important to note, that due to the longer sleep opportunity of adolescents that we provided to accommodate the overall higher sleep need in younger participants, adolescents and adults differed in most general sleep architecture markers and SO as well as sleep spindle descriptive measures. In addition, changes in sleep architecture are prominent during the maturational phase from adolescence to adulthood, which might introduce additional variance between the two age groups.

      Table R4. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults across the whole sample (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      Table R5. Summary of sleep architecture and SO/spindle event descriptive measures (at electrode C4) of adolescents and adults in the sleep first group (mean ± standard deviation) in the learning night. Independent t-tests were used for comparisons

      In order to ensure that our correlational analyses are not driven by these systematic differences between the two age groups, we used cluster-corrected partial correlations to control for sleep architecture markers (Figure R7) and SO/spindle descriptive measurements (Figure R8A). Critically, none of these possible confounders changed the pattern of our initial correlational analyses of coupling strength and task proficiency/learning curve. Additionally, we also controlled for differences in spindle event number by using a bootstrapped resampling approach. We randomly drew 200 spindle events in 100 iterations and subsequently recalculated the coupling strength for each subject. We found that resampled values and our original observation of coupling strength are almost perfectly correlated, indicating that differences in event number are unlikely to have an impact on coupling strength as long as there are at least 200 events (Figure R8B). Combined these analyses demonstrate that our correlations between coupling strength and behavior are not influenced by the reported differences in sleep architecture and SO/spindle descriptive measures.

      Figure 7R

      Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling for possible confounding factors. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable.

      Figure R8

      (A) Summary of cluster-corrected partial correlations of coupling strength with task proficiency (left) and learning curve (right) controlling SO/spindle descriptive measures at critical electrode C4. Asterisks indicate location of the detected cluster. The pattern of initial results remained highly stable. (B) Spearman correlation between resampled coupling strength (N = 200, 100 iterations) and original observation of coupling strength for adolescents (red circles) and adults (black diamonds), indicating that coupling strength is not influenced by spindle event number if at least 200 events are present. Grey-shaded area indicates 95% confidence intervals of the robust trend line.

      We now provide general sleep descriptives (Table R4 & R5) in the revised version of the manuscript as Supplementary file – table 2 & table 6. These data are referred to in the results section (page 6, lines 101 – 105):

      "Polysomnography (PSG) was recorded during an adaptation night and during the respective sleep retention interval (i.e. learning night) except for the adult wake-first group (for sleep architecture descriptive parameters of the adaptation night and learning night as well as for adolescents and adults see Supplementary file – table 1 & 2)."

      And (page 15, lines 311 – 318):

      "Furthermore, given that we only recorded polysomnography for the adults in the sleep first group and that adolescents in the wake first group showed enhanced task proficiency at the time point of the sleep retention interval due to additional training (Figure 3 – figure supplement 2A), we only considered adolescents and adults of the sleep-first group to ensure a similar level of juggling experience (for summary statistics of sleep architecture and SO and spindle events of subjects that entered the correlational analyses see Supplementary file – table 6)."

      The additional control analyses (Figure R7 & R8) are also now added to the revised manuscript as Figure 3 – figure supplement 3 & 4 in the results section (page 16, lines 356 – 360):

      "For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      5) The authors used a partial correlations to rule out that age drove the relationship between coupling strength, learning curve and task proficiency. It seems like this analysis was done specifically for electrode C4, after having already established that coupling strength at electrode C4 correlates in general with changes in the learning curve and task proficiency. I think the claim that results were not driven by age as confounding factor would be stronger if the authors used a cluster-corrected partial correlation in the first place (just as in the main analysis).

      The reviewers are correct that initially we only conducted the partial correlation for electrode C4. Following the reviewers suggestion we now additionally computed cluster-corrected partial correlations similar to our main analysis. Like in our original analyses, we found a significant positive central cluster (Figure R6A, mean rho = 0.40, p = 0.017) showing that higher coupling strength related to better task proficiency after sleep and a negative cluster-corrected correlation at C4 showing that higher coupling strength was related to flatter learning curves after sleep (Figure R6B, rho = -0.47, p = 0.049) also when controlling for age.

      Figure R6

      (A) Cluster-corrected partial correlation of individual coupling strength in the learning night and overnight change in task proficiency (post – pre retention) collapsed across adolescents and adults, controlling for age. Asterisks indicate cluster-corrected two-sided p < 0.05. A similar significant cluster to the original analysis (Figure 4A) emerged comprising electrodes Cz and C4. (B) Same conventions as in A. Like in the original analysis (Figure 4B) a negative correlation between coupling strength at C4 and learning curve change survived cluster-corrected partial correlations when controlling for age.

      We now always report cluster-corrected partial correlations when controlling for possible confounding variables in the updated version of the manuscript (also see answer to issue #7). A summary of all computed partial correlations including Figure R6 can now be found as Figure 3 – figure supplement 3 & 4 in the revised manuscript.

      Specifically we now state in the results section (page 16 – 17, lines 347 – 360):

      "To rule out age as a confounding factor that could drive the relationship between coupling strength, learning curve and task proficiency in the mixed sample, we used cluster-corrected partial correlations to confirm their independence of age differences (task proficiency: mean rho = 0.40, p = 0.017; learning curve: rhos = -0.47, p = 0.049). Additionally, given that we found that juggling performance could underlie a circadian modulation we controlled for individual differences in alertness between subjects due to having just slept. We partialed out the mean PVT reaction time before the juggling performance test after sleep from the original analyses and found that our results remained stable (task proficiency: mean rho = 0.37, p = 0.025; learning curve: rhos = -0.49, p = 0.040). For a summary of the reported cluster-corrected partial correlations as well as analyses controlling for differences in sleep architecture see Figure 3 – figure supplement 3. Further, we also confirmed that our correlations are not influenced by individual differences in SO and spindle event parameters (Figure 3 – figure supplement 4)."

      And in the methods section (page 35, lines 813 – 814):

      "To control for possible confounding factors we computed cluster-corrected partial rank correlations (Figure 3 – figure supplement 3 and 4)."

      References

      Aru, J., Aru, J., Priesemann, V., Wibral, M., Lana, L., Pipa, G., Singer, W. & Vicente, R. (2015) Untangling cross-frequency coupling in neuroscience. Curr Opin Neurobiol, 31, 51-61.

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J., Gruber, G., Birklbauer, J. & Hoedlmoser, K. (2019) The impact of sleep on complex gross-motor adaptation in adolescents. Journal of Sleep Research, 28(4).

      Bothe, K., Hirschauer, F., Wiesinger, H. P., Edfelder, J. M., Gruber, G., Hoedlmoser, K. & Birklbauer, J. (2020) Gross motor adaptation benefits from sleep after training. J Sleep Res, 29(5), e12961.

      Campbell, I. G. & Feinberg, I. (2016) Maturational Patterns of Sigma Frequency Power Across Childhood and Adolescence: A Longitudinal Study. Sleep, 39(1), 193-201.

      Dayan, E. & Cohen, L. G. (2011) Neuroplasticity subserving motor skill learning. Neuron, 72(3), 443-54. De Gennaro, L. & Ferrara, M. (2003) Sleep spindles: an overview. Sleep Med Rev, 7(5), 423-40.

      De Gennaro, L., Ferrara, M., Vecchio, F., Curcio, G. & Bertini, M. (2005) An electroencephalographic fingerprint of human sleep. Neuroimage, 26(1), 114-22.

      Dinges, D. F., Pack, F., Williams, K., Gillen, K. A., Powell, J. W., Ott, G. E., Aptowicz, C. & Pack, A. I. (1997) Cumulative sleepiness, mood disturbance, and psychomotor vigilance performance decrements during a week of sleep restricted to 4-5 hours per night. Sleep, 20(4), 267-77.

      Dinges, D. F. & Powell, J. W. (1985) Microcomputer Analyses of Performance on a Portable, Simple Visual Rt Task during Sustained Operations. Behavior Research Methods Instruments & Computers, 17(6), 652-655.

      Eichenlaub, J. B., Biswal, S., Peled, N., Rivilis, N., Golby, A. J., Lee, J. W., Westover, M. B., Halgren, E. & Cash, S. S. (2020) Reactivation of Motor-Related Gamma Activity in Human NREM Sleep. Front Neurosci, 14, 449.

      Feinberg, I. & Campbell, I. G. (2013) Longitudinal sleep EEG trajectories indicate complex patterns of adolescent brain maturation. American Journal of Physiology - Regulatory, Integrative and Comparative Physiology, 304(4), R296-303.

      Hahn, M., Heib, D., Schabus, M., Hoedlmoser, K. & Helfrich, R. F. (2020) Slow oscillation-spindle coupling predicts enhanced memory formation from childhood to adolescence. Elife, 9.

      Helfrich, R. F., Lendner, J. D. & Knight, R. T. (2021) Aperiodic sleep networks promote memory consolidation. Trends Cogn Sci.

      Helfrich, R. F., Lendner, J. D., Mander, B. A., Guillen, H., Paff, M., Mnatsakanyan, L., Vadera, S., Walker, M. P., Lin, J. J. & T., K. R. (2019) Bidirectional prefrontal-hippocampal dynamics organize information transfer during sleep in humans. Nature Communications, 10(1), 3572.

      Helfrich, R. F., Mander, B. A., Jagust, W. J., Knight, R. T. & Walker, M. P. (2018) Old Brains Come Uncoupled in Sleep: Slow Wave-Spindle Synchrony, Brain Atrophy, and Forgetting. Neuron, 97(1), 221-230 e4.

      Killgore, W. D. (2010) Effects of sleep deprivation on cognition. Prog Brain Res, 185, 105-29.

      Kurth, S., Jenni, O. G., Riedner, B. A., Tononi, G., Carskadon, M. A. & Huber, R. (2010) Characteristics of sleep slow waves in children and adolescents. Sleep, 33(4), 475-80.

      Maris, E. & Oostenveld, R. (2007) Nonparametric statistical testing of EEG- and MEG-data. J Neurosci Methods, 164(1), 177-90.

      Muehlroth, B. E., Sander, M. C., Fandakova, Y., Grandy, T. H., Rasch, B., Shing, Y. L. & Werkle-Bergner, M. (2019) Precise Slow Oscillation-Spindle Coupling Promotes Memory Consolidation in Younger and Older Adults. Sci Rep, 9(1), 1940.

      Muehlroth, B. E. & Werkle-Bergner, M. (2020) Understanding the interplay of sleep and aging: Methodological challenges. Psychophysiology, 57(3), e13523.

      Niethard, N., Ngo, H. V. V., Ehrlich, I. & Born, J. (2018) Cortical circuit activity underlying sleep slow oscillations and spindles. Proceedings of the National Academy of Sciences of the United States of America, 115(39), E9220-E9229.

      Purcell, S. M., Manoach, D. S., Demanuele, C., Cade, B. E., Mariani, S., Cox, R., Panagiotaropoulou, G., Saxena, R., Pan, J. Q., Smoller, J. W., Redline, S. & Stickgold, R. (2017) Characterizing sleep spindles in 11,630 individuals from the National Sleep Research Resource. Nature Communications, 8, 15930.

      Van Dongen, H. P., Maislin, G., Mullington, J. M. & Dinges, D. F. (2003) The cumulative cost of additional wakefulness: dose-response effects on neurobehavioral functions and sleep physiology from chronic sleep restriction and total sleep deprivation. Sleep, 26(2), 117-26.

      Wilhelm, I., Metzkow-Meszaros, M., Knapp, S. & Born, J. (2012) Sleep-dependent consolidation of procedural motor memories in children and adults: the pre-sleep level of performance matters. Developmental Science, 15(4), 506-15.

      Winer, J. R., Mander, B. A., Helfrich, R. F., Maass, A., Harrison, T. M., Baker, S. L., Knight, R. T., Jagust, W. J. & Walker, M. P. (2019) Sleep as a potential biomarker of tau and beta-amyloid burden in the human brain. J Neurosci.

    1. Author Response:

      Evaluation Summary:

      This study reports that monocular inactivation of the fellow (good) eye with tetrodotoxin supports long-lasting recovery from the effects of monocular deprivation, as measured by visual evoked potentials in primary visual cortex. This work should be of interest to neuroscientists studying plasticity and clinicians treating amblyopia. The results are compelling, although the advance compared to previous work is incremental.

      We thank the reviewers for their positive assessment of the data and the constructive comments. We believe that the current study substantially advances knowledge over the current state of the art. Decades of previous experience in cats, monkeys and humans led to the conclusion that monocular occlusion therapy is no longer effective in producing a lasting improvement after the critical period, which we now also confirm in mice. The key finding of our study is that this conclusion does not apply if the fellow eye is temporarily inactivated, underscoring the critical difference between degrading image formation in one eye (by, e.g., patching) and temporarily silencing all retinal ganglion cell activity. Although our previous research showed that binocular inactivation can also promote recovery, the finding that this recovery is enabled by inactivating only one eye halves concern over ocular health following treatment and forces a revision in how earlier observations of the effects of enucleation of the fellow eye are interpreted. In addition and of particular significance, the effects of unilateral retinal inactivation promoting stable recovery from deprivation amblyopia were observed in both cats and mice, suggesting evolutionary conservation of a core mechanism for recovery of cortical function.

      Reviewer #1:

      In this manuscript, Fong et al. showed that temporary inactivation of the fellow (good) eye through injection of tetrodotoxin (TTX) led to long-lasting recovering from amblyopia beyond the critical period, using both mice and cats as model systems. In contrast, reverse occlusion only had short term effects. This work is built on previous works from the authors, showing that TTX injection did not have any obvious effects on neuronal health (DiCostanzo et al., 2020), and that reverse occlusion in cats induced anatomical recovery from amblyopia (Duffy et al., 2018). In summary, this work is clearly written, with a strong and simple message, and has potentially important clinical implications for the treatment of amblyopia. It could be significantly strengthened by some probing into potential mechanisms, especially whether mechanisms previously shown to be important for critical period plasticity are activated following temporary inactivation of the fellow eye. These may give insight into potential treatment strategies.

      We share the reviewer's interest in mechanisms underlying recovery and look forward to investigating them in future studies. As it stands now, however, our work immediately suggests a strategy that could be reduced to practice, even before the mechanism is pinpointed.

      Reviewer #2:

      This manuscript by Fong and colleagues explores plasticity of circuits in the visual cortex of normally reared and amblyopic mice and cats. Previous work from this group had reported the exciting finding that transient inactivation of the non-deprived eye by intraocular injection of TTX can trigger substantial recovery of acuity in the deprived eye. Here the authors perform electrophysiological experiments to reveal that:

      1. Temporary inactivation of one eye in normally reared mice increases the visually evoke potential amplitude of the non-inactivated eye for more than 7 days
      2. Recovery of amblyopia from long term monocular deprivation is possible even in the adult
      3. Recovery is detectable even after 1 day of transient inactivation
      4. Recovery persists for a longer period of time than the recovery traditionally observed with reverse occlusion
      5. Inactivation has similar effects in mice and cat

      The experiments described in this manuscript are generally carefully performed and results are very clear. The information gained from this study are important in advancing our understanding of adult plasticity and the potential to reverse amblyopia. This reviewer has a few questions/comments about the interpretation of the data that should be addressed to improve the impact and clarity of this study.

      We appreciate the careful review of our manuscript. We hope we have addressed the reviewer’s questions and comments in our response to the editor’s summary above.

    1. Author Response:

      Reviewer #1 Public Review:

      Nakayama and colleagues report a unique screening concept utilizing conserved mechanisms between zebrafish gastrulation and cancer metastasis for identification of potential anti-metastatic drugs. They screen 1280 FDA-approved drugs using the gastrulation as a marker, and identify Pizotifen as an epiboly interrupting drug. Then they find that pharmacologic and genetic inhibition of HTR2C, a target of Pizotifen, suppresses metastatic progression in a zebrafish and mouse model through inhibition of epithelial to mesenchymal transition (EMT) via Wnt-signaling.

      Their work is of interest and has the potential to appeal to a broad audience. However, additional experiments are needed to further substantiate their concept that human cancer metastasis mimic/recapitulate zebrafish gastrulation in terms of conserved mechanism, as well as to confirm the validity of their screening method regarding to the effects of global toxicity.

      Major concerns:

      The first major concern I have is the appropriateness to think the gastrulation as a parameter/index of cancer metastasis. While they cherry-picked some genes that they are known to be involved in both gastrulation and cancer metastasis, more broad analysis should probably be necessary to conclude so. For examples, the authors can analyze comprehensive RNA-seq data set to see if the pathways/networks are similar between gastrulation (zebrafish embryo development data set) and cancer metastasis (benign/primary tumors vs metastasis tumors in TCGA).

      The conservation of embryonic EMT and tumor metastasis EMT has long been well recognized. Now we cited some of these published references (Nieto et al., 2016; Thiery et al., 2009; Yang and Weinberg, 2008). In Table 1, we compiled 50 genes based on published literature to provide further and strong evidence to support this conservation. Knockdown of these genes in Xenopus or zebrafish induced gastrulation defects; conversely, overexpression of these genes conferred metastatic potential on cancer cells and knockdown of these genes suppressed metastasis. Although this point is not really an objective of this study, we believe that the evidence for the conservation is sufficiently convincing to provide the basis for our study. Further RNA-seq comparison of zebrafish embryonic EMT and human tumor metastasis should be beyond the scope of the current study. Generally, the transcriptomic data for zebrafish embryo development at the epiboly/gastrulation stage are based on the whole embryos which include all other activities and are not specific to EMT; thus, it may not be a proper comparison with tumor metastasis data to search for more evidence.

      The second concern is about the Pizotifen's effects on cancer metastasis. Since the Pizotifen suppresses gastrulation, it might have some harmful effect on the organogenesis/development of day2 embryos that they used in zebrafish transplantation model. And if so, cancer metastasis can be suppressed indirectly. The authors could examine if Pizotifen could have some side effects on day2 embryos. The drug also has some cell viability suppressive effects in vivo so as the pics in Fig.2D looks like, and it would be good if this possibility was excluded.

      We had not observed any abnormality in development of Tg (kdrl;GFP) and WT zebrafish at day 2 when these fish were treated with 5µM Pizotifen. However, more than 20µM Pizotifen treatment affected approximately 10% of these fish. The affected fish show shorter tail rather than that of vehicle-treated zebrafish. In xenograft experiments, zebrafish embryos at day 2 were treated with 5µM Pizotifen. The concentration of Pizotifen did not affected development of day2 embryos.

      Futhermore, we demonstrated Pizotifen did not affect primary tumor growth in a mice model of metastasis using 4T1 cells by two different experimental methods. One is that tumor measurement revealed that the sizes of the primary tumors in Pizotifen-treated mice were equal to those in the vehicle-treated mice at the time of resection on day 10 post inoculation. The other is that IF-staining showed the percentage of Ki67 positive cells in the resected primary tumors of Pizotifen-treated mice were the same as those of vehicle-treated mice (Figure 3A and B). Therefore, we conclude that Pizotifen suppress metastasis without affecting cell viability in vivo.

      Finally, the mechanistic parts would need more confirmation and rescue experiments. Transplanted cells can be sorted after the treatment and the expression changes of EMT markers can be examined to see if the phenomenon happens in vivo as well. All main results can be rescued to see if the effect of Pizotifen against EMT happens through HTR2C-Wnt axis.

      Figure 5C showed that 4T1 primary tumors from Pizotifen-treated mice has elevated E95 cadherin expression compared with tumors from vehicle-treated mice. Furthermore, Figure 5C also demonstrated that β-catenin accumulated in the nucleus, and phospho-GSKβ and Zeb1 expression were decreased in 4T1 primary tumors from Pizotifen-treated mice compared with vehicle-treated mice. Loss of E-cadherin plays an essential role in promoting EMT-mediated metastasis since loss of E-cadherin itself is enough to promote metastasis. In contrast, overexpression of mesenchymal markers: vimentin and N-cadherin is not sufficient to induce metastasis. Based on our data from Figure 5C and the accumulated evidences, we conclude that Pizotifen restored epithelial properties to metastatic cells through a decrease of transcriptional activity of β-catenin in vivo

    1. Author Response:

      Reviewer #3 (Public Review):

      This paper addresses important questions about how two chaperones, DNAJA2 and DNAJB1, interfere with tau aggregation. The description of their interaction mechanisms and their interaction regions is both novel and interesting to the field of chaperone and tau aggregation. The use of NMR and kinetic analysis is compelling to obtain useful information. This information will be useful to understand the importance and the mode of action of the chaperones in a biological context.

      In general, no attention is paid to how aggregation is triggered. There is no mention of heparin in the results and the quantity of heparin used to trigger aggregation is not written in the method. This is a crucial aspect that is relevant to what aggregation pathway is modeled in vitro, in particular in the light of recent results showing that heparin-induced fibrils are different from brain-extracted fibrils (Fichou et al. Chem Comm 2018, Zhang et al., elife 2019).

      My major concern in this paper is about the assumption that tau aggregation-prone conformers were generated. The conjecture that an excess of heparin increases the population of aggregation-prone conformers is not justified. On the contrary, excess of heparin was shown to form off-pathway oligomers (thus not aggregation-prone) (Ramachandra & Udgaonkar JBC 2011) that do not exhibit exposed PHF6(*) (Fichou et al, Frontiers in neurosciences 2019) expected to be the signature of aggregation-prone conformers (Eschmann et al. Scientific reports 2017; Chen et al. Nat. Comm 2019). If more aggregation conformations were present, they would aggregate more easily, and not be stable as it happens when heparin concentration is increased. Thus, I don't believe that the interpretation that the chaperone interfere with aggregation-prone conformers is justified by the data on heparin-tau complex. In general heparin-tau complex should not be use a proxy for aggregation-prone conformations in the different result sections and in the discussion.

      We thank the reviewer for their valuable comments.

      Following the reviewer’s suggestion, we have added a description to the method section indicating that the aggregation was triggered by the addition of heparin (at 4:1 tau:heparin ratio for the unseeded experiments) or with 1% tau seeds (for the seeded experiments).

      In addition we fully agree that the heparin generated tau fibers are different from brain-extracted fibrils and therefore tried to avoid making any conclusions regarding the mode of chaperone association with the mature fibrils themselves.

      We do however stand behind our statement that excess heparin increases the population of the aggregation-prone extended tau conformers. The seeding competent monomers (Ms) reported by Mirbaha al. eLife 2018 and Chen et al. Nat. Comm 2019 were generated by the incubation of tau with equimolar concentrations of heparin, similar to that used in our manuscript to generate the aggregation-prone species. Furthermore, Mirbaha al. eLife 2018 and Chen et al. Nat. Comm 2019 found the tau species to be similarly expanded both when purified from AD patients and generated in the presence of equimolar heparin concentration.

      However, in light of the reviewer’s comments and in order to confirm that in our NMR experiments the heparin-bound tau monomer indeed displays an expanded conformation, we recorded RDC measurements on our samples. Tau alone, in the absence of heparin, showed large RDC values (10-20 Hz) in the PHF6 regions. The increased RDCs in these regions, which were visible using two alignment media, indicate the presence of a compaction in these regions, in agreement with previous reports. Addition of heparin, however, drastically reduced the value of these RDCs, indicating that the compaction is no longer present in heparin bound tau species. These experiments are added as Figure 4 - figure supplement 3 to the revised manuscript.

      Lastly, while we fully agree with the statement that excess of heparin prevents tau aggregation, this does not necessarily imply that these tau species are not found in the expanded conformation. The non-monotonous change of the aggregation propensity of tau4R as a function of heparin is suggestive of a concentration-enhancement mechanism. At sub-stoichiometric concentration of heparin, multiple tau molecules can interact with a single heparin molecule (at least 2:1 tau:heparin complexes can form), thus increasing the local concentration of tau4R and therefore the likelihood for nucleus formation. An excess of heparin will naturally shift the stoichiometry of the complexes towards 1:1 tau:heparin assemblies. Under these conditions, the effective concentration of tau is rather low (essentially the bulk concentration), i.e., the bimolecular rate of two tau molecules forming a nucleus, and thus the rate of aggregation, will be drastically reduced.

      Furthermore, our NMR experiments do not only show an increase in the accessibility of the PHF6 motifs in the presence of heparin, but this change in the accessibility also correlates with the competence of DNAJB1 to bind tau. Importantly, since DNAJB1 does not bind to heparin alone, it is therefore most plausible to assume that a more expanded conformation of tau is formed in complex with heparin, and that this comprises a substantial contribution to the aggregation of tau.

      The data on P301L/S mutant is more convincing. However, the deduction that DNAJB1 binds P301L/S better because of the exposure of PHF6 is plausible but purely hypothetical (and it should be described as is). I'm in particular concerned with the facts that (i) the PHF6-exposed conformers are likely to represent a small population in the P301L/S mutants and (ii) PHF6* is not exposed in heparin-tau complexes (Fichou et al, Frontiers in neurosciences 2019) and yet they bind DNAJB1.

      We thank the reviewer for this very important comment. In order to provide experimental proof for the increased expansion of the P301L/S mutants we performed RDC measurements, and compared the RDC values to those of wild-type tau monomer (which showed compaction in the PHF6 region). The measured RDCs showed a clear decrease in the PHF6 RDC values of the mutant tau compared to the monomer, despite the two samples being aligned to the same degree. We further recorded the RDC measurements in two separate alignment media (Pf1 bacteriophage or polyethylene glycol/hexanol), and the same ~15% decrease was observed for both, demonstrating that the mutants are slightly more expanded that the wt protein.

      We indeed find that the expanded, PHF6-exposed conformers represent only a small population (~15%) of the overall conformational ensemble of the P301S/L mutants. This small percentage of exposure is, however, entirely consistent with P301S/L tau mutant only interacting weakly with DNAJB1 chaperone, compared to binding of heparin-bound tau species (which present a fully expanded conformational ensemble).

      The new RDC experiments, as well as a better explanation of the differential binding of DNAJB1 to tau mutants and heparin-bound species, are added to the revised version of the manuscript.

    1. Author Response:

      Reviewer #2 (Public Review):

      This work uses a throughput continuous culture system with simplified soil microbial communities to investigate how diversity-disturbance relationships (DDRs) change with different disturbance "intensities" (here, defined as mortality rate or dilution rate in a continuous system) and "frequencies" (here, defined as the number of dilution events that occur per day to achieve the desired mortality rate). Understanding the mechanisms that support different DDR is an ongoing and urgent need in ecology and ecosystem sciences because of the pressing need to predict and manage systems given climate and land-use disturbances.

      A major strength of the work is a blending of modeling and empirical approaches. It includes an ambitiously-designed study that uses a controlled, high-throughput microbial community experimental system to observe disturbance outcomes and uses those observations to build their proposed quantitative framework. The figures are informative and framework is explained clearly. The authors propose and name a new mechanism, "niche-flip" that describes resource competition at varying disturbance "intensities" - this is an interesting proposal and I suggest that it is explored more fully as a potential mechanism (see weaknesses).

      Weaknesses of the work are the use of definitions that are generally inconsistent with the disturbance ecology literature, and the inability to separate the disturbance event characteristic of "intensity" from the biological outcome of mortality. The authors conclude that DDRs are contextual, which is supported by their modeling and data, but I suggest that they consider that diversity as an outcome in itself may not be the most informative metric of what mechanism(s) drive context-specific outcomes. The authors have a lot of compositional data that could also be examined to understand whether their "niche-flip" mechanism is supported.

      This work is likely to advance our understanding of the myriad of outcomes of DDR and what potential mechanisms may support those DDR in natural ecosystems.

      Thank you for your kind words and careful review of our manuscript. We are pleased you appreciate both the experiments and the modeling work, and that you are intrigued by the findings and the niche flip mechanism.

      Major comments:

      Comment 1. Ecological definitions and interdependence of disturbance outcomes/attributes

      The authors define disturbance "intensity" as the average mortality rate but claim that this is a disturbance characteristic. However, mortality rate is not a characteristic of a disturbance event, but rather an effect/outcome of a disturbance on the biological community. The key distinction is that disturbance characteristics (also called traits or aspects) are defined relative to the environment, while disturbance outcomes (also called effects, impacts, or responses) are defined relative to the biology of interest, in this case a microbial community. So, changes in diversity of the community, as a result of a disturbance, is a biological outcome of the disturbance. An average mortality rate, what the authors call "intensity" (L40) would be such an outcome.

      Thank you for this excellent point. We have revised the introduction to make this distinction, reproduced here for convenience:

      "Accordingly, there have been many efforts aimed at understanding the role of environmental disturbances, which are perturbations to the state of an environment. These disturbances are of ecological interest for the impact they have on a community, for example, by bringing about mortality of organisms and a reduction of biomass of a community."

      The authors' definition of "intensity" is not in agreement with the disturbance ecology literature, including the references cited in this current work. For example, in reference #18 (Miller et al. 2011 PNAS) disturbance aspects include intensity, timing, duration, extent, and interval. Specifically, Miller et al. 2011 defined intensity as the magnitude of the disturbance (e.g., a flood's maximum stage). Notably, Miller's definition of intensity is more aligned with the author's definition of "fluctuation," which the authors define as the "magnitude of deviations from the average". In the current work, the disturbance "event" cannot be separated from the biological outcome because of the nature of the continuous culture system. The system is not being disturbed with, for example, a change in pH or salinity or another environmental variable that results in microbial mortality, but rather the loss of viable members from the community through control of the flow-through. So, the mortality is both the precisely controlled disturbance "event" and "outcome" in the continuous culture.

      To summarize, the premise of the article is confusing, because one of the two disturbance "characteristics" considered is, rather a disturbance outcome. This may seem like mincing words and to each paper its own definitions, but because this work seeks to reconcile DDRs as reported across many studies, and because many of the previous ecology studies that have investigated or reported DDRs are not using analogous terms, the work could further confusion rather than serve as a reconciliation. When different definitions are applied that mix disturbance aspects with biological outcomes of disturbance, readers will have to work hard to understand this work in context with the existing literature. I suggest revising the introductory section to be consistent in terminology with the ecology literature and to be framed not only as disturbance characteristics, but also outcomes. I also suggest adding discussion of how an inability to distinguish disturbance event from outcome may influence interpretation of this work and its broader application. I suggest adding clarification/discussion of "how intensity and fluctuations interact" (e.g. L200): as the authors define intensity and fluctuation of the disturbance event, intensity is not independent of the biological disturbance outcome of mortality in the given model system. So, how the two "disturbance components interact" is not able to be examined independently from the biological outcome (mortality, resulting diversity).

      These are also critical points. First, we will address the choice of terminology (re: Miller et al) and, second, the equivalence between disturbance and outcome in continuous culture.

      We agree that careful use of terminology is important for understanding our work in context of the literature. Accordingly, we have replaced our characteristics “intensity” and “fluctuation” with “mean intensity” and “frequency” throughout the paper. We have also added more examples through the results section to indicate how mean intensity, frequency, and maximum dilution rates (during disturbance events) are related.

      "To determine whether the effects of disturbance on diversity are truly fluctuation-dependent15, a disturbance should ideally be decomposed into distinct components of mean intensity (e.g. time-averaged disturbance magnitude) and frequency (e.g. temporal profile of fluctuations)."

      The direct connection between disturbance and mortality in a continuous culture system under dilution disturbances is a critical aspect of our experimental design, because we wanted to compare disturbance outcomes that varied in temporal features (in Miller et al terms, intensity/magnitude vs frequency/timing) while holding mortality equal. In continuous culture this may be achieved by controlling dilution rate and frequency, but you are correct that other classes of natural disturbances such as pH or salinity changes may have different effects on community members. As a first step towards investigating these effects, we had included analyses with non-equal mortality rates (Appendix figure 4). We have now edited the introduction and discussion to emphasize that the equivalence between disturbance event and disturbance outcome is a feature specific to continuous culture.

      Introduction

      "Dilution is perhaps the most common choice for a laboratory disturbance, as it causes species-independent mortality and replenishes the system with fresh nutrients, reminiscent of flow in soil, aquatic, or gut microbiomes. Unlike disturbances with indirect biological impacts (such as pH, temperature, or osmolarity disturbances), there is a direct link between the dilution disturbance event (removal of culture volume) and the biological outcome (mortality of community members)."

      Discussion

      "We also note however, that these types of disturbances do not share the direct link between environmental change and biological outcome that is characteristic of dilution disturbance, so the impact may be less clear."

      Comment 2: Compositional evidence for the proposed "niche flip" mechanism and suggestion for deeper consideration of population-level response to disturbance outcomes that collectively contribute to emergent diversity values.

      Regarding the "niche flip" - it is unclear whether there is compositional evidence for any swap in niche preference/space among particular community members. Figure S8 may offer evidence, but I could not deduce it from the busy bar charts. Could population/ASV level analysis be conducted on each member to assess their dynamics and ask whether the dynamics support the proposed niche-flip as a DDR mechanism?

      This is a very interesting suggestion. As suggested, we could extract the relative preferences of different ASVs from composition data to test a prediction about changes in the composition resulting from niche flip. To make such a prediction, we’d need the Monod growth parameters of the species on relevant resources. We began collecting this data (see Figure 3 – figure supplement 4) but found it challenging to measure these parameters on defined media sources. Furthermore, since we elected to run our main experiments in a complex media that could potentially support diverse communities (as opposed to minimal medias which produce simple communities, see Goldford et al Science 2018) we cannot link Monod growth parameters in this media to particular resources. Subsequent experiments with defined species with measured Monod parameters in defined media would enable us to make and test predictions. These are sizeable experiments that we do not believe are in the scope of the present work. Without a testable prediction, we do not believe species or ASV level analysis to be particularly informative on its own.

      Related, there seems to be possible evidence of a "fluctuation" rate threshold, after which there is a major compositional shift in the microbial community. Consider Figure 3: At all "intensities", there is a shift in microbial community composition between "fluctuation" rates of 4/day and 16/day (3d, Fig S8). This threshold/shift is not also apparent in the Shannon diversity in Fig 3f. This could be an example in which diversity as a metric in itself is not as informative/useful outcome for disturbance responses, as identical Shannon diversity values can result from different community compositions that are themselves the outcomes of different mechanisms. I see from the PCoAs (Fig S9) that the authors were exploring potential compositional clustering by day, frequency, and dilution - the most "obvious" clustering to the eye is indeed by "frequency" and between 4/day and 16/day (red/blue separation along both axes, which also supports a potential threshold/shift. Generally, it would have been good to report statistical tests (e.g., PERMANOVA or equivalent) for these PCoA categories (where it makes sense, nested and term interactions as well) - is there statistical support for compositional threshold shift between 4/16?

      Thank you for these suggestions. Indeed, by eye and by the PCoA plots, there seems to be a significant difference in composition that separate the low-frequency (1/day & 4/day) from the high-frequency (16/day & Constant) conditions. We calculated pairwise distances between Day 6 samples grouped by A) dilution frequency, B) mean dilution rate, or C) combinations of dilution rate and frequency. Using these distances to perform PERMANOVA tests, we find significant differences between cultures with different frequencies, but not for cultures with different dilution rates. For combinations, we found several pairs with differences that were significant only before correction for false-discovery rate. Distances between low-frequency (1/day & 4/day) conditions are much smaller than between low-frequency and high-frequency groups, or between the high-frequency groups. We have now included this as Figure 3 – figure supplement 9 and have summarized the results in the main text, reproduced below for convenience:

      "PERMANOVA statistical analysis of endpoint compositions confirmed that dilution frequency (but not mean dilution rate) had a significant effect on composition (Figure 3 – figure supplement 9). Despite separation between conditions in PCoA of endpoint compositions (Figure 3 – figure supplement 9), PERMANOVA analysis of dilution rate and frequency combinations did not yield significant values after correcting for false discovery rate."

      Reviewer #3 (Public Review):

      This manuscript focuses on the relationship between diversity and disturbance. The authors study this relationship in experimental microbial communities. These communities as subject to different levels of disturbance, which is identified as the dilution rate. The authors find a non-monotonic relationship between diversity and dilution rate. In presence of temporal fluctuations, the non-monotonic relationship becomes less evident, disappearing for strong enough fluctuations. The experimental findings are well explained by a consumer-resource model with Monod response.

      The results of the paper are a very interesting combination of experimental and theoretical work. The manuscript is well written and easy to follow.

      Experiments. The data support the main result of the paper. The U-shaped disturbance-diversity relationship (DDR) is robust (e.g., independent of the measure of diversity). The experimental setup is innovative.

      Theory. A main strength of the manuscript is the clarity in which the model reproduces the experimental data. It is also interesting that alternative models (Lotka-Volterra and consumer-resource with linear response) do not reproduce the data, therefore indicating the relevance of the data themselves. The main weakness of the paper is that, in the end, the mechanism behind the non-monotonicity of the DDR is not completely clear. The authors discuss how it emerges with two species and two resources in presence of a trade-off between maximal growth rate and resource-limited growth rate: at low dilution rate, the species with high maximal growth rate wins, while at high dilution rate the one with resource-limited growth rate dominates. This mechanism is clear with two species (in which diversity can transition between 2 and 1). It is unclear what happens for more species and resources. In particular, the role of the tradeoff --- which is central in the pairwise competition case --- is unclear: the U-shapes relationship is observed also in absence of the tradeoff for multispecies communities.

      Thank you for your enthusiasm about our work and your careful review of our manuscript. We are pleased you appreciate the concordance between experiment and model in our study.

    1. Author Response:

      Reviewer #1:

      The paper is very well written and the results are well presented. I only have minor comments.

      The introduction insists on the idea that the ability to regenerate might be ancestral (line 45) but convergent evolution is an extremely common phenomenon. The hypothesis of convergent evolution cannot be excluded here. In any case, whether convergence or ancestrality, one can ask whether the mechanisms underlying organ regeneration are the same in various taxa.

      We thank Reviewer 1 for all the helpful suggestions. We will make it clearer in the revised Introduction that the notion convergence still cannot be fully excluded.

      Reviewer #2:

      Weaknesses:

      The work presented on Drosophila is intriguing because the adult legs of flies were not thought to be capable of any regeneration. One of the major constraints is that growth in arthropods is limited by the hard exoskeleton (cuticle) surrounding the body. Periodic molting allows these animals to grow in a stepwise fashion (shedding the old cuticle and forming a new one), but adult flies do not molt, so it is unclear how an adult regenerating leg would break that constraint. Abrams et al. report that a small proportion (~1%) of amputated legs regrow part of the limb when the flies are kept on a medium supplemented with leucine, glutamine and human insulin. The number of legs in which this has been observed is small and the extent of regeneration is variable and not well documented in relation to the site of amputation (which is unmarked). A more detailed documentation of the regrowth would be needed to validate the authors conclusions.

      We thank Reviewer 2 for the more detailed suggestions in the full review. In the present data, our conclusion is enabled by the non-ambiguous phenotype: there are fully regrown tibias in the treated population, and there are none in the (now) over >1000 control flies examined. In the revised manuscript, we will include as the Reviewer suggested new extensive documentation to show single-fly tracking.

      The work on mice focuses on the regeneration of digit tips, a relatively well-studied example of limited regeneration in these rodents. Mice are known to be able to regenerate the tips of their digits when these are amputated near the distal end, but cuts proximal to the base of the nail fail to regenerate. The authors focus on regeneration of digits amputated near this boundary. They report that animals whose drinking water is supplemented with leucine, glutamine and sucrose are more likely to regenerate part of their digit tips when amputated at the base of the nail. These data are intriguing, but the number of observations is limited (few digits with patterned regeneration) and variation in the site amputation does not make it easy to draw firm conclusions on the extent of regeneration compared with controls.

      We perform digit amputation proximal to the established non-regenerating boundary (see red line in Figure 6c-d), and far from the regenerating boundary (see blue line in Figure 6c-d). Moreover, as a control, for every digit amputated, the part removed was fixed, stained, and documented to enable precise definition of the amputation plane. The sample sizes in the study (20 control digits, 48 treated digits) enable statistical power, and are comparable to experiments in adult mouse digit (e.g., n~15 digits PMID: 24209617, n~20 digits PMID: 28975034).

      Overall, the authors propose that similar nutritional interventions have similar effects in 'inducing' regeneration in widely divergent animals, revealing a widespread intrinsic capacity of animals to regenerate. The claim that these treatments 'induce' regeneration seems exaggerated, given that appendage regeneration in Aurelia and in mouse digits can occur to a variable extent in untreated animals. These treatments appear to shift the probability and the extent of regeneration. The data on Drosophila legs are more surprising and deserve further analysis.

      We have been careful in the manuscript to use the phrase “promote regeneration” to describe the findings in Aurelia because the spontaneous partial regeneration observed in the natural habitat. In the mouse digit, however, the boundaries of regenerating and non-regenerating cuts have been clearly established (by multiple studies, e.g., PMID: 28493324, PMID: 7100922, PMID: 18234177, PMID: 17147657), which enables the question of how to induce regeneration from proximal cuts posed and pursued (e.g., PMID: 30723209, PMID: 20110320). We believe therefore in this case, our choice of wording is validated by the scientific context and precedents in the mouse digit field.

      The idea that the same nutritional interventions may have similar effects on regeneration in diverse animals is intriguing. A minor caveat: the nutritional interventions tested in each species were not identical; in Aurelia high-nutrient, insulin and leucine treatments were tested separately, in Drosophila leucine and insulin were combined, in the mouse leucine and sucrose were combined. Future work could determine which components in these treatments (nutritional, metabolic or hormonal) are responsible for the observed responses in each species.

      We agree with the Reviewer. We will make it clearer in the revised Discussion the differences in the molecule administration across species, and that further studies will determine the specific underlying mechanisms, in spite of which, we could excitingly move across species in a predictive manner.

      Reviewer #3:

      Weaknesses:

      The evolutionary statement underlying the entire study is not fully accurate. While it is true that many animal phyla include species that can regenerate and also that some studies start to identify common molecular components in regeneration, the question of regeneration being ancestral or not is still debated. It is indeed highly tempting to consider regeneration as ancestral but this is not proven yet (see Lai and Aboobaker 2017) and the possibility of convergence have to be considered too. In addition, appendage regeneration versus whole body regeneration versus structure or organ regeneration may not rely on similar mechanisms.

      We thank Reviewer 3 for the helpful feedback in the detailed review. We are making it clearer in the revised Introduction that the notion of convergence still cannot be fully excluded since convergences can be very common.

      The strategy put in place for identifying putative triggers is questionable and more information about, (i) the reasons to select and test such parameters, (ii) what the different drugs are doing in theory (components of signaling pathways targeted, for ex), (iii) how exactly the various tests have been done, frequency of scoring, different concentration tested, number of individuals per condition, frequency of drug administration etc etc are missing. It appears really surprising that no modulators of signaling pathways (notably the wnt b catenin, known to be involved in many developmental and regeneration contexts, especially in cnidarians) are involved in any sort in such process.

      We will include in the revised manuscript more extensive details about the screen parameters and procedures. Briefly here, similar experimental designs as those described for the main experiments were used. With regard to the Wnt pathway, we tried two inhibitors against GSK3beta, two inhibitors against tankyrase, and a Wnt ligand itself—none so far has shown effects that motivated further follow-up. Nonetheless, we will emphasize more explicitly in the revision that the negative result conclusion can only be specific to the conditions and the modulators used, and that this has by no means ruled out the involvement of developmental pathways in all capacities.

      Concerning the results with cnidarians, two aspects puzzle me: the high variability in the regenerative response between batches and the way the amputations were done in ephyra. In the context of appendage regeneration, what justify to perform an amputation that remove almost the half body of the ephyra and not just one arm. Basic information about regeneration occurrence in the context of amputation of only one arm are missing.

      We find the variability intriguing too. Some of the variability may arise from technical variations or stochasticity inherent in biological processes. There may also be key physiological parameters yet to be identified; We are continuing screening in the lab to search for factors that may synergize with leucine and insulin, and quantify differences between regenerating vs non-regenerating individuals. Finally, the variability may support the idea that the regeneration we are seeing has been evolutionarily inactivated, and may therefore have drifted over tens or hundreds of years and consequently does not exhibit the robustness we are used to seeing in wild-type processes.

      With regard to the amputation scheme, we tested various amputation schemes and observed similar results (this will be included in the revised manuscript). The amputation scheme chosen was the fastest to do, which facilitates testing thousands of ephyrae. We will add this rationale in the revised Methods. With the amputation scheme chosen, arm regeneration is the most dramatic outcome. However, the reviewer is correct that the body is also partially regenerated, which suggests to us that L-leucine and sugar/insulin may have regenerative effect beyond the context of appendage. We include this discussion in the revised manuscript.

      More importantly, the results shown, sometimes in extremely low proportions compared to the controls (ie in the case of drosophila), are not supported by other approaches. It would be really important to have some clues about the molecular mechanisms underlying such process and its induction.

      We agree with the Reviewer that mechanistic investigation into the underlying molecular mechanism is the next immediate challenge, and is underway in the lab. In the scope of the present work, the focus is on seeing if regeneration can be induced at all and the comparative presence of such latent ability. This was itself an enormous effort, but one that enables laying the groundwork for directing mechanistics investigations and broader pursuit of promoting regeneration across animals.

    1. Author Response:

      Reviewer #1:

      The paper by -Blackwell et al. develops the ideas developed in the influential paper by Dash et al. (2017) which defined a similarity matrix for CDR3s TCRdist which is based on a weighted combination of local and global similarity measurements. In this paper they use the metric to develop the idea of a meta-clonotype, a set of similar TCRs which enrich for TCRs directed at the same antigen. They demonstrate that these meta-clonotypes show greater publicity than individual clonotypes, and show evidence of HLA-restriction. The authors speculate that the metaclonotype may be a useful biomarker. They provide open-access software tools for defining meta-clonotypes in antigen enriched repertoires.

      The major findings are: (1) Meta-clonotypes are more public than clonotypes, a result which seems not unexpected, given that meta-clonotypes include many different sequences; (2) Meta-clonotypes show evidence of HLA restriction, again predicted given the well-established fact that specific antigens can be recognised by sets of similar TCRs.

      The concept of a metaclonotype is an interesting one which could have widespread use in analysis of TCR repertoire. However, the impact could be much greater, by sharpening the focus of the paper, and adding detail and clairty to the idea of teh clonotype. In particular, while the introduction correctly points out that prediction of SARS-Cov_2 clinical outcome, or better understanding of the role of coronavirus prior exposure in determining outcome are important unanswered questions, this paper does not address these questions.

      Thank you for your careful review of the manuscript. We have submitted a major revision with greater focus on the definition of a TCR meta-clonotype. We have removed from the introduction much of the background about SARS-CoV-2 and potential implications for the pandemic. In its place we’ve added greater detail about meta-clonotypes, how they can be defined from antigen-enriched TCR data, and how they can be used to analyze bulk TCR sequencing data.

      A substantial portion of the paper is devoted to analysing data obtained using the MIRA assay (Klinger et al PLoS One 10 :e0141561) to define SARS-COV-2 responses, and it is not always clear whether the objective is to evaluate the accuracy of this data set, or to test the power of the meta-clonotype approach.

      Our objective with the analyses of the IMMUNEcode dataset (Nolan et al. 2020) dataset, using MIRA method from Klinger et al. 2015, is to demonstrate that TCR meta-clonotypes can be defined from antigen-enriched TCR data and that they can be used to identify and quantify antigen-specific TCRs in bulk sequenced data. Furthermore, the analysis provides evidence that meta-clonotypes have greater publicity than individual clonotypes, thus increasing sensitivity of detection for antigen-specific TCRs. Using bulk repertoires from COVID-19 patients we then demonstrated that population-level analysis can be made possible using meta-clonotypes and provided supporting evidence that the antigen specificity of the centroid TCR is retained. These analyses and their interpretation is further revised on lines 319-348. We think that in the process of evaluating meta-clonotypes, our analysis also shows that the publicly released data contains valuable information about SARS-CoV-2 TCR specificities; however, we have not systematically attempted to verify the validity of the dataset. In the current revision these objectives are made clear in the revised Introduction section.

      Reviewer #2:

      Summary of main aims: The main aim of this paper is to build a framework for TCR meta-clonotypes for finding similar TCRs across individuals (or different repertoires). The majority of the investigations performed in this work have the objective of showing the data properties of meta-clonotypes as well as the metaclonotypes' usefulness for the analysis of antigen-specific TCR data and disease-labeled immune repertoire data.

      Major strengths: Building meta clonotypes is a possible path towards a better coverage of immune repertoire biology as well as inter-individual repertoire comparison. TCRdist3 is an efficient method for building meta-clonotypes that enables the study of the specific characteristics of meta-clonotypes. So, far clusters of similar sequences have not been investigated in depth. The author team is making a significant step forward in this direction by characterizing meta clonotypes in differentially antigen-specific-clone-enriched repertoires and by relating the results to generation probability, HLA, sex and immune status.

      Major weaknesses: Although the authors show a significant amount of data, I am not sure if these data convey sufficient intuition about the characteristics and behavior of meta-clonotypes. The authors seem too focused on relating meta-clonotypes to immune status instead of focusing on the specific biological characteristics of metaclonotypes.

      We agree and have shifted the focus of the manuscript away from the results of the application and towards providing greater detail about the characteristics and behaviors of meta-clonotypes; for example, we’ve removed much of the background about SARS-CoV-2 and the COVID-19 cohorts and we’ve added details about how the meta-clonotype radius can be optimized. We’ve also reframed the data analysis section to emphasize that the results demonstrate how meta-clonotypes carry important antigen-specific signals above and beyond individual clonotypes; this makes the results valuable beyond the application to SARS-CoV-2. For example, while demonstrating HLA restriction occurs in SARS-CoV-2 specific T cell responses in COVID-19 patients is not a surprising finding, it provides evidence that meta-clonotypes enable quantification of an antigen-specific and HLA-restricted T cell response from a bulk single-chain TCR repertoire. We use this example analysis to compare the strength of this signal in individual clonotypes, meta-clonotypes with radius alone and meta-clonotypes with a motif constraint. The revision of lines 98-100 and lines 461-470 provide clarity about this motivation for the analysis and interpretation of the results.

      Furthermore, we have added a section to the Results that demonstrates how meta-clonotypes and tcrdist3 enable analyses that can provide biological insights about the biochemical properties that may confer antigen specificity (lines 383-418, Figure 10). Since meta-clonotypes define groups of sequences, we can use CDR3 logo plots to dissect how positions and amino acid properties in the CDR3 define the group. In the revision we demonstrate a “background-adjusted” logo plot, that is able to emphasize amino acids that define the meta-clonotype, yet are uncommon among TCRs using the same V and J genes. Visualizing the results in this way can generate hypotheses that can be experimentally validated about the amino acids that are essential for antigen recognition.

      Furthermore, the authors fail to convincingly show that the background repertoires chosen for meta clonotypes are robust and to what extent meta-clonotypes are sensitive to changes in the background repertoire.

      We agree that it is important to understand how the creation of meta-clonotypes, and specifically optimization of the radius, depends on the background repertoire. Therefore, we conducted sensitivity analyses varying the size (25K to 1M) and makeup (synthetic OLGA vs. cord blood vs. a blend) of the background (lines 259-294). We also empirically demonstrate the value of over-sampling background TCRs with matching V and J genes. We show that using a background of 200,000 TCRs was sufficient for reducing the bias and variability in selecting a meta-clonotype radius, compared to a reference set of 2 M background TCRs; this is important because while it is tractable to use large backgrounds for a small number of meta-clonotypes, for larger studies or analyses confined to a laptop, the smaller background set is sufficient; we also show that this can largely be attributed to the gain in efficiency that comes with using a background that includes synthetic OLGA TCRs with VJ-gene frequencies that match the TCRs included in the meta-clonotype. However, we note that “Ultimately, the best choice for the background may depend on the question being asked and the data that is available, with factors including donor HLA, age, potential antigen exposures, and other factors that may shape the repertoire.” Our goal with tcrdist3 was to make it easy for the user to customize the background to the scientific question.

      The authors also do not convincingly differentiate themselves from previous approaches that have used network analysis and generation probability in order to find clusters of similar sequences (very much conceptually similar to the approach taken here).

      We agree it’s important to communicate how meta-clonotypes differ from existing TCR analysis approaches. There are several important distinctions with the existing methods that use networks and generation probability, namely TCRNET and ALICE. We have highlighted these differences in the Discussion (lines 483-497), quoting:

      “The meta-clonotype approach also differs from methods, such as TCRNET (Ritvo et al., 2018) and ALICE (Pogorelyy et al., 2019), that seek to identify TCRs sharing antigen-specificity within bulk repertoires. These methods were developed to identify TCR nodes in a network with an enriched number of edges compared to the expected number of edges in a background (TCRNET) or derived from a probability model (ALICE). Similarly, another recent method attempts to find antigen-associated sequences in bulk repertoires using a two-stage agglomerative clustering of a k-mer based representation of CDRs, first within and then across bulk repertoires (Yohannes et al., 2021). Our framework is designed for a different task than these algorithms. Specifically, we sought to construct definitions of TCR groupings among already antigen-associated TCRs, which would have high sensitivity and specificity for finding similar TCRs in bulk repertoires. This is an important distinction because the existing network-enrichment methods would simply find that most or all of the TCR groupings among a set of antigen-associated sequences were statistically enriched compared with their frequency in antigen-naïve repertoires. By contrast, a flexible meta-clonotype radius permits the definition of the largest possible group of antigen-associated TCRs with the constraint that the likelihood of finding a TCR within the radius in an antigen-naïve background is equally low across all meta-clonotypes.”

      Finally, the authors do not provide detailed descriptions of how the comparison of meta-clonotypes across repertoires is handled as well as potential sequence redundancies across meta-clonotypes (in potentially different individuals). I believe that all of the perceived shortcomings are readily addressable in a revision.

      You are quite right that many meta-clonotypes are overlapping in that a single TCR might conform to more than one meta-clonotype definition. Thus, in the application of meta-clonotypes to the COVID-19 dataset we tested each meta-clonotype individually for an association with the predicted HLA- restricting genotype. Depending on the context, if a summary across meta-clonotypes is required (e.g. finding the overall abundance of conformant TCRs in a repertoire) it may be appropriate to use meta- clonotypes to identify conformant sequences, but then tally them based on actual abundance (i.e. no double counting). In a prediction context, it may be desirable to have overlapping meta-clonotype features, and in fact many machine learning algorithms excel in this regime. With tcrdist3 we have incorporated a “join” functionality that allows for relational database-style joining of meta-clonotypes with a TCR repertoire; this makes it relatively easy to eliminate or keep redundancies, depending on the context. We have added a sentence to the Discussion pointing out that there is overlap among meta-clonotypes that needs to be considered in their application and we provide a link to an example of how to use the join functionality on https://tcrdist3.readthedocs.io/en/latest/join.html#step-by-step-example.

      All in all, this manuscript is an important steps towards a better understanding of immune receptor biology. tcrdist3 is an evolution of a previously published method (tcrdist) that is here used to build meta-clonotypes. After reading the paper, it remains slightly unclear (addressable in a revision) as to how useful they are for understanding repertoire biology as well as how to use them in practice in terms of robustness and sensitivity.

      Thank you for your constructive comments, and I hope we’ve addressed these issues around biological interpretability and application in the revision.

      Reviewer #3:

      Mayer-Blackwell et al introduce a new framework for leveraging antigen-annotated T cell receptor (TCR) sequencing data to search for similar TCRs in bulk repertoire data, which potentially recognise the same antigen peptide. They introduce the notion of meta-clonotype, a T cell receptor (TCR) feature consisting of a main TCR sequence ("centroid") and a distance radius around it (+/- a CDR3 motif), with distance measured according to their previously published TCRdist method (Dash et al, 2017). The meta-clonotypes benefit from increased publicity over exact clonotype matching, and enhance the ability to find potentially relevant TCRs in repertoires from unrelated individuals, which are usually highly diverse, predominantly private, and subject to sampling constraints. The idea of meta-clonotypes is very interesting, and will provide a very useful tool in future repertoire analyses. For example, public databases of annotated TCRs (e.g. VDJdb) can be used to derive the set of meta-clonotypes for a variety of antigens, which can in turn be searched for in bulk repertoire data to identify e.g. memory to previous antigen exposure, immune status etc.

      The tool for performing the analysis, tcrdist3, is open-source, well-documented with instructions and examples, and the statistical analysis has been well-thought out. It is also useful to have the comparison to the current alternative of k-mer based TCR distance (i.e. GLIPH2), and the added flexibility for the user to define the precise distance metric to be used in the tcrdist3 tool.

      The authors then apply their method to analyse TCR beta sequences from COVID-19 datasets that have been publicly released by Adaptive Biotechnologies through the immuneRACE project. They use the MIRA set, the peptide-enriched set, to identify the meta-clonotypes, and then search for these in an independent cohort of COVID-19 bulk repertoires from 694 individuals. The authors find that a large proportion of the meta-clonotypes were more abundant in patients expressing the relevant restricting HLA allele, and suggest this could potentially lead to the development of disease biomarkers. The set of sars-cov-2 related meta-clonotypes is a useful resource in itself, as researchers generating other COVID-19 TCR datasets will be able to utilize this set of meta-clonotypes to search and potentially stratify patients in their own generated data.

      There are a few areas were further detail / examples would strengthen the paper's claims, in particular in the application of the tcrdist3 method to the COVID-19 data.

      1) Bulk TCR data from repertoires with past antigen exposure are likely to contain varying sizes of clones due to the proliferation of responding T cells and a remaining memory population. Due to the sharp drop in size between a TCR sequencing sample and the entire repertoire, clones above a particular size relative to the sample size are highly unlikely to have been sampled by chance, and identifying significantly/meaningfully expanded clonotypes in a sample is often used to identify a potentially antigen-recognising set of TCRs. The authors demonstrate the detection of meta-clonotypes in the repertoire sets, but it is somewhat unclear how the abundance of a clonotype conforming to a particular meta-clonotype is addressed. For example, there may be rationale for treating the following cases differently: meta-clonotype A is instantiated by (i) a unique clonotype with abundance 1; (ii) a single clonotype with abundance 1000; (iii) 100 different clonotypes (i.e. a "dense neighbourhood" around this meta-clonotype). If used to develop biomarkers, perhaps some degree of granularity in how the frequency/occurrence of meta-clonotypes is calculated would be helpful here.

      Thanks for this helpful suggestion. We agree that the scenarios you outlined above, which differ in the level of clonal breadth (i.e. number of unique clones), may have great immunological relevance. Though we have not specifically assessed the clinical or immunological relevance of clonal breadth vs. clonal frequency, we have noted in the revision (lines 514-519) that there are multiple ways of counting meta-clonotype conformant sequences and multiple ways of aggregating counts across meta-clonotypes, for example without double counting clones that may be conformant to multiple meta-clonotypes. We have also added a documentation page about how to tabulate abundance or breadth of conforming clones: https://tcrdist3.readthedocs.io/en/latest/join.html

      2) The authors focus their analysis on detecting meta-clonotypes from MIRA sets with strong evidence of HLA-restriction. They report 59.7% of these meta-clonotypes were more abundant in patients expressing the corresponding HLA allele. This means that over 40% of meta-clonotypes with strong HLA restriction were more abundant in repertoires with other HLA types. This point could be further elucidated by comparing results with the control repertoires from the COVID-19 set, from MIRA sets with low evidence of HLA restriction, or combining the sets of low and high evidence of HLA restriction (i.e. HLA agnostic results).

      We’d like to clarify that the results do not imply that 40% of meta-clonotypes were more abundant in participants lacking the restricting HLA allele; rather, these meta-clonotypes did not have a significant association with presence or absence of the HLA genotype. In the discussion we highlight several of the possible explanations for this including that meta-clonotypes were too rarely detected in the population (lines 470-478). The volcano plots in Figure 6A and 7A show that there are very few if any HLA associations of the opposite sign (i.e. meta-clonotypes more abundant in patients without the restricting HLA allele). In fact, at the chosen significance threshold (FDR <0.01), 0 of 1831 predicted HLA-associated meta-clonotype were significantly negatively associated with the predicted HLA.

      3) The MIRA55 set is used as an illustrative example throughout the manuscript, which familiarises the reader with this dataset as they are reading the paper. However, the claims made by the paper about MIRA sets / strong HLA evidence MIRA sets could be strengthened by providing an indication of how measured characteristics of the MIRA55 set compare to the other sets being assessed.

      This is a good point and we have tried to provide as much information about MIRA55 and the other MIRA sets to help establish that MIRA55 is a representative set. Characteristics of the other MIRA sets appear in Supporting Table S6, including:

      • Input number of clonotypes (AA exact)
      • Number of non-redundant, public meta-clonotypes
      • Clonotypes spanned by at least one meta-clonotype
      • Span (% of clonotypes conforming to a meta-clonotype definition)
      • Number of public enhanced sequences that match an identified TCRβ

      As well as summary statistics for other meta-clonotype properties:

      • Pgen
      • Radius (TCRdist units)
      • TRBV-CDR length
      • Number of MIRA subjects contributing at least 1 sequence

      Furthermore, Table S7 provides the strength of evidence for HLA-restriction for each meta-clonotype, which is then summarized by MIRA set visually in Figure 4.

      Based on these criteria, we think MIRA 55 is a reasonably representative set to focus on.

      4) There is some discussion throughout the manuscript about using the sars-cov-2 meta-clonotypes to identify differing clinical outcomes such as disease severity. Perhaps the dataset does not have sufficient power to allow for such sub-analysis, but a method of using meta-clonotypes to differentiate between patients based on the occurrence of meta-clonotypes in their repertoire is not provided [e.g. the number of observed clonotypes, the density distribution around clonotypes etc.)

      That is true. With this manuscript we have tried to focus on establishing the methodology, evaluating the strength of the antigen-specific signal and demonstrating its potential applications; we have tried to make these goals more explicit throughout. Specifically in the revision we note that: “Much like any biomarker study, to establish a TCR-based predictor of a particular outcome, the features must be measured among a sufficiently large cohort of individuals, with a sufficient mix of outcomes.” At this time the publicly available ImmuneRACE data lack negative controls and sufficient clinical details to allow for building a predictor of SARS-CoV-2 infection or disease severity.