409 Matching Annotations
  1. Mar 2021
    1. Reviewer #2 (Public Review):

      Lundberg and colleagues provide a detailed set of data showing the utility of host-associated microbe PCR. By simultaneously amplifying microbial community and host DNA, hamPCR provides an opportunity to measure the microbial load of a sample. I was largely convinced about the robustness of this approach after seeing the many different optimization datasets that were presented in the paper. I also appreciated the various applications of hamPCR that were demonstrated and compared to other standard approaches (CFU counting and shotgun metagenomics, for example). As clearly illustrated in Figure 6f, hamPCR could dramatically improve our understanding of interactions within microbiomes as it helps remove issues of relative abundance data.

      One challenge about the approach presented is that it cannot be quickly adapted to a new system. Unlike most primers for 'standard' microbial amplicon sequencing, considerable time will be required to determine which host gene to target, how to make that host gene size larger than the size of the microbial amplicon, etc. This may limit wide adoption of hamPCR in the field. I do appreciate the authors providing some details in the Supplement on how they developed hamPCR for the several different systems described in this paper. The helpful tips may make it easier for others to develop hamPCR for their own systems.

      An issue that repeatedly came up is that at high and low ends of host:microbe ratios, inaccurate estimates can occur. For example, with high levels of microbial infection, the authors note that hamPCR has reduced accuracy. The authors propose three solutions to this problem (1. altering host:microbe amplicon ratio, 2. use a host gene with higher copy number, 3. and adjust concentrations of host primers), but only present data for #1 and 3. Do they have any data to show that #2 would actually work?

      One instance of potential unreliable load that sticks out in the paper is in Figure 5b. The authors note that this is likely due to unreliable load calculation. Is this just one of 4 replicates? What are other potential reasons this would be an outlier and how can the authors rule this out? Did they repeat the hamPCR for this outlier to confirm the striking difference from the other three samples in the eds1-1 Hpa + Pto sample?

      Could the DNA extraction method used cause biases in hamPCR for/against either the host or the microbiome? If two different labs study the same system (let's say bacterial communities growing on Arabidopsis leaves) but use different DNA extraction approaches, would we expect them to obtain different answers using hamPCR? Did the authors try several different DNA extraction methods to see if this is an issue? Or has another team of researchers considered this and addressed it in a separate paper? I would appreciate seeing either data to address this or a discussion paragraph that reasons through this.

      One emerging theme in microbiome science is to have consistent methodologies that are used across studies/labs to allow direct comparisons of microbiome datasets. Standardization of approaches may make microbiome science more robust in the long-term. Given much of the nuance in developing hamPCR for different systems, my impression is that this method is best for comparing samples within a particular host-microbe system and not across systems. For example, it may be challenging to directly compare my bacterial load hamPCR data from Arabidopsis to another lab's if we used different Arabidopsis host genes or if we used different 16S gene regions. Can the authors unpack this a bit in a discussion paragraph? If it is widely adopted, is there a way to standardized hamPCR so that it can be consistently used and compared across datasets? Or should that not be the goal?

      There appears to be considerable non-specific amplification or dimers in the gels presented throughout the manuscript. Could this non-specific amplification vary across host-microbe primer combinations? Would this impact quantification of host and microbial amplicons?

    2. Reviewer #1 (Public Review):

      This work described a novel approach, host-associated microbe PCR (hamPCR), to both quantify microbial load compared to the host and describe interkingdom microbial community composition with the same amplicon library preparation. The authors used the host single (low-copy) genes as PCR targets to set the host reference for microbial amplicons. To handle the problem that in many cases, the host DNA is excessive compared to the microbiome DNA, the authors adjusted the host-to-microbe amplicon ratio before sequencing. To prove the concept, hamPCR was tested with the synthetic communities, was compared to the shotgun metagenomics results, was applied in the biological systems involving the interkingdom microbial communities (oomycetes and bacteria), or diverse hosts, or crop hosts with large genomes. Substantial data from diverse biological systems confirmed the hamPCR approach is accurate, versatile, easy-to-setup, low-in-cost, improving the sample capacity and revealing the invisible phenomena using regular microbial amplicon sequencing approaches.

      Since the amplification of host genes would be the key step for this hamPCR approach, the authors might also include more strategy discussions about the selection of single (low copy) genes for a specific host and the primer design for the host genes to guarantee the hamPCR usage in the biological systems other than those mentioned in the manuscript.

    3. Evaluation Summary:

      Overall, we agree that this new method is potentially impactful although the full versatility of the approach is currently unclear for several reasons. We appreciated the application of the approach to distinct systems and also the relatively low cost of this technique. The diagrams presented (particularly in Figure 3) nicely convey the steps in the protocol with expected sample outcomes to further facilitate the ability of other researchers to employ hamPCR. Overall, we are very positive about this work, but given that the impact of this paper rests on whether or not the technique is widely adopted, some revisions will lower the barrier to entry for future researchers to adopt this approach.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

    1. This manuscript is in revision at eLife

      The decision letter after peer review, sent to the authors on February 18 2021, follows.

      Summary:

      This paper is of potential importance to neuroscientists who study sensory representations and how they are learnt. It suggests that neural representations underlying human perception can be understood in terms of an optimal compression of the sensory input. While the attempt is indeed interesting, there are several shortcomings that should be addressed before this work could be considered as one that can contribute substantially to the understanding of tactile perception.

      Essential Revisions:

      1) There is a significant gap between the simulated data used here and the empirical data of material perception by touch. The vibratory signals were taken from recordings of surface exploration using a tool tip (Strese et al., 2017 ) whereas the ratings of the different materials are taken from an experiment in which participants used bare hand touch (Baumgartner et al., 2013). The difference is significant especially when material classification, and not only texture classification, is required. It is not at all clear how vibratory signals could code hardness, warmth, elasticity, friction, 3D, etc (see Baumgartner et al., 2013). The authors must provide a serious discussion about this gap and convince the reader that their simulations can indeed provide an access to the internal representations of natural haptic touch. In the same spirit, they should explain why, and demonstrate that, the pre-processing of the vibrotactile data (cutting & filtering) makes sense for natural haptic touch.

      2) The authors should provide good reasons to convince the readers that the compressed representation they found is indeed a good candidate for the biological representation. First, the nature of the AE algorithm is that it will converge to some representation in the minimal encoder dimension. Why is that a good encoding representation, and why is it a good model for the biological one? Second, the differences between the results obtained with the AE and those obtained with humans (Baumgartner et al., 2013) seem to outnumber the partial similarities found. The authors should list both differences and similarities and discuss, based on these comparisons, the probability that the coding found here is similar to the coding guiding human behavior.

      3) Notion of efficiency and compression. It was not demonstrated that the main result (figure 3A) is due to compression and efficiency of the AE. What will happen if no AE is used and the distance is measured in the raw input domain (e.g. between Fourier coefficients or principal components)? One could expect figure 4 to account for that, and also show that for a very wide AE there is some deterioration of the main result. Otherwise, the main result about correlation to perceptual data cannot be attributed to the compressive property of the AE.

      4) Biological correlates of the latent representation. On the one hand the authors claim that the AE latent representation aims to mimic a latent representation of the haptic space, which they assume to be compressed and efficient. On the other hand, they claim that the AE representation is similar to mechano-sensory representation, which is a first biological representation before any compression can take place (when hand movements are ignored, as done here). This needs to be clarified.

      5) Validity of the latent representation. The reconstruction error of the AE is large and systematic: only ~50% of the variance are explained and its high frequencies are systematically ignored. The resulting latent representation is such that classes are poorly separable (~29%) and it seems to be by far worse than the human level (around 70% in Baumgartner et al. 2013). It will be therefore interesting to see if the key result, i.e. relation between AE latent space and the perceptual distance, remains valid for a more advanced AE.

    1. Reviewer #2 (Public Review):

      The paper presented by Boroumand et al. aims to delineate the impact of bone marrow resident adipocytes on the phenotype, development, and metabolism of murine monocyte subsets during diet-induced obesity and leanness. The paper provides an interesting analysis of the metabolic state and phenotype of mitochondria in murine monocytes during high-fat diet feeding. Furthermore, it provides some insight on the crosstalk between bone marrow resident adipocytes and different monocytes.

      The paper will help to further delineate the response of monocytes during obesity, however, the impact the paper will have on the field of mononuclear phagocytes biology and our understanding of myelopoiesis during low-grade inflammation is limited.

      Several claims should be more thoroughly addressed, such as the phenotypes of macrophages found within the adipose tissues and a more fine-grained analysis of the mononuclear phagocyte progenitors within the bone marrow. Furthermore, a central claim of the paper is that Ly6clow monocytes convert to Ly6chigh monocytes. If the authors would like to hold that claim it needs some experiments which are supportive of that hypothesis.

    2. Reviewer #1 (Public Review):

      In this study, Boroumand et al investigate abundance and metabolic phenotype of Ly6Chi and Ly6Clo monocytes in the bone marrow (BM) following feeding a HFD for 3, 8 and 18 weeks compared with a control diet. The authors suggest that upon accumulation of white adipocytes in the BM (8 weeks of feeding), monocytes are skewed towards the Ly6Chi subset, which have been shown to give rise to many macrophage subsets in obese tissues. The authors further demonstrate metabolic changes in Ly6Clo monocytes which may contribute towards this phenotype. Finally, through a series of in vitro and ex vivo cultures, the authors suggest that the increase in Ly6Chi monocytes is due to conversion of Ly6Clo monocytes into Ly6Chi monocytes as a result of the increased prevalence of white adipocytes in the bone marrow.

      Overall the findings of this work are interesting to the field and in the future it will be interesting to determine how these changes in the bone marrow relate to the different subsets of recruited macrophages present in obese tissues. For example, whether these monocytes preferentially generate CD9+Trem2+ Lipid associated macrophages recently described in obese adipose tissue (Jaitin et al, Cell, 2019) or if they are equally capable of generating monocyte-derived tissue resident macrophages in obese tissues.

      The main strength of this paper is in the identification of the changes in the monocyte subsets abundance early after feeding a HFD and in uncovering the metabolic changes in and between these two monocyte subsets in obese mice. One concern regarding the data as a whole is that, while the authors have nicely indicated the number of samples/mice in each figure, there is no mention of how many times each experiment was performed. Including this would greatly aid in an understanding of the reproducibility of the results. Additionally, the inclusion of the different gating strategies used particularly for the first figures would be advantageous to fully appreciate the findings being presented. This is particularly relevant for the identification of Ly6Chi and Ly6Clo BM monocytes.

      The conclusions made regarding the role of white adipocytes in skewing the monocyte subsets and particularly regarding the conversion of Ly6Clo monocytes to Ly6Chi are however less convincing. The authors use a culture strategy where they grow BM monocytes in vitro for 5 days. They then culture these 'monocytes' for a further 18 hours with conditioned media from BM adipocytes from control or HFD fed mice. They show that culture with 8 & 18 week conditioned media results in the increased abundance of Ly6Chi monocytes. The authors later claim this is not through proliferation of the existing Ly6Chi monocytes but conversion from Ly6Clo monocytes. However, the alternate explanation could be that there are some progenitors remaining in these cultures that can give rise to Ly6Chi monocytes following exposure to the conditioned media. To further validate these claims, it would be beneficial to sort Ly6Chi monocytes and culture them with the conditioned media to demonstrate the numbers do not increase. Moreover, it is important to demonstrate that there are no progenitors left in these cultures when the conditioned media is added. Indeed, later in the manuscript, when Ly6Clo monocytes are sorted and cultured with media from EWAT or BAT, it would be important to confirm that the sorted cells are a pure population of Ly6Clo monocytes with no contamination from progenitors that are also Ly6Clo that could give rise to Ly6Chi monocytes without going through the Ly6Clo monocyte stage.

      In a similar vein, the authors suggest no conversion of Ly6Chi monocytes to Ly6Clo monocytes, but that Ly6Clo monocytes would convert into Ly6Chi monocytes (fig. 7). As this is a rather controversial claim, additional data in support of this conclusion would be beneficial. For example, after 18 hours of culture it is possible that if the authors are sorting Ly6Chi monocytes on the basis of Ly6Chi expression, that the antibody staining may be maintained for 18 hours. Similarly, after culture, it is possible that the cells are less healthy and hence non-specific binding should also be ruled out. Alternatively, qPCR for gene expression associated with Ly6Chi and Ly6Clo monocytes could be utilised to further substantiate the claims. For example, Spn expression for Ly6Clo monocytes, Ly6c2 expression for Ly6Chi monocytes.

      Thus overall, this manuscript nicely demonstrates changes in the BM monocyte subsets and their metabolism, however some additional controls are required to further validate the claim that Ly6Chi monocytes are increased due to Ly6Clo monocyte conversion to Ly6Chi monocytes.

    3. Evaluation Summary:

      Using mice fed with high fat diet (HFD), Boroumand et al. observed a link between bone marrow (BM) adipocyte whitening and the expansion of BM Ly6Chi monocytes and derived cells in the adipose tissue. By adopting an in vitro approach, they also show that BM conditioned medium is able to metabolically rewire Ly6Chi monocytes notably concerning mitochondrial fission/fusion gene expression. They conclude that early changes in the BM adipocytes induced by HFD drive the activation of monocytes and influence the outcome of the disease.

      This study is of interest to those investigating BM adaptations to lipid signals as well as macrophage biologists interested in macrophage recruitment and differentiation in the context of obesity and beyond in inflammation.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors)

    1. Reviewer #4 (Public Review):

      This paper describes the transmission of Trypanosoma brucei by the Tsetse vector. As part of these studies, the authors discovered that (i) a single parasite is sufficient for transmission and (ii) two stages of the Trypanosoma brucei life cycle (slender and stumpy forms) can be efficiently transmitted by the Tsetse vector. This was unexpected (as mentioned in the title) because only stumpy forms were known to be adapted for transmission.

      The life cycles of parasites are text-book knowledge that researchers rely on and rarely question. It's the slide #2 of every talk in parasitology. In the mammalian host, the life cycle of Trypanosoma brucei comprises two stages: the dividing slender forms and the cell-cycle arrested stumpy-forms, which are pre-adapted to survive in the midgut of the next host (Tsetse fly). In this report, Schuster, Subota et al. show that slender forms are sufficient to establish an infection in the Tsetse fly and thus ensure transmission. The claims and conclusions are justified by the data presented.

    2. Reviewer #3 (Public Review):

      In this work, Schuster et al. have explored the requirement of the short stumpy morphological form of the African trypanosome, Trypanosoma brucei, for the completion of the parasite lifecycle. Heretofore, short stumpy form parasites, which have been proposed to be pre-adapted for life in the tsetse fly insect vector, were considered an essential stage in the transitions from mammalian blood forms to insect-infective stages. These parasites do not divide and are generated in a density-dependent manner from the rapidly dividing long slender blood form. The quiescent short stumpy forms have been shown in vitro to undergo differentiation into insect-infective forms in response to a diversity of environmental cues and stress, supporting their position as the lifecycle stage that initiates colonization of the fly midgut.

      The findings presented in this work call into question the longstanding notion that short stumpy parasites play a central role in the lifecycle. Notably, the authors have found that long slender forms are as competent as short stumpy parasites to infect flies. This observation may solve a major conundrum raised when short stumpy forms are considered essential intermediates in disease transmission. That is, how is the parasite successfully transmitted to tsetse flies when the flies only ingest very small bloodmeals from hosts with parasitemia too low to trigger density dependent stumpy form development?

      The authors perform an extensive analysis of parasites isolated from infected flies and compare fly infections established using different numbers of short stumpy and slender parasites. This effort includes dissection of a variety of fly tissues and scoring parasites for expression of key developmental markers. Interestingly, the data indicate that the long slender parasites activate pathways described from short stumpy parasites to complete differentiation; however, unlike the stumpy forms that are arrested in the cell cycle, the parasites continue to proliferate. Overall, the process of differentiation to the insect stage is not identical for the long slender and short stumpy forms, as expression of key markers (PAD1 and EP1) occurs more quickly when short stumpy forms are used in fly infection studies while, unlike the long slender forms, they are delayed in return to the normal cell cycle.

      The conclusions of the paper are supported by the presented data and the discussion further develops the case that long slender forms may be key to parasite transmission to the vector. The work is based on using the standard model African trypanosome subspecies that infects rodents and not a trypanosome species that infects humans. This does not, however, diminish the potential impact of the work, as the rodent parasites are the field standard (and molecular tools have primarily been developed in that background). In addition to finding that long slender forms are competent for lifecycle completion, which could ultimately require amendment of medical school textbook lifecycles, this work also raises important questions about the role of the short stumpy form in parasite biology. The authors speculate the short stumpy forms may serve to control population size in a quorum sensing-dependent-fashion. While this notion conflicts with observations presented from human infections where blood parasite levels are very low, it remains unresolved what cues environments like the skin and other tissues present to the parasite, and how these may influence short stumpy differentiation.

    3. Reviewer #2 (Public Review):

      Differentiation pathways for parasitic organisms are of considerable importance, as they are relevant to understanding transmission, mechanisms of host specificity as well as, in some cases, offering possible routes to control measures. The transition between mammalian host and insect vector for African trypanosomes has been widely addressed due to accessibility and tractability. However, one view has been dominant, despite, as the authors suggest, considerable counter evidence. The present work posits an alternate pathway, questioning the role of the so called stumpy stage. This is of considerable importance to the immediate field and possibly wider.

      The major strengths here are in the use of a good model, and a high number of individual infections. The weaknesses include some assumptions with which I have issue, and given that this work is seeking to overturn a dogma, which also has assumptions, one needs to tread very carefully, to avoid falling into an unscholarly dispute. The major things are for me the assumption that PAD-1 cells are stumpy - almost anything seems to be able to activate PAD-1 and the lack of any quantitative data are concerning. This is difficult really and Matthews also says that PAD-1 does not equal stumpy and morphology is also important. Further, simple expression of EP procyclin is not sufficient for designation as pr cyclic, and the salivary gland cells are assumed metacyclic without demonstration of VSG expression for example. While I accept that these interpretations are reasonable, this is an assumption and in all three cases leads me to feel a little underwhelmed. Perhaps most concerning are the lack of statistical calculations as well as any attempt at further analysis beyond counting. The result is very much phenomenology and lacks any mechanistic insight.

    4. Reviewer #1 (Public Review):

      The data in the paper are mostly convincing, but might be somewhat over-interpreted: statistical analysis of the Tables is required. Yes, long slender bloodstream forms can definitely differentiate to pro cyclic forms and infect Tsetse. However, they take longer to differentiate than stumpy forms do, and even though morphologically stumpy forms are not an obligatory intermediate, expression of at least one stumpy-form mRNA (and presumably, others in the pathway) is definitely required. This should be stated in the Abstract. The conclusion that there is no cell-cycle arrest at all is not really supported by the data.

    5. Evaluation Summary:

      This paper describes the transmission of Trypanosoma brucei by the Tsetse vector. As part of these studies, the authors discovered that (i) a single parasite is sufficient for transmission and (ii) two stages of the Trypanosoma brucei life cycle (slender and stumpy forms) can be transmitted by the Tsetse vector - although the stumpy form developed more rapidly into proliferative parasites in the Tsetse midgut. The results are unexpected because it was previously thought that only stumpy forms were important for transmission.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

    1. Joint Public Review:

      This is an elegant study that delves into germline initiation and ovule development at a resolution not previously reported. The topic is of general significance for developmental biologists, and particularly interesting for groups studying the basis for germline development. Using a multitude of assays, starting from 3D segmentation analysis, progressing to modelling, reporter line analysis and mutant characterization, the authors document cellular components of ovule primordium growth and uncover new aspects of spore mother cell (SMC) emergence, in which ovule geometry appears to play a relevant role. The authors concluded that anisotropic growth is one of important factors to drive overall development of ovules, especially in Phase I, and that the L1 dome and the basal domain, but not the SMC and neighboring L2 companion cells, are consecutive sites of cell proliferation, thus contributing to morphological changes of ovules in Phases I and II. In terms of novelty, this work identified growth principles conducive to ovule primordium growth, added a layer of complexity to the nucellar epidermis towards SMC specification, and provided a new concept of SMC development: SMC fate emergence and SMC singleness resolution, where cell geometry plays a very active role

      The katanin mutant is an interesting choice since it has been reported previously to impact cell growth. As expected, in katanin mutants, the primordium became enlarged in size and was more isotropic (lower height/width ratio) in shape. A reduced anisotropy also induced aberrant enlargement of SMC companion L2 cells in katanin mutant ovules. From PCNA and CYCB1.1 expression patterns, which are S- and M-phase markers, respectively, the authors found that the SMC precursor and its companion cells showed a highly frequent S-phase pattern. Taken together with infrequent divisions, the SMC and its neighbors have properties distinct to other ovular cells in longer S-phase duration. In addition, SMC singleness was suggested to be determined partly by Katanin-dependent anisotropic condition.

      The claims made through the work are well documented and supported. In terms of experimental clarity and composition, the authors describe very well how the samples were obtained/how they were named, the statistical analysis appears robust and well described, and several of the markers analyzed provide a comprehensive landscape of what is occurring in the ectopic cells.

    2. Evaluation Summary:

      The authors use imaging analysis of Arabidopsis developing ovule primordia until the onset of meiosis to clarify the importance of organ morphogenesis in cell fate. They first document the growth of ovule cells in three dimensions, then use computational modelling to predict factors underlying ovule growth, shape and spore mother cell (SMC) differentiation. They test this model through analysis of a mutant of Katanin, encoding a microtubule-severing protein. Overall, this work is elegant, adds new information and confirms previous hypotheses for the field. A well appreciated feature of this paper is OvuleViz, an R-based software tool that they developed, which will provide a consistent way for others to analyze mutants with similar phenotypic abnormalities.

      (This preprint has been reviewed by eLife. We include the joint public review from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      Developing animals must couple information about external and internal conditions with developmental programs to adapt to changing environments. In animals ranging from flies to mammals, growth and developmental progression is controlled by a neuroendocrine system that integrates environmental and developmental cues. In mammals, this system involves the reproductive axis (hypothalamic-pituitary-gonadal axis, HPG). In the fruit fly Drosophila, neurosecretory cells that project onto the ring gland, a composite endocrine organ that houses the corpora cardiaca (CC), the corpus allatum (CA), and the prothoracic gland (PG), serves analogous functions. Characterizing the neurosecretory cells that project to the ring gland and the inputs they receive is therefore key to a deeper understanding of how the neuroendocrine system receives and processes information about external and internal conditions, and in response, adjusts growth and development. Building on the electron-microscopic reconstruction of the Drosophila L1 larval brain, the authors perform a comprehensive analysis of the neurosecretory cells that target the larval ring gland and the neurons that form synaptic contacts with these neurosecretory cells. This work is truly impressive on its own, and more than that it will also be extremely important for the future characterization of inputs received by the neuroendocrine system to modulate its activity, thus coupling development with environmental conditions. The work is well-written, and I have no doubt that it will be of great value to the field.

    2. Reviewer #2 (Public Review):

      Analyzing EM data from the Drosophila larva, Hueckesfeld et al. investigate and describe the synaptic connectivity of sensory neurons and interneurons that provide input into the neuroendocrine system in fly larvae. The output of neuroendocrine neurons projecting to the ring gland is mostly non-synaptic and identified by receptor expression analysis. Using a modelling approach, they provide a more detailed analysis on newly discovered CO2-responsive cells and their downstream network and also other possible processing pathways from sensory to endocrine neurons. To test some of their model predictions, they analyze the response of predicted CO2-downstream neurons to CO2 exposure.

      Strengths of the paper:

      The authors did a great job in visualizing the complex connectivity between sensory inputs, interneurons, and endocrine neurons. Neuroendocrine neuron outputs, which are mostly non-synaptic, have been detected by identification of vesicle release regions. The authors went beyond the analysis of EM data and collected a lot of new data to confirm non-synaptic connectivity between neuroendocrine neurons and their downstream targets by performing antibody stainings and trans-tango experiments. This information will be highly valuable to the field.

      Sensory inputs in the larvae have been attributed according to previous publications, but the authors also describe a new CO2 sensing function of tracheal TD neurons. Description of this new sensory function is also a valuable addition to the Drosophila field.

      The authors used a modelling approach to describe and detect specific processing pathways, for example from a certain sensory modality, or to a specific endocrine neuron. This manuscript underlines that the use of a (simple) computational model framework to understand network motifs within an EM dataset is very powerful. Also, they can confirm that predicted CO2 downstream neurons indeed respond to CO2 in a certain way.

      The authors discuss potential functional implications for faster and slower processing pathways (connections over interneurons or direct). Indeed there might be situations where the larva needs to respond in flexible ways that are however also easily reversable (fast pathways), but there might be also other situations where the larva needs to integrate more sensory evidence and which might induce non-reversible behaviors, such as pupation (slow pathways). I think this discussion suggests an interesting concept of the impact/cost of adaptive behavioral changes and the different timescales they can occur.

      Weaknesses of the paper:

      Data wise, this manuscript is a very descriptive study. The authors visualize the complex and diverse possible processing pathways; however, the function of the circuit remains unknown. To really understand the functional properties behind this complex architecture will require studies focused on single sensory modalities, single pathways and/or single peptidergic classes all in the context of a certain behavioral framework.

      The authors try to provide a complete overview over the connectivity within the neuroendocrine system pathways. However, the authors should discuss that the connectivity data from the one EM dataset that they analyzed might be changing across individuals and development. Especially the vesicle release sites might be more variable across individual larvae than synaptic connections. Neuropeptide receptor expression might also change over development.

      The authors investigate the TD CO2 sensing pathway in more detail. They show that the sensory neurons and the predicted downstream neurons respond to CO2. This shows that the neural connectivity might serve a functional purpose. There is however another type of sensory neurons that respond to CO2 in the larva (Gr21a receptor neurons- Faucher et al., 2006), which are required for an avoidance response to the stimulus. The authors should discuss and maybe analyze the EM data for possible circuit convergence between the two different CO2 sensory input neurons.

      The authors discuss the CO2 response in the context of a stress response. However, the natural environment of larvae, rotten fruits, also emit CO2 as a by-product. Thus, sensing CO2 which converges together with information from Fructose/Glucose sensors might be used for finding or evaluating food sources.

    3. Reviewer #1 (Public Review):

      The neuroendocrine system of the maggot has been mapped in parts at both the light and electron microscopic levels in earlier studies. In this manuscript, Hückesfeld et al map the entire endocrine system all the way from its sensory input neurons to the interneurons and secretory neurons and the glands. This is invaluable for many reasons, including because information about external stimuli are likely integrated at the level of interneurons.

      The authors use this connectome to model how and to what extent each sensory modality might influence the different neurosecretory cells. They use the CO2 sensing pathway to functionally validate their model in vivo using CaMPARI. Through this they validate a circuitry where CO2 sensing neurons in the trachea influence 4 types of neurosecretory cells via 4 interneuron pathways. Interestingly, they find that the CO2 sensory information is not necessarily what dominates the sensory input onto some these neurons.

    4. Evaluation Summary:

      This manuscript will be of broad interest to readers in the field of neuroscience. The authors use a serial section transmission electron microscopy data set to trace out the entire neuroendocrine system of a maggot from its sensory input to neuroendocrine cells. It highlights the complexity of brain circuits, describing how parallel processing systems can lead to a multitude of different input combinations for different neuroendocrine cell types and subcircuits. They provide interpretations about functionality of one of described neural circuits. While the analyses are generally rigorous, the functional interpretations need more supporting evidence.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #3 agreed to share their names with the authors.)

  2. Feb 2021
    1. Reviewer #3 (Public Review):

      It is established that Kinase suppressor of Ras 1 (KSR1) contributes to the oncogenic actions of Ras by promoting ERK activation. However, the downstream actions of this pathway are poorly understood. Here Rao et al. demonstrate that this KSR1-dependent pathway increases translation of Epithelial-Stromal Interaction-1 (EPSTI1) mRNA and expression of EPSTI1 protein. This is significant because EPSTI1 drives aspects of EMT, including expression of ZEB1, SLUG, and N-Cadherin. The analysis is thorough and includes both loss-of-function and gain-of-function studies. Overall, the conclusions of this study are convincing and advance our understanding of cancer development.

    2. Reviewer #2 (Public Review):

      KSR1 functions as a critical rheostat to fine-tune MAPK signalling, and identifying modes by which its over-expression promotes tumor progression is clinically important and potentially druggable. Ras is highly mutated in CRC and unfortunately inhibitors of Ras have been challenging to develop. However, small molecules which stabilize an inactive form of the KSR are actively being developed in an attempt to repress RAS signaling. Thus, this study, which seeks to identify how KSR1 promotes oncogenic mRNA translation, is potentially highly clinically relevant, as it may identify novel druggable targets.

      In this manuscript the authors performed polysome profiling in colorectal cancer (CRC) cells and proposed that KSR1 and ERK regulate the translation of EPSTI1 mRNA. They go on to characterize the phenotypes associated with knock-down or knock-out of KSR1 in CRC, and show that their defects in invasion, anchorage-independent growth and switch to a less EMT-like phenotype are all EPSTI1-dependent.

      The authors succeeded in providing ample in vitro data that KSR1 and EPSTI1 are potential therapeutic targets in CRC. However, the data demonstrating that KSR1 and ERK regulate EPSTI1 mRNA translation is tenuous. Although the authors state that "EPSTI1 is necessary and sufficient for EMT in CRC cells", the data presented are consistent with a more restrained conclusion of a partial-EMT and not EMT per se. Finally, without an in vivo model it is difficult to glean novel insight into the mechanism by which KSR1 and/or EPSTI1 control the invasive and metastatic behaviour of cells.

    3. Reviewer #1 (Public Review):

      In this manuscript Rao et al. describe an interesting relationship between KSR1 and the translation regulation of EPSTI1 (a regulator of EMT). They identified this relationship by polysome RNAseq of CRC cells in the context of KSR1 knockdown (KD) which they confirm by polysome QPCR. They then go on to show that KSR KD and add back influences EPSTI1 expression at the protein but not mRNA level and impacts cell viability, anchorage-independent growth, and possibly cell migration. They focus on the cell migration phenotype and show that it is associated with changes in EMT-related genes including E-cad and N-cad. Interestingly, add back of EPSTI1 can reverse the phenotype elicited by KSR1 deletion. Overall, this story is interesting and translation regulation by KSR1 has not been described previously. However, Rao et al. do not provide a mechanism for how KSR1 regulates the translation of EPSTI1, and it is unclear whether this occurs through eIF4E, as the authors suggest.

    4. Evaluation Summary:

      This paper demonstrates the involvement of Kinase Suppressor of Ras 1, a protein that acts as a scaffold in the mitogen activated protein kinase (MAPK) signaling cascade, in translational control of epithelial-to-mesenchymal transition. The analysis is thorough and includes both loss-of-function and gain-of-function studies. This study advances our understanding of cancer development.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

    1. Reviewer #3 (Public Review):

      The authors have studied preclinical models of human small cell lung cancer (SCLC) using characterized SCLC cell lines that have been manipulated to conditionally express mutant EGFR (L858R) or KRAS (G12V) alleles and then assessing their morphology in cell culture, expression of neuroendocrine differentiation markers and transcription factors, and main signaling pathways such as the MAPK pathway. They focus on this because activation of ERK and the MAPK pathways are seen in nearly all non-small cell lung cancers (NSCLCs) including those with EGFR or KRAS mutations but mutations in these driver oncogenes or active ERK and MAPK pathway are essentially never found in SCLCs. In addition, chromatin modifications are assessed after manipulations and functional genomics targeting and pharmacologic inhibition of various components of the MAPK pathway are tested to see their effect on NE expression. Because of the known clinical phenomenon of transformation to SCLC like tumors by lung adenocarcinomas with EGFR mutations that become resistant to EGFR tyrosine kinase inhibitors, findings from the SCLC studies were applied to try to experimentally generate such LUAD to SCLC transformation. Overall, they found that activation of ERK/MAPK pathway by oncogenic mutations led to loss of NE differentiation and that the "ERK-CBP/p300-ETS axis promotes a lineage shift between neuroendocrine and non-neuroendocrine lung cancer phenotypes". They conclude: "In summary, we provide the first reported biological rationale for why alterations in MAPK pathway are rarely found in SCLC and describe the molecular underpinnings of how the central node in this pathway, ERK2, suppresses the NE differentiation program. " The authors conclusions and claims are justified by the experiments and data they present and they provide a mechanistic basis of what happens with MAPK/ERK activation in SCLC, why one does not find MAPK/ERK activation in SCLC, or the presence of related oncogenic driver mutations such as mutant KRAS or EGFR.

    2. Reviewer #2 (Public Review):

      Cell fate transitions (such as adenocarcinoma converting to small cell neuroendocrine fate) are an increasing phenomenon observed during therapeutic resistance in lung cancer, prostate cancer, and possibly other cancer types. It is important to understand these mechanisms if we ultimately seek to tailor treatment to a patient's disease and/or to control the pathways that lead to treatment resistance. However, the mechanisms that underly these cell fate changes are not well understood. It has been previously observed (Calbo et al, Cancer Cell, 2011) that activated mutant Kras (commonly associated with adenocarcinoma fate) can promote a non-neuroendocrine fate in SCLC, but the mechanisms are unknown.

      Predominantly using three human small cell lung cancer (SCLC) cell lines, Inoue and colleagues use genetic and pharmacological approaches to focus on potential mechanisms by which Egfr/Kras/Mapk signaling can repress neuroendocrine fate. They make a number of interesting observations that extend our understanding of neuroendocrine cell fate regulation including:

      1) Kras-induced NE suppression appears to depend mostly on ERK2, and not ERK1 or PI3K signaling.

      2) Kras activation induces chromatin changes including increased H3K27Ac in 2/3 cell lines; increased H3K27Ac in response to HDAC inhibition is associated with NE suppression. Pharmacological inhibition of CBP/p300 (a HAT that promotes H3K27Ac) reduces H3K27Ac and restores NE suppression. Altogether, these findings are consistent with the notion that SCLC cannot tolerate high levels of H3K27Ac.

      3) Kras induces the MSK/RSK pathway consistently in cell lines but appears to be functionally-relevant to NE fate only in H82 cells.

      4) Kras activation induces chromatin occupancy at ERG and ETS family transcription factor (Etv1, 4, 5) binding sites in 2/3 cell lines, and induces ETV4 (2/3 lines) and ETV5 protein levels (3/3 lines). ETV1 and ETV5 overexpression are sufficient to inhibit NE fate markers in context-dependent manner. Ets family induction appears to occur in a CIC-independent manner.

      In addition, some interesting negative data is presented, for example, SOX9 is induced upon Kras activation in 3/3 cell lines but it was not functionally relevant for NE suppression; Notch1, Notch2, and HES1 (known NE fate suppressors) are induced by Kras activation in a cell context-specific manner, but they did not appear functionally-relevant to NE suppression based on HES1 knockout and a pharmacological inhibitor of Notch signaling; Rb1 loss was not sufficient to promote NE fate in EGFR/p53 mutant cell lines, despite its known association with adeno-to-SCLC conversion. Overall, the conclusions in the manuscript are well justified. These findings will be of interest to those especially in lung and prostate cancer studying cell fate conversions in the context of EGFR and AR inhibitor resistance, respectively. These observations will be built upon by these fields.

      Weaknesses:

      1) One recurring issue in the manuscript is that the observations are often not consistent across the three cell lines and are context-specific effects, and the potential reasons could be explained better. The cell lines chosen unfortunately (but interestingly) represent some of the major cell states of SCLC. H2107 represents the ASCL1+ NE-high subset of SCLC (and has some MYCL). H82 and H524 represent the C-Myc (MYC)-high subset of SCLC, with H82 having a MYC amplification, and both representing the NEUROD1 subtype (which tend to be associated with more MYC). Assessment of NE score using a common approach in the field (Zhang et al, TLCR) shows that H82 cells are already considerably NE-low, with H524 as NE-intermediate/high, and H2017 as NE-high. So, this may be related to why H82 seemed to be the most permissive cell line to change NE fate in multiple assays.

      In addition, H2107 and H524 appear to have EP300 mutations, which may contribute to their NE-high nature and contribute to the refractory response to A485 treatment based on the author's model. It's known that MYCL and MYC-driven cell lines differ in numerous aspects from transcriptional signatures, super enhancer usage, metabolic regulation, therapeutic response, etc. This information could be mentioned in the results and discussed when mentioned as a factor near line 540.

      2) Related to Figure 4, the authors show that p300 pharmacological inhibition can restore NE fate in presence of Kras. Given that drugs can have off-target effects, it would be helpful to know if genetic knockdown/knockout of p300 phenocopies these effects. Given that CREBBP (CBP) or EP300 (p300) mutations are common in SCLC, it is also relevant whether any of these cell lines have CREBBP (CBP) or EP300 (p300) mutations. It appears H2107 and H524 may have EP300 mutations, and it would be good to know whether the authors have tried to restore EP300 function.

    3. Reviewer #1 (Public Review):

      The paper is investigating the mechanism of lineage switch in lung cancer. In about 10-15% of lung cancers treated with inhibitors of oncogenic receptors such as EGFR or KRAS, cancer cells emerge over time with newly acquired features of neuroendocrine differentiation. The authors proposed that it is a direct result of inhibition of MAPK pathway signaling so that reduced MAPK activity activates previously silent genes regulating neural crest differentiation. While this theory is of interest, the experiments presented herein are construed on the opposite sequence by way of introducing activated MAPK via oncogenic KRAS or EGFR to 3 neuroendocrine cell lines resulting in lower expression of neuronal transcription factors. The authors propose MAPK-activated ETS family TFs are responsible for the repression of NE lineage.

      Several principal issues presented by the authors raise some concerns:

      1) Despite presenting some evidence to the effect of suppression of NE transcription factors by overactivating MAPK signaling, the conversion of adenocarcinoma to NE (the opposite transition) is not being addressed in the paper. Therefore, it is rather illogical to investigate the process of transition that is not taking place in the real world.

      2) The authors do not consider a possibility of multi-clonality of human cancers and clonal competition as a mechanism leading to acquired resistance and the emergence of NE clones that are not suppressible by the inhibitors of MAPK pathway (e.g. EGFR inhibitors, or KRAS/RAF/MEK inhibitors). Starting the experiments with clonal populations of long-term cultured cell lines may be an insurmountable difficulty to switch these cells between the epithelial and NE phenotypes which proved to be frustratingly non-productive in the hands of the authors. Taken out of context of tumor microenvironment, these phenotypic transitions may be co-regulated by a combination of cell-intrinsic and extrinsic factors.

      3) Despite zeroing in on ETVs downstream of ERK1/2, the paper does not go as far as showing the direct effect of these TFs as repressors of NE differentiation (ASCL1, BRN2, NEUROD1 etc.).

      4) The line of evidence that Dox-activated MAPK signaling via massive over expression of KRAS or EGFR induces dramatic increase in marks of transcriptionally active chromatin (such H3K27ac and others) is to be expected in this entirely artificial system. Indeed, the addition of doxycycline results in massive burst of proliferation and overexpression of ETV1 and ETV4, the canonical MAPK targets. Again, this switch appears unrelated with the opposite of epithelial-to-NE de-differentiation.

    4. Evaluation Summary:

      This manuscript will be of interest to cancer biologists studying cell fate transitions, particularly adenocarcinoma-to-small cell transitions that occur in prostate and lung cancer, which is a timely topic. While there is not a single linear mechanism identified that fully explains Kras-induced neuroendocrine cell fate suppression in all contexts, multiple new findings will likely be built upon by the field. Overall, the data are properly controlled and the key claims are supported.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. All reviewers agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      Advances in understanding the biochemical and cellular mechanism of neuronal damage are investigated here and are to be appreciated. The strength of this work on SARM1 is its success in establishing that a concentration-dependent phase change activates the enzyme to degrade NAD, an essential component of neuronal integrity. Cellular significance is demonstrated in C. elegans neuronal damage triggered by citrate. Weaknesses are that high citrate is required for SARM1 effects but low citrate is used in the C. elegans model without establishing concentration dependence in the C. elegans system. The progression on neuronal damage from enzyme activation to neuronal damage in C. elegans is missing the quantitation of NAD change. A strength of the work is to provide a solid stepping-stone to permit the next steps in cementing the biochemical pathways of initiating cellular damage to neurons.

    2. Reviewer #2 (Public Review):

      The latest manuscript of Loring and coworkers solves a number of important problems of SARM1 structure and function at once, namely why the purified enzyme has little activity, what size is the active multimer, whether it produces cADPR on the way to ADPR, and how this enzyme may overcome autoinhibition by NAD+ in vivo. In work that is technically sound, the authors describe a phase transition that can be induced by macroviscogens and by citrate in which we are able to see cryoEM images of activated multimers and the induction of SARM1 activity in worms by citrate. Working with concentrated enzyme, the authors are further able to characterize SARM1 activity in detail and clearly show which cations are most inhibitory and that ADPR and not cADPR is the primary product of the reaction.

      There is clearly a lot of regulation in the system with NAD+ inhibiting and NMN activating this enzyme and NMNAT, which controls conversion of NMN to NAD+ being localized to the outer Golgi membrane. Golgi and mitochondria are both moved along axons in processes that are totally dependent on cellular energetics. Given the broad contributions that are made by this work, I would not mind if the authors considered whether citrate, either from stressed mitochondria or from inhibition of the cytosolic enzyme ATP-citrate lyase, might be produced at high enough concentration to push SARM1 into the phase transition described herein.

    3. Reviewer #1 (Public Review):

      SARM1 is an enzyme that is present in neurons and degrade NAD+. Previous studies have shown that disrupting SARM1 inhibits axon degeneration and thus it could be a target for treating neurodegenerative diseases. NAD+ is also an important metabolite that is required for many biological pathways. Thus, SARM1 activity must be carefully regulated. Recent studies have provided structural and biochemical insights about how SARM1 activity is auto-inhibited in basal states. The manuscript by Dr. Thompson and coworkers provide a nice new model regarding how SARM1 could be potentially activated. They provide strong in vitro data to support that phase transition, promoted by PEG molecules and citrate, could dramatically increase the activity of SARM1 TIR domain (which is the catalytic domain) in vitro. The authors also showed that in the worm, C. elegans, citrate promotes SARM1 puncta formation and axon degeneration, which is consistent with the in vitro data. They also generated multiple mutants of SARM1 TIR domain and showed many of the mutants have decreased phase transition and decreased activity in vitro. One of mutant, G601P, also showed decreased puncta formation when expressed in HEK 293T cells as SARM1 SAM-TIR domains E462A mutant (a catalytic mutant so that expression will not cause toxicity) fused with GFP.

      The manuscript has many strengths, including the strong and very careful in vitro characterization of the purified SARM1 TIR domain, which provide a lot of useful information regarding the kinetic parameters, substrate specificity, and inhibition profiles. The worm data with citrate is consistent with the in vitro data, which is also a strength.

      The impact of the finding lies in two aspects. First, it provides a new understanding about how SARM1 activity might be regulated in vivo by phase transition. This is especially true given most studies so far focuses on how it is inhibited at basal conditions. It also adds another example to the list of enzymes that are regulated by phase separation. Second, the finding that PEG and citrate strongly activate SARM1 in vitro also provides a much improved assay for the development of small molecule modulators of SARM1 for potential therapeutic applications.

      There are two minor weaknesses associated with the studies of the manuscript. One is that all the in vitro studies used just the TIR domain of SARM1, not the full length SARM1. Another minor weakness is associated with the data in Figure 5. Most of the mutants have dramatically lower catalytic activities (>100-fold), but the precipitate formation is only modestly affected (2-fold). Although this does not affect the overall conclusion of the manuscript, it prevents the mutants from being more useful for mechanistic dissection.

    4. Evaluation Summary:

      This manuscript describes an interesting regulatory mechanism that activates SARM1, an enzyme that degrade NAD+ and promote axon degeneration. Previous structural and biochemical studies mostly focus on how SARM1 is auto-inhibited at basal conditions and this manuscript provides evidences supporting that phase transition could promote its activity, thus providing new understanding about its regulatory mechanism. The finding also enables in vitro assays to be carried out more easily and thus could facilitate the development of small molecule modulators of SARM1 for therapeutics purposes.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and reviewer #2 agreed to share their names with the authors.)

    1. Reviewer #3 (Public Review):

      The authors set out to determine the role of interleukin (IL)-33 in the host immune response to the parasite Toxoplasma gondii. They achieve this using a mouse model of infection and a range of genetically modified mice to systematically prove the pathway involved.

      A major strength of the study is the use of strategic immune cell/factor-deficient mice in combination with recombinant proteins in vivo. This may be further strengthened by future studies that test the impact off inhibitory antibodies against the primary factor of interest, IL-33. This would allow for a loss and gain of function approach, supporting the exisiting in vivo data with recombinant mouse IL-33.

      Overall, the approach taken achieves the goal of the study. The manuscript is well written and systematically addresses the steps in the pathway that are required to mount an effective IL-33-mediate immune response to T. gondii.

      The likely impact of this work are new knowledge of the function of IL-33 in response to infection and the interaction between different components of the immune system to achieve a balanced, context dependent response. The study does not highlight new methods or technical advances, but does provide important new information on immune responses to infection.

    2. Reviewer #2 (Public Review):

      The authors eloquently showed that IL-33 was produced from stromal cells following Toxoplasma infection and that the absence of IL-33 signaling resulted in increased parasitemia. In agreement with this observation, they found that exogenous IL-33 reduced parasite load and increased the recruitment of inflammatory monocytes that are critical for resistance. The manuscript is well written and data presented here supports the major findings of this work.

    3. Reviewer #1 (Public Review):

      In initial experiments, low levels of IL-33 were detected in Toxaplasma-infected mice. How do these levels compare with normal physiological levels? It would help the reader to understand the relative levels to expect.

      The authors identify that most IL-33 is produced by stromal cells rather than hematopoietic cells. The frequency of tdTomato parasites appear to be much less than for the distribution of IL-33 producing cells. Does the parasite expression reflect 100% of parasites or are the number of IL-33-producing stromal cells stimulated in the infection much larger than the identifiable parasite number? That is, is the activation of the stromal cells a direct effect of the Toxaplasma infection or does it depend on intermediates to amplify the effect?

      Although the data presented are interesting and the authors identify that both stromal cells and hematopoietic cells contribute to the protective effect of IL-33, it is somewhat confusing amongst the hematopoietic cells, which cells are really driving the response amongst those categorized as 'innate'. Within the hematopoietic compartment, a number of associations are delineated but the causal connections are less clear. The provision of exogenous cytokines indeed have the effect they show in their results, but it remains unclear to this reviewer, whether these effects directly act on the hematopoietic cells, or stromal cells which alone are not sufficient to contain the infection and thus develop a higher pathogen load confounding their contributions.

      This work would be strengthened significantly by delineating more clearly the contributions of each compartment. Currently, the correlations are modelled on the responses in the omentum and it would be useful to understand if this reflects the broader response.

      This work would benefit from a schematic to indicate how the authors believe the different cells are connected and which are the real drivers/where connections have been demonstrated in driving the immune response.

    4. Evaluation Summary:

      This study sheds new light on the function of an immune system protein termed interleukin (IL)-33 in response to parasite infection. The study provides information on alternative functions of this immune protein and details the path taken to achieve a beneficial immune response. This study is of interest to immunologists who deal with the host response to infection, particularly to parasites. Immunotherapies that enhance or inhibit IL-33 are in development. Understanding the role of this immune factor in a broad range of infections is important when considering future treatments that target this pathway.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #3 agreed to share their name with the authors.)

    1. Reviewer #2 (Public Review):

      Overall this is a solid and technically sound manuscript, and I have only two relatively minor suggestions for improvement:

      1) Tetramer versus dimer

      The particles that were analyzed by cryoEM were composed of four THO-Sub2 protomers, yet the authors argue that a dimer is the functional unit. Why? The tetramer versus dimer organization needs to be better discussed, also in light of the observation that the human complex can also form a tetramer.

      2) Sub2 activation mechanism

      The authors should more carefully discuss how THO 'activates' Sub2 (and how the 'semi-open state' leads to activation) and indicate the RNA binding surface of Sub2 in their model.

    2. Reviewer #1 (Public Review):

      1) I found the initial description of the overall structure confusing. At first the authors say the complex is a tetramer, which is not what was seen by the Conti lab and then follow that with a confusing discussion leading to the conclusion that the dimer with a rigid subunit and a flexible one is the functional unit. It rather feels like they arrive at this conclusion because that's what Conti's lab saw, rather than any other reason. Since the human complex is a tetramer, perhaps the tetrameric complex observed here is one possible form and that possibility should be considered more carefully. Please state whether there is any similarity in the arrangements between the human tetramer and the tetramer observed here. I found the figure 2 supp 1C was not easy to follow. Coloring each of the four protomers differently would make things clearer.

      2) The authors previously determined the structure of yra1C domain bound to sub2 and several labs have shown this interaction activates Sub2 atpase activity. Are those interaction observed previously between Yra1 and Sub2 compatible with this new structure? If so, perhaps the authors could provide a model showing how Yra1 fits into this larger complex. Also, could Yra1 C domain and Gbp2 bind simultaneously to a single THO-Sub2 protomer or would one protomer bind Yra1 and perhaps another bind Gbp2? This is worth considering because this would strengthen the concept that TREX acts as a general platform engaging with multiple export factors to drive recruitment of multiple Mex67 molecules and eventual export of the Mex67:mRNP complex. In the human system, the SR proteins and Alyref have an overlapping binding site on Nxf1, suggesting they may not act together to recruit a single Nxf1, but rather they recruit different Nxf1 molecules perhaps to the same mRNP via a single multimeric THO platform.

    3. Evaluation Summary:

      This is an interesting paper describing the structure of the yeast THO:Sub2 complex and how it interacts with the SR like protein Gbp2. The paper extends what we have learned from two recently published Tho:Sub2 complex structures by the Conti and Plaschka groups in two ways. Firstly, it shows how Gbp2 interacts with the THO complex. Secondly, it reveals a substantially different orientation between THO:Sub2 protomers compared with the earlier structure, so provides more information on the flexibility and range of movements that the two protomers might make with respect to each other. The structural inferences are supported by some biochemical experiments but mechanistically the work has limitations, similar to other recent cryo-EM structures of this complex. However, this is an important structure of wide interest to people working on gene expression in eukaryotes and it undoubtedly advances the field.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers opted to remain anonymous to the authors.)

    1. Reviewer #3 (Public Review):

      This paper examines the role of neutrophils, inflammatory immune cells, in disease caused by genital herpes virus infection. The experiments describe a role for type I interferon stimulation of neutrophils later in the infection that drives inflammation. Blockade of interferon, and to a lesser degree, IL-18 ameliorated disease. This study should be of interest to immunologists and virologists.

      This study sought to examine the role of neutrophils in pathology during mucosal HSV-2 infection in a mouse model. The data presented in this manuscript suggest that late or sustained IFN-I signals act on neutrophils to drive inflammation and pathology in genital herpes infection. The authors show that while depletion of neutrophils from mice does not impact viral clearance or recruitment of other immune cells to the infected tissue, it did reduce inflammation in the mucosa and genital skin. Single cell sequencing of immune cells from the infected mucosa revealed increased expression of interferon stimulated genes (ISGs) in neutrophils and myeloid cells in HSV-2 infected mice. Treatment of anti-IFNAR antibodies or neutrophil-specific IFNAR1 conditional knockout mice decreased disease and IL-18 levels. Blocking IL-18 also reduced disease, although these data show that other signals are likely to also be involved. It is interesting that viral titers and anti-viral immune responses were unaffected by IFNAR or IL-18 blockade when this treatment was started 3-4 days after infection, because data shown here (for IFN-I) and by others in published studies (for IFN-I or IL-18) have shown that loss of IFN-I or IL-18 prior to infection is detrimental.

      These data are interesting and show pathways (namely IFN-I and IL-18) that could be blocked to limit disease. While this suggests that IL-18 blockade might be an effective treatment for genital inflammation caused by HSV-2 infection, the utility of IL-18 blockade is still unclear, because the magnitude of the effect in this mouse model was less than IFNAR blockade. Additionally, further experiments, such as conditional loss of IL-18 in neutrophils, would be required to better define the role and source(s) of IL-18 that drive disease in this model.

    2. Reviewer #2 (Public Review):

      This manuscript will be of interest to a broad audience of immunologists especially those studying host-pathogen interactions, mucosal immunology, innate immunity and interferons. The study reveals a novel role for neutrophils in the regulation of pathological inflammation during viral infection of the genital mucosa. The main conclusions are well supported by a combination of precise technical approaches including neutrophil-specific gene targeting and antibody-mediated inhibition of selected pathways.

      In this study by Lebratti, et al the authors examined the impact of neutrophil depletion on disease progression, inflammation and viral control during a genital infection with HSV-2. They find that removal of neutrophils prior to HSV-2 infection resulted in ameliorated disease as assessed by inflammatory score measurements. Importantly, they show that neutrophil depletion had no significant impact on viral burden nor did it affect the recruitment of other immune cells thus suggesting that the observed improvement on inflammation was a direct effect of neutrophils. The role of neutrophils in promoting inflammation appears to be specific to HSV-2 since the authors show that HSV-1 infection resulted in comparable numbers of neutrophils being recruited to the vagina yet HSV-1 infection was less inflammatory. This observation thus suggests that there might be functional differences in neutrophils in the context of HSV-2 versus HSV-1 infection that could underlie the distinct inflammatory outcomes observed in each infection. In ordered to uncover potential mechanisms by which neutrophils affect inflammation the authors examined the contributions of classical neutrophil effector functions such as NETosis (by studying neutrophil-specific PAD4 deficient mice), reactive oxygen species (using mice global defect in NADH oxidase function) and cytokine/phagocytosis (by studying neutrophil-specific STIM-1/STIM-2 deficient mice). The data shown convincingly ruled out a contribution by the neutrophil factors examined. The authors thus performed an unbiased single cell transcriptomic analysis of vaginal tissue during HSV-1 and HSV-2 infection in search for potentially novel factors that differentially regulate inflammation in these two infections. tSNE analysis of the data revealed the presence of three distinct clusters of neutrophils in vaginal tissue in mock infected mice, the same three clusters remained after HSV-1 infection but in response to HSV-2 only two of the clusters remained and showed a sustained interferon signature primarily driven by type I interferons (IFNs). In order to directly interrogate the impact of type I IFN on the regulation of inflammation the authors blocked type I IFN signaling (using anti IFNAR antibodies) at early or late times after infection and showed that late (day 4) IFN signaling was promoting inflammation while early (before infection) IFN was required for antiviral defense as expected. Importantly, the authors examined the impact of neutrophil-intrinsic IFN signaling on HSV-2 infection using neutrophil-specific IFNAR1 knockout mice (IFNAR1 CKO). The genetic ablation of IFNAR1 on neutrophils resulted in reduced inflammation in response to HSV-2 infection but no impact on viral titers; findings that are consistent with observations shown for neutrophil-depleted mice. The use of IFNAR1 CKO mice strongly support the importance of type I IFN signaling on neutrophils as direct regulators of neutrophil inflammatory activity in this model. Since type I IFNs induce the expression of multiple genes that could affect neutrophils and inflammation in various ways the authors set out to identify specific downstream effectors responsible for the observed inflammatory phenotype. This search lead them to IL-18 as possible mediator. They showed that IL-18 levels in the vagina during HSV-2 infection were reduced in neutrophil-depleted mice, in mice with "late" IFNAR blockade and in IFNAR1 CKO mice. Furthermore, they showed that antibody-mediated neutralization of IL-18 ameliorated the inflammatory response of HSV-2 infected mice albeit to a lesser extent that what was seen in IFNAR1 CKO. Altogether, the study presents intriguing data to support a new role for neutrophils as regulators of inflammation during viral infection via an IFN-IL-18 axis.

      In aggregate, the data shown support the author's main conclusions, but some of the technical approaches need clarification and in some cases further validation that they are working as intended.

      1) The use of anti-Ly6G antibodies (clone 1A8) to target neutrophil depletion in mice has been shown to be more specific than anti-Gr1 antibodies (which targets both monocytes and neutrophils) thus anti-Ly6G antibodies are a good technical choice for the study. Neutrophils are notoriously difficult to deplete efficiently in vivo due at least in part to their rapid regeneration in the bone marrow. In order to sustain depletion, previous reports indicate the need for daily injection of antibodies. In the current study the authors report the use of only one, intra-peritoneal injection (500 mg) of 1A8 antibodies and that this single treatment resulted in diminished neutrophil numbers in the vagina at day 5 after viral infection (Fig 1A). Data shown in figure 2B suggests that there are neutrophils present in the vagina of uninfected mice, that there is a significant increase in their numbers at day 2 and that their numbers remain fairly steady from days 2 to 5 after infection. In order to better understand the impact antibody-mediated depletion in this model the authors should have examined the kinetics of depletion from day 0 through 5 in the vaginal tissue after 1A8 injection as compared to the effect of antibodies in the periphery. These additional data sets would allow for a deeper understanding of neutrophil responses in the vagina as compared to what has been published in other models of infection at other mucosal sites.

      2) The authors used antibody-mediated blockade as a means to interrogate the impact of type I IFNs and IL-18 in their model. The kinetics of IFNAR blockade were nicely explained and supported by data shown in supplementary figure 4. IFNAR blockade was done by intra-peritoneal delivery of antibodies at one day before infection or at day 4 after infection. When testing the role of IL-18 the authors delivered the blocking antibody intra-vaginally at 3 days post infection. The authors do not provide a rationale for changing delivery method and timing of antibody administration to target IL-18 relative to IFNAR signaling. Since the model presented argues for an upstream role for IFNAR as inducer of IL-18 it is unclear why the time point used to target IL-18 is before the time used for IFNAR.

      3) An open question that remains is the potential mechanism by which IL-18 is acting as effector cytokine of epithelial damage. As acknowledged by the authors the rescue seen in IFNAR1 CKO mice (Fig 5C) is more dramatic that targeting IL-18 (Fig 6D). It is thus very likely that IFNAR signaling on neutrophils is affecting other pathways. It would have been greatly insightful to perform a single cell RNA seq experiment with IFNAR CKO mice as done for WT mice in Fig 3. Such an analysis might would have provided a more thorough understanding of neutrophil-mediated inflammatory pathways that operate outside of classical neutrophil functions.

      4) The inflammatory score scale used is nicely described in the methods and it took into consideration external signs of vaginal inflammation by visual observation. It would have been helpful to mention whether the inflammation scoring was done by individuals blinded to the experimental groups.

      5) The presence of distinct clusters of neutrophils in the scRNA-seq data analysis is a fascinating observation that might suggest more diversity in neutrophils than what is currently appreciated. In this study, the authors do not provide a list of the genes expressed in each cluster within the data shown in the paper. Although the entire data set is deposited and publicly available, having the gene lists within the paper would have been helpful to provide a deeper understanding of the current study.

    3. Reviewer #1 (Public Review):

      Overall this is a well-done study, but some additional controls and experiments are required, as discussed below. The authors have done a considerable amount of work, resulting in quite a lot of negative data, and so should be commended for persistence to eventually identify the link between neutrophils with IL-18, though type I IFN signaling.

      Major Comments:

      • A major conclusion of this manuscript is prolonged type I IFN production following vaginal HSV-2 infection, but the data presented herein did not actually demonstrate this. At 2 days post infection, IFN beta was higher (although not significantly) in HSV-2 infection, but much higher in HSV-1 infection compared to uninfected controls. At 5 days post infection the authors show mRNA data, but not protein data. If the authors are relying on prolonged type I IFN production, then they should demonstrate increased IFN beta during HSV-2 infection at multiple days after infection including 5dpi and 7dpi.

      • Does the CNS viral load or kinetics of viral entry into the CNS differ in mice depleted of neutrophils, IFNAR cKO mice, or mice treated with anti- IL-18? Do neutrophils and/or IL-18 participate at all in neuronal protection from infection?

      • In Figure 3 the authors show that neutrophil "infection" clusters 2 and 5 express high levels of ISGs. Only 4 of these ISGs are shown in the accompanying figures. Please list which ISGs were increased in neutrophils after both HSV-2 and HSV-1 infection, perhaps in a table. Were there any ISGs specifically higher after HSV-2 infection alone, any after HSV-1 infection alone?

      • The authors claim that HSV-1 infection recruits non-pathogenic neutrophils compared to the pathogenic neutrophils recruited during HSV-2 infection. Can the authors please discuss if these differences in inflammation or transcriptional differences between the neutrophils in these two different infections could be due to differences in host response to these two viruses rather than differences in inflammation? Please elaborate on why HSV-1 used as opposed to a less inflammatory strain of HSV-2. Furthermore, does HSV-1 infection induce vaginal IL-18 production in a neutrophil-dependent fashion as well?

    4. Evaluation Summary:

      This manuscript will be of interest to a broad audience of immunologists especially those studying host-pathogen interactions, mucosal immunology, innate immunity and interferons. The study reveals a novel role for neutrophils in the regulation of pathological inflammation during viral infection of the genital mucosa. The main conclusions are well supported by a combination of precise technical approaches including neutrophil-specific gene targeting and antibody-mediated inhibition of selected pathways.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #1 and Reviewer #2 agreed to share their names with the authors.)

    1. Reviewer #2 (Public Review):

      In this manuscript, Galbraith et al add to our understanding of COVID19 pathobiology by undertaking a cross-sectional survey of 73 hospitalized COVID19 patients with non-severe disease. They perform very broad multi-omics analysis, including plasma proteomics, cytokine profiling and mass cytometry. The authors propose that disease course can be classified by the titer of anti-CoV2 antibodies, which in turn is associated with distinct changes in circulating proteins, cytokines and immune subsets. Interesting correlations with complement and coagulation factors are noted. These findings suggest an alternative way to map disease progression in COVID19 and have implications for broader studies of COVID19 pathobiology. In particular, it will in interesting to extend this framework to analyze a broader spectrum of COVID19 patients, particularly those with poor outcome.

    2. Reviewer #1 (Public Review):

      Galbraith et al., using systems immunology approach document in a very detailed manner, provide the textbook example of innate and adaptive immune responses over time following an infection. Here, their clinical assessment is linked to SARS-CoV2 infection. While novelty aspects are not immense, this study is nonetheless well executed, detailed and thorough.

      The authors perform association studies and propose that simple seroconversion test should be considered in determining the clinical treatment. While some would argue that is already practiced and perhaps expected, the authors have done an excellent job at detailed immune analyses which they coupled with statistically sound associations. Thus these findings are important to document, and should be considered as experimental ex vivo evidence of what clinical practice may have implicitly already considered.

    3. Evaluation Summary:

      In this study, the authors use a systems immunology approach to document innate and adaptive immune responses during clincal SARS-CoV-2 infection. This general impact of this work is a better understanding of COVID19 pathobiology and more specifically, the identification of serum antibodies as a novel classification framework to understand COVID-19 disease course and associated changes.

      (This preprint has been reviewed by eLife. We include the public reviews from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. Reviewer #2 agreed to share their name with the authors.)

    1. Joint Public Review:

      The manuscript by Tachinawa et al. presents a new method (named RhIP), to study incorporation of recombinant epitope-tagged histone dimers into permeabilized cell nuclei. Using RhIP, the authors demonstrate that both H3-H4 and H2A-H2B and their variants are incorporated in this setup. They proceed with investigating context-specific features of these events, providing evidence that ongoing replication and overall chromatin structure may influence histone dimer incorporation in RhIP. This argues for RhIP having the potential to reveal the mechanisms of chromatin assembly and disassembly genome-wide, and determine how cell cycle and chromatin structure influence these dynamics.

      The system is capable of recapitulating major known chromatin assembly pathways and supports existing knowledge of histone dimer dynamics on chromatin. RhIP is also valuable in directly testing histone mutants or variants, as proven by authors.

      H3.1 incorporation is shown to be exquisitely dependent on replication, demonstrating that replication itself, as well as replication-dependent chromatin assembly are successfully reconstituted with isolated nuclei, cytosolic extracts and recombinant histones.

      The focus of the study is on the incorporation H2A variants, in particular H2A.Z. These data supports known notions about H2A.Z dynamics in chromatin, showing a preference for transcription start sites, and the dependence on the M6 region.

      However, the major limitation of the current manuscript is that it remains unclear what properties are driving the observed RhIP effects. This is not fully elucidated and thus limits the ability of RhIP to enable the discovery of new mechanisms.

      While replication-dependent mechanisms are well captured by RhIP, it is less clear if transcription and chromatin remodeling is functional in this system and thus if transcription-dependent nucleosome exchange processes are faithfully recapitulated. It is important to improve the comparison of RhIP with 'in vivo' (i.e. existing ChIP-seq datasets) localisation and explicitly develop hypotheses why in some cases the data matches the 'in vivo' situation and in others not. It would be helpful to improve the interpretation of the data to include all existing caveats to the assay setup.

    2. Evaluation Summary:

      The method presented in this article is of interest for all fields that interface with chromatin dynamics. It could provide a powerful tool to dissect the mechanisms of chromatin assembly and disassembly genome-wide, and determine how cell cycle and chromatin structure influence these dynamics. However, in the current form, the article falls short of its potential. Further validation of the data, and clarification of its implications is requested.

      (This preprint has been reviewed by eLife. We include the public review from the reviewers here; the authors also receive private feedback with suggested changes to the manuscript. The reviewers remained anonymous to the authors.)

    1. Reviewer #3:

      In this paper Werkhoven et al. ask a fundamental question in behavioral neuroscience - what is the structure of co-varying behaviors among individuals within populations. While questions in the context of inter-individual behavioral differences have been studied across organisms, this work represents a highly novel and comprehensive analysis of the behavioral structure of inter-individual variation in the fly, and the underlying biological mechanism that may shape this structure of covariation. In particular, for their experiments they combined a set of behavioral tests (some of them were explored in previous studies) to a 13-day long behavioral paradigm that tested single individuals in a highly controlled and precise way. Through clever analysis the authors interestingly showed strong correlations only between a small set of behaviors, indicating that most of the behaviors that they tested do not co-vary, exhibiting many dimensions of inter-individual variation in the data. They further used perturbations of neuronal circuits and showed that temperature and circuit perturbations can change dependencies among sets of behaviors. In a different set of experiments where they integrated gene-expression data (from the brains of single individuals), they showed that some of the genes are correlated with individual-specific parameters of behaviors. Interestingly, through comparison of inbred and outbred population they demonstrated that also outbred populations are showing relatively low covariance of behaviors across individuals.

      Overall, the data in the paper indicate that surprisingly, even for a 'simple' organism, there are many dimensions of inter-individual variation, e.g. many specific characters that can change among individuals in a non-dependent way. The ability of the authors to precisely measure such dependencies in such a highly robust and precise way allowed their investigation of the underlying processes that may generate this variation. The results in this study are highly interesting and novel. They uncover a general picture of the structure of behavioral variation among individuals and open many avenues for further analyses of the underlying neuronal and molecular mechanisms that control variation in sets of behaviors. Furthermore, the methods that were developed in this paper can be of great use by other researches in the field.

      However, while the key claims of the manuscript are well supported by the data and analyses methods, some aspects of data analysis need to be clarified or extended:

      • It is not clear what the motivation is for using the 'Effective dimensionality spectrum' analysis presented in the paper and how it significantly adds to existing methods of clustering that are relying directly on the correlation/distance matrix (some of them were used in this study).

      • While it is clear that the distilled behavioral covariation matrix has many independent dimensions (as the authors indicated, most of the a-priori PCs are not strongly correlated), the number of 'significant' Pcs was not calculated directly for the distilled matrix, and t-SNE analysis is presented only for the original covariation matrix (1L).

      • It is possible that some of the behaviors that covary across individuals in the high temporal resolution assay and also tend to be associated over time within an individual, may indicate sequences of behavior on longer time-scales (than the timescales in which parameters are quantified).

      • Further analyses are needed for extending the detection of correlations between variation in gene-expression data and the independent behavioral measures in the covariance matrix.

    2. Reviewer #2:

      In this paper, Werkhoven and colleagues describe a large-scale effort, using Drosophila, to study variation in behavior among individuals with identical genotypes, and raised in very similar environmental conditions. This addresses the important and basic question of how much behavioral variability exists under such conditions, e.g. due to stochastic processes during development. By looking across many different behaviors, the authors are able also to investigate the nature of this variability. The key conclusion of the paper is that this intragenotypic variability is high dimensional, and cannot be explained by a small set of behavioral syndromes. They find that this observation is robust to the method they use to quantify behavior, and also holds to different degrees in data sets acquired from outbred flies, or files subjected to genetic perturbations of neural activity. Furthermore, they have generated a data set that allows correlation of behavioral biases in individual animals with transcriptomic data. Altogether, this is an impressive study that, beyond its important conclusions, opens up the possibilities for many further explorations in this area, and should be interesting to a broad audience. The experiments are well designed and overall the paper is very nicely written and clear to understand.

    3. Reviewer #1:

      The definition of individuality and its neurogenetic basis is a fundamental problem in ethology and neuroscience. Individuals might fall into discrete groups of personality types; alternatively, individuals might be better described by a broader spectrum of independent traits. An unbiased and quantitative analysis of behavioural traits that make up an individual's personality is a prerequisite of investigating the neuronal and genetic basis of individuality. Given the technical challenges in systematically measuring many behavioural traits across sufficiently large and genetically defined populations and over long time-scales, these questions remain unanswered. This manuscript represents a tour-de-force trying to shed more light in these directions. Werkhoven and colleagues aim at characterizing structure in correlations among a large set of quantitative behavioural measures obtained from the model organism Drosophila melanogaster. The authors performed a large number of high throughput behavioural experiments that cover behavioural paradigms ranging from locomotion to perceptual decision-making. Data were acquired from an inbred, hence isogenic fly line, an outbred line, and various neuronal circuit manipulations. In addition, gene expression data were obtained from individuals. In this way, the authors were able to capture hundreds of behavioural metrics from hundreds of flies, while keeping their individual identities over the course of 13 days. They developed a computational analysis pipeline that quantifies the correlation matrix computed from these metrics. In a 2-step procedure, they condense this matrix into a "distilled" matrix, the entries of which contain all remaining behavioural covariates that were not a priori expected by the authors.

      A central claim in this paper is that any structure in this distilled matrix should reveal the principal axes along which individuality should be described. Based on these measurements and analyses flies could not be categorized into discrete types. Moreover, behavioral covariates appear rather sparse and derive from a high-dimensional behavioral space. This would mean that each individual fly is better described by a large combinatorial set of parameters. The same qualitative finding was made between inbred and outbred flies, leading the authors to a conclusion that larger genetic diversity does not change the principal organization of behaviour. The authors perform a set of neuronal-circuit manipulations and claim in conclusion that specific neuronal activity patterns underlie structure in behavioural correlations. Some correlations between gene expression and behavioral metrics were discovered, for example gene expression of metabolic pathways can predict some variability found in the behaviour of flies. The behavioural pipeline is sophisticated and presents a great leap forward in enabling researchers to capture a large set of behavioural measures from a large fly population, keeping the identity of individuals. The work is also presenting an innovative and interesting analysis pipeline.

      Although we applaud these ambitious experimental paradigms and computational techniques used, we have several major reservations about this work. Reading through the manuscript multiple times, one is left confused whether the major finding is that no structure whatsoever can be found in these data and to what extent the remaining sparse correlations are of biological / ethological relevance. Another major concern arises from the high level of trial-trial variability that is found in the data, which seems to preclude identification of persistent idiosyncrasies in the behavioural traits of individuals and impedes the reproducibility of the data matrices in two repetitions of the main experiment. We feel that most of the authors' conclusions and claims are confounded by these caveats.

      1) Distinguishing persistent idiosyncrasies from trial-to-trial variability and reproducibility of decathlon data

      A major challenge in measuring personality traits or individuality is to distinguish between persistent idiosyncrasies and trial-to-trial variation; the latter could result from inherent stochastic properties of behaviors, environmental or measurement noise. To identify an idiosyncratic behavioral trait in an animal one needs to show that individuals exhibit a distinct distribution in a behavioral metric that cannot be explained by trial-to-trial variability. Such a distinction cannot be made if a behavioral metric is measured just once or during a short period, but requires repeated measures over longer time-scales from a sufficiently large population of animals. Unfortunately, in this study many measures have been taken during just one 1-2hs episode per individual of a decathlon. For other measures that were taken repeatedly (circadian assays, unsupervised video acquisition) no efforts have been undertaken by the authors to make the above distinction. Hence, the authors' conclusion that there are no "types" of flies seems premature. In Figure S1 we are surprised to see how low most behavioral measures auto-correlate when recorded on two subsequent days; most auto-correlations further drop to meaningless values when compared over time-periods that correspond to the different epochs of a decathlon. This indicates that trial-to-trial variability dominates the data. In our view it makes little sense to ask whether two behavioral metrics are correlated or not, if their autocorrelations measured over the same time-scale are already extremely low. Moreover, Fig S5B shows that the two decathlons generate largely different data matrices (correlation ~0.25), raising concerns that the results are not reproducible. We wonder whether any structure in behavioral correlations was masked by various sources of noise in this study.

      Related to above, there should be error bars and number of flies for the plots in Fig S1. This figure undermines the starting point of the paper claiming persistent idiosyncratic behaviors.

      2) Given the concerns above, it is not surprising that the outbred fly line delivers another set of covariates which lack otherwise any further structure. If experiments with >100 inbred flies cannot deliver reproducible results, it cannot be expected that a similarly sized population of outbred flies would. Perhaps the needed population size must be orders of magnitudes larger in this case.

      3) Figure 3. It is intriguing to observe how the relationship between switchiness and clumpiness is perturbed upon temperature shifts. But, it seems rather uncorrelated at the restrictive temperature in the Iso line, with a slightly positive value. However, the switchiness-clumpiness correlation is not reproducible in both perturbation types at permissive temperatures. Note, that at both temperatures the Shi and Trp datasets show no - or very low correlations: the Trp lines produce correlations from approx. -0.2 (permissive T) to 0.1 (restrictive T); the Shi lines 0, 0.1 respectively. Fig 3D is very misleading in showing the best fits to the combined datasets. We are not convinced that there is a robust sign-inversion in any of these correlation. The authors' major conclusion that " thermogenetic manipulation and specific neuronal activity patterns underlie the structure of behavioral variation" is not supported by these data. The effect of temperature in the control line, although interesting, is a major caveat for interpreting the results from the Shi and Trp results.

      4) The authors measure a large set of low- and high-level behavioral metrics, e.g. walking speed and choices in Y-mazes respectively. A fundamental problem is that many of these metrics potentially have common underlying but trivial causes, e.g. covariation between speeds measured in various conditions is expected. Therefore, the authors condense their original correlation matrix (Fig 1E) into a distilled matrix (1G) by making such judgements. In the present form, it is impossible to evaluate how systematic or arbitrarily these choices were. In many cases, where the same measure was recorded repeatedly (e.g. circadian bout length) or across different conditions (e.g. mean speed) it is obvious, but for other cases it is not obvious at all for the non-expert: for example, why are circadian-bout-length and LED-Y-maze-choice-number lumped into one block of expected behavioral covariates? The current manuscript lacks detailed explanations how the authors systematically created the distilled matrix. Can the sparseness of the distilled matrix be a consequence of too generous pre-allocations? See also point (6). The bulk of the analysis in this paper is done on the "distilled matrices" which are produced by removing correlations within previously defined groups of behavioral metrics. This is said to cleanly reveal unexpected correlations, leading to a main result of the paper, the correlations between "Switchiness" and "Clumpiness". However, if the a priori categories were defined differently, then in the extreme case this correlation would have been completely removed. How sensitive is this correlation to the choice of categories, especially given that many of the Switchiness and Clumpiness metrics are from similar assays (Fig. S8)?

      5) For the second pipeline that uses t-SNE and watershed (Fig. 2 and S3C), a previous publication from some of the authors [1] appears to show low repeatability of this analysis.Thus, the repeatability and noise levels of the pipeline must be investigated further. These were 3x 1h recordings per decathlon. Related to comments (1-2), the authors need to show that the differences across flies (Fig 2C,D) are not expected from the level of trial-to-trial variability. Perhaps more data from individual flies need to be recorded?

      6) 1G: To our understanding, within-block entries to the distilled matrix should indicate zero correlations, because these are correlations between PCA-projections. But we see many nonzero entries. Given the information provided in the methods it is unclear why this is the case; this requires further explanation.

      In any case, within-block correlations are expected to be at least very low. Hence, we expect the distilled matrix to be relatively sparse given how it was calculated. Of interest are then the across-block correlations, the authors should make this point more clear to the readers.

      7) Some of the author's claims are related to the spectral dimensionality reduction technique described in Fig. S9. However, none of the real data shown in the main paper figures look qualitatively similar to the toy data. Indeed, the histograms from the main figures are on a log scale, and are thus not comparable to the toy data results. Although the technique might be well suited for certain classes of data, one interpretation of the main paper figures seems to be that no structure is revealed whatsoever. More work should be done to exclude this as a possible interpretation, at least by generating toy data that look like the real Datasets; also with respect to point (6) above.

      8) Throughout the paper, the authors use the term "independence" for orthogonal / uncorrelated datasets. Correlation/uncorrelation - dependence/independence are not interchangeable terms. To my understanding PCA decomposes into independent variables only under certain circumstances (multivariate normal distributed data). Have the authors tested for independence?

      [1] Todd, J.G., Kain, J.S. and de Bivort, B.L., 2017. Systematic exploration of unsupervised methods for mapping behavior. Physical biology, 14(1), p.015002.

    4. Summary: This manuscript is interesting to circuit-neurobiologists, behavioural biologists and psychologists. The reviewers agree that this manuscript addresses an important unanswered question: what is the covariation-structure in the vast space of behavioural variables that individuals can explore, and what defines their individuality in this space? The reviewers also praise the great efforts made in the experimental approach and analyses methods, which potentially will set new benchmarks in the field. However, the work can be improved, by accounting for the trial-to-trial variability in behavioural data and clearly distinguishing these from persistent idiosyncrasies observed in individuals.

    1. Reviewer #4:

      This manuscript by Huss, P., et al, is a major technological step forward for high throughput phage research and is a deep dive into the deep mutational landscape of a portion of the T7 Phage receptor binding protein (RBP). The author’s develop a new phage genome engineering method, ORACLE, that can generate a library of any region of the phage genome. They apply ORACLE to do a deep mutational scan of the tip domain of T7 RBP and screen for enrichment in several bacteria. The authors find that different hosts give rise to distinct mutational profiles. Exterior loops involved in specialization towards a host appear to have the highest differential mutational sensitivity. The authors follow up these general scans in the background of phage resistant hosts. They find mutations that rescue phage infection. To demonstrate the utility of the approach on a clinically relevant task, the authors apply the library to a urinary tract associated clinical isolate and produce a phage with much higher specificity, creating a potentially powerful narrow scope antibiotic.

      Overall, the ORACLE method will be of tremendous use for the phage field solving a technical challenge associated with phage engineering and will illuminate new aspects of the bacterial host-phage interactions. It was also quite nice to see host-specialization validated and further explored with the screens done in the background of phage resistance mutations. The authors do a tremendous job digging into potential mechanisms when possible by which mutations could be altering fitness. We especially appreciate how well the identity of amino acids tracks host specialization within exterior loops.

      We have no major concerns about the manuscript but have some minor comments to aid interpretation. There are also some minor technical issues. We think this manuscript will be of broad interest, especially for those in the genotype-phenotype, phage biology, and host-pathogen fields.

      Minor comments:

      P5L20: In the introduction to the ORACLE section the authors mention homologous recombination then they mention using 'optimized recombination' that is done with recombinases. This contrast should be mentioned somewhere perhaps to highlight the benefit of having specific recombinases.

      P6L16: Using Cas9 to cut unrecombined variants is clever... Cool! This is a real 21st Century Dpn1 idea.

      P6L27 The authors state that there is a mild skew towards more abundant members after ORACLE. Why might this be? In iterations more abundant members simply become even more abundant? To be clear this isn't a substantial limitation and it's common to see these sorts of changes during library generation. Just curious. Overall looks like a fantastic method.

      P7L6: Authors mention ORACLE increases the throughput of screens by 3-4 orders of magnitude. How many variants can one screen? Is this screen of a little over 1k variants at about the threshold of the assay?

      P8L7: The authors assign functional scores based on enrichment and normalize to wild type. Is a FN=1 equivalent to wild type?

      P9L5: Awesome!

      P10L7: Authors mention R542 forms a hook with a receptor. There should be a citation here.

      P10L21: For N501, R542, G479, D540 there are wonderful mechanistic explanations. However, for D520 there is not. Any hypothesis for why this is distinct from the others? Are there other residues that behave similarly? I feel it would be really helpful to have a color scale that discriminates between FN 1 (assuming wild type) and enriched/depleted w/in figure 3A.

      P12L4: Authors note residues that are surface exposed yet intolerant to mutations in the previous paragraph. Authors also calculate free energy changes with Rosetta and state free energy maps pretty well with tolerance. What is the 93% based on? Perhaps a truth/contingency table would be useful here to discriminate/ compare groupings. What residues are in the 7% others. Can the energy scores help understand the mechanisms behind the mutations better?

      P12L7: Authors state substitutions predicted to stable and classified intolerant could indicate residues necessary for all hosts. What about those that fall outside of the groupings? Unstable residues can also be necessary.

      P14L22L Authors mention comparing systematic truncations, however they do not present any figure. This should be in a figure to aid in looking at the data and would surely be helpful to people in the phage field. A figure should be included here especially because this is one of the main discussion topics at the end of the manuscript.

      P16L2: The authors did the selection in the background of a clinically isolated strained and discussed 3 variants that were clonal characterized. Was this library sequenced similar to before?

      Figures:

      Barplots need significance tests.

      Figure 2C-E ; Fig 3A. All figures are colored white to red. With this color scale it's hard to appreciate which variants are neutral vs those that are enriched. A two or more color scale would be more appropriate. Log-scaling might be wise to get a better sense of the dynamic range that is clearly present in fig2F.

      FIg 4F: Needs a statistical test between bar plots.

      Fig6A-C: These figures have tiny symbols that represent the architecture at an insertion position. It's probably easier to look at if the same annotations from Fig 4B or C for architecture were used.

      Fig6D: needs tests for significance

      Supp fig 4E: This figure is the first evidence that the physics chemistry of amino acids w/in surface exposed loops determine host specificity. This is followed up by Figure 4D and E. I would consider moving this to one of the main figures.

      Supp fig 5: A truth table could be useful here to test for ability to classify based on rosetta compared to FD. It looks like here that the tolerant residues have a distinct pattern

      Why are these colored white to red?

    2. Reviewer #3:

      Huss et al. describe a phage genome engineering technology that they call ORACLE. This technique uses recombineering of a phage target gene with a variant library to identify both gain and loss of function mutations. The beauty of this method and what makes it superior to other techniques is that it dramatically limits loss of mutants that are less fit during the initial round of library generation. Thus, the pool of variants is vast and is reduced in bias toward more fit species based on the host used for initial library amplification. They use the model coliphage T7 as a proof of principle and show that several previously unidentified residues in the T7 tail fiber play critical roles in both loss and gain of function for phage infectivity and they also identify residues that are major drivers of altered host tropism. Lastly, they apply this library to a pathogenic UTI associated strain of E. coli which is normally resistant to wild type T7 infection and identify tail variants of T7 that can now infect this strain, highlighting the applicability of this method toward the discovery of engineered phages that could be used therapeutically. Altogether this is an important advancement in phage engineering that shows potential promise for future phage therapies.

    3. Reviewer #2:

      The authors are reporting a new approach termed ORACLE to develop locus-specific phage variants, which includes a recombination step, whose efficacy is improved by the overexpression of a dedicated recombinase, followed by an enrichment performed using CRISPR/Cas9. They applied this method to create a mutant library containing 1660 variants of the tip domain of the T7 tail fiber. Performance of each variant was determined by quantifying their abundance before and after selection on three E. coli strains compared to the WT phage. Their findings show that single amino acid changes in the tip of gp17 can have major consequences on phage performance on different hosts. Then they tested whether these variants would be less prone to select phage-resistant using an UTI strain. Finally, they searched for variants that would be more prone to infect one host than another and successfully tested their predictions.

      The ORACLE approach is overall novel and has some advantages over existing methods, mainly for generation of mutation libraries of genes. Authors did a nice (even if very lengthy) job of showing how mutants have consequences to structure and function of the tail fiber gene and how that influences performance on different hosts, including combating host resistance.

      The authors state that ORACLE overcomes three major hurdles that make it better than existing methods, one of which is "generalizability for virtually any phage", while denouncing other systems for being applicable for highly transformable hosts only. This is highly exaggerated since ORACLE requires transformation of two plasmids (helper and donor) including one with tunable gene expression, which is clearly not possible in many bacteria. Furthermore, the enrichment step requires a strain with a functional CRISPR/Cas9 system, which again is not so obvious in the bacterial world.

      The authors disregard bias that can be generated at the "O" step if a variant reproduces better than the wt. They should also mention bias arising from non-viable or severely infection hampered variants, which is briefly mentioned later in the manuscript but should be mentioned earlier, would not pass the accumulation step.

      The weakest paragraph is the one dealing with the UTI strain. I have the feeling that this paragraph could simply be deleted without changing the overall story. Approaching resistance, selection, and evolution would require more experiments than the very simplistic lysis curves. The authors did not even show adequately that cells growing after 5-10 hours are either genotypically or phenotypically resistant cells. A more appropriate qualification would be "insensitive" instead of resistant.

    4. Reviewer #1:

      Huss et al. have developed a novel tool (ORACLE) for generating libraries of phage variants. They go on to apply this tool to study the residues important for T7 host specificity, providing a rich dataset for in-depth functional studies. They validate a subset of hits and use this information to engineer T7 variants that may be able to overcome bacterial resistance against a urinary tract infection associated strain, consistent with their in vitro results. Their approach provides both a valuable new tool and intriguing biological insights prompting future studies.

      Major suggestions for improvement:

      1) The writing could be much more concise.

      2) Claims about generalizability should either be removed or supported by additional data. This study focused on a single phage gene and a single host bacterial species. As such, it is not clear if ORACLE will work well in other contexts.

    1. Reviewer #3:

      In this manuscript, the authors investigated roles of PSD95 in the hippocampus for contextual fear extinction. The authors showed that PSD95 levels in the spine and density of PSD-95-positive spines in the dorsal CA1 (dCA1) are changed following contextual fear conditioning and extinction learning. Interestingly, overexpression of PSD95-S73A mutant or chemogenetic inhibition of dCA1 impairs only the second extinction learning at 24 hrs following the first extinction learning. Importantly, these manipulations also blocked the changes of PSD95-positive spines following the first extinction learning. These observations suggest that phosphorylation of PSD95 at S73 in the dCA1 of hippocampus contributes to contextual fear extinction. This manuscript suggests the importance of PSD95 phosphorylation in the hippocampus in some aspects of mechanisms of contextual fear extinction at the molecular and spine levels. However, the title, abstract and conclusions do not well reflect observations and experimental designs in this manuscript. I have several concerns as follows.

      Major concerns:

      1) The authors used viral overexpression of PSD-95 S73A mutant that may function as a dominant negative mutant, but not knock in mutation. Therefore, the function of phosphorylation of PSD 95 at S73 on spine morphology and contextual fear extinction have been not yet investigated well. The experimental design in this manuscript made limitations to understand behavioral results. It is better to use knock-in mutation strategy than overexpression of the mutant. Alternatively, the authors can examine the phosphorylation levels of PSD95 following contextual fear conditioning and extinction learning and/or function of this mutant at the molecular and cellular levels using biochemistry/molecular biology/cell culture.

      2) Overexpression of S73A or chemogenetic inhibition of CA1 impaired additional extinction learning. These observations are interesting. However, the authors have not well characterized these findings at the behavioral levels. In other words, the authors should clarify the effects of these manipulations on contextual fear extinction at the behavioral levels. According to abundant knowledge of fear memory extinction, the behavioral results in this manuscript raised a lot of questions to understand the impact of those genetic manipulations on "contextual fear extinction". How about effects on extended extinction learning (60 min), additional 30 min extinction learning at the same day after first extinction training, spontaneous recovery, renewal, and reinstatement? Some answers of these questions will help to understand behavioral observations in this study and enable us to identify roles of PSD95 and its phosphorylation in extinction of contextual fear memory. It is also important to examine PSD95-positive spines just after the additional extinction learning to understand behavioral observations.

    2. Reviewer #2:

      Ziółkowska et al. investigate synaptic processes in the dorsal hippocampal CA1(dCA1) region with the goal of testing the role of postsynaptic density protein 95 (PSD-95) dynamics in contextual fear extinction. They conclude that 1) extinction increases synaptic dCA1 PSD-95 levels and induces remodeling of dendritic spines, 2) extinction-related PSD-95 changes are mediated by phosphorylation of PSD-95 at serine 73, and 3) phosphorylation of PSD-95 at serine 73 as well as dCA1 activity are required to "update a partially extinguished fear memory". The experiments provide new insight and address a timely and important issue. The major strengths of the paper lie in the use of a wide range of complementary technical approaches, and the significance of addressing specific molecular mediators of fear attenuation. However, some of the analysis is based on inadequately justified or inappropriate measures (e.g. that do not directly assay the phenomenon under investigation), and there are concerns about independent effects of viral overexpression in this system as well as the relevance of the behavioral analysis. The conclusions from the paper, if true, would appear to support a very intricate model involving PSD95 phosphorylation and synaptic accumulation after extinction, but because of weaknesses in the underlying evidence, these mechanisms and their relationship to extinction memory were not persuasively demonstrated. Following are some specific concerns:

      1) The mean intensity of PSD95 labeling per spine appears to be affected in some hippocampal layers (Fig. 1), but this might be attributable in some cases to elimination of spines that have relatively lower PSD-95, rather than a change in PSD-95 levels, per se.

      2) The quantification of overexpressed PSD-95 in Fig. 2 makes unclear what specifically has been measured. The methods suggest that % area is defined as the total area of mCherry labeling divided by the total image area. This is not a direct measure of PSD-95 levels, rather than morphological or protein localization changes. Furthermore, the localization of overexpressed PSD-95 (Fig. 2) is clearly very different from that of endogenous PSD-95 (Fig. 1) in that it accumulates throughout the dendrites. This makes it unclear what a "puncta" represents, or whether the analysis implies anything about synaptic function.

      3) The authors argue that S73 phosphorylation is required for synapse elimination during extinction, but Fig. S2 (which is not referenced or discussed in the manuscript) and Fig. 3 indicate that the effect of S73A overexpression is to dramatically reduce spine density in both behavioral groups. It is therefore not clear whether the manipulation interacted with extinction to prevent spine removal, or simply occluded such an effect because spine density was already at an artificial floor prior to any behavioral training. Overexpression of the wildtype construct also reduced spine density to a similar degree. Furthermore, the S73A mutant protein dramatically increased PSD area (Fig. 3d), which apparently contradicts the notion that phosphorylation of this site is required for synaptic accumulation, when applying the same logic used elsewhere in the paper. These are serious confounding issues because the central claim of the paper is that S73 phosphorylation mediates PSD95 synaptic accumulation and synaptic strengthening.

      4) The authors suggest that successive days of extinction represent a distinct process called updating of a partly extinguished memory, which they seem to imply has different molecular requirements. There appears to be no basis in the literature for this idea.

      5) The analysis of extinction relies on measurement of within-session decreases in freezing. However, within-session extinction has been shown to be neither sufficient nor essential for between-session extinction. It is not even clear that within-session extinction is really even extinction at all, rather than, for example, habituation. It is essential to examine the retention of decreased freezing across days in order to establish that the formation of long-term memory is involved.

      6) Finally, numerous comparisons are made between animals that received FC, with no further manipulation, and extinguished animals. This design leaves open the possibility that any differences are attributable not to an extinction process but instead to context exposure independent of fear regulation. A behavioral control in which animals receive context exposures, but no shocks, would be very useful.

    3. Reviewer #1:

      Patients with posttraumatic stress disorder show impaired fear extinction that leads to persistent fear memories. The CA1 subregion of the hippocampus has been implicated in the acquisition and extinction of contextual fear memories, and both mechanisms depend on glutamatergic synaptic plasticity in this region. Postsynaptic density protein 95 (PSD-95) is known to regulate structural and functional changes in glutamatergic synapses, but whether PSD-95 participates in the acquisition and extinction of contextual fear memories remains unclear. To address this question, here Ziółkowska and coworkers used nanoscale-resolution analyses of PSD-95 protein in the CA1 combined with genetic and chemogenetic manipulations in mice exposed to a classical Pavlovian contextual fear conditioning paradigm. The study revealed that PSD-95-dependent synaptic plasticity in the dorsal CA1 area is not necessary for fear acquisition or the initial phase of fear extinction, but is critical for updating a partially extinguished fear memory. In addition, phosphorylation of PSD-95 at serine 73 is necessary for contextual fear extinction-induced PSD-95 expression and remodeling of dendritic spines in this region, suggesting a potential mechanism for fear memory persistence.

      This timely study provides important and novel findings with regard to the role of PSD-95 protein in fear extinction formation and helps to advance our understanding of how dendritic changes in the hippocampus regulates fear maintenance. The present findings should be of general interest to the scientific community because extinction-based therapies are the gold-standard treatment for many fear-related disorders. The manuscript is clear, and the experiments were well-designed and executed. While the study is elegant, there are several important points including data interpretation that need to be clarified.

      Major points:

      1) The authors identified changes in PSD-95 expression levels and spine density after both fear acquisition and fear extinction. Similarly, S73-dependent phosphorylation of PSD-95 and changes in spine density were also reported following both phases. How do the authors explain the lack of effects on fear acquisition and extinction after the infusion of S73-deficient PSD-95 expressing virus? Does this suggest that the observed dynamics of PSD-95 are not important for the fear memory expression? The interpretation of these findings should be clarified in the discussion.

      Previous studies have demonstrated a key role of dorsal hippocampus CA1 area on fear retrieval and extinction acquisition using either lesion (e.g., Ji and Maren 2008, PMID: 18391185), or optogenetic tools (e.g., Sakagushi et al, 2015, PMID: 26075894). However, in the present study, chemogenetic inhibition of this same region had no effect on fear retrieval or extinction acquisition (Figures 5 and 6). How do the authors reconcile the lack of effects on fear retrieval and extinction acquisition with the previous literature? Similarly, previous studies on the role of hippocampal PSD-95 protein in extinction memory should be described and the main differences in the experimental design and findings should be discussed (e.g.; Nagura et al, 2012, PMID: 23268962; Cai et al, 2018; PMID: 30143658; Li et al 2017, PMID: 28888982)

      2) The authors have used scanning electron microscopy to analyze the ultrastructure of dendritic spines and determine whether PSD-95 regulates extinction-induced synaptic growth. In addition, the authors complemented these studies by investigating the effect of PSD-95-overexpression and fear extinction training on synaptic transmission in the dorsal CA1 ex vivo. However, it is hard to understand what does the observed changes in dendritic spines and amplitude of EPSCs mean if the behavior of the animals was the same. This point should be discussed in the article.

      3) In Figure 5, the authors showed that chemogenetic inactivation of CA1 changed PSD-95 expression in all the 3 subregions of CA1 (stOri, stRad and stLM). However, the extinction training behavior in Figure 1 demonstrated an effect only in 2 subregions (stOri and stLM). The authors should clarify this discrepancy. In addition, in the same series of experiments (Fig. 5Ciii), it is unclear whether the reduction in PSD-95 expression induced by chemogenetic inactivation is sufficient to bring the PSD-95 expression to the same post-conditioning levels.

      4) The authors showed an interesting behavioral effect in the second part of the extinction phase (Figure 6C), similar to the results in Figure 4C. However, to confirm that phosphorylated PSD-95 is crucial for the maintenance of extinction memory, the authors may want to consider a direct comparison between the levels of phosphorylated PSD-95 right after extinction 1 and extinction 2. Differences in the expression would clarify whether the phosphorylated PSD-95 expression is further increased after additional extinction training, which would help to link the effect of chemogenetic inactivation on behavior. At least some discussion is needed for this part.

      5) The authors used immunostaining and confocal tools to analyze 3 domains of dendritic tree of dorsal CA1 area in Thy1-GFP(M) mice (stOri, stRad and stLM) on different fear phases (conditioning and extinction). They found a significant decrease of PSD-95 expression, spine density and spine area in stOri and stRad during conditioning and a rescue of such decrease during extinction. However, the authors’ interpretation is that extinction resulted in an upregulation of PSD-95, which doesn't seem to be the case if you compare the numbers with the naïve group. Please clarify this point.

    4. Summary: This timely study provides important and novel findings with regard to the role of PSD-95 protein in fear extinction formation and helps to advance our understanding of how dendritic changes in the hippocampus regulates fear maintenance. The findings should appeal to those interested in hippocampal function, fear and fear-related conditions, and extinction-based therapies. The major strengths of the paper lie in the use of a wide range of complementary technical approaches, and the significance of addressing specific molecular mediators of fear attenuation. Reasonable alternative explanations were identified for some of the key findings and the conclusions may not perfectly reflect the observations and experimental designs.

      Reviewer #1 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #3:

      This paper is primarily about modeling the ERK pathway during the induction of synaptic plasticity. This pathway has been previously modeled, and this is cited in the paper. The main addition here is the addition of the effect of SynGap which is necessary in some form of LTP. This is a very detailed study, and what it seems to primarily show is that the ERK pathway favored spaced vs. massed stimulation protocols. This is a very detailed paper, but no conceptually new ideas are presented here. The paper adds to an existing foundation, but fails to make the case that this is a very significant addition. What is the significant consequence at a higher level of these added details?

      The ERK pathway is just one component of a much larger set of pathways that control synaptic plasticity, how much do we learn from studying this pathway in isolation? Also, the paper cites the importance of this pathway to L-LTP, is it the induction phase of L-LTP? It seems so because ppRRK decays in less than an hour. How then does this pathway contribute to the maintenance of L-LTP? These processes, such as a possible upregulation of protein synthesis, are not part of this model either.

      This paper studies in detail different pathways that influence ERK activation in synapses. This is a very detailed study, but how many details do we actually know? For a detailed paper though it seems that many of the details are missing. Is there a detailed diagram of reactions, or set of equations for all these reactions? Some coefficients are named in figure 1, and this might be sufficient for a schematic description of the model in the paper, however there must be somewhere a detailed description of all reactions. How many species are there here, how many coefficients? How are coefficient values known? How many coefficients are directly estimated? The paper does carry out an extensive robustness analysis, though it is not well explained.

      What are the major takeaways from this paper, and what experiments could test this model?

      To summarize, the paper is very detailed, carefully constructed and executed, but it fails to convince that the problem it addresses is very significant, and it makes no conceptual breakthroughs.

    2. Reviewer #2:

      Miningou and Blackwell in their manuscript titled "Temporal pattern and synergy influence activity of ERK signaling pathways during L-LTP induction" explored the contributions of upstream pathways to ERK activation during LTP. The authors expanded on their previously published LTP model to assess the influence on ERK activation of each of the upstream pathways originating from cAMP or Ca+2 activated with differing temporal patterns. This manuscript's aim is quite germane, since 1) ERK plays such a central role in learning and memory and its cellular proxy LTP; 2) the Ca+2/cAMP/PKA system is highly complex and nonlinear, with multiple feedback loops. The resulting manuscript has the potential to be impactful. The approach of using a stochastic reaction-diffusion model is state of the art and appropriate for the modeling of these subcellular events in spines. And the modeling insights are very intriguing as the authors predict that ERK activation by cAMP/PKA or Ca+2 pathways differ in their linearity, these pathways can synergize during LTP and this may involve a novel feedforward loop containing synGAP. The authors do a marvelous job placing their findings within the huge body of LTP literature.

      There are, however, a couple of points that I feel should be addressed:

      1) There needs to be additional technical detail on how the original models were expanded. The model presented here was developed by merging Jȩdrzejewska-Szmek et al., 2017 and Jain and Bhalla, 2014 models. These models were developed based on experimental data and validated with independent experimental datasets in a rigorous manner. It is not clear how the combining these two models, and the additional molecules and reactions added have affected the dynamics of ERK activation, and how comparable they are to the original experimental data used for model development in the previous modeling efforts. It is not clear if the model was reparameterized.

      2) Beyond the ERK activation traces, it would be useful for clarity sake to also include the simulated traces for the activation of the upstream molecules (PKA, RAS, RAP, etc). Given how additional changes have been made additional information should be provided to ensure that the contribution of each pathway is accurately represented.

    3. Reviewer #1:

      This study takes on the question of the roles of the many pathways leading to ERK activation in long-term potentiation. This is an advance: few models consider more than a couple of input pathways. The authors consider two aspects: how pathways sum to give strong responses, and distinct temporal pattern selectivity. They show that both summation linearity, and pattern selectivity, are strongly governed by which pathways are engaged in driving the response.

      The model and analysis is potentially interesting, but the paper would be much strengthened if there were more convincing validation of the properties of the model by way of simulations to compare with experiments. Further, the pathways chosen are already one step into the synapse. Thus the actual combination of pathway activations would not be quite as cleanly separated if they were driven by synaptic input.

    1. Summary: This research makes important, incremental contributions to the fundamental understanding of propofol interactions with bacterial voltage-gated sodium channels.

      Public Review:

      The reviewers agree that this research adds to the fundamental understanding of propofol interactions with bacterial voltage-gated sodium channels. Here an objective avenue to binding site mapping is taken involving a photoactivated azide propofol derivative. The strategy identifies two adjacent sites at the intracellular face of Nav channels. These sites are provocative as they settle into a mechanistically rich channel region where the voltage-sensor is coupled to the pore. The manuscript is well-written and referenced, and the conclusions are aligned with the data presented. The methods are appropriate, the data appear to be of high quality. The manuscript is internally consistent and well written. The findings are quite interesting.

      The primary concern is that these results were deemed to add incrementally to recently published studies (Yang et al., JGP, 2018) which came to similar conclusions, without the support of the photoaffinity ligand results. Additionally, there were questions about whether voltage-gated sodium channels are involved in the anesthetic actions of propofol, technical questions about molecular simulations, and suggestions for control experiments.

  3. Jan 2021
    1. Reviewer #3:

      Behaviours that are instrumental for producing reward can be either goal-directed or, after repeated practice, habitual. Tasks that dissociate these types of learning, notably outcome devaluation, are tricky to implement for studying intravenous drug delivery although there is great interest to understand the role of habits in controlling drug use and addiction and so this paper is important in that regard. This article takes a new approach analyzing response latencies to infer the types of decision-making process that underlies a reward-seeking behaviour. Goal-directed behaviours are argued to involve evaluation of the outcome of responding and/or deliberation between choices both of which should take time, and slow responding relative to an efficient but inflexible habit. So I think this approach is quite interesting. The paper is well written and the predictions are clear.

      My main issue in evaluating the current article is that while different predictions are made about when response latency should be relatively fast or slow, since the article is framed in terms of dissociating goal-directed and habitual processes, I feel there should be some independent evaluation of whether the target behaviour is in fact goal-directed or habitual. The authors rely on the amount of training as extended training has been shown to promote habitual control. However, exactly how much training is needed and how other parameters (type of reward, schedules of reinforcement, choice or single outcome) affect when habitual control may emerge varies widely in the literature and I don't think we can take for granted that after a certain amount of training responding will be habitual without testing that.

      It is also important to consider alternative explanations for differences in response latency. A behaviour that is well-practiced might well be expected to become more efficient and faster. This need not be due to habit formation. The authors acknowledge the possibility that responding could be at floor but don't really discuss it or whether it might apply more to the saccharin response.

    2. Reviewer #2:

      When animals are given a choice between drug and nondrug reinforcers, they will most often choose the nondrug alternative even when presented with highly reinforcing drugs of abuse. This is difficult to reconcile with known behavior in humans and for modeling aspects of addiction that are critical to the disorder, such as choosing to use drugs above all other reinforcers. Recent work by this same group has reported that responding for nondrug reinforcer is, surprisingly, insensitive to devaluation. This suggests that the choice for the nondrug reinforcer is under habitual, rather than the presumed goal-directed, control and may explain why animals most often choose the nondrug reinforcer over drug reinforcers. Moreover, because there is no devaluation procedure for determining whether drug choice is habitual or goal directed, it's not known if choice for drug is also habitual or remains goal-directed.

      The manuscript by Vandaele et al., therefore, sought to develop a procedure for determining whether behavior of rats making choices between saccharin and cocaine reinforcers was habitual or goal-directed based on reaction times (RT). Based on previous theories, the authors argue that goal-directed behavior should have slower RTs on choice trials versus sampling trials (e.g., because animals are deliberating between the alternatives) whereas habitual behavior should have similar RTs across both sampling and choice trials. The authors also present a third possibility in which options are evaluated sequentially, rather than simultaneously, resulting in RTs being longer in the sampling versus choice trials. The authors report that rats with minimal training and who are presumed to be goal-directed have slower RTs in choice trials compared to sample trials whereas rats that have had extensive training have similar RTs in the choice and sampling phases. These findings are consistent with their hypotheses. Moreover, they demonstrate that in the small subset of rats that prefer cocaine over saccharin, RTs in the sampling trials are longer than that in the choice trial suggesting that cocaine preferring rats are not evaluating each of the options. These data are the first to evaluate habitual responding for a drug reinforcer and suggest that comparing latencies across different task phases could be used to measure habitual and goal-directed behaviors.

    3. Reviewer #1:

      Vandaele et al. probe the mechanisms of decision making in rats when making a forced choice between drug and non-drug reward. The authors have led the field in this domain. In this manuscript, a retrospective analysis of choice response times from many rats in their past work is used to tease out potential decision-making mechanisms. We know already from decades of work that choice response times are almost always log-normally distributed (humans, non-human primates, rodents). The question here is whether differences in the mean and dispersion of these distributions can be used to derive insights into nature of the decision-making mechanism - a deliberative comparison versus a race model - and how this may differ for rats that prefer cocaine over saccharin and how this might be altered by more extended training. These questions are framed in terms of the differences between goal-directed and habitual behavior which, to be frank, I found less compelling (these response time data are of significant interest in their own right). I enjoyed reading this manuscript. It was thoughtful and well presented. I have only two comments.

      First, much, if not all, of the absolute differences between latencies in sample and choice phases appear to be carried by the sample rather than the choice phase. Choice latencies for cocaine preferring rats, saccharin preferring rats, and the indifferent rats are all very similar. In contrast, the sampling latencies for cocaine preferring rats and the indifferent rats are longer. I am not sure why this should be. My reading was that the authors were more concerned with the choice side of the experiment being different, not the sample phase. Is this predicted by the models being tested? I struggled to understand why an SCM-like model would predict the difference being in the sample phase. Either way, the authors could be clearer about where the difference is expected to lie and why the sample phase is so obviously different in some conditions and the choice phase so similar.

      Second, the main and real issue for me is whether the differences between response latencies in the sample versus choice phases plausibly reflect operation of different decision making mechanisms (race model versus deliberative processing) or different operation of the same decision-making mechanism. I don't know the answer, but I could not really derive the answer from the data and modelling provided. The authors frame the differences in response time as being uniquely predicted or explained by different forms of choice. The models that the authors are using are closely linked to, and intellectually derived from, models of human choice reaction time. The most successful of these models are the diffusion model (DDM) (Ratcliff, R., Smith, P.L., Brown, S.D., and McKoon, G. (2016). Diffusion Decision Model: Current Issues and History. Trends in Cognitive Sciences 20, 260-281) and the linear ballistic accumulator (LBA) (Brown, S.D., and Heathcote, A. (2008). The simplest complete model of choice response time: linear ballistic accumulation. Cognitive Psychology 57, 153-178.2008).

      Even though the DDM and LBA adopt different architectures to each other (but the same architectures as those in Supp Fig 1A), they are intended to explain the same data. Of relevance, the same model (a DDM or an LBA) can explain differences in both the response distribution and the mean response time via changes in the starting point of evidence accumulation, rate of evidence accumulation, and/or the boundary or threshold at which evidence is translated into choice behavior. So, for either a difference accumulator model (DDM) or a race model (LBA), the difference between sampling and choice performance could reflect changes in how the model is operating between these two phases, including a change in the starting point of the decision [bias], a change in rate of accumulation [evidence], a change in threshold [caution] or collapsing boundary scenario, rather than reflecting operation of a completely different decision-making mechanism.

      In thinking of a way forward I readily concede I could be wrong and the authors may effectively rebut this point. Another option could be to acknowledge this possibility and discuss it. E.g., does it really matter if it is a qualitatively different decision-making process or different operation of the same decision-making mechanism? I don't really think the action-habit distinction lives or dies by reaction/response time data, this distinction is almost certainly far less absolute than often portrayed in the addiction literature, and it is generally intended as an account of what is learned rather than an account of how that learning is translated into behaviour (even if an S-R mechanism provides an account of both). Response time data tell me, at least, something different about how what has been learned is translated into behaviour. The third, marginally more difficult but more interesting option, would be to explore these issues formally and to move beyond simple descriptive or LDA analyses of response time distributions. The LBA has a full analytical solution and there are reasonable approximations for the DDM. Formal modelling of choice response times (e.g., Bayesian parameter estimation for a race model or DDM) could indicate whether a single decision-making mechanism (LBA or DDM or something else) can explain response times under both sample and choice conditions or not. This is a standard approach in cognitive modelling. This would be compelling if it showed the dissociation the authors argue - i.e. one model cannot be fit to both sample and choice datasets for all animals. However, if one model can be fit to both, then formal modelling would show which decision making parameters change between the sample and choice conditions for cocaine v saccharin v individual animals to putatively cause the differences in response times observed. Either way, more formal modelling would provide a platform towards identification of those specific features of the decision-making mechanisms that are being affected.

    4. Summary:

      In this manuscript the authors perform a retrospective analysis in attempt to delineate the role of goal-directed versus habitual mechanisms underlying choice between drug and non-drug rewards. Specifically, the authors utilized data generated in their laboratory to assess cocaine-versus-saccharin choice following limited and extended training paradigms. A sequential choice model was used to assess the prediction that increased latencies during choice reflect goal-directed control; whereas no change in latencies reflects habitual control. Based on this model, the authors report that rats engage in goal-directed control after limited training, and adopt more habitual responding after extended training. The authors conclude that the sequential choice model is specific to habitual choice.

      While the Reviewers appreciate the approach and conceptual framework described in this manuscript, they are all in agreement that additional data and analyses are needed to better support the claims surrounding goal-directed versus habitual control of reward-seeking behavior. For example, an independent evaluation of whether the target behavior is in fact goal-directed or habitual seems necessary to support such claims. Reviewers’ comments and suggestions for improvement are included below.

      Reviewer #1 and Reviewer #2 opted to reveal their name to the authors in the decision letter after review.

    1. Author Response:

      Reviewer #3:

      However, a lot of the data presented in the manuscript are not novel and were previously published. A recent Molecular Cancer Research paper by Llabata and collaborators published in April 2020 (referred to in the text) has already identified the same MGA interactors by Mass Spectrometry and the same binding sites by ChIP-Seq using human lung adenocarcinoma cell lines. Llabata et al. found that MGA interacts with the non-canonical PCGF6-PRC1 complex (named PRC1.6) that includes L3MBTL2 and that the complex also contains MAX and E2F6 but not MYC. They clearly show that MAG binds to and represses genes that are bound and activated by MYC convincingly showing that MYC and MGA have opposite functions. This unfortunately tempers the enthusiasm of the reviewer.

      This reviewer states that "... a lot of the data presented in the manuscript are not novel and were previously published". The reviewer goes on to write that the Llabata et al. 2020 paper (referring to doi: 10.1158/1541-7786.MCR-19-0657 [https://mcr.aacrjournals.org/content/18/4/574]) "has already identified the same MGA interactors by Mass Spectrometry and the same binding sites by ChIP-Seq using human lung adenocarcinoma cell lines. Llabata et al. found that MGA interacts with the non-canonical PCGF6-PRC1 complex (named PRC1.6)..." ​

      We strongly disagree with the reviewer's statements.

      1) A major focus of our paper is that it provides and validates a mouse model in which we delete MGA and demonstrate its tumor suppressive activity. The experiments in Llabata et al., including the biological assays and the ChIP_Seq, were done by overexpressing MGA in cells which already express endogenous MGA. Therefore, all their data monitor the consequences of overexpression of MGA, a situation without clear biological relevance. In the experiments reported in our paper, we delete MGA. Therefore our molecular data refer to a comparison between MGA null and the same cells expressing endogenous MGA. This is important since MGA is a tumor suppressor and its loss of function is what is crucial biologically, as we show here or the first time in our lung adenocarcinoma model. Furthermore, by deleting MGA we were able to show that its loss corresponds to an increase in a core set of target genes previously associated with PRC1.6. Furthermore, we show that members of this core group are relevant to the proliferation of tumors that lack MGA.

      2) The PRC1.6 complex has been known to be associated with MGA since at least 2012 as indicated in our references cited. Llabata et al confirmed that result. Our paper reports PRC1.6 subunits are associated with MGA through the DUF4801 domain of MGA. This is the first identification of the interface between PRC1.6 and MGA. It is important and relevant because multiple frame shift mutants in MGA have the consequence of deleting this region in a wide range of tumor types.

    2. Reviewer #3:

      Mathsyaraja and collaborators analyzed the role of the MAX-Gene associated protein, referred to as MAG, in mouse models and human cell lines and organoids of Non-Small Cell Lung Cancer. MAG is a repressor, a MYC antagonist that opposes its transcriptional activity. It has TBX and bHLH domains. They found that MGA loss by shRNA or CRIPSR accelerated tumor development in vivo in the KP mouse models. Using RNA-Seq, the authors showed that MGA loss leads to the de-repression of the atypical/non-canonical PRC1.6 polycomb complex, E2F and MYC targets as well as increased invasion. ChiP-Seq/cut and run as well as proteomics, revealed that MGA, E2F6 and L3MBTL2 co-occupy thousands of promoters and that MGA interacts with E2F6, and many core members of PRC1.6. Finally, they mapped the DUF domain as required to bind the PRC1.6 complex and bring it to promoters.

      Overall, the experiments are well executed, the paper clearly written and the conclusions justified by the data.

      The new data in the present report are the in vivo data in the mouse models, the role of MGA in repressing invasion, in increasing IFN signaling and the anti-tumor response, and the identification of the DUF domain required for binding to the PRC1.6 complex.

      However, a lot of the data presented in the manuscript are not novel and were previously published. A recent Molecular Cancer Research paper by Llabata and collaborators published in April 2020 (referred to in the text) has already identified the same MGA interactors by Mass Spectrometry and the same binding sites by ChIP-Seq using human lung adenocarcinoma cell lines. Llabata et al. found that MGA interacts with the non-canonical PCGF6-PRC1 complex (named PRC1.6) that includes L3MBTL2 and that the complex also contains MAX and E2F6 but not MYC. They clearly show that MAG binds to and represses genes that are bound and activated by MYC convincingly showing that MYC and MGA have opposite functions. This unfortunately tempers the enthusiasm of the reviewer.

    3. Reviewer #2:

      This manuscript by Mathsyaraja et al. studies the oncogenic loss of the Max-gene-associated (MGA) protein due to deletion or mutation in cell-lines, in mice and in human cancers (cell-lines and tumors). The authors knocked out MGA by aerosol-delivered, CRISPR-CAS expressing lentiviruses that simultaneously Cre-activated a Lox-stop Kras oncogene. The loss of MGA accelerated proliferation and oncogenesis, and shortened survival. Oncogenesis was further enhanced by enforced TP53 deletion in these lung tumors. RNA-seq and ChIP-seq of MGA+ or - cell-lines demonstrated the up and downregulation of various gene classes (thousands of genes) according to function and regulation including of PRC1.6 targets, meiosis regulators, TGF-beta signaling pathway components, EMT regulators, anti-tumor immunity, as well as of MYC, E2F, etc. Different cell lines exhibited both overlapping and distinct target sets. MGA knockout cells were more migratory and invasive and displayed actin-protrusions in accord with this behavior. They show that a Domain of Unknown Function in the mid-region of MGA engages PRC1.6 and is required to depress proliferation. The DUF is also required to limit actin-protrusions. Human colon organoids were studied since MGA mutations and deletions are also apparent in colon cancer. Again, shared and distinct targets of MGA action were inferred.

      The authors make a strong case that MGA is an important tumor suppressor that operates through PRC1.6 for some of its actions.

    4. Reviewer #1:

      The authors report the analysis of a Mga deletion and provide convincing evidence that Mga functions as a tumor suppressor during lung carcinogenesis. The data shown are clear, the message is important and the discussion is very careful. There is a certain overlap with a recent study by Llabata et al., but there is sufficient novelty in the current study.

      Comments:

      It seems that the investigation of publicly available datasets is essentially identical to the Schaub et al . analysis and not new data. If the authors want to maintain this, they would need to better explain what is new. One important piece of information that seems to be missing is whether the mutations are homozygous or heterozygous. So data on MGA and MYC protein expression in human tumors would greatly strengthen this part.

      Conceptually, one would to know whether tumor development in an MGA-delete situation depends on MYC. One would also like to know whether the polycomb complex that is assembled by MGA is tumor-suppressive. Therefore,the authors should perform a similar analysis as they did for MGA (introduce sgRNAs into the lung models) and score the phenotypes they get. Both experiments could be done in cell lines established from this model and either in vitro (that would allow a mechanistic analysis, e.g. RNA seq) or upon re-transplantation. This would also prevent simply reporting negative results.

      The interpretation of the VENN diagram and the heatmaps in Figure 5A,B is somewhat uncertain. If one plots these for MYC, occupancy often simply parallels occupancy by RNAPII, so essentially being bound by MYC simply says the promoter is open/active. Is this the case for MGA and its complex partners? Or is there a specificity in binding? The authors should do RNAPII ChipSeqs in these cells, preferentially +/- MGA, and then show these alongside (and plot a correlation between MYC, RNAPII and MGA occupancy).

      Along these lines, it is hard to understand how one obtains the extreme p-values shown in figure 5E and 5H, I would challenge this. If the authors want to maintain this, they should not use ENCODe data, but simply determine what genes are active in the cells (e.g. what promoters are bound by RNAPII) and then use those as background list and calculate P-values for overlap between MYC, MAX and E2F6.

      Based on the description, the ChIPSeq analyses are not spike-normalized and I could not find information about the number of repeats. If it is n=1, the authors need to find a way to exclude that the differences are due to experimental variation.

      I think the Llabata reference is missing in the list.

    5. Summary:

      The reviewers agreed that the paper provides strong in vivo data for a tumor-suppressive role for Mga in lung carcinogenesis. The authors convincingly show that MGA is important in oncogenesis. We note here that MGA is highly understudied (~200 publications) in and of itself despite its involvement with the MYC network for oncogenesis (~41,000 publications at the current time). Given a protein of 3000 amino acids, the number of potential protein partners and PTMs that might modify its tumor suppressor functions are staggering. However, the reviewers also noted that a previous paper has addressed the same topic and the novelty of the data presented here needs to be better explained and additional experiments are needed to strengthen and expand the new aspects.

      Reviewer #1 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #3:

      This manuscript presents its two main results in Figure 3:

      In response to a non-hydrolysable glucose analogue, E. coli cells show...

      (1) Increase in fluorescence intensity of motors with labelled stator proteins, (2) Increase in speed of motor rotation and swimming

      Sufficient controls are described to rule out possible indirect explanations of this effect, via buffer refreshment, metabolism of glucose, proton motive force (Fig 3D) and rotation direction (Fig 4F), and by contrast the effect is demonstrated to depend upon the chemotaxis receptor for glucose (Fig 4B) and the phosphotransferase system (Fig 4D), which is supports the chemotaxis system. These results are interpreted as evidence for a direct effect of the chemotaxis system upon the number of independent stator units, and thereby upon motor and swimming speeds.

      This is a novel finding, and with better statistics (more repeats of fluorescence experiments) and better presentation of the findings (see below), the paper would be an important contribution to the field of bacterial chemotaxis. However, especially without presenting nor postulating a mechanism for the proposed direct effect, the paper might be more suitable for a more specialist journal.

    2. Reviewer #2:

      1) The authors hint towards the involvement of c-di-GMP signaling via the YcgR protein. This hypothesis can be tested by knocking down the ycgr gene and repeating the assay, but this has not been done or reported. Addition of these data to the manuscript would make the paper significantly stronger.

      2) Do other chemoreceptors (Tar, Tsr, Tap) also act in the same way with their respective ligands? It would be useful to know if this effect is specific to Trg or if it is also found in the other chemoreceptors.

      3) In figure 3C, what is the reason that the GFP intensity and the speed do not have the same range? In other words, why is the slope not equal to 1? Since there is 1:1 correspondence between the number of MotB and the number of GFP, shouldn't the slope be 1?

      4) The authors do not cite or discuss the recent literature on load-dependent stator remodeling (e.g. PMIDs: 29183968, 31142644). It would be helpful to have a more in-depth discussion on how the observed stator unit recruitment relates to stator remodeling in response to load.

    3. Reviewer #1:

      Bacterial chemotaxis is a well-studied process at many levels, from the chemical networks that control the rotation of the flagella to the fluid dynamics of the motility itself. In the present paper the authors address the widely held view that ligand sensing is responsible only for changing the rotational bias of the motor driving flagellar motion, and not its speed. Using a well-established method of quantifying motor activity by monitoring the rotation of the cell body when the flagella are stuck to a surface, a fluorescent labelling technique to determine the membrane potential, a mutant with fluorescently labelled stator units, and direct measurements of swimming speed, the authors show that the sensing of a non-metabolizable analogue of glucose leads to a momentary increase in motor speed and stator unit numbers. At the same time, control experiments make it clear that this is purely as a consequence of ligand sensing. This behaviour is indeed contrary to the accepted view, and although the fundamental mechanism is as yet unclear, this is an important result.

      On the whole I am very supportive of this work, which has been done with great care and clear logic. My only suggestion for improvement would be to make quantitative the changes in chemotactic behaviour that would be expected as a consequence of the motor speed changes revealed in this research. That is, can the authors put some numbers into a standard analysis of run-and-tumble dynamics to quantify any improvement in chemotactic efficiency or speed under such changes?

    4. Summary: This is an interesting study reporting an increase in the rotation speed of the E. coli flagellar motor upon the sensing of a non-metabolizable glucose analog (2Dg) by the cell. The authors conclude that this increase is due to an increase in the number of torque-generating stator complexes that drive the motor. Knockout of the trg gene abolished this effect, suggesting that sensing of 2Dg by the Trg chemosensor is responsible. Involvement of membrane potential, the PTS pathway, and the chemotaxis response regulator CheY is ruled out. The manuscript is well-written, and the data are convincing. But the mechanism remains unclear.

      Reviewer #3 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #3:

      This study by Pipitone et al. combines SBF-SEM microscopy with quantitative proteomics and lipidomics to explore chloroplast differentiation. Authors describe that chloroplast biogenesis occurs in a first phase of structure establishment with thylakoid biogenesis, followed by a second phase of chloroplast division. The images and 3D reconstructions are beautiful, the quantitative data are novel, and their integration offers a new perspective into the seedling de-etiolation process, a model system for physiological and molecular studies. However, in my opinion some aspects need to be better explained and significantly improved.

      • In lines 276-282, the authors write: "After 8h of illumination (T8), we observed decreased abundance of only one protein (the photoreceptor cryptochrome 2, consistent with its photolabile property) and increased levels of only three proteins, which belonged to the chlorophyll a/b binding proteins category involved in photoprotection (AT1G44575 = PsbS; AT4G10340= Lhcb5; AT1G15820= Lhcb6". This is striking, as many well studied proteins change in abundance during the first hours of de-etiolation. Actually, looking into the data set with the quantification data for the ~5,000 proteins, it appears that many proteins do show significant changes between T0 and T8. For example PORA and ELIP, changes that are also reflected in figure 6A.

      • Related to the above, well known proteins for example phyA and HY5, that undergo drastic changes in abundance when etiolated seedlings are first exposed to light, do not show changes in T4,T8 and T12 relative to T0 in the proteomics data set. This raises questions about the proteomic approach (sensitivity of the method?) or the experimental setup. Could authors please comment on this? I feel that validation of the proteomics approach is critical, especially taking into account the central conclusion that "the first 12h of illumination saw very few significant changes in protein abundance".

      • Lines 570-572: A reference is needed. Also, it is mentioned that PSII appears later than PSI, which does not seem to match the observation that PSII proteins appear earlier than PSI, or that the surface area occupied at early time points by PSII is greater than the one occupied by PSI. Please check.

      • Are the calculations of thylakoid surface expansion over time consistent with previous available data using tomography? Please include.

      • In the introduction, authors could include mention of the massive transcriptional reprogramming that takes place during de-etiolation. In addition, I think that comparison of the proteomics data with the transcriptomic changes during de-etiolation (well described in the literature) would allow further understanding of the distinct phases proposed. For the chloroplast proteins already present in the dark, how does this correlate with expression of the corresponding genes?

    2. Reviewer #2:

      This impressive manuscript describes a comprehensive, multifaceted analysis of the morphological and molecular changes that accompany photosynthetic establishment during seedling de-etiolation. Morphological data, focusing in particular on the photosynthetic thylakoid membranes, are derived using transmission electron microscopy (TEM), serial block face scanning electron microscopy (SBF-SEM), and confocal microscopy, while quantitative molecular data on the abundances of proteins and lipids are derived using mass spectrometry and western blotting. The various data are acquired over a time course between 0 h and 96 h post illumination, and with a high level of temporal resolution. The data allow the authors to develop a mathematical model for the expansion of the surface area of thylakoids (reaching 500-times the surface area of the cotyledon leaf), which matches well with experimental observations from the SBF-SEM analysis for earlier, but not later, stages of de-etiolation. Moreover, the data point to a two-phase organization of the de-etiolation process, with the first phase ("Structure Establishment") characterized by thylakoid assembly and photosynthetic establishment, and the second phase ("Chloroplast Proliferation") characterized by chloroplast division and cell expansion.

      The data are of a high standard, and the depth and breadth of analysis in a single, unified study is unprecedented. While it is arguable that there are few major, completely novel insights reported here (indeed, in the Discussion, the authors very helpfully point out how many of the parameters they have measured are consistent with data reported elsewhere by others), this should not detract from the overall value of the study; a major and unique strength here is that all of the data have been acquired together and so are directly comparable. I have no doubt that this dataset will be extremely interesting to many researchers, and prove to be an invaluable resource for the plant science community. Consequently, I am sure that it will attract many citations.

      I have a few specific comments that I would like the authors to consider carefully, as follows.

      1) Figure 3. The 3D reconstructions are undoubtedly useful for deriving quantitative data, as they enable the derivation of thylakoid surface area data to verify the mathematical model. However, it is very difficult to see anything clearly in the images shown in the Figure. I wonder if the authors can make the images clearer, and then also point to and describe some of the key features. The videos do help a bit, but even these are not that clear.

      2) Page 9, second paragraph. It is here that the "two phases" model is first proposed. I really could not see a clear basis for proposing this model here, using the data that had been presented thus far. As I see it (and based on the way the two phases are described in the Discussion), one can't really propose this model until after the chloroplast number and cell size data have been presented.

      Moreover, the description of the second phase here ("and a second phase...") seems a bit inconsistent with the statement in the paragraph above that thylakoid surface area increases dramatically between T4 and T24, and much less between T24 and T96.

      3) Figure 6, and the related supplementary figure. Loading controls are missing here, and should be added. Also, it is stated that a number of proteins (PsbA, PsbD, PsbO, Lhcb2) are "detectable" at T0 (line 348, page 11). To me, they look UNdetectable.

      4) Dividing chloroplasts. On page 13, line 412-413, it is stated that the volume of dividing chloroplasts was measured, and we are referred to Figures 8E and 4B in support of this statement. However, it is not explained how this was done. More clear and specific explanation is needed. Was it the case that the authors sought out and measured dumbbell-shaped organelles, and quantified those? If so, images are needed to illustrate this point. And, I don't see anything relevant in Fig. 4B - this callout apparently belongs in the following sentence. The statement that the average size of dividing chloroplasts was higher than that of all chloroplasts (lines 413-414) is not really surprising if the authors were measuring organelles just on the point of becoming two organelles.

      5) Page 13, beginning of modelling section. The motivation for this section needs to be better introduced. When I first read it, I could not understand why the authors wished to again "determine the thylakoid membrane surface area", as this had already been discussed earlier in the manuscript.

      Also related to the modelling: Did the authors take into account the existence of appressed membranes when calculating the surface area exposed to the stroma (lines 431-432). And, assuming it is clearly established that there is a 1:1 relationship between these proteins and the relevant complexes (lines 441-443), perhaps this should be stated and the relevant literature cited.

    1. Reviewer #3:

      Overall the manuscript is a valuable contribution and represents an important advance using the model that the authors have recently established in Doro et al. 2019.

      I have however a few suggestions for improvement, that I present below.

      Suggestions to strengthen the manuscript:

      1) Fig. 1 diagram is very useful. However, it would be very informative if the diagram could be followed by a representative quantification. For example, when injecting 200 T. carassii, what % of larvae is classified in the two infection categories? Could the authors also further discuss the % of T. low larvae where no parasites were observed during the clinical scoring? Have these larvae (or some of them) cleared the infection completely? Shouldn't they be classified/followed on their own?

      2) Fig. 2: Is the clinical scoring predictive of early death onset (or likelihood of death)? To show this, the authors could, for example, divide the T. car 200 survival curve into 2 separate curves, based on the clinical scoring at day 4-5.

      3) In Fig. 5 and Fig. 6 and related text, the authors describe their results as "macrophage proliferation" and "neutrophil proliferation". I would encourage them to avoid these terms and rephrase these sections. Normally "macrophage proliferation" is used to refer to resident tissue macrophages that occasionally are seen to divide/proliferate. To my knowledge, neutrophil proliferation in a similar manner has not been described. Most likely what the authors describe is myelopoiesis (in agreement, the authors also indicate that Edu staining most commonly is seen in hematopoietic tissues) and the EdU staining in mature macrophages/neutrophils is the result of a (recent) cell division of a hematopoietic progenitor cell. The authors do not have evidence that the terminally-differentiated cells (macrophages and neutrophils) are actually "proliferating". In lack of a more specific mechanistic insight, I would encourage the use of much broader terms, such as "increased production/number of macrophages/neutrophils" rather than "macrophage/neutrophil proliferation", throughout.

      4) The authors observe several very interesting phenotypes that they report in Fig. 7, 8, 9 & 10. The frequency of these phenotypes (association with infection and with each other) however is not quantified and tested statistically. In particular:

      • The authors report that macrophages, but not neutrophils, infiltrate in the cardinal vein, although both cell populations are accumulating on the outer side of the vasculature during infection. Can the authors quantify and test statistically these phenomena, i.e. by counting cells inside the vessel and associated (externally) with the vessel in the PVP, T. car-low and T. car-high groups? Also, do neutrophils ever interact with trypanosomes in other sections of the vasculature, if not in the cardinal vein? Do trypanosomes ever escape from the circulation and interact with neutrophils elsewhere?

      • The authors report that foamy macrophages occur inside the vasculature and are exclusive to high-infected larvae. Can the authors show some quantifications of these associations and perform statistical tests (i.e. count foamy/non-foamy mpeg+ cells inside/outside the vessels in the PVP, T. car-low and T. car-high groups)? Also, macrophages do not phagocytose T. carassii, but foamy macrophages are seen in the context of other (intracellular) Trypanosoma infection. Are macrophages here perhaps scavenging dead Trypanosoma from the circulation, and is this leading to the foamy macrophage phenotype? Trypanosomes are also leading to hemolysis and this could lead to increased phagocytosis of red blood cell debris by macrophages. Could this be linked to the foamy appearance? How specific is BODIPY, to distinguish cholesterol (typical of foamy macrophages), vs lipids derived by phagocytosis of cell debris (i.e. high in membrane phospholipids?)

      • The authors report that foamy macrophages occurring in T. car-infected larvae are characterised by a strong proinflammatory profile and are all il1beta and all tnfa positive. Significant differences are observed in the inflammatory response of macrophages in high- and low-infected individuals and in their susceptibility to infection. Can the authors quantify and test statistically these observations? For example, can the authors show that foamy macrophages are indeed more frequently il1b positive/tnfa positive than neighbouring non-foamy mpeg+ cells?

      • The authors report that a strong inflammatory profile is associated with the occurrence of foamy macrophages. However, it is not clear how widely spread the inflammation is and only images of macrophages and endothelial cells in the cardinal vein are shown. Moreover, only tnfa and il1b are assessed (using transgenic reporters). The authors also mention that they observe a mild inflammatory response in low-infected individuals and that this is strongly associated with control of parasitaemia and survival to the infection. Can they confirm strong vs mild inflammatory profiles and different association with survival in the 2 infection categories and PVP control with a panel of qRT-PCR for several inflammatory markers (i.e. il1beta, tnfa and other relevant cytokines and chemokines)?

    2. Reviewer #2:

      Using this new Trypanosoma carassii infectious model in larval zebrafish, Jacobs et al. have developed a new clinical scoring system to reliably separate high-and low-infected larvae in order to investigate their individual innate immune responses, with a special emphasis on macrophages and neutrophils.

      In summary the separation system used in this allows us i) to identify a strong macrophage and neutrophil proliferation response by high-and low-infected larvae, although happening a bit earlier, 5 dpi, for macrophages in low-infected larvae, and ii) to observe a differential distribution and morphology of macrophages, associated to the unique presence of more rounded foamy macrophages with a high pro-inflammatory profile into the vessels of high-infected zebrafish larvae. Together, this study constitutes the first report of the occurrence of foamy macrophages during an extracellular trypanosome infection.

      Although the paper is well-written and the findings are interesting as they bring new insights into the development of foamy macrophages in response to an extracellular pathogen, i.e. Trypanosoma carassii, using a zebrafish larvae model, I have a few concerns regarding the following:

      • The experimental infectious model in zebrafish: figure 2 summarizes that only 15% of the infected larvae, named low-infected larvae, are able to survive the infection. As an explanation the authors refer to the trypanosuceptible vs. trypanotolerant background of the host observed in non-zebrafish models. However, in this particular setting, all the larvae possess an identical genetic background. Therefore, why would the larvae behave differently in response to a similar pathogen? In addition, there is no clear differences in neither parasitic load at 2 dpi (figure 3F) nor myeloid cells accumulation at 3 dpi (figure 4AB), which could lead to a drastic difference in parasitic load based on mRNA expression at 4 dpi (figure 3F). The authors should discuss this shortly.

      • Figure 4: the representative pictures from Fig4B do not seem to clearly match the histograms depicted in Fig4C. For example, from the pictures in Fig4B, it seems that there is a decrease in red fluorescence in the representative pictures from 7 dpi to 9 dpi low-infected larvae, which is not reflected in the histogram. Also, a representative picture of 7 hi-infected larvae seems to show at least equal or even more red fluorescence compared to 9 dpi low-infected larvae.

      • Lines 494-496 states "No significant difference was observed between high-and low-infected fish, confirming that macrophages react to the presence and not to the number of trypanosomes.", reflecting that there is no differences in total macrophages nor in their proliferation between low- and high-infected zebrafish larvae (Figure 5B&C). Therefore it is not sufficiently clear on which basis the authors states a few lines later as a conclusion that "Altogether, these data confirm that T. carassii infection triggers macrophage proliferation and that proliferation is higher in low-infected compared to high-infected individuals, possibly due to a higher haematopoietic activity." Therefore the authors should revise this conclusion or bring stronger data to reinforce their results. Also, similar conclusions need to be adjusted in the discussion section and bring new elements to explain the higher number of macrophages observed in figure 4.

    3. Reviewer #1:

      The authors devised clinical criteria for identifying Zebrafish larvae with high or low T. cassari infections in order to track. Using transgenic fish line marking macrophages and neutrophils, the authors showed that both groups of larvae increase macrophage (and to lesser extent neutrophil) levels in response to infection. However, the macrophages in high parasitaemia animals migrated into the capillaries and had elevated levels of inflammatory markers (TNF, IL-1) and lipids, indicative of a foamy phenotype. The authors conclude that a measured inflammatory response allows animals to control the initial infection, while an exaggerated inflammatory response leads to an environment in which the bloodstream trypanosomes can proliferate. The findings support and extend data from murine models of infection, by allowing direct visualization of host immune response.

    4. Summary: This study investigates the role of the innate immune response in controlling bloodstream trypanosome infection in the zebrafish infection model recently developed by the authors. The study found that an innate immune response, characterized by controlled inflammatory response was sufficient to control infection in some individuals, while failure to control infection was associated with a strong inflammatory response characterized by expansion of foamy macrophages. The findings highlight the importance of a balanced immune response in controlling bloodstream trypanosome infections that are likely relevant to mammalian infections.

      Reviewer #1 and Reviewer #2 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #2:

      This manuscript, "Lactobacilli in a clade ameliorate age-dependent decline of thermotaxis behavior in Caenorhabditis elegans," is focused on the impact of diet on age-dependent behavioral decline. The authors utilize a thermotaxis screen using different lactic acid bacteria (LAB) and identify strains of LAB with the ability to ameliorate age dependent decline in thermotaxis behavior. The study introduces some interesting results, including the finding that many LAB strains of the same clade can improve thermotaxis in older nematodes, despite disparate results on longevity. However, there were some questions remaining about methodology, and more importantly, there is very little evidence provided on what the molecular mechanism might be behind this phenomenon. Overall, this study contains interesting findings that are not developed thoroughly enough.

      Major Comments/Questions:

      1) How is LAB different from Ecoli? Does metabolic composition of LAB dictate its impact on thermotaxis behavior of worms? In the manuscript the authors argue that LAB are a "better" food source than E. coli. How does one define better for something as broad as a food source? There is a difference here but it is very unclear what aspects of LAB physiology may play a role.

      2) Does this phenomenon require eating LAB, or just perceiving it? The assays did not test whether perception of LAB diet is sufficient for its effect on thermotaxis, rather whether more time on LAB leads to better thermotaxis.

      3) Showing a potential daf-16 interaction is plausible, given that daf-16 interacts with many key pathways in the worm, but it is unclear whether this interaction is direct or indirect, or whether daf-16 is a major player in this pathway or just necessary for maintenance of health. What sensory pathways are activated when worms are fed on LAB diet, and how it finally interacts with daf-16?

      4) Similarly, the pha-4 and eat-2 data are interesting, but are not developed in any way. This is another avenue that could in principle lead toward a better mechanistic understanding.

    2. Reviewer #1:

      These investigators examine how lactic acid producing E. coli impact age-related decline in neurological function through the use of temperature-food associative learning or thermotaxis. In particular, they screen a panel of different lactate producing E. coli and identify a particular clade of bacteria, Lactobacilli, that are able to suppress age-dependent decline in thermotaxis in a daf-16 dependent manner. Moreover, they uncouple improvement in neurological function from lifespan determination and locomotion. Overall, this group presents an interesting phenomenon regarding the effects of the lactic acid producing bacteria. However, it is not clear what is happening in the worm to elicit this neurological response and much work remains to determine this mechanism of action.

      While I can appreciate the careful nature of these worm behavioral assays including a host of different controls, these studies lack cellular and molecular details, which reduce my overall excitement for the story. It is interesting that a clade of lactic acid bacteria (LAB) can improve associative learning in C. elegans. However, I was very underwhelmed when I got to the final figure, which very briefly touched on molecular mechanism (only to give DAF-16 dependence). Since it has previously been shown that daf-16 mutant animals impact taste avoidance learning (Nagashima et al. PLOS Genetics, 2019), the dependence of DAF-16 and its role in associative learning seemed predictable. For future submissions, this previous study on DAF-16 should be referenced in the manuscript. Moreover, data regarding dietary restriction and the eat-2 mutation appear to be misinterpreted. Thus, more attention and analysis should be dedicated to the effects of dietary restriction on their paradigm. I thought that it was interesting that a clade of LAB consistently reduced expression of PHA-4 transcription factor and the authors might benefit for expanding upon this observation.

      In addition to molecular characterization, the manuscript provides little explanation at the cellular level. It is unclear what neurons or neuronal circuit are responsible for this phenomenon. Although mentioned in the discussion, this manuscript would benefit by close examination of the thermosensory circuit including the AFD and AIY neurons. How are these lactic acid producing E. coli ultimately signaling to the neurons? Do the LAB slow the rate of degeneration of either neuron? Is this phenomenon the result of lactic acid production or something else in the bacteria? Would it be possible to supplement lactic acid to worm media and produce the same result?

      This is an interesting phenomenon and requires more in-depth cellular and molecular characterization.

    1. Reviewer #2:

      In this manuscript, Knight et al examine the genetic diversity in >12,000 publicly available C. difficile genomes in order to characterize genomic evidence of taxonomic incoherence among this genomically diverse pathogen. Their primary analysis employs average nucleotide identity thresholds to identify species boundaries, with secondary analyses examining core genome size changes, gene content, and estimated emergence dates. The authors' main conclusion is that the previously identified C. difficile cryptic clades CI-III are genomically divergent enough from the main clades C1-5 to warrant classification as different genomospecies. This paper is a useful contribution in benchmarking our understanding of the genetic diversity of C. difficile using all currently publicly available genomes, but the results are largely unsurprising given previous phylogenetic analyses involving clades 1-5 and CI-III, and is therefore probably best suited for a specialty journal. Additionally, in some instances, the methods lack details, reducing their interpretability and reproducibility.

      Major Comments:

      1) There are some claims that are too strong and not supported by the data or literature, including the claim that the rise of community-associated CDI is likely due to presence of C. difficile in livestock (Lines 53-54 - far too little evidence to make such a sweeping claim), the statement of apparent rapid population expansion into clades C1-4 (Lines 278-279 - only shown for certain sequence types and greatly impacted by observation bias), the statement that these findings "impacts the diagnosis of CDI worldwide" (Lines 37-38 -too grandiose given limited evidence of the clinical importance of the cryptic clades).

      2) Generally, it is hard to discern which sets of genomes and variants were used for each of the bioinformatic analyses that are described. If there are a limited number of genome sets it might be useful to define them in the results to allow the reader to more easily follow along and understand the scope of different analyses.

      3) The dated phylogenomic analyses methods would benefit from a more thorough assessment of model assumptions along with more description of the sources of bias and uncertainty at play. Specific questions are:

      • Was the temporal signal in the data evaluated?

      • What are the potential impacts of using a single clock model and demographic prior for such a diverse set of taxa?

      • Was the clock rate restricted to the cited 2.5x10-9 - 1.5 x 10-8 range? What clock prior distribution was applied?

      • Were relaxed clock priors explored?

      • What went into the selection of the demographic model prior in BEAST? Were alternative models evaluated?

      • The significant uncertainty in the divergence estimates should be emphasized/listed as a limitation.

      4) Similarly, the pangenome analyses could be more thoroughly described, and the relevance of the core-genome size changes more robustly explored. Specifically:

      • How did the core genome change when excluding any of C1-5? Were these changes much different than when excluding CI-III?

      • The differences between Roary and Panaroo are notable, and potentially important for the microbial genomics community. More details should be provided on these results and how sensitive they are to the input parameters of the respective programs (e.g. collapsing paralogs in Roary and percent identity for orthologs). In addition, it is important to know if any filtering was done with respect to the quality of assemblies, which could have a significant impact on Roary's behavior.

    2. Reviewer #1:

      General Assessment:

      The work presented by Knight et al. in "Major genetic discontinuity and novel toxigenic species in Clostridioides difficile taxonomy" is of excellent quality and spans several of the themes of eLife. The manuscript provides a thorough and robust examination of publicly available C. difficile genomes, to deliver a much-needed update of C. difficile phylogeny, in particular the cryptic clades of C. difficile. However, there are some further clarifications could be included to confirm if the cryptic clades of C. difficile, and the 26 unclassified STs (which seemingly form 4 distinct clusters) should indeed be assigned to the Clostridioides genus, distinct from both C. mangenotii and C. difficile.

      Specific comments:

      Lines 96-97 and Figure 2: Figure 2 suggests the 26 unclassified STs form at least 4 distinct clusters, yet these STs are classified as outliers. Could you please comment on why these are considered outliers? Or do these STs represent new cryptic clades? C-IV, C-V etc.? And do these unclassified STs also fit into the criteria for the novel independent Clostridioides genomospecies?

      Lines 161-162; Table 1: C. mangenotii is referred to as Clostridioides mangenotii on lines 161-162, but has been listed as Clostridium mangenotii in table 1. Was this intentional? Or should this be Clostridioides mangenotii as C. difficile is also listed as Clostridioides difficile?

      Figure 6: Many of the numbers and symbols on the figure are difficult to see e.g. Figure 6A the values listed above each data point are extremely small. Can these values/symbols be increased?

      Lines 224-225: Given that C. difficile strains lacking tcdA and tcdB can still cause infections, consider rephrasing "indicating their ability to cause CDI".

      Figure 7: As with Figure 6, many of the numbers and symbols on the figure are difficult to see. Can these values/symbols be increased?

      General comments:

      Were the unclassified STs included in the species wide ANI analyses in Figure 3? If similar analyses were performed for these STs and given the clusters that are presented in Figure 2 would this support the idea that they may also fit into the criteria for the novel independent Clostridioides genomospecies?

      Similarly, were these same unclassified STs included in the BactDating and BEAST analyses? Or the pairwise ANI and 16S rRNA value comparisons in Figure 5? Or the pangenome and toxin gene analysis also presented in Figures 6 and 7? And would this add further strength to the idea that these "outliers" could be the first typed representatives of additional genomospecies?

      Lastly, your conclusions are a little too on the fence. You have presented sufficient evidence to suggest that the cryptic clades of C. difficile likely represent novel independent Clostridioides genomospecies, but dilute out the importance of this throughout the discussion and conclusions. Although controversial, the evidence provided gives credence to these claims, and the text should be changed to reflect this.

    3. Summary: We appreciate this study and find that the conclusions that reclassify Clostridiodes are largely justified by the data/analysis. The major concern is that the work represents the application of standard approaches to refine species classification, as opposed to either proposing a novel approach to classify species or defining a split that might be more surprising and/or clinically significant (e.g. Kumar et al. Nature Genetics, 2019). Consequently, despite being a useful contribution to the literature we believe it is more suitable for a specialized audience.

      Reviewer #1 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #2:

      Recombinant antibodies are the most common and powerful reagents in life science research to identify and study proteins. Yet, every single antibody should always be validated and carefully tested for its relevant application, to ensure constructive and reproductive scientific endeavor. I was thus extremely pleased to review the manuscript of Terkild Buus et al, as it provides a careful assessment of oligo-conjugated antibody signal in CITE-seq. The authors tested four variables (antibody concentration, staining volume, cell numbers and tissue origin) and clearly showed that antibody titration is a crucial step to optimize CITE-seq panel. The authors found that, as a general rule, concentration in the 0.625 and 2.5 µg/mL range provides the best results while recommended concentrations by vendors, 5 to 10 µg/mL range, increase background signal.

      In my opinion, the study is well-performed and may serve as a guideline to accurately validate antibodies for CITE-seq, as a consequence I have only minor comments.

      • As stated by the authors, the starting concentration used for each antibody was based on historical experience and assumptions about the abundance of the epitopes. This approach may not be ideal, and the optimal concentration may have been missed. Do the authors think that a proper titration would be an advantage? Maybe this could be discussed in the text.

      • The authors showed by testing four variables (see above) that they could define the optimal conditions to reduce background signal and increase sensitivity of antibodies and thus this way improves CITE-seq outcome. Nevertheless, the authors rely on the fact that all antibodies used in their panel are specific for their targeted antigens. I am not asking here to test the specificity of every single antibody used in the study as this would be a colossal amount of work. But I feel that this aspect should be discussed in the manuscript, especially when an "uncommon" antibody is intended to be used in the CITE-seq panel; the specificity of this antibody should be indeed tested prior to its use.

    2. Reviewer #1:

      In the study by Buus et al., the authors set out to address an important need to understand how oligo-conjugated antibodies should be optimally utilized in droplet-based scRNA-seq studies. These techniques, often referred to as CITE-seq, complement techniques such as flow cytometry and mass cytometry yet also further extend them by the ability to jointly measure intra-cellular RNA-based cell states together with antibody-based measurements. As is the case with flow cytometry, manufacturers provide staining recommendations, yet encourage users to titrate antibodies on their specific samples in order to derive a final staining panel. Based on the ability to stain with hundreds of antibodies jointly, few studies to date have assessed how the antibodies present in these pre-made staining panels respond to a standard titration curve. In order to address this point, this study tests two dilution factors, staining volume, cell count, and tissue of origin to understand the relationships between signal and background for a commercially available antibody panel. They arrive at the general recommendation that these panels could be improved, grouping various antibodies into distinct categories.

      This study is of general interest to the scRNA-seq and CITE-seq communities as it draws attention to this important aspect of CITE-seq panel design. However, it would stand to be substantially improved by not only providing suggestions but also testing at least one, if not more, of their suggestions from Supplementary Table 2, and preferably performing experiments using more technical replicates or biological replicates. As it stands now, the study is largely based on one PBMC and one lung sample, that were stained once with each manipulation as far as can be gathered from the Methods.

      Major comments:

      1) Given the title is improving oligo-conjugated antibody... it would be important to functionally test one of the suggestions. We would suggest a full titration curve of selected antibodies, perhaps one from each of the categories, but if cost is a concern at least two or three antibodies, to identify how titration impacts antibodies, and especially those in categories labeled as in need of improvement. Relatedly, if the idea is that if antibodies (such as gD-TCR) do not have a cognate receptor leading to general background spread, does spiking in a cell that is a known positive in increasing ratios remedy this issue by acting as a target for the antibodies? Does adding extra washes help to remedy these issues of background?

      2) Another way of improving these panels is through reducing the costs spent on both staining but perhaps more importantly the sequencing-based readouts. Several times in the manuscript (at line 77 for example or line 277) it is alluded to that the background signal of antibodies can make up a substantial cost of sequencing these libraries. However, no formal data on cost is presented, which would be important to formalize the author's points. It would be important to provide cost calculations and recommendations on sequencing depth of ADT libraries based on variation of staining concentration. Relatedly, in the methods, sequencing platform and read depth for ADT libraries was not discussed, nor is the RNA-seq quality control metrics provided other than a mention of ~5,000 reads/cell targeted. This is important to report in all transcriptomic studies, and especially a methods development study.

      3) One of the powerful elements of joint multi-modal profiling, as mentioned in the title, is to be able to measure protein and RNA from a single cell. This study does not formally look at correlation of protein and RNA levels, and whether a decrease in concentration of antibody either improves or diminishes this correlation. This would be important to test within this study to ensure that decreasing antibody levels does not then adversely affect the power of correlating protein with RNA, and whether it may even improve it.

      4) How was the lack of antibody binding determined for Category E? CD56 is frequently detected on NK cells in peripheral blood, CD117 should be detected on mast cells in the lung, and CD127 should be found on T cells, particularly CD8+ T cells. From inspecting Figure 1E, it appears as if all three of these markers are detected on small but consistent cell subsets. As the clusters are only numbered and no supplementary table is provided to help the reader in their interpretation, it is difficult to determine if these represent rare but specific binding, or have not bound with any specificity.

      5) References: At 14 references, the paper overall could benefit from a more comprehensive citation of related literature including flow cytometry and/or CyTOF best practices for antibody staining and dealing with background, and joint RNA and protein measurement from single cells.

    1. Reviewer #3:

      The authors present a simple model that explains important outstanding controversies in the field of long-range gene regulation. These controversies include the fact that insulation boundaries tend to be weak; that acute inactivation of CTCF or cohesin (that leads to inactivation of insulation boundaries) leads to only minimal gene expression and that in live cells enhancer-promoter contacts appear not correlated with transcriptional bursting. The model involves a futile cycle of tag addition and removal from promoters, stimulation of more tag addition when tag is already present, and stimulation of tag addition by contacts with distal enhancers. The authors show that such a model explains all the above controversies, and indicate that the controversies are not inconsistent with mechanisms where long-range gene activation is driven by physical contacts with distal regulatory elements.

      The authors have explained and explored the properties of the model well. I have only minor comments.

      1) An alternative explanation for TAD-specific enhancer action is that an E-P interaction within a TAD (between two convergent CTCF sites), one that is brought about by extruding cohesin, is not equivalent to an interaction that occurs between two loci on either side of a CTCF site and that can be a random collision that is not mediated by extruding cohesin. In other words, two interactions can be of the same frequency but can be of a very different molecular nature. I agree that this model would not explain the results of the experiment where cohesin is acutely removed.

      2) In the beginning of the introduction the authors introduce TADS. I recommend that the authors present this in a more nuanced way: compartment domains also appear as boxes along the diagonal, an issue that has led some in the chromosome folding field to be confused. This reviewer believes TADS are those domains that strictly depend on cohesin mediated loop extrusion, whereas compartment domains are not. If the authors agree, perhaps they can rewrite this section?

      3) If I understand the model correctly, the nonlinearity arises because of the increased rate of tag addition when tag is already present. The authors then speculate histone modifications can be one such tag. However, there are only so many sites of modification at a promoter. Can the authors analyze how the possible range of tag densities affects performance of the model? Is the range required biologically plausible?

      4) Can the authors do more analysis to explore how rapid changes in gene expression may occur (e.g. upon signaling a gene may go up within minutes)? How much more frequent does the E-P interaction need to be for rapid switch to the active promoter state? Can the authors do an analysis where they change the rates of the futile cycle upon some signal: at what time scale does transcription then change (keeping E-P frequency the same)?

    2. Reviewer #2:

      The main analyses of the study compare previously published experimental observations from Hi-C and ORCA to predictions of the author's "futile cycle" model. The predictions are derived from simulations and differential equations analysis of the model as a dynamical system. Given its centrality to the manuscript, we recommend describing this overall strategy in more detail in Results. For example, at line 124 (Pg. 4) the authors could talk about how the simulations are done, including where the variability comes from (e.g., random starting conditions vs. probabilistic events vs. different parameters).

      Xiao et al. make several key assumptions to dramatically simplify their model. Namely, it is assumed that promoter modification and transcription are equivalent and that enhancer-promoter contact influences transcription instead of transcription influencing structure. Steady-state equilibrium must also be assumed. It would be helpful if the authors explicitly stated these assumptions and provided references to support their being reasonable.

      It is not totally clear why the authors decide to call their proposed approach the futile cycle model. There are similarities to other well-known models in biochemistry and biophysics that should be noted. It might make sense to simply call this a mechanistic model of cooperative promoter activation. If the authors stick with "futile cycle", the relationship between promoter activation through tags and metabolic signaling should be described in more detail.

      There is also an opportunity to emphasize that the proposed model is not necessarily absolutely correct, but one of many plausible models that can produce a non-linear relationship between genome structure (enhancer-promoter contact) and transcription. Any thoughts on other models that could generate similar dynamics would be a useful discussion point. There are parallels to both sigmoidal dose-response curves, where drug concentration is plotted against response, and transcription factor binding curves, where free ligand concentration is plotted against the fraction bound. We recommend providing background context on these types of models or the Hill equation to illustrate why non-linear behavior is or is not surprising given the proposed model.

      For clarity, it would be helpful to discuss model parameters in greater detail. First, we suggest noting which parameters shift the location of the curve and which increase the steepness of the curve. Second, we recommend including a phase diagram exploring when sigmoidal behavior and any other key model predictions arise across parameter space. In what circumstances does hypersensitivity or time lag emerge? The authors demonstrate that a narrow set of parameters is sufficient to produce a super-linear relationship between enhancer-promoter contact and transcription in Figure 6. One potential dilemma is this model's ability to explain many experimental observations by indicating that minimal changes all occur in the sub-linear regime while observable changes occur in the super-linear regime. Given that one needs specific parameters to replicate an example of the hyper-linear regime (including at least three degrees of stimulation and increasing stimulation of the successive states), it could be valuable to demonstrate how large the plausible parameter space is. Without an exhaustive search across the space of minimal parameters, it is not clear when this property emerges or how common it is within the full parameter space. The authors could vary model parameters and plot a grid visualizing behavior (e.g., steepness of the curve or Hill coefficient).

      Images throughout the manuscript are low resolution, making the figures difficult to read. Increase the resolution of figures throughout, especially those containing text (Fig 6A).

    3. Reviewer #1:

      Xiao et al describes a kinetic model of enhance-promoter interactions, which the authors use to explain the changes in transcription levels upon disruption of genomic contacts within topologically associated domains (TADs). The model uses the law of mass action to describe activity of promoters and enhancers, which are proposed to be able to accommodate multiple transcription activation tags. The authors use the model to explain the nonlinear relationship between the genomic contact frequencies within TADs and their corresponding transcription rates. They recapitulate the superlinear relationship between the changes in genomic contact probabilities and transcription rates within TADs observed in their recent experiments (Mateo et al, 2019). Inspired by the futile cycle of cell signaling, their model incorporates multiple tagging of promoters allowing for transient amplification of transcription rates.

      Conceptually, this work is interesting and the model suggests possible reconciliation of seemingly contradictory experimental observations reported earlier.

      However, the manuscript in its current form fails to substantiate many of its claims.

      Here are my major concerns:

      1) The presentation of the model is unclear. It is currently present in the text, lines 110-122, in pure qualitative description. Authors define only rates in the text; definitions of other model parameters are not present. For example, E and a are not specifically defined in the text or Methods section. Since both terms "enzyme" and "enhancer" are being used and in fact "enzyme tagging" and "enhancer tagging" occur simultaneously in the model, it is not possible to say for sure when do authors call which one in the model and thus the methods section can be interpreted in different ways. Moreover, the cartoon is missing a legend confirming, which molecular player is which. The figure caption mentions only green triangles being the tags, but no other parts of the cartoon are being explained. Taken together, this makes it very difficult to verify the mechanics of the model.

      • The authors should provide a detailed technical description of their model directly in the text, including description of their parameters, list their constitutive equations and identify all parameters in their cartoon Fig. 1C.
      • Axes labels in all figures should be expressed in the parameters/variables of the model (as in Fig. 6C-D) directly connecting to inputs/outputs of the model.

      2) Due to the lack of description, in many sections it is not clear what are the specific inputs and outputs of the model (e.g. Fig. 2).

      3) The Methods section describes the chemical kinetics of the suggested reactions and the insulation score calculations. But it is not clear how do these inform each other, how are contact-frequency maps chosen/computed and cross-referenced with the local E-P kinetics?

      4) In the Methods section, it appears that in lines 577-580 of the model description, the mass is not conserved.

      5) In 587-588, the index of k is 2(n+1), which equals to 2n+2, but then in the next line the following assumption is made 2n+1 → n+1

      6) The authors make assumptions that their kinetic considerations hold for n>2. What is the evidence?

      7) The authors observe hysteresis in median transcription rate as a function of enhancer contact frequency. However, the presented violin plots suggest a presence of two states, one with low and one with high transcription rates. In the intermediate regime of enhancer contact frequency, where authors report hysteresis, the violin plots show bimodal distributions suggesting coexistence of these two states. This would suggest that the system exists in and switches between two distinct states with a discontinuous transition, instead of a continuous hysteretic behavior as suggested by the median behavior.

      8) The language of the paper is often not technically precise with qualifiers missing, which could lead to ambiguities and misinterpretations. Here are some examples:

      • *p. 1, line 10, "difference in contact across TAD borders is usually less than twofold"
      • *p. 1, line 17, "results from recent cohesion disruption"
      • *p. 2, line 71, "A simple model of hypersensitivity to changes in contact frequency"

      9) On p. 13, line 483, authors define Ostwald ripening as given by weak multivalent interactions; however, Ostwald ripening is a thermodynamic process. In addition, they propose that liquid condensates become larger due to Ostwald ripening, but there are also other processes that may occur, such as coalescence of condensates, which would also lead to larger condensates.

      10) At the beginning of the Discussion section authors state they will propose future experiments in each section. However, in some of the sections it is not clear what specifically authors are proposing. These suggestions should be made clearer.

    4. Summary: The work describes a simple theoretical model for enhancer action that explains several major controversies in the field of long-range gene regulation and the role of topologically associating domains and insulating boundaries in modulating enhancer-promoter interactions. Further, the model makes predictions that can be experimentally tested. This is valuable for the field of gene regulation.

      Reviewer #2 and Reviewer #3 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #2:

      This manuscript by Diamanti et al. describes their study on how visual neurons responded to identical visual stimuli at two different locations along a virtual linear track. Extending their previous result that spatial location modulates the neuronal activities in the primary visual cortex (V1), they now demonstrate that similar spatial modulation also occurred in the higher visual areas (HVAs), but not so much in a lower visual area, the lateral geniculate nucleus (LGN). In addition, they show that the modulation, measured by a spatial modulation index (SMI), was stronger when animals had more experience in the track and when the animals were actively performing a task rather than passively viewing the same virtual track. The authors have been responsive to comments by previous reviewers at a different journal. Data are appropriately analyzed and clearly presented.

      Since the finding that visual neurons are spatially modulated similarly as hippocampal place cells in spatial navigation tasks (Ji and Wilson, 2007; Haggerty and Ji, 2015; Fiser at al, 2016; Saleem at al, 2018), there has been increasing interest in identifying the source(s) of this modulation. This study adds new evidence to this puzzle, suggesting that it is more likely either generated within the visual cortex or top-down propagated from higher brain areas, rather than bottom-up propagated from the thalamus. This is an important contribution. However, there are concerns, mainly on the data interpretation and the clarification of the main conclusion, as elaborated below.

      1) Because experience and task engagement enhanced spatial modulation, the authors concluded in the abstract that "Active navigation in a familiar environment, therefore, determines spatial modulation...". This conclusion is too strong and not well-supported by the data. First, spatial modulation on Day 1, when the task was novel, was lower than on later days, but it was already much higher than 0 (Fig. 1h). Also the individual neuron data (Fig. 1e) display clear spatial modulation on Day 1. Therefore, "familiar environment" is not a requirement. Second, spatial modulation during passive viewing was much higher than 0 and was correlated with that during active navigation, as shown in Fig. 4e - Fig. 4l. Therefore, "active navigation" is not a requirement either. It is true that both active navigation and familiar environment enhanced spatial modulation. They did not "determine" spatial modulation.

      2) Related to the point above, the presence of spatial modulation in passive viewing reminds us that these cells in the visual system were still mainly driven by visual stimuli. The data in Fig. 4e,f are especially telling: the modulation in V1 was similar and highly correlated between active navigation and running replay. In addition, it is clear from all the raw traces in Fig. 1 and Fig. 2 that these cells did respond to the two segments with identical stimuli reliably with two peaks. The spatial modulation was just a change in one of the peaks. So the nature of the modulation is a "rate remapping" of the expected, classical visual responses. I believe, in order to maintain the big picture of what drives the activities of these neurons, it is beneficial to clarify that the "spatial modulation" is a modulation on top of the expected visual responses. This message is not explicitly conveyed in the current manuscript.

      3) The authors stated that spatial modulation is "largely absent in the main thalamic pathway into V1". This was based on the significantly weaker SMIs in LGN than those in V1 and HVAs. However, it is unclear whether the SMIs in LGN were still significant. The SMI values for both LGN buttons (Line #100) and LGN units (Line# 130) might be statistically significant from zero. The statistical comparison p-values should be given in both cases. Second, Figure 3 - figure supplement 1 b,f show that the SMI values in LGN could be predicted by spatial modulation, but not by visual stimuli alone or behavioral variations, just like those in V1 and HVAs. This seems to me good evidence for the presence of spatial modulation in LGN. Therefore, it is my opinion that the data do not support the complete lack of spatial modulation in LGN, but do clearly demonstrate weaker spatial modulation in LGN than in V1 and HVAs.

    2. Reviewer #1:

      This paper investigates the modulation of spatial signals in higher order visual areas. A number of the findings are novel and interesting, including that signals in higher visual areas are not more influenced by spatial position that signals in V1, that this modulation is not a general feature of the entire visual circuit (i.e. LGN boutons in L4 of V1, as well as LGN units, show very little spatial modulation, and that spatial modulation decreases when mice are watching a replay of tunnel traversals. Overall, I think this paper provides new insight regarding position coding in visual systems. However, there are some points that should be addressed.

      1) The imaging data is from mice with different genetic backgrounds, as well as a mixture of gcamp6f and 6s. In addition, different reward protocols were used for different mice. Although the authors state in the methods that none of these factors impact their results, it would be good to include some quantifications to this effect (e.g. they could show the distribution of SMI for 6f data vs 6s data). While I don't expect the major observations to change if it turns out that some of these factors have as systematic effect, it could affect portions of the results where the dataset is split up - for example in the comparison between different higher visual areas, and the observation that spatial modulation appears to vary with receptive field location.

      2) The authors state that it is to be expected that LGN neurons respond more strongly in the first half of the corridor due to contrast adaption mechanisms. However, I did not see any quantification that could support this statement?

      3) When looking at the spatial modulation index, the authors switch between using median (e.g. Fig 1 and 2) and mean (Fig 4), t-test and rank-sum - and sometimes there is missing information regarding which (mean or median) they are reporting. The authors need to include more detail regarding these statistics.

      4) It was not clear to me if the authors are only imaging from layer 2/3 or if they also attempted to image deeper layers.

      5) Throughout the paper, the authors use 'firing rate' to refer to deconvolved calcium signal. Although this is stated in the methods, this wording can be misleading, especially since the paper also contains extracellular recordings of spiking activity.

      6) It was not clear to me how the dotted lines (e.g. Fig 1 b) were calculated.

    3. Summary: This paper investigates the modulation of spatial signals in higher order visual areas in mice navigating virtual reality environments. Previous work demonstrated that the spatial position of an animal modulates neural activity in the primary visual cortex (V1). Here, the authors demonstrate that this spatial modulation however, is not a general feature of the visual circuit. Similar spatial modulation occurs in higher visual areas but not in lower visual areas, such as the lateral geniculate nucleus. Moreover, this work finds that spatial modulation was stronger when animals had more experience on the track and when the animals were actively performing a task, rather than when the animal was passively viewing the same virtual track. Since the first reports that visual neurons show modulation by spatial position during spatial navigation tasks, similar to that observed in hippocampal place cells, the source of this modulation has been an open question. This work adds new insight regarding this question, suggesting that it is likely either generated within the visual cortex itself or propagated in a top-down manner from higher brain areas, rather than in a bottom-up manner from the thalamus.

    1. Reviewer #3:

      In this interesting paper authors compare MEG recordings of svPPA patients and 44 healthy controls during living vs. non-living categorization tasks. Both patients and the control group performed this task with similar accuracy. In addition, svPPA patients showed greater activation over bilateral occipital cortices and superior temporal gyrus, and inconsistent engagement of frontal regions. The authors conclude that patients with svPPA compensate for their semantic deficit by recruiting regions involved in perceptual processing.

      This is a well written study and the results are presented clearly. The findings are novel and interesting.

      1) One question for clarification is whether the recruitment of the occipital areas in semantic PPA is truly "compensatory" - does it indicate a shift of resources due to the anterior temporal atrophy? Is the recruitment of the parieto-occipital regions associated with more accurate performance?

      2) The main results concentrate on the differences between patients and controls in the low gamma range. There are also significant effects in the other frequency bands (e.g., high gamma, beta and alpha). Could the authors discuss the functional significance of these effects?

    2. Reviewer #2:

      Borghesani and colleagues aimed to understand how dysfunction in the ATL alters the dynamic activity during semantic categorization. To achieve this, they contrast MEG responses between patients with svPPA and age-matched healthy controls. Both groups show similar profiles of behavioural performance on the task, and broad similarities in MEG responses. Critically, svPPA patients show enhanced gamma synchronization in the occipital lobe compared to controls, while gamma synchronization was correlated to task RTs.

      In general, I found the manuscript interesting, and the major strength being the application of MEG analyses to a clinical population during a cognitive task. In terms of improvements, I think the results could be more fully characterized, which would allow for more expansive interpretations and inferences.

      Major comments:

      1) As the paper is about 'Neural dynamics', I felt this aspect could be developed, with the timing of the effects characterized further, and considered more in relation to the conclusions. For example, the main finding is the increased occipital gamma response in svPPA compared to controls. Looking at Figure 3, there is a peak in the svPPA group near 200 ms, and very little synchronized activity in the control group. This is interesting as there are many ways we could have seen svPPA > controls, but this suggests that the gamma synchronization response associated with compensation is specific to the svPPA group (and largely absent from controls - also from Supp fig 1), and is distinguished from an initial visual evoked response (peaking ~100 ms). I would recommend discussing and characterizing the dynamics of this effect more, such as what a later occipital effect could tell us about dynamics given ATL dysfunction? Is this increase a result of a lack of top-down effects from ATL? I think these kinds of issues could be explored and discussed more.

      2) The occipital gamma effect looks like the primary visual cortex, which might suggest the effects are not related to higher-level perceptual features (such as has eyes, teeth) as the authors suggest, but rather low-level visual effects. Do the authors perhaps think the effects could relate to enhanced processing of visual details (as related to the ideas of Hochstein and Asher's reverse hierarchy), or whether the effects relate to additional visual input following a visual saccade?

      3) The VBM results for the svPPA patients were surprising given that all the atrophy appeared in the left hemisphere. There can be hemispheric differences in svPPA, but is this a true lateral pattern (meaning the right ATL is intact) or a product of VBM being run so that the most atrophied hemisphere is shifted to the left side? If the VBM maps are correct, and the svPPA patients are only showing left hemisphere atrophy, then what does this suggest about the role of the right ATL, and the bilateral nature of occipital increased in svPPA?

      4) Both svPPA patients and healthy controls achieved around 80% accuracy in the categorization task. This seems surprisingly low given, (1) the task (living vs. nonliving after seeing the image for 2 seconds), (2) that all the images were pretested and had high name agreement, and (3) that items were repeated on average 2.5 times. Is there something that explains this low performance for all individuals?

    3. Reviewer #1:

      This study examines MEG activity in a picture categorization task (decide living or non-living) in a sample of 18 patients with semantic variant PPA, compared to 18 controls. As svPPA is a rare (but scientifically informative) disorder, the sample size is impressive, and given that relatively few MEG studies exist in PPA at all, this is an interesting dataset. The authors show differences in engagement of oscillatory activity, specifically increased low-gamma ERS in occipital cortex and increased beta ERD in the superior temporal gyrus. The authors interpret this as reflecting increased engagement of / reliance on early perceptual mechanisms for completing the task, as opposed to semantic identification of the picture.

      Major concerns:

      1) My biggest methodological issue with this paper relates to a very old debate in neuroimaging that still comes up all the time: the choice of statistical threshold. Using a high threshold prevents false positives, but may also lead to false negatives, and I fear that is the case here, with the high threshold contributing to an unrealistic impression of spatial specificity in MEG. It is obvious from the average responses in both groups that these oscillatory responses are widespread through the brain. Indeed the alpha and beta responses are significant in the majority of cortical voxels. This basic property of the responses should be presented clearly and prominently in the paper - I don't think it's appropriate to put it in supplementary information where only a minority of readers will even see it. The authors then use what I think is an extremely high and conservative statistical threshold to contrast differences between the two groups. P<.005 uncorrected is a highly conservative threshold already, even before cluster-thresholding is added (although with data as smooth as MEG beamforming solutions, cluster-thresholding is unlikely to change anything). Basically this makes the only the strongest part of the activation survive, and it is valid to conclude that a significant group difference exists there (protected from Type 1 error), but this can give a false impression of the difference is specific to that region. I think a more realistic characterization of the results would involve measuring differences in the strength of the responses between groups on a broader level, possibly the sensors or in large ROIs - and not ROIs pre-selected to show a dramatic difference by first searching the whole brain for the most significant effects - that is the classic "double-dipping" fallacy in neuroimaging.

      2) Similarly, the ERD/ERS in each frequency band is treated as a separate entity, ignoring the fact that these bands are arbitrary and frequency is a continuous quantity. This matters because much is made of the fact that PPA participants exhibited greater ERS in the low-gamma range, and that this was correlated with reaction time. Supplementary figure 1 shows that both groups had strong occipital ERS in the high-gamma range, but only PPA showed it in the low gamma range as well. This suggests that the ERS in the PPA group may simply have been shifted to a lower frequency range. A more fulsome characterization of these group differences via time-frequency analysis and/or power spectral analysis would help clarify what is going on here.

      3) It is surprising that PPA participants only exhibited increased MEG responses compared to controls - assuming that both gamma ERS and beta ERD can be interpreted as increased neural activation, which is a reasonable assumption based on the literature. No decreases in the PPA group are found, and thus the observed increases can be plausibly attributed to compensatory processes as framed by the authors. However, I am concerned about the role of certain analysis choices in producing this data pattern. In particular, the authors state (line 611): "To remove potential artifacts due to neurodegeneration or eye movement (lacking electrooculograms), we masked statistical maps using patients' ATL atrophy maps (see section MRI protocol and analyses), as well as a ventromedial frontal mask."

      It is not clear whether this masking was done in group space from average atrophy maps, or on an individual level. In either case, I don't think this is well justified. I don't know any physical mechanism by which tissue undergoing neurodegeneration can be said to generate an artifactual signal. Atrophied tissue still contains living neurons with ionic currents; these are real signals not artifacts, and furthermore, atrophy is a continuous process with tissue further from the epicenter also undergoing similar neurodegenerative mechanisms. Atrophied tissue may well generate electromagnetic signals that are different from healthy tissue, and such differences should be included in this paper. I think that there may be regions of hypoactivation as well as hyperactivation in this PPA group. If the hypoactivation localizes to atrophied tissue and the hyperactivation to other regions, that will bolster the case that we are seeing compensatory processes, but it isn't certain with half the story masked. I also don't really see statistical masking of the frontal region as a valid solution to eye movement artifacts. The authors would have to present evidence that the region that they masked corresponds to the region potentially affected by eye movements. However, many studies have found that beamforming already does a pretty good job of removing ocular artifacts from estimated brain signals, except for very close to the eyes.

      4) The correlation with reaction time in the occipital cortex is consistent with the idea that the ERS there may reflect compensatory overreliance on perceptual information, but it isn't conclusive. The authors suggest that PPA patients are able to categorize the stimuli correctly based on visual features, but are unable to name them. What about testing for correlations with the out-of-scanner behavioural measures that established that the patients have a naming deficit? It would strengthen the case if atrophy or hypoactivation (see comment above) correlated with the naming deficit.

    4. Summary: Borghesani and colleagues aimed to understand how dysfunction in the anterior temporal lobe (ATL) alters dynamic activity during semantic categorization. They contrast MEG responses between 18 patients with semantic variant Primary Progressive Aphasia (PPA) and 18 age-matched healthy controls. Both groups show similar profiles of behavioural performance on the task, and broad similarities in MEG responses. Critically, however, svPPA patients show enhanced gamma synchronization in the occipital lobe compared to controls. The authors interpret this as reflecting increased engagement of / reliance on early perceptual mechanisms for completing the task, as opposed to semantic identification of the picture.

      Overall, the reviewers found the manuscript interesting. As svPPA is a rare (but scientifically informative) disorder, the sample size is impressive, and given that relatively few MEG studies exist in PPA at all, this is an interesting dataset. However, the general opinion is that the results could be more fully characterized, which would allow for more expansive interpretations and inferences.

      This manuscript is in revision at eLife.

      Reviewer #2 and Reviewer #3 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #3:

      Neuronal ensembles have been shown by this lab and others to constitute one basic functional unit for the representation of information in cortical circuits. It is therefore important to determine how stable these blocks of representation might be. If these ensembles were preserved across time and sensory stimuli, this would indicate a significant degree of structure underlying cortical representations. In a first attempt to address these important issues, this manuscript analyzes the long-term stability of ensembles of coactive neurons in the layer 2/3 of mouse visual cortex across several days. Ensembles were recorded during periods of spontaneous activity as well as during visual stimulation (evoked). For this, the authors record spontaneous and evoked activity using two-photon calcium imaging one, ten and 40 days after the first recording session. In order to maximize overlap between successive imaging sessions, the authors record three planes separated by 5 microns almost simultaneously (9ms interval) using an electrically-tunable lens. They show that ensembles extracted during visual stimulation periods are more stable on days 2 and 10 than those computed during spontaneous activity. Stable ensembles display a higher "robustness" (a parameter that quantifies how many times a given ensemble is repeated and how similar these repeats are) . Neurons displaying stable membership are more functionally connected than unstable ones. It is concluded that such observed stability of spontaneous and evoked ensembles across weeks could provide a mechanism for memories. Long-term calcium imaging within the same population of neurons is a real challenge that the authors seem to overcome in the study. The conclusions are important, my main concern relates to the number of experiments and analyses supporting these findings as detailed below.

      Number of experiments and statistics: According to Table 1, two mice with GCamP6f have been through the complete imaging protocol (days 1,2, 10 and 43) but none with the 6s, since 3 missed the intermediate measure (day 10) and one the last point (day 40+). Therefore five mice have been recorded over weeks with two different indicators, but only two were sampled on day 10. One mouse was only recorded until day 10. Altogether, this is quite a low sampling, but the experiments are certainly difficult. However, the total number of experiments analyzed is higher, due to the repeat of 3 sessions on the same mouse per day. This certainly contributes to reaching significance. However, the three samples from the same mouse are not independent points. Are the FOVs different for each session in the same mouse? If they are the same, then the statistics should be repeated but treating all experiments from the same mouse as single experiments. I would suggest repeating the analysis but using only one data point per mouse per day. Also, given that two different indicators were used (6s and 6f), one would need to see whether the statistics are the same in the two conditions.

      Robustness: the authors compute this metric, as the product of ensemble duration and average of the Jaccard similarity and find that stable ensembles display higher robustness: isn't it expected that robustness is higher in stable ensembles given that stable ensembles should be observed more often?

      Evoked ensembles: It seems to me that evoked ensembles are ensembles extracted during continuous imaging periods that include stimulation. However, one would expect evoked ensembles to be the cells activated time-locked to the visual stimulation. This notion only appears at the end of the paper with "tuned" neurons in Fig. 4. In the discussion, authors conclude lines 205-207 that "sensory stimulus reactivate existing ensembles" . I do not think this is supported by the analysis performed here. For this, I believe that one would need to compare, within the same mouse the amount of overlap between spontaneous ensembles and "tuned neurons".

      How representative are the illustrated examples in Figs. 2&3? The authors report that about 20 neurons remain active from day 1 to 46 but their main figures display example rasterplots with more than 60 neurons, which is three times more than the average. Is this example representative? Which indicator was used? Is there a difference in stability between 6f and 6s?

      Rasterplot filtering: The authors chose to restrict their ensemble analysis to frames with "significant coactivation". Why not use a statistical threshold to determine the number of cells above which a coactivation is significant instead of arbitrarily setting this number to three coactive neurons? In cases of high activity this number may be below significance.

      Demixing neuronal identity: The authors assign a neuron to an ensemble if it displays at least a functional connection with another neuron. They use reshuffling to test significance of functional links but still it seems that highly active neurons are more likely to display a high functional connectivity degree and therefore to be stable members of a given ensemble with that definition of ensemble membership. What is the justification to define membership based on pairwise functional connectivity? The finding that core ensemble members display a high functional degree may be just a property reflecting a property of highly active neurons (as previously described by Mizuseki et al. 2013).

      Type of neurons imaged: The authors use Vglut1-Cre mice, therefore they are excluding GABAergic cells from their study, this should be clearly mentioned and even discussed.

      Volumetric imaging: I am not sure one can say that "volumetric imaging" was performed here, rather this is multi-plane imaging.

      Mouse behavior: there is little detail concerning mouse behavior, are mice allowed to run? What is the correlation between ensemble activation and running?

      Abstract: the authors should say that 46 days is the longest period they have been recording, otherwise it gives the wrong impression that after 46 days ensembles are no longer stable. Also "most visually evoked ensembles" should be replaced by "ensembles observed during periods of visual stimulation" (see above). "In stable ensembles most neurons still belonged to the same ensemble after weeks": how could ensembles be stable otherwise?

      Discussion: I found the discussion quite succinct. It lacks discussion of the circuit mechanisms for assembly stability and plasticity (role of interneurons for example?), the limitations and possible biases in the analysis and the placing of the results in the perspective of other studies analyzing the long-term stability of neuronal dynamics.

    2. Reviewer #2:

      Overall I think the authors collected an interesting dataset. Analyses should be adjusted to include all cells rather than sub-selecting for stability. Additionally, the language needs to be adjusted to better reflect the data. I wish there was any behavioral data included, but if the authors compare their data to publicly available data in V1 for a single recording session during a visually guided task, these concerns could be quelled a bit.

      1) In general the language of this paper and title seem to mismatch the results. The fraction of cells that were 'stable' as the authors say on line 112 was very small, however the authors focus extensively on this small subset for the majority of analyses in the paper. Why ignore the bulk of data (line 119)? What happens if you repeat the same analysis and keep all cells in the dataset? The general language around stability of neural ensembles should be adjusted to better reflect the data (ex: lines 157, 225).

      2) There are claims in this paper about how ensembles 'implement long-term memories' in the introduction and conclusion and yet the authors never link the activity of ensembles to any behavioral or stimulus dependent feature. This language reaches far beyond the evidence provided in this paper. The introduction could provide some better framing for expectations of stability vs. drift in neural activity rather than focus on the link between ensembles and memory given that there isn't much focus on the ensembles' contribution to memory throughout. For example, the last sentence of the paper is not supported by data in the paper. Where is the link between ensembles and memory in the data? What is the evidence that transient ensembles are related to new or degraded memories? This reads as though it was the authors' hypothesis before doing the experiments and was not adjusted in light of the results.

      3) There is no discussion around the alternative to stability of neuronal ensembles. What are the current theories about representational drift? For example, in Line 34 the authors present an expectation for stability without any reasoning for why there need not be stability. This lack of framing makes their job of explaining results in line 217 more difficult. There is a possibility that the most stable cells aren't more important - what is the evidence that they are? Does an ensemble need a core? Would be interesting to include some discussion on the possibility of a drifting readout (Line 223). [https://doi.org/10.1016/j.conb.2019.08.005]

      4) How do activations in V1 in this dataset compare to other data collected from V1 while the animal is performing a task (where for example the angle of the gradings is relevant to how the mouse should respond)? I would be interested to know if the authors compared statistics of their ensembles to publicly available data recorded in V1 during a visually guided behavior. Are the ensembles tuned to anything in particular? Could they be related to movement? [http://repository.cshl.edu/id/eprint/38599/]

      5) The authors provide some hypotheses as to why fewer cells are active in the later imaging sessions (dead/dying cells?). This is worrisome in regards to how much it might have affected the imaged area's biology. One alternative hypothesis is that the animal is more familiar with the environment/ not running as much etc. Have the authors collected any behavioral data to compare over time?

      6) How much do the results change when you vary the 50% threshold of preserved neurons within an ensemble (Line 146)? Does it make sense to call an ensemble stable when 50% of the cells change? Especially given that the cells analyzed as contributing to an ensemble are already sub-selected to be within the small population of stable cells (Line 119)?

      7) Cells are referred to as 'stable' when they're active on 3 different sessions that are separated in time. However, the authors find a smaller number of cells are stable over extended time (43-46 days later). If we extrapolate this over more time, would we expect these cells to continue to be stable? Given these concerns, it might make more sense to qualify the language around stability by the timespan over which these cells were studied.

      8) Filtering frames to only coactive neurons for ensemble identification seems strange to me. Authors may be overestimating the extent of coactivation. What happens when you don't do this? How much do the results change when you don't subselect for Jaccard similarity? I would be interested to see how the results vary as you vary this threshold (Line 136).

      9) The term 'evoked activity' is misleading because the authors don't link these activations to the visual stimulus. There's no task, so the mice could be paying little attention to the stimulus. Should we really consider this activity to be visually driven? Could the authors provide any evidence of this?

      10) A method like seqNMF could reveal ensembles that are offset in time. This looser temporal constraint could potentially reveal more structure. This should be run on the entire dataset (without stability sub-selection). I suggest this as a potential alternative or supplement to the method described by the authors. [https://elifesciences.org/articles/38471]

    3. Reviewer #1:

      Perez-Ortega and colleagues performed rigorous experiments to determine if the activity of neurons in the visual cortex is similar across days, in particular comparing spontaneous activity in the absence of visual stimuli across days, which was previously not examined to my knowledge. The paper claims that evoked ensembles are more stable than spontaneous ensembles, but more convincing quantitative analyses are required to support these claims.

      Major Comments:

      1) There is only one mention of prior work with multi-day imaging in the visual cortex (Ranson 2017). Another related study to cite and compare your results to would be Jeon, ..., Kuhlman 2018 (and I think a comment about how similar/different your results are from this study + Ranson would be useful for the reader). I would also recommend mentioning that there are studies that have observed differences in evoked activity across learning in V1 (e.g. Poort, Khan et al 2015; Henschke, Dylda et al 2020). Do you think there was adaptation across days to the stimulus that you repeated?

      2) Some GCaMP6f mice have aberrant cortical activity (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5604087/). In the raw data (Fig 1F) it doesn't look present, but it would be useful to show more time and sort the neurons by their first PC weights perhaps to see the activity structure.

      3) The approach of 3 plane imaging taking the maximum projection seems useful for tracking cells across days. There is a claim that some cells are no longer found / no longer active. Based on Fig 1G it appears there may have been some Z-movement from day 10 to day 46. This Z movement may explain some of the lost active cells. As a sanity check I would recommend plotting the Z-plane on which the cells were maximally active on day 1 vs the Z-plane on which the cells were maximally active on day n.

      4) There is an emphasis on analyzing the data as ensembles but I think this may be missing other slow, gradual changes. The definition of stable is at least 50% of neurons were preserved across days. However, the fitting procedure of finding ensembles may produce different ensembles even if those neurons are still correlated to each other. I would recommend two possible additional analyses: 1) compare the correlation matrices for common neurons across days (unless there are too few neurons for this); 2) look at changes in single neuron statistics across days. For 2) this may include reliability of neural responses to the visual stimuli, the weights of the neuron onto the first principal component of spontaneous activity, or the correlation of a neuron with running speed. I think these results may solidify your ensemble result (evoked-related statistics change less across time).

    4. Summary: This work examines whether coincident firing of neurons in the visual cortex is preserved over a long timescale (one month) which is important because it provides insight into the stability and plasticity of neural circuits and visual representations. The authors find that subsets of identified neurons maintain coordinated firing despite some degree of flux in the firing activity across the population.

      All reviewers agreed that the question is important but found the analysis lacked depth and there were some technical issues in the experiments that should be addressed with a fuller discussion and potentially additional analysis to eliminate confounds/artefacts. In general, and in light of earlier work (some of which is not cited) the conclusions need to be more circumspect. Specifically:

      • There were concerns about movement/loss of cells/calcium indicator artefacts over this long imaging period that should be accounted for more rigorously.
      • The analysis applies a somewhat arbitrary criterion for stability (50% of cells remain responsive in an assembly). This threshold should be systematically explored and justified more carefully.
      • The wider literature on this topic should be more thoroughly cited, limitations of the study should be transparently laid out, claims about the overall stability found in this population response and its relevance to memories and behaviour should be moderated in line with the comments below.

      Reviewer #1 and Reviewer #2 opted to reveal their name to the authors in the decision letter after review.

    1. Reviewer #3:

      From the technical perspective this manuscript provides clear results that are consistent with, but do not prove, what this reviewer believes is the main objective of the work; to establish the relevance of the open structure of the eukaryotic cysteine desulfurase complex. This reviewer has no good basis to either accept or reject the open structure as having physiological relevance. This could well be the case but it is not clear from my (limited) knowledge of the published literature that the relevance of the open structure is generally accepted. From this perspective I believe the manuscript is sound from the technical approach and experimental implementation but suffers from a lack of clarity about the case for and against the relevance of the open structure. If this is a point of controversy in the field the topic should be discussed in depth and the position of the authors more clearly articulated.

    2. Reviewer #2:

      In this manuscript, Barondeau and co-workers test a hypothesis for the role of the protein frataxin in iron-sulfur cluster assembly, seeking, inter alia, to explain the observation that mutations in the gene encoding this protein are associated with the incurable neurodegenerative disease, Friederich's ataxia. Their notion is that, whereas the bacterial versions of the sulfur-providing cysteine desulfurase are stable homodimers - in which the interactions between the monomers help to organize the mobile loop harboring the key cysteine residue that serves as general acid and nucleophile in the C-S-cleavage reaction that mobilizes the sulfur for incorporation into the cluster - the human enzyme (i) has a dimer interface that has been weakened through evolution, (ii) can be monomeric or form non-optimal dimeric forms, and (iii) can be driven to adopt the optimally active dimer form by intervention of accessory proteins (e.g., frataxin). Their approach was to perturb a bacterial (E. coli) cysteine desulfurase (IscS) by structure-guided mutagenesis in an attempt to introduce into it the behavior of the human enzyme, specifically its activation by accessory proteins (here CyaA and FXN). The experiments were successful in this goal. I like this paper and believe that it is interesting and important. I would point out two aspects that perhaps leave room for improvement.

      1) In principle, it would have been a more powerful test of their hypothesis had they been able to perturb the human enzyme to get a constitutively active form, no longer dependent on the binding of the accessory proteins, either instead of, or in addition to, the converse perturbation of the bacterial system. Perhaps this approach was precluded by difficulties associated with the human enzyme?

      2) The second criticism is that the effects on quinonoid form decay and activity are rather modest. However, I believe that important biological effects can arise from even such modest regulation of enzyme activity levels.

    3. Reviewer #1:

      This study presents a detailed and focused study of the structural basis for a regulation strategy used by a human iron-sulfur cluster biosynthesis system, elucidated by artificial installation of new amino acids into a bacterial system that lacks the allosteric elements of the human enzyme. The work includes quaternary structure analysis and activity assays of variant bacterial proteins. It is performed competently and supports the conclusions. But the focus may be too narrow for a general audience. To bring the work over the bar, the authors could test whether installing the bacterial residues into human NFS1 restores activity without frataxin (inactivated in the human genetic disorder Friedrich's Ataxia). Furthermore, some elements of the study could be presented more clearly/rigorously to communicate the significance of the work to a general audience. These suggestions are listed below.

      1) It would be useful for an unfamiliar reader to include a diagram of the bacterial and human iron-sulfur cluster biogenesis pathway. It would also be helpful to depict the mechanism of the IscS/NFS1 cysteine desulfurase reaction - essentially a picture to go along with the description of the PLP-dependent transformations described in paragraph 2.

      2) In the first paragraph of the results section - I would be interested to see more details about the selection of the three residues targeted for mutagenesis. For example, did the authors inspect the interfaces of existing crystal structures of these complexes? Did they create sequence alignments for multiple eukaryotic/prokaryotic cysteine desulfurases and select sites conserved in bacterial proteins but not eukaryotic ones? More description of the experimental or bioinformatics basis for selecting these three sites would be important for convincing the reader that the basis for this work is sound.

      3) The structural basis for the dimer interaction and the enhanced activity isn't completely clear - how do the changed interactions enhance the enzyme activity? A good description of the different quaternary forms and why they are more/less active is given on page 4-5 - but perhaps another link could be made between the exact residues targeted for substitution and the features of the system important for catalysis.

      4) On page 10, the authors describe changes in IscS quaternary structure as a function of concentration. What is the estimated copy number or concentration inside the cell? Which concentration ranges would be most physiologically relevant?

      5) Addition of any helper protein appears to increase the proportion of variant IscS dimer and activity. Is there any reason to believe that this phenomenon is simply a crowding effect? If the same amount of an unrelated protein is added - does the activity/dimer fraction change compared to variant IscS alone?

      6) I found the color scheme in Figure 1 hard to follow - could the authors keep the subunit colors consistent and use text labels directly on the figure panels for the subunits and forms (open, ready, etc). I also don't think the "Clash!!" labels are necessary. A more effective approach might be to use zoomed-in insets for each clash.

      7) In Figures 4-6 - could the authors include a more complete description of the error bars? What kind of error is shown? Are the replicates different experiments done on different days? These presentations might also benefit from showing the actual data points on top of the bars/error bars.

    4. Summary: This study provides support for a proposed allosteric regulatory mechanism in a human iron-sulfur cluster biosynthesis protein that is linked to the human genetic disorder, Friedrich's Ataxia. In an approach guided by inspection of a structure of the human enzyme, the authors successfully converted a bacterial homolog lacking allosteric regulation into a system that behaves similarly to the human one. The work provides validation of the roles of accessory proteins in activating iron-sulfur cluster biosynthesis machinery. It also could open novel routes for therapeutic intervention in genetic disorders of this process in humans.

      The major concerns about the study center on the significance of the form of the human enzyme structure used as the basis for designing the mutagenesis/activity experiments in the bacterial enzyme. To bolster the underlying framework for the experiment design, the description of the existing human enzyme structures and how exactly they were used to select sites for mutagenesis in the bacterial counterparts should be improved to include more detail and balanced perspective. Experiments are suggested to show that activity enhancement upon addition of accessory proteins is specific to those factors, along with a more comprehensive discussion of the errors and reproducibility in activity measurements. Finally, the significance of the work would be elevated if the authors could use a similar approach to install activating mutations in the human enzyme - particularly if these could overcome the requirement for frataxin.

    1. Reviewer #3:

      This study combines two cutting-edge approaches for the study of polyclonal antibody responses to understand the molecular profiles of antibodies elicited by HIV envelope trimer immunization in a rabbit model. In one arm of the study, the authors performed mutational profiling of serum antibody neutralization escape variants, and in the second arm they used electron microscopy polyclonal epitope mapping (EMPEM) to track antibody binding sites. These authors performed large-scale data collection and present high-quality validation data and explorations of the resulting datasets that compare antibody binding and virus neutralization profiles. These approaches provide a comprehensive window into the molecular specificity and performance of HIV immunization and are expected to inform advanced HIV-1 vaccine designs.

      Summary of any substantive concerns:

      The authors have done a nice job validating the integrity of the NGS data, and the strong data in Figs 4/5/2B show the power of the NGS-based neutralization mapping assays. This adds a solid confirmation of the study findings and demonstrates the quality of the techniques. Overall this is a solid study and the findings are informative. I see just a few methods updates and analyses that would help finalize the presentation of methods and data.

      1) Additional information on the bioinformatic methods for data analysis is needed. How did the authors handle discrepancies in data across replicates or libraries, for example if a mutation that was enriched in one library or replicate, but deleted in another? Were there any quality filters or metrics used to estimate true signal vs. noise?

      2) Differential selection statistics are mentioned briefly, along with citations to prior publications. Prior citations are definitely helpful. I think it is still important to state the key steps used in processing NGS data and the statistical techniques and quality metrics that were used. The authors should also state any criteria for acceptance or rejection or binning of individual data points, or acceptance/rejection of datasets or replicates, if quantitative criteria or metrics were used.

      3) Several replicates showed a low percentage infectivity (Fig S1, e.g. animals 5724 and 2124), but the text indicates averages between 0.3% and 2.7% infectivity. Were some groups omitted from analysis, or were all groups included?

      4) How well did the mutational profiles correlate between different libraries or replicates of the same samples?

    2. Reviewer #2:

      This manuscript by Dingens et al. develops a novel application of mutational antigenic scanning to identify dominant neutralizing antibody epitopes in polyclonal sera from vaccinated animals, and compares the findings of such techniques with those from cryo-EM based unbiased mapping of binding antibodies and from conventional mutational mapping of neutralizing epitopes. Overall, I find the experiments and analyses to be of high quality, thorough and of sound reasoning, and the manuscript to be well written. I also commend the authors for the development of a facile and easy-to-use interactive viewer for exploring the mutational scanning data. I think the dual approach of mutational scanning and cryo-EM based mapping has the potential to be a powerful approach for dissecting antibody content of polyclonal sera post-vaccination or in infected hosts.

      The only major concern I could identify is the following. One of the main advantages of the mutational scanning approach is that it can identify novel epitopes targeted by antibody responses in a high-throughput manner. It is a little disappointing that this advantage was not leveraged in the current manuscript, perhaps due to the choice of the vaccine (BG505 SOSIP trimers where the epitopes have been thoroughly mapped in the literature) and the selection of vaccinated animals. Looking at Fig. 2, animal 5727 was the only animal whose serum showed some selection signatures outside of the regions considered in depth (at sites 507 and 509) - have the authors analyzed these escape mutations? If not, and only if possible within reasonable workload, I urge the authors to pursue this example or any other example where a potential novel epitope discovery could be possible.

    3. Reviewer #1:

      Dingens et al. report a timely complementary study to map neutralizing and binding responses in polyclonal rabbit sera induced by immunization with the BG505 SOSIP Env trimer. Neutralizing responses are mapped using libraries of replication-competent HIV expressing all mutants of the BG505 Env, an approach developed in the Bloom laboratory. Binding responses were mapped using an EM-based method, EMPEM, developed in the Ward laboratory. The Env mutations that affect neutralization of the autologous BG505 strain in the BG505-SOSIP-immunized animals were largely known from other studies, as were the binding (not necessarily neutralizing) responses - the strength of this study is the combination of the two approaches. It is especially useful that the complex datasets have been deposited on-line where they can be interactively explored, including mapping onto Env trimer and monomer structures. Although results were anticipated, it is very nice to directly compare the neutralization epitopes to the binding epitopes determined by EMPEM. This is a well-written and beautifully illustrated paper.

    1. Reviewer #3:

      The authors probe mechanosensory processing in Hydra by measuring calcium activity in neurons and muscles in response to precise mechanosensory stimulation in whole and resected animals. The authors' claims are well supported by the evidence. The development of a mechanosensory delivery system for Hydra is also a significant methodological advancement. Taken together, the work advances our understanding of the Hydra nervous system and is a needed step towards developing Hydra as a powerful model for systems neuroscience.

      Substantive concerns:

      1) One weakness is that different measures of "mechanosensory response" are used at different places in the manuscript. In some contexts, a response is defined as calcium activity in neurons (Fig 2), and elsewhere as calcium activity in muscles (Fig 3 and 4). And in Fig2 SuppFig2 muscle contractions are also measured using MeKs. The relation between neural activity, muscle activity and body movement is of course of high interest, and the paper explores this. But, if technically possible, it would be helpful to report a single metric of behavior that could be used in all experiments. For example, it might be possible to use video of the animal's pose or body length to measure contractions in all experiments. At a minimum the reasoning behind choice of measurement of response for each experiment could be discussed explicitly.

      2) Related: Without a consistent measure of behavior, it will be important to further clarify figures so that a reader can tell at-a-glance how contraction probability is being measured.

    2. Reviewer #2:

      The Hydra, in the phylum cnidaria, is a near microscopic freshwater animal that has recently resurfaced as an attractive model organism in neuroscience due to its optically accessible transparent body, sparsely distributed neural network, and simple behaviors. In this manuscript, Badhiwala and colleagues use calcium imaging of the Hydra neural network, combined with surgical resection and microfluidics pressure stimulation to identify body regions indispensable for mechanosensory activity. They report that while resection of the aboral region did not abolish the mechanical response, resection of the oral region attenuated this response, while combined resection of oral and aboral regions showed the greatest effect. They also find a correlation between reduced stimulated activity and spontaneous activity, suggesting a common mechanism that gives rise to both activities. While this study takes on an innovative approach by using a microfluidics device to mechanically stimulate the hydra under optical recording there are a number of conceptual and technical limitations. Perhaps my biggest reservation is that despite real potential, the data are rather low resolution (body transections and bulk calcium responses) and as such the conclusions that can be reasonably drawn do not extend what is known in a significant way.

      Major comments:

      1) The authors have designed a microfluidic device that allows them to simultaneously mechanically stimulate, monitor movement and functionally image a hydra. The highly quantifiable nature of the microfluidic device is a great asset, although this potential is not deeply explored. While I can see how the microfluidic stimulation could offer benefits over fluid jet or blunt probe, more in-depth characterization is needed.

      2) What is the spatial distribution of the pressure pulse stimulus on the Hydra body? How far does the mechanical force spread from the region directly touching the pressure valve?

      3) The use of the microfluidic device was limited. Have the authors attempted to map mechanical sensitivity across the Hydra body by stimulating different sites?

      4) The authors have not attempted to record calcium responses from single neurons, but rather spatially average a population response from a large region of interest. This should be specifically stated in the results section. More importantly, to provide insight into network function much smaller ROIs over multiple sites are needed instead of the bulk activity of the entire peduncle. This seems like a real lost opportunity as the lure of the optically clear and small hyda is that neural representation and coding can be tracked over large portions of the network at cellular resolution.

      5) It is unclear where the recorded signals are coming from and if movement is creating artifacts. Have authors made any attempts to correct for movement? The supplemental movies show a stationary region of interest and moving animal, in some cases parts of animal moving in and out. Furthermore, is background subtracted and how? There is a large fluorescent signal coming from the entire body/ middle columnar part of the body and spontaneous firing that makes interpretation of the data difficult.

      6) Contraction is a behavioral response of the animal; however, the authors use 'contraction' do describe calcium imaging responses throughout the figures and text. This should be avoided.

      7) I am unsure if the title of the paper is accurate. I do not think this work has demonstrated "multiple nerve rings" are important for coordinating mechanosensory behavior.

      8) Furthermore, the claim that the observed "linear relationship" between the spontaneous contraction probability and resection type is evidence for shared neural pathways is a stretch. These data are fairly coarse resolution and include only 3 animals in each group with highly variable responses (Figure 4C). Additionally, they do not provide evidence to distinguish the motor circuits they hypothesized these neural nets converge upon.

    3. Reviewer #1:

      The manuscript by Badhiwala et al. is an interesting study using the emerging model system Hydra, which has many advantages for studying the entire nervous system of an animal during simple behavior. Some of the foundational neuroscience papers in this field have only come out in the past few years, and new studies such as the one here, might have the potential to contribute to an important early literature. Despite clear reasons for enthusiasm, the many shortcomings in this work greatly diminished my enthusiasm and support for this study. Although I appreciate building the microfluidic devise with simultaneous pan-neuronal imaging, the nature of the new biological insights provided here seems quite limited and easily predicted based on prior studies in hydra and other model systems. Moreover, the crude nature of some experiments inhibits my ability to make fair judgement of potential findings.

      Major concerns:

      1) The pressurized stimulation of the hydra appeared to be specific to the center of the body. The authors don't mention why this region was chosen, which seems critical to this study. Relatedly, why didn't they test multiple areas across the hydra with this system? Might we expect to see different sensorimotor behaviors, and thus different neural outputs?

      2) The authors reference a recent single cell study characterizing multiple neuronal cell types in hydra. This work would greatly benefit by using some cell-type resolution studies to determine the functional nature of the neurons being activated as opposed to solely using pan-neuronal GCAMP imaging. If they can put GCAMP in all neurons, why not put it in specific subsets of neurons based on cellular identity? This point becomes more salient because a major take-home from this paper is that the spontaneous behavior and firing patterns is nearly identical to the stimulus evoked patterns, except for an apparent increase in firing rate. The true nature of the mechanosensory response might be revealed with cell-type specific experiments.

      3) Although the authors reference whole animal imaging, they focus imaging analysis on peduncle and hypostomal nerve rings, despite the videos showing calcium activity in other areas throughout the body. Moreover, are the authors certain their pan-neuronal genetic strategy equally samples neurons throughout the body? In other words, is the apparent increase in activity in the nerve ring over other areas driven by a technical artifact of these neurons being labeled better?

      4) While I appreciate the resection studies to get at "loss-of-function" experiments, this approach seems rather crude, and potentially confounding to clear interpretation. Exactly which neurons are killed and to what extent, and how many, if any began to regenerate throughout this process? My alarm here is raised especially in light of the author's surprising finding that "footless" animals show that the aboral nerve ring is not required for spontaneous or mechanosensory responses. What if residual activity from neurons not ablated is driving this response?

    4. Summary: Specifically, all of the reviewers agreed that the emerging Hydra system holds great promise for neuroscience discoveries. Moreover, some of the findings presented here have the potential to be of use to other scientists who work in this system. However, we felt that the findings here were too preliminary and underdeveloped. In particular reviewers felt that 1) multiple locations across the Hydra's body should be stimulated coupled with mapping the behavioral and neuronal correlates of such stimulation, 2) the pan-neuronal nature of the bulk calcium measurements made it challenging to fully appreciate which neuronal circuits might be driving the sensorimotor responses, 3) uniform proxies for measuring/plotting the behavior would be useful, 4) the ablation studies lacked cellular resolution, similar to the calcium imaging experiments.

    1. Reviewer #3:

      Lee and Daunizeau formulate a model of the effects of mental effort on the precision and mode of value representations during value-based decision-making. The model describes how optimal levels of effort can be determined from initial estimates of precision and relative value difference between competing alternatives, accounting for the subjective cost of incremental effort investment, as well as its impact on precision and value differences. This relatively simple model is impressive in its apparent ability to reproduce qualitative patterns across diverse data including choices, RTs, choice confidence ratings, subjective effort, and choice-induced changes in relative preferences successfully. The model also appears well-motivated, well-reasoned, and well-formulated.

      I have two sets of concerns, my first set relates to model fitting and validation. The model appears to do fairly well in predicting aggregate, group-level data, but does it predict subject-level data? Or, does it sometimes make unrealistic predictions when fitting to individual subjects? The Authors should provide evidence of whether it can or cannot describe subject level choices, confidence ratings, subjective effort, etc.

      Also, I think the Authors should do more to demonstrate that their model is an advance on simpler variants. The closest thing to model comparison is an exercise where the authors show that, relative to when their model is fit to random data, their model explains more variance in dependent variables when fit to real data. This exercise uses a straw man as a baseline because almost any model which systematically relates independent variables to dependent variables would explain more variance when fit to real data than to data for which, by definition, independent and dependent variables do not share variance. It would be more useful to know whether (and if so, how much) their model explains data better, than, e.g. a model with where effort only affects precision (beta efficacy), or a model in which effort only impacts value mode (gamma efficacy). Since the Authors pit their model against evidence accumulation models, it would be yet more useful to ask whether their data predicts these diverse data better than a standard evidence accumulation model variants.

      My second set of concerns are regarding the assumed effect of mental effort on the mode of subjective values. First, is it reasonable to assume that variance would increase as a linear function of resource allocation? It seems to me that variance might increase initially, but then each increment of resources would add diminishing variance to the mode since, e.g., new mnesic evidence should tend to follow old mnesic evidence. How sensitive are model predictions to this assumption? What about if each increment of resources added to variance in an exponentially decreasing fashion? Also, what about anchoring biases? Because anchoring biases suggest that we estimate things with reference to other value cues, should we always expect that additional resources increase the expected value difference, or might additional effort actually yield smaller value differences over time? If we relax this assumption, how does this impact model predictions?

    2. Reviewer #2:

      The manuscript introduces a computational account of meta-control in value-based decision making. According to this account, meta-control can be described as a cost-benefit analysis that weighs the benefits of allocating mental effort against associated costs. The benefits of mental effort pertain to the integration of value-relevant information to form posterior beliefs about option values. Given a small set of parameters, as well as pre-choice value ratings and pre-choice uncertainty ratings as inputs to the model, it can predict relevant decision variables as outputs, such as choice accuracy, choice confidence, choice induced preference changes, response time and subjective effort ratings. The study fits the model to data from a behavioral experiment involving value-based decisions between food items. The resulting behavioral fits reproduce a number of predictions derived from the model. Finally, the article describes how the model relates to well-established accumulator models, such as the drift diffusion model or the race model.

      Before I get into more detailed comments, I would like to highlight that this work addresses a timely and heavily debated subject, namely the role of cognitive control (or mental effort) in value-based decision making (see Shenhav et al., 2020). While there are plenty of models explaining value-based choice, and there is a growing number of computational accounts concerning effort-allocation, little theoretical work has been done to relate the two literatures (but see Major Comment 1). This work contributes a novel and interesting step in this direction. Moreover, I had the impression that the presented model can account for a broad range of behavioral phenomena and that the authors did a commendable amount of work to validate the model (but see Major Comments 2 and 3). The manuscript is also well written in that it seems accessible to a broad audience, including non-technical readers. However, while I remain curious about what the other reviewers have to say, the manuscript misses to address a few issues that I elaborate below.

      Major Comments:

      1) Model Comparison(s): While the manuscript compares the presented computational approach to existing accumulator models, it could situate itself better in the existing literature, ideally in the form of formal model comparisons. For instance, as someone less familiar with choice-induced preference changes in value-based decision making, I wonder how the model compares to existing computational work on this matter, e.g. the models described in Izuma & Murayama (2013) or the efficient coding account of Polanía, Woodford, & Ruff (2019). I do understand that the presented model can account for some phenomena that the other models cannot account for, at least without auxiliary assumptions (e.g. subjective effort ratings), but the interested reader might want to know how well the presented model can explain established decision-related variables, such as decision confidence, choice accuracy or choice-induced preference changes compared to existing models, by having them contrasted in a formal manner. Finally, it would seem fair to compare the presented account to emerging, more mechanistically explicit accounts of meta-control in value-based decision making (e.g. Callaway, Rangel & Griffiths, 2020; Jang, Sharma, & Drugowitsch, 2020). As these approaches are still in preprint, it may not be necessary to relate them in a formal model comparison. However, the manuscript might benefit from discussing how these approaches differ from the presented model in the text.

      2) Fitting Procedure: This comment concerns the validation of the described model based on its fits to behavioral data. If I understand correctly, the authors first fit the model to each participant while "[a]ll five MCD dependent variables were [...] fitted concurrently with a single set of subject-specific parameters" and then evaluate whether model fits match the predicted qualitative relationship between experimental variables (e.g. pre-choice value ratings and pre-choice confidence ratings) and dependent variables (e.g. choice accuracy). I'm happy to be convinced otherwise, but it appears that the model's predictions could be tested in a more stringent manner. That is, it doesn't appear compelling to me that the model, once fitted, matches the behavior of participants -- please note that this is not to diminish the value of the results; I still think that these results are valuable to include in the manuscript. Instead, rather than fitting the model to all dependent variables at once, it would be more compelling to fit the model to a subset of established decision-related variables (e.g. accuracy, choice confidence, choice induced preference changes) and then evaluate how the fitted model can predict out-of-sample variables related to effort allocation (e.g. response time and subjective effort ratings). Again, I am happy to be convinced otherwise but the latter would seem like a much more stringent test of the model, and may serve to highlight its value for linking variables related to value-based decision making to variables related to meta-control.

      3) Parameter Recoverability: Given that many of the results rely on model fits to human participants, it would seem appropriate to include an analysis of parameter recoverability. That is how well can the fitting procedure recover model parameters from data generated by the model? I apologize if I missed this, but the manuscript doesn't appear to report this kind of analysis.

      References:

      Callaway, F., Rangel, A., & Griffiths, T. L. (2020). Fixation patterns in simple choice are consistent with optimal use of cognitive resources. PsyArXiv: https://doi.org/10.31234/osf.io/57v6k

      Izuma, K., & Murayama, K. (2013). Choice-induced preference change in the free-choice paradigm: a critical methodological review. Frontiers in psychology, 4, 41.

      Jang, A. I., Sharma, R., & Drugowitsch, J. (2020). Optimal policy for attention-modulated decisions explains human fixation behavior. bioRxiv: 2020.2008.2004.237057.

      Polania, R., Woodford, M., & Ruff, C. C. (2019). Efficient coding of subjective value. Nature neuroscience, 22(1), 134-142.

      Shenhav, A., Musslick, S., Botvinick, M. M., & Cohen, J. D. (2020, June 16). Misdirected vigor: Differentiating the control of value from the value of control. PsyArXiv: https://doi.org/10.31234/osf.io/5bhwe

    3. Reviewer #1:

      The authors report a model about the confidence-effort tradeoff; explaining how subjects invest effort depending on how confident they want to be in their decision (and how costly this is). They fit their model to behavioural data and report qualitative similarities between model and data.

      I find this an interesting model, with interesting links between timely topics of interest, such as confidence, effort, and cost optimisation. But I have several requests for clarification.

      Major Comments:

      Line 274: Without loss of generality: what does it mean here? I guess that with a different cost function, not all conclusions remain the same?

      The model assumes that it is "rewarding" to choose the correct (highest-value) option (B = R*P). But is this realistic? If the two options have approx the same value, then R should be small (it doesn't matter which one you choose); if the options have different values, it is important to choose the correct one. Of course, the probability P_c continuously differentiates between the two options, but that is not the same as the reward. Can the predictions generalise toward a more general R that depends on value difference?

      In Figure 2, I guess that the important quantity to decide is a standardised delta-mu (similar to d' in signal detection theory). It might be useful to also plot that (essentially combining the current two plots). Or alternatively, plot P_c(z), which relates more directly to the theory.

      The section Probabilistic model fit is unclear. Are the MCD variables y the 5 variables mentioned above? Do different y's share the same alpha, beta, gamma? Are different transformation parameters a and b fitted for each y? Is estimation done per subject? It is mentioned that VBA is used, but what distribution is approximated exactly using VBA? Is it a mean-field approximation, optimised with gradient descent? Is the goal function a posterior across the 5 parameters? It would also be good then to have an intuition on the estimated model parameters (e.g., their standard error or Bayesian equivalent). Is there an estimate of model fit (in addition to checking qualitative predictions)? Figure S3 is a good start (and I think it is worth putting in the main MS), but it would be nice, for example, to see model comparisons where one or more parameters are restricted.

      Figure 4, 5, 6 should be better annotated. I have a hard time trying to fill in what is plotted exactly (eg, scale of the color bar). Why are the data grouped in percentiles? Also in Figure 4 legend, I guess that "beta" is not used as the MCD model parameter? Please avoid overloading definitions.

      Figure 7: It seems that "spreading" of alternatives occurs in the model only for alternatives that are initially close together? Is this consistent with their discussion around equation (14)? (I may be overlooking something; if so, consider making this more explicit.)

      I find it a really interesting feature of the model that it can explain spreading of alternatives from a statistical perspective. So I think it's worth commenting on it in the Discussion. For example, does the current model capture trends in the literature? To what extent is the effect (also in empirical data) dependent on initial value differences?

    4. Summary: This manuscript addresses a timely subject: the role of cognitive control (or mental effort) in value-based decision making. While there are plenty of models explaining value-based choice, and there is a growing number of computational accounts concerning effort-allocation, little theoretical work has been done to relate the two literatures. This manuscript contributes a novel and interesting step in this direction, by introducing a computational account of meta-control in value-based decision making. According to this account, meta-control can be described as a cost-benefit analysis that weighs the benefits of allocating mental effort against associated costs. The benefits of mental effort pertain to the integration of value-relevant information to form posterior beliefs about option values. Given a small set of parameters, as well as pre-choice value ratings and pre-choice uncertainty ratings as inputs to the model, it can predict relevant decision variables as outputs, such as choice accuracy, choice confidence, choice induced preference changes, response time and subjective effort ratings. The study fits the model to data from a behavioral experiment involving value-based decisions between food items. The resulting behavioral fits reproduce a number of predictions derived from the model. Finally, the article describes how the model relates to established accumulator models of decision-making.

      The (relatively simple) model is impressive in its apparent ability to reproduce qualitative patterns across diverse data including choices, RTs, choice confidence ratings, subjective effort, and choice-induced changes in relative preferences successfully. The model also appears well-motivated, well-reasoned, and well-formulated. While all reviewers agreed that the manuscript is of potential interest, they also all felt that a stronger case needs to be made for the explanatory power of the model, and that the model should be embedded more thoroughly in the existing literature on this topic.

    1. Reviewer #2:

      This is a nice study that is clearly written and makes use of several datasets. The authors show that a gene signature associated with increased myelopoiesis in utero is associated with increased risk of pediatric asthma. Furthermore they show that cord blood serum PGLYRP -1 is associated with reduced risk of pediatric asthma and increased FEV1/FVC. Interestingly sIL6ra which is derived from neutrophils but not associated with neutrophil granules did not show any association with pulmonary outcomes. This suggests that it is the neutrophil granules rather than the neutrophils per se that are the problem association. The following should be addressed:

      1) While the manuscript is clearly written, the message regarding PGLYRP-1 is at times confusing. The manuscript is clear that PGLYRP -1 is inversely associated with mid childhood asthma risk. The discussion however refers to animal models where PGLYRP -1 is proinflammatory and is associated with increased airway resistance and allergen sensitization. The apparent disparity should be clarified.

      2) What is the proposed role of neutrophil degranulation in the pathogenesis or long term susceptibility to asthma?

      3) While it was not the focus of the current study and maybe beyond the scope of the data it would be interesting to know if there is any association with the subsequent development of adult asthma.

    2. Reviewer #1:

      This paper attempts to explain perinatal risk factors and the associated risk of developing pediatric asthma in the mid-childhood and early teenage years. The authors found that some maternal characteristics such as atopy, BMI, race/ethnicity and demographics such as newborn sex, and birth characteristics such as birthweight, gestational age, and mode of delivery were associated with risks of subsequent asthma development in the pediatric population. The paper then goes on to demonstrate the differences in immune response during the different time frames of pregnancy. Throughout the majority of the pregnancy, fetal hematopoiesis generates mostly lymphoid and erythroid lineages. Towards term, the immune cells are predominantly neutrophils and monocytes. Pre-term is characterized primarily by lymphocytes. It was seen during term deliveries that the myeloid response produces several cytokines that shift CD4+ T- cells away from the Th2 response. Enhanced production of IFN gamma by leukocytes stimulation early in life is associated with reduced susceptibility to infections. However, the author states that these findings do not extend to asthma diagnosis in childhood.

      Major comments:

      I would have liked the paper to readjust the introduction; a lot of emphasis is placed on IFN/infection/asthma, but after this fact, it seems neglected going forward and the paper explores another topic. Instead, the paper's focus was on determining the biological nature, serologically, with a granulocytic luminal marker (PGLYRP-1) and a membrane-bound marker (sIL6Ra) and its association to pediatric asthma.

      The take-home message for the paper - that there appears to be an inverse relationship between serum levels of PGLYRP-1 and overall risk for pediatric asthma - should be explored in relation to the whether a therapeutic role for such proteins is possible since they can accurately predict risk factors for disease and assess pulmonary function. Other proteins, like the sIL6Ra, have no association with disease predictability and have no association with predicting pulmonary outcomes. This should be explored/explained in greater detail.

      Minor comments:

      As part of the validation efforts of the study - the rationale for using three different cohorts to assess pediatric asthma risk was not clearly explained.

      One of the main findings of the analysis was the conclusion that patients with higher levels of myeloid cells in their CBMCs are at lower risk of developing pediatric asthma, and vice versa. Furthermore, CBMC neutrophil abundance was negatively associated with the number of risk factors. (patients with more risk factors, as mentioned above, were found to have lower levels of neutrophils in their CBMC, and more at risk of pediatric asthma). This was further elucidated with measuring CBMC plasma levels of PGLYRP-1 with levels of mRNA and correlating it with risk of developing pediatric asthma. Increased levels of mRNA for the PGLYRP-1 protein was associated with an increased serum concentration of the protein. However, this was inversely correlated with risk factors. Patients with reduced risk factors for development of pediatric asthma were found to have increased levels of the protein and its mRNA.

      The measurement and correlation of PGLYRP-1 (present in neutrophil specific granules) and sIL6Ra (derived from neutrophils, but not present in granules) to pediatric asthma at mid-childhood and early-teen years was determined. There were two follow-up points where asthma outcomes and pulmonary function by way of the FEV1/FVC ratio was determined. It was found that increased levels of PGLYRP-1 were significantly associated with current asthma at mid-childhood. However, there was no association between levels at the early-teen follow-up.

      In terms of correlations between each protein level and pulmonary function - the sIL6Ra protein was NOT associated with the FEV1/FVC ratio or a bronchodilator response at either age group. However, it was found that increased levels of PGLYRP-1 were associated with an INCREASED FEV1/FVC ratio (not indicative of asthma) and reduced odds of developing pediatric asthma at each age group.

      This analysis makes sense as increased production of neutrophil granules, PGLYRP-1, serves a protective effect against infection, reducing incidence of disease states. The paper, however, should explore the rationale behind the no-response to the sIL6Ra protein. In terms of understanding, since this protein is NOT associated with neutrophilic granules, it can be inferred, that is it may not have a role in protecting against infection. However, this could have been explored in more detail in the paper.

    3. Summary: This is a nice study that is clearly written and makes use of several datasets. It attempts to explain perinatal risk factors and the associated risk of developing pediatric asthma in the mid-childhood and early teenage years. Identified among maternal characteristics that were associated with risks of subsequent asthma development included atopy, BMI, race/ethnicity and demographics, birth characteristics, and mode of delivery. The paper then goes on to demonstrate the differences in immune response during the different time frames of pregnancy. Most notably, a gene signature associated with increased myelopoiesis in utero is associated with increased risk of pediatric asthma. Furthermore they show that cord blood serum PGLYRP -1 is associated with reduced risk of pediatric asthma and increased FEV1/FVC. Interestingly sIL6ra which is derived from neutrophils but not associated with neutrophil granules did not show any association with pulmonary outcomes. This suggests that it is the neutrophil granules rather than the neutrophils per se that are the problem association.

    1. Reviewer #3:

      The results of this study suggest that maternal loss alters the HPA stress axis in wild chimpanzees, but these effects are transient and are not evident later in life.

      Overall the study is the result of much careful fieldwork. The number of cortisol samples is impressive and these are robustly analysed. The conclusions are carefully and thoroughly discussed.

      I have very few comments, in part because I am not a specialist in stress hormones and so cannot fully assess the laboratory analysis or interpretation, but in part because my view is that this is a high-quality thorough study and a well-written manuscript.

      My only major point is that I am aware that measurement of cortisol is difficult in the wild. It is possible to inadvertently measure metabolites other than cortisol, and the most robust way to measure cortisol is using a challenge and subsequent measurements. While I cannot adequately assess this aspect of the manuscript, I think it is important that the other reviewers/editor ensure the hormone measurements are appropriate.

    2. Reviewer #2:

      The paper submitted by Girard-Buttoz and colleagues asks whether and how early maternal loss affects cortisol levels and diurnal slopes among wild chimpanzees at Tai Forest, Côte d'Ivoire. The major claim of the paper is that, like humans, chimpanzees experience altered HPA functioning after maternal loss, including alterations to both diurnal slope and overall cortisol levels. However, their chimpanzee orphans exhibited patterns in diurnal slope that were opposite to their predictions (predicted blunted slopes, observed steeper slopes). The authors should be commended for their efforts in collecting a large number of samples for this analysis. However, I am not convinced that it is sufficient for investigating the hypotheses put forth here and, therefore I am also not convinced that their results are solid. I also have concerns about the theoretical grounding for the paper.

      1) My principal concerns with this paper, as written, revolve around the methods/results. First and foremost, I am not convinced that the authors have the sufficient sample size to evaluate the predictions/hypotheses outlined in the introduction. While 849 urine samples is a large number, and again, their efforts here should be commended, the sample spread is actually quite thin once it is spliced up into appropriate categories, especially considering how many samples were collected per individual year, on average. As the authors indicate throughout and especially when describing their modeling approach, cortisol is inherently a very noisy hormone impacted by myriad factors- including age in at least one other densely-sampled chimpanzee community. I'm also surprised that time of day was modeled quadratically. It is my understanding that humans, other populations of chimpanzees, and other mammals follow a sigmoidal curve which should be modeled with a third-order term as well. For these reasons, it's difficult to tell whether model 1A is not significant because of insufficient sample or a true lack of predictive power. Additionally, I'm concerned that the paper seems to focus so much on the results from a single model term in a model that did not reach significance.

      2) Despite acknowledging that the "significance of these predictors should be interpreted with caution" because model 1a did not reach significance, the authors make very strong claims about the results in the discussion- and also feature the finding of that model in the title of the paper. That seems problematic to me- especially because the insignificant model results (more intense diurnal slopes among immature orphans) diverge from the expectations set forth by other works in humans and non-humans. The finding that this is to do with higher-than-expected morning cortisol is puzzling given that evening levels are generally considered more responsive or plastic. However, this could also be an artefact of fitting the models without the third-order term for time.

      3) The introduction needs refinement to help clarify and specify the authors' arguments.

      (a) Does the biological embedding model always lead to negative fitness outcomes? Or is it possible that phenotypic adjustments might be adaptive, or even just making the best of a bad job (e.g. earlier death, but not death today)?

      (b) Throughout the introduction it is unclear whether and where the authors refer to the human clinical literature as opposed to animal literature. It is also unclear how human patterns are similar versus different from those observed in animals. Further, I would recommend that the authors include a deeper review of the animal literature (e.g. early experimental work with macaques, cortisol at other chimpanzee field sites/captivity). It's also unclear whether and where the authors refer more broadly to early life adversity (and what this means for humans vs. animals) versus more specifically to maternal loss. Additionally, there should be further discussion specifically related early maternal loss (rather than "early life adversity" which can include a lot of different factors) focused on the nutritional and social obstacles associated with early maternal loss, how these related to HPA functioning, and how these effects are expected to change during development (Plasticity? Flexibility? The role of HPA in responding to changing environmental conditions?). What about the adaptive calibration model which posits that the HPA can readjust during particular periods of developmental reorganization?

      4) It is difficult to assess the discussion without first dealing with the problems in the introduction/methods. However, despite their claims in the results section, it does not seem that the authors interpreted the results of model 1a with caution.

    3. Reviewer #1:

      A very interesting paper testing the biological embedding model in a wild long-lived mammal using an impressive dataset. However, the results for immature orphans are not entirely straight forward. The effect on the HPA axis is in the opposite direction to humans and there seems to be no significant increase in cortisol compared to non-orphans overall - it depends on time since maternal loss. The paper would be improved by communicating this more clearly and discussing exactly why this pattern may be different to that in humans. Some of the evolutionary ideas discussed in the paper also need to be more clearly conveyed or thought through.

      Substantive concerns:

      1) There are important sections in the introduction (L125-128 particularly) and discussion (L403-409) about the evolution of the HPA response and differences between humans and other mammals that are unclear. Greater detail on the evolutionary logic being used, the precise hypotheses being suggested and references to back the ideas up are required (further details in minor comments).

      2) Table2/Model 1a doesn't directly test whether orphans have higher cortisol than non-orphans (or no p-value reported in table 2) and CIs in table 1 suggest that there is not a significant difference. Therefore, categorical statements that orphans have higher cortisol levels don't seem to be entirely justified. However, model 1B demonstrates that cortisol declines with years since maternal loss and figure 3 supports the idea that orphans do have higher cortisol than non-orphans in the first 2 years following maternal loss but that this declines to levels similar to those of non-orphans after 2 years. Could a statistical test be run to back this up? Perhaps instead of using a binary variable for orphan status (yes/no) it could be analysed as categories (orphaned within 2 years, orphaned more than 2 years ago, not orphaned as an immature) which could be used to directly test this and back up statements e.g. recently orphaned immatures had higher cortisol levels than non-orphans. A broader concern is why likelihood ratio tests have been used to calculate p values (and for only some of the predictors) rather than reporting the output from the models themselves. Could you explain what the benefit of this is over reporting values from the actual models and/or also provide the model outputs?

      3) The effect on cortisol slopes found in this study is in the opposite direction to that in humans. This is discussed in some detail but is lacking clarity in places and I think it would help to make this difference more obvious - it is really a key finding of the paper not a secondary point. The expected pattern is very nicely set out in the introduction so it would be good to format the discussion so there is a paragraph that outlines exactly how the results differ from hypothesized:

      (a) that the effect on cortisol slopes is in the opposite direction

      (b) that only the cortisol levels of recently orphaned immatures are significantly different to non-orphan immatures and then brings in the ideas discussed about why these differences may be present. I think this would really help communicate the findings more clearly, bringing the discussion more inline with what is set out in the introduction.

    4. Summary: This paper tests the biological embedding model by asking whether and how early maternal loss affects cortisol levels and diurnal cortisol slopes among wild chimpanzees at the Tai Forest, Côte d'Ivoire. The results suggest that maternal loss alters the HPA stress axis in wild chimpanzees, but these effects are not visible later in life. Authors suggest that the lack of a later life association between maternal loss and cortisol levels may be due to selective early mortality of individuals with high cortisol levels but did not provide any survival or behavioural data to show that orphans and non-orphans differ in any fitness-related traits other than cortisol. Furthermore, the association between cortisol and the HPA axis is in the opposite direction to that observed in humans and there seems to be no significant increase in cortisol in orphans compared to non-orphans. Overall, the study is the result of extensive fieldwork, the number of samples collected is impressive and the subject is very interesting.

      The analyses will benefit greatly if the authors use effect sizes and confidence intervals for inferences instead of p-values. This may solve the significance threshold issues. Moreover, the reliance on p-values seem to limit the value of the data. For example, authors suggest that results from model 1 should be treated with caution because the full model is not significantly different from the null model, but by relying on it as the key finding of the study without exploring effect sizes, it does not seem that they did exercise sufficient caution.

    1. Reviewer #2:

      The manuscript addresses an interesting question: whether genetic effects of common variants on educational attainment (EA) differ between individuals with and without psychiatric diagnoses. The dataset they use is ideally suited for such an analysis. The authors find evidence that the influence of common variants on EA is attenuated in individuals with a diagnosis of autism spectrum disorder (ASD) or ADHD.

      My main concern with the paper is the statistical analyses used to support the authors' conclusions. The authors draw conclusions from dividing individuals into subgroups and comparing the R^2 of the EA PGS between those subgroups. This analysis is liable to bias due to range restriction: if the subgroups have been selected based on low/high education, then the R^2 of a predictor will tend to be lower in the subgroups than in the overall sample. Furthermore, here the selection into the subgroup (here diagnosis with ASD or ADHD) itself is related to both education and the EA PGS, which could be contributing to the differences in R^2 the authors observe between subgroups.

      A more powerful and robust analysis would be to fit an interaction model in the full sample. The authors could regress individual's EA jointly onto their EA PGS, their diagnoses coded as binary variables, and the interactions between the EA PGS and the diagnoses codings. The authors could do this jointly for all diagnoses in the full sample, which would account for comorbidities between psychiatric disorders. If the influence of the EA PGS is truly weaker in ASD and ADHD cases, there should be a negative interaction effect between the EA PGS and ASD and ADHD diagnoses, which can be tested with a simple statistical test for a non-zero interaction effect.

      It could also be worth first regressing the EA PGS onto the psychiatric diagnoses, and taking the residuals before assessing whether there are interactions between the EA PGS and ADHD/ASD diagnosis. It is possible that correlation between the EA PGS and ADHD/ASD diagnosis could generate a spurious interaction effect in the above analysis.

      It is interesting that controlling for SES appears to mediate the (potential) interaction between EA PGS and ADHD diagnosis. However, I worry again that this could be a function of SES influencing ADHD diagnosis. SES and its interaction with both EA PGS and ADHD diagnosis could also be included in a full interaction model that could help interpret this finding.

      The authors construct the PGS by using a pruning and thresholding approach. This is known to be suboptimal, which may explain why their R^2 is lower than in other studies. The authors could use LD-pred or other methods that account for linkage disequilibrium and non-infinitesimal genetic architectures. In the EA GWAS from which the score was constructed, the best R^2 was found by applying LD-pred to all variants without p-value thresholding.

      The hypothesis that indirect genetic effects differ between psychiatric cases and controls is interesting. Do the authors have sufficient sibling data within their samples to test this?

      Line 581: Closely related individuals were removed from the analysis. Why? How many were removed? Could inclusion of these help assess the hypothesis about indirect genetic effects and improve power? The authors could use a mixed model regression to control for relatedness without having to throw individuals out of their sample.

      The grammar in the writing of the paper is a little odd at times. Often, definite or indefinite articles are omitted preceding nouns, such as in 'association of EA-PGS' in the abstract, which should be 'association of the EA-PGS'.

      line 54: 'strongly influences', I think this is a little overconfident in its assignment of causality to highest level of education, perhaps 'strongly associated' would be better

      Paragraph 3 of the introduction: the authors should mention population stratification and assortative mating as possible mediators of the association between EA PGS and EA, especially when referencing the drop in association strength in within-family designs

      I found the decile based analyses a bit pointless. By arbitrarily dividing a continuous outcome into discrete subgroups, the authors are losing power and not gaining much compared to simply performing linear regression, which they already do. I would relegate these to supplementary figures.

      Line 452: I think that the stated equivalence between low EA PGS and learning difficulties goes a bit too far here. I understand the point the authors are trying to make, but I think it should be phrased more carefully.

      The authors used an MAF threshold of 5% for construction of the score. Typically, a threshold of 1% is used for construction of PGS from summary statistics by software such as LD-pred.

      Line 580: the authors state that an EA PGS based on summary statistics from European samples cannot be used to predict EA in non-European samples. This is not true. It is true that the prediction accuracy is attenuated, but it is not zero.

    2. Reviewer #1:

      This is overall a well written and methodologically sound study researching how educational achievement can be predicted using genomic data when the sample is stratified to those without and those with diagnoses of common psychiatric disorders. I think that it is a very important study area, the study is well powered using a fantastic representative sample and offers some insights into aetiology of associations between psychiatric traits and educational achievement.

      I suggest some minor adjustments for the authors to consider, mainly addressing the conclusions and implications of the findings. I also recommend some clarifications in the methods and the results sections; these suggestions might require some very modest additional analyses and rethinking/rewording some of the conclusions.

      • The major issue I have is that you discuss family SES as a purely environmental factor throughout the manuscript. However, we know that this is not the case and that there is substantial heritability for SES. It follows from what SES composite is made out of, in your case parental education and occupation, both of which are highly heritable (as you rightly note in the manuscript yourself). This needs to be addressed and discussed throughout the manuscript.

      • The major conclusion in the manuscript, even if you acknowledge that this is speculation, is that the attenuation of the association between EA-PGS and school grades after correcting for SES can be explained by genetic nurture. I agree, this can be one of the explanations, however, here you also control (partially) for transmitted genes, that is educationally related genetic variants present in both generations (so without genotyped trios here you cannot distinguish between direct and indirect genetic effects). In addition, this attenuation can also be explained by gene and environment correlation (not only passive which is addressed by genetic nurture hypothesis) but also active and evocative rGE. In addition, in your design, you need to consider assortative mating. I suggest directly addressing this in the manuscript.

      • I also think that you should address that you are dealing with diagnosed disorders only. It is a great strength of the paper, and you are using a fantastic resource, but we know that these disorders are quantitative traits and your study does not allow to take that into account, so there are possibly individuals with high ADHD symptoms are included in the control group; similarly, you cannot take into account the symptom severity. In terms of symptom level data, I see you have referenced Selzam et al., 2019 paper that, among other things, related EA-PGS to ADHD symptoms and vice versa, and also controlled for SES.

      • In the introduction, you rightly state that individual differences are explained by genetic and environmental factors and the interplay between them, however, I suggest rephrasing it, because "much of the variance can be explained" is incorrect, all of the individual differences can be explained by the combination of these factors.

      • You report low rG between schizophrenia and E1, can you specify how this was calculated

      • You state that your prediction in the control sample is lower than the other studies and offer a possible solution of the inclusion or exclusion of 23andMe data in the summary statistics, please note that other studies have not used 23ndme statistics either (for example TEDS publications). You also discuss genetic heterogeneity; I think that the difference can be explained by both genetic and environmental heterogeneity. What is the rG between EA in your sample and GWAS sample?

      • I think that the conclusion that the impact of low EA-PGS is comparable to the impact of ADHD is too strong, your data does not support this strong conclusion. I suggest rephrasing it, especially as we're not aware of the associated mechanisms. Note that people with ADHD in your sample also have lower EA-PGS compared to control conditions. In addition, symptom severity of ADHD varies greatly.

      • I also do not agree with the statement that having wealthy parents does not boost the performance as much for children with ADHD as compared to children without for the reasons mentioned above.

      • I think that you have fantastic data, and you have data available about how many of your participants have multiple diagnoses. I suggest adding a stratified group with multiple diagnoses to the analyses, that is adding groups with 2, 3 or 4 and more psychiatric diagnoses and checking their polygenic score prediction to EA.

      • I suggest making it clearer what covariates were used in every analysis (you say first that you added psychiatric diagnoses as covariate among the usual covariates, but later only that covariates were included 'as before', I assume you did not include diagnoses in later analyses, but this is not clear). In addition, it is not clear to me why you control for psychiatric diagnoses in the first set of analyses, I would have wanted to see full results without this covariate.

      Overall, this is a beautiful study and it was a pleasure to read/review it.

    3. Summary: This is an interesting study researching how educational achievement (EA) can be predicted using genomic data when the sample is stratified to those without and those with diagnoses of common psychiatric disorders. The study is well powered using an impressive and representative sample and offers insights into the etiology of associations between psychiatric traits and educational achievement. The authors find evidence that the influence of common variants on EA is attenuated in individuals with a diagnosis of autism spectrum disorder (ASD) or ADHD.

  4. Dec 2020
    1. Reviewer #3:

      In this paper the authors have developed a system to simultaneously generate two-, three- and four-photon fluorescence excitation from a single laser line and then proceed to apply this system to a number of turbid biological imaging applications to highlight its capabilities. Using a customised commercial La Vision BioTec Trimscope, they have incorporated a high powered fiber laser source with an Optical parametric amplifier and dispersion compensation to generate a either 1330nm or 1650nm laser lines with high peak pulse energies at low pulse repetition rates. They then compare the relative capabilities of each laser line in terms of number of fluorescence emission channels measured (skin tumour xenografts), fluorescence bleaching analysis and functional toxicity thresholds and fluorescence signal attenuation (excised murine bone).

      Whilst the paper is well written, the concept of utilising high laser peak powers and at low repetition rates to generate 3PE and 4PE at spectral excitations at 1300nm and ~1650nm is not new and has been presented previously (Cheng et al. 2014), as referenced by the authors. The authors have however gone into more detail and presented a number of comparative excitation approaches to compare and contrast low-duty-cycle high pulse-energy infrared with the more common high-duty-cycle low pulse energy near-infrared alternative. The benefits of higher order multiphoton microscopy when combined with higher wavelength excitation allows deeper imaging and more localised fluorescence excitation with reduced phototoxic and photobleaching effects per excitation pulse. One of the major issues associated with generating 4PE is that since higher pulse energy is required, this further reduces the repetition rate of the laser source, in order to reduce the average laser power in order to avoid sample heating effects. This in turn leads to much longer acquisitions and is limited by the fluorophore saturation particularly since they are using single beam excitation.

      Major comments:

      1) It seems as though when you take into consideration duty cycles, fluorescence saturation, water absorption effects and longer acquisition times, which lead to greater phototoxicity, 4-PE at 1700nm excitation is not appropriate for most dynamic biological applications where acquisition speed and/or continued image acquisitions are the key factors. Could the authors comment on this?

      2) How long does it take to acquire a single frame with four-photon excitation at 1700nm? In none of the data sets was frame time mentioned in particular when acquired 3D data sets. Can the authors ensure that these times are mentioned both in the main text and the figures containing images.

      3) In line 131 and figure 3d the authors present data showing relative axial resolution measurements. Are these features measured diffraction limited and how do they know? They are clearly not measuring like for like structures (different fluorescent species) so do not think this can be used as a measure of resolution. Can the author provide other resolution measurements?

      4) In line 140 - 142 the authors present data showing the advantages of THG at 1650nm over other excitation lines. Aside from the excitation wavelength could this data be explained by the greater absorption and scattering at the emission wavelengths generated at these laser lines?

      5) In figure 3A and 3C the SNR for 1650nm increases whilst for 1300nm and 1180 excitation this decreases. Is this simply due to more of the exciting fluorophore species residing deeper into the tissue?

    2. Reviewer #2:

      Nonlinear microscopy is in the unique position that high-resolution images of cells and other tissue components can be obtained in live tissue. However, scattering and absorption limit the penetration depth. The impact of nonlinear microscopy in biomedicine and biology would be much improved if higher imaging depths can be achieved. Lately a few key studies have appeared achieving this. This manuscript contains a well-motivated extension of this research, in particular on the benefits of high-pulse-energy low-duty-cycle infrared excitation near 1300 and 1700 nm over 2-photon excitation, in heterogenous and dense tissue. The authors compare three types of excitation, at 1650 and 1300 nm at 1 MHz and at 1100 (or 1270) nm at 80MHz. They characterize photodamage in the tissue and determine the limits for power densities to stay below that. They study the achieved resolution at high depth for each of the processes and show a deeper imaging depth is resolved in bone and tumor core with 3P and 4P than with 2P. The article is a very solid and extensive study.

      Though I have no major concerns with article, I do have some minor points:

      l.57: Are the resolutions reported for 2-, 3-, or 4- photon processes. Do you not expect these to differ for the different processes? l.60 It is not explained that power is increased from X to Y, instead the peak power of 87 nJ in L 67 is not found back in fig. S2.

      L. 103 Given is the power at the sample surface, after which the readout for cell stress via Ca imaging is done (very elegant). Is not the imaging depth of the readout relevant too, as it is probably the power density at the focus which matters. What imaging depths can be reached with this low power? This comes back later, but would be good to mention here.

      L.110 The phrase 'Furthermore' confuses me. I guess the authors mean to say that with their 2.8-8.7 nJ of power they were well below the 100 mW level? Which is kind of obvious at 1 MHz?

      L. 126 Some words are missing, 'but 1180'.

      Why do some signals show a peak in intensity in fig. 3C and G rather than a slope?

    3. Reviewer #1:

      In this manuscript, the authors show they can accomplish imaging in complex specimens using 3- and 4-photon excitation, deeper in the specimen than comparable optics can accomplish with 2-photon excitation laser scanning microscopy. This is a clear advantage for imaging optically hostile specimens such as cultured organoids or spheroids, or in challenging in vivo settings. I am excited about these findings, but I am not at all supportive of the current version of the manuscript being used to present these lovely findings.

      There are two strong reasons for my opinion:

      i. The manuscript presents the findings in a manner that will only be understandable by the readers who are familiar with the topic, and who are likely to already have heard of the capabilities of 3- and 4-photon excitation to image deeper into specimens.

      ii. The results are not presented in a way that the large body of potential readers can understand. They will be unable to grasp the way that the experiments were performed, or understand what the figures are showing, or critically evaluate the results that are presented.

      Thus, there is a disconnect between the quality of the work and the quality of the presentation. There are many areas of quantitative imaging and intravital imaging that are well known to those that know about them (or use them), and that are a complete mystery to the vast majority of those that don't know about the tools or use them. The authors must take this as an opportunity to reach the many workers that could benefit from this powerful approach, rather than writing for the group that already knows (and even uses) the approaches presented.

      1) Provide needed background and present important things first. The authors should give the reader a clear view into the issues in imaging biological tissues with the longer wavelengths that are used for confocal laser scanning microscopy (CLSM) and for two-photon laser scanning microscopy (TPLSM). There are several factoids presented, all seemingly true, but not presented in an accessible manner. Rather than starting with a mention of the expected temperature rise due to the dramatically higher absorbance by water of 1300nm and 1700nm light, the paper first presents the major absorbance of the light (~2/3 loss) and that this isn't a problem because there is sufficient laser power. For most readers, the need for a larger laser won't be their first question; instead it will be the viability after/during the imaging session. The expected temperature rise, and an indirect mention of burn marks (!), comes at the end of the section.

      2) Explain and perform cell viability tests. Calcium imaging for assessing tissue viability is not the technique of choice for most readers, and is presented in a way that assumes general knowledge that simply does not exist. Membrane patency assays using membrane-impermeant DNA dyes, or other live-dead assays are far more common, but not presented in this study. I am not insistent that the authors use any particular assay, but I am insistent that the authors present the need for viability assay(s), teach the reader the principles of the assay(s) used, and present the results in an understandable manner.

      3) Present the finding and the figures in an accessible manner. The figures are simply not digestible by the readers who do not perform this sort of work, and the legends do not help sufficiently. For those of us who do perform work of this sort, the figures are not as convincing as they should be, or presented in a way that they can be critically evaluated.

      Consider the legend for Figure 1: "Microscopy with simultaneous 2-, 3- and 4 photon processes excited in fluorescent skin tumor xenografts in vivo. Representative images were selected from median-filtered (1 pixel) z-stacks, which were taken in the center of fluorescent tumors through a dermis imaging window. a) Excitation at 1300nm (OPA) in day-10 tumor at 145 μm imaging depth with a calculated 3.3 nJ pulse energy at the sample surface, 24 μs pixel integration time and 0.36 μm pixel size. For calculation of pulse energy at the sample surface see Figure S3. b) Excitation at 1650 nm (OPA) in day-13 tumor at 30 μm depth with a calculated 6.3 nJ pulse energy at the sample surface, 12 μs pixel integration time and 0.46 μm pixel size. c) Excitation at 1650 nm (OPA) in day-14 tumor at 85 μm depth, with a calculated 5.4 nJ pulse energy at the sample surface, 12 μs pixel integration time and 0.46 μm pixel size. Cell nuclei containing a mixture of mCherry and Hoechst appear as green."

      If I gave any of the figures and legends to the people in my lab, the half that don't do multiphoton imaging (but that have sat through many lab meetings) would just hand them back to me with quizzical expressions on their faces.

      The figures are not as compelling as the results, and defer to the body of the paper to explain what was done or what was shown, and assumes that the average reader remembers the differences between OPO and OPA , for example (which they won't). The power plots showing nJ and mW in Figure 3 are inaccessible to most readers, and not well described.

      I should mention that the figures, legends and text are not satisfying for the readers who are familiar with 2-, 3- and 4-photon imaging either. These are fantastic findings, and deserve figures that are as lovely as the results, and are compelling. Some of these issues are due to typos: "Consistently, multiparameter recordings were achieved inside the tumor at 350 μm depth using excitation at 1650 nm and 1300 nm, but 1180 nm (Figure 3b). "

      However, the greater problem is that the text doesn't present the findings in a straightforward, convincing fashion and then interpret them. Instead, the conclusion often leads the evidence: "In line with an improved depth range, the signal-to-noise ratio (SNR) of 3PE TagRFP outperformed the SNR of 2PE TagRFP at depths beyond 150 μm (Figure 3c). Because H2B-eGFP expression in HT1080 tumors was very high, 3PE eGFP emission reached the highest SNR."

      The legend and figure that it describes should be able to stand on their own, and convince a skeptical reader with the help of the text in the body of the manuscript.

      In summary, these are lovely and important results that I am excited about. They are presented in a fashion that will make it difficult for most to appreciate because the body of the paper is not fashioned to teach the reader, and the figures themselves are challenging, and the legends inadequately present what is shown in the figures. Careful expansion and editing should resolve all of these issues and make the manuscript into the presentation these excellent findings deserve.

    4. Summary: Nonlinear microscopy is in the unique position that high-resolution images of cells and other tissue components can be obtained in live tissue. However, scattering and absorption limit the penetration depth. The impact of nonlinear microscopy in biomedicine and biology would be much improved if higher imaging depths can be achieved. In this manuscript, the authors show they can accomplish imaging in complex specimens using 3- and 4-photon excitation, deeper in the specimen than comparable optics can accomplish with 2-photon excitation laser scanning microscopy. Using a customised commercial system, the authors have incorporated a high-powered laser source with an OPA and dispersion compensation to generate either 1330nm or 1650nm laser lines with high peak pulse energies at low pulse repetition rates. They then compare the relative capabilities of each laser line in terms of number of fluorescence emission channels measured (skin tumour xenografts), fluorescence bleaching analysis and functional toxicity thresholds and fluorescence signal attenuation (excised murine bone).

      This is a very interesting study with some potentially important findings from a technical perspective. However, there is a disconnect at present between the quality of the work and the quality of the presentation. There are many areas of quantitative imaging and intravital imaging that are well known to those in the direct field, and that are a complete mystery to the vast majority of those that are not. It would therefore be highly beneficial to restructure the manuscript in such a way that the findings can reach the many researchers that could benefit from this powerful approach rather than the few who already use it.

    1. Reviewer #2:

      The authors quantify virulence factors in Cryptococcus neoformans and C. gattii in a large number of clinical isolates and correlate these virulence factors to survival in a g. mellonella infection model and to the clinical outcome. The authors found a correlation between secreted laccases and disease outcome in patients. In addition, the authors show that a faster melanization rate in C. neoformans correlated with phagocytosis evasion, virulence in the g. mellonella model and worse prognosis in humans.

      The manuscript is well structured with an appropriate abstract summing the main findings, a clear introduction, well described methods section and appropriate number of figures and tables. The results are clearly described.

      1) The authors identify and acknowledge the most important limitation of the study: line 365-366 the patients were treated with different regimens in distinct health services. This reviewers agrees this is a limitation. However, to get a feeling about the impact of these differences the authors should indicate how the patients were treated and whether there were differences in patients that died and survived. Without this information clearly presented, I cannot interpret the correlations between virulence factors and outcome found in this study. Perhaps the authors can show how many patients that were included in the phenotype-survival analysis, that died and survived were treated according to Brazilian guidelines.

      2) The melanin production evaluation assay is an important tool that the authors use in this study and the measurements from these assays were correlated with G. mellonella and patients survival and thus are essential to the conclusions of the study. The method is well standardized, and the authors show elegantly that the outcomes are highly reproducible. Can the authors describe when melanization occurs: does it occur in mature colonies and may growth rate itself may influence the measurements? Do isolates with a high growth rate/colony maturation have a low T-HMM or high melanization Top. Have the final colonies of different species have a different final cell number after 7 days incubation and how does this correlate to melanization? And how does the growth rate/ budding rate/ colony maturation/ correlate to G. mellonella survival?

      3) The figures 1-5 give a clear picture of the wide distribution and variation of virulence parameters e.g. the distribution of melanization kinetics parameters, the distribution of capsule sizes, GXM secretion and LC3 phagocytosis. But what does this distribution mean, it only shows that the isolates are not the same but does not contribute majorly to the final conclusion. Can the authors think of a way to give more meaning to these figures: e.g. indicate with colors which isolates were retrieved from patients that eventually died and which survived (although this may be inappropriate as not all clinical information is available. Figure 6 really gives meaning to the numbers displayed in figure 1-5. Perhaps move some figures to the supplementary file.

    2. Reviewer #1:

      The manuscript describes the characterization of in total 85 Cryptococcus spp. clinical isolates with regard to virulence phenotypes including a Galleria mellonella infection model for cryptococcosis. The authors determined the melanization kinetics of all strains, measured the whole-cell and extracellular laccase activity, the capsule thickness, and the concentration of the cell wall polysaccharide glucuronoxylomannan. In addition, during macrophage interaction the proportion of Cryptococcus-containing LC3-positive phagosomes for each strain was determined as well as the survival of G. mellonella after infection with selected Cryptococcus strains. Finally, regression analyses were performed to estimate the relationship between the risk of death in crytptococcosis patients and the phenotypes of the isolated Cryptococcus strains. A major finding was that the risk of death in patients with disseminated cryptococcosis increased with the level of extracellular laccase activity and the time for half-maximum melanization in the Cryptococcus isolates. This suggests that the melanization rate, more than the total amount of melanin, impacts the outcome of a Cryptococcus infection.

      General assessment:

      The study is based on carefully performed experiments. However, the scientific significance of this work is moderate. Melanin and the laccases that are involved in its synthesis are known virulence factors of Cryptococcus spp. for many years and similar studies have already been published elsewhere (e.g. Samarasinghe et al. 2018). The major new finding of the presented work is that the speed of melanization has an impact on the virulence of Cryptococcus spp. rather than total amount of melanin. The shortcoming of the manuscript is that the author's hypothesis is mainly based on regression analyses, but the final proof based on a genetically well-defined background is missing. Therefore, the study only provides little new insight into fundamental mechanisms of Cryptococcus virulence but includes associations with patients and therefore might be more suited for a journal specialized in pathogenic fungi.

      Following points should be considered:

      1) The authors show the association between faster Cryptococcus melanization and more effective evasion from host immunity. However, the author cannot totally exclude other factors that are associated with host evasion. It would be more appropriate to either create a mutant (e.g. overexpression of LAC1), which showed faster melanization in comparison to a wildtype strain or to perform multilocus sequence typing (including the LAC1 locus) to capture the genetic variation of the clinical isolates and to find come correlations with the speed of melanization. The interesting question is which genetic factors contribute to the difference in the melanization rate.

      2) The authors should critically discuss the suitability of their Galleria mallonella infection model. It is a known fact that temperature has an influence on the melanization in Cryptococcus spp.. Laccasse activity is significantly inhibited at temperatures of 37°C and higher. The Galleria model can only be used at lower temperatures.

    3. Summary: The description of how faster melanization is associated with LC3-mediated phagocytosis evasion, virulence and outcomes in humans is interesting and does provide some new information. In general, the study has been executed well, with clear articulation of the results and appending conclusions. However, the work falls short of investigating any substantive mechanistic basis for the observations and how they relate to the broader metabolism in Cryptococcus. .

    1. Reviewer #3:

      The manuscript by Shi and colleagues delineates an approach for labeling newly synthesized lipids thereby providing a method to examine how lipids move throughout the cell. The premise of this technical approach is that fluorescently labeled fatty acids are fed to a cell in the presence of another lipid which will incorporate the fluorescent acyl tail using the endogenous cellular acyltransferases. Cellular imaging is paired with this approach to show the subcellular accumulation of the lipid. As presented, the data are intriguing, but there are some concerns and questions about the technique that limits the interpretation of the data and could impact the overall utility of this approach. The authors should provide the additional requested data, and resolve the issues raised below to increase confidence that this labeling approach allows for the monitoring of physiologic lipid trafficking pathways.

      Specific concerns and questions are delineated below.

      1) The authors initially exploit the remodeling of PLs as described in figure 1a. This involves the addition of lyso-PL and NBD-labeled palmitoyl-CoA. The authors imply from their schematic in Fig 1a that they are using lyso-PLs that are being remodeled at the sn1 position by NBD-labeled palmitoyl-CoA. Unless I am missing something, lyso-PA and other related lyso-PLs are generally remodeled at the sn2 position. Additionally, there is specificity for PUFAs acylation to the lyso-PL. So I am a bit confused about the enzymes that are working in this system. I tried to determine which lyso-PLs that the authors are using, but the methods did not specify if they are using 1- or 2-lyso PLs. This should be clarified so that we can understand the enzymes the authors think are underlying the labeling reaction. On a minor, but related note, the lyso-PL in Figure 1a is missing an -OH group at the sn1 position.

      2) The authors use a cell system where the cells are starved of lipids and other metabolites for 1 hour and then fed a large bolus of lipids as substrates. It appears that the cells can remodel and label some PLs under these conditions, but it is not clear to me that this represents physiologic labeling that can be used to track the de novo labeling and trafficking into subcellular compartments. Nor can it be used to draw strong conclusions about required trafficking or enzymatic pathways under normal conditions. What happens if labeling occurs in complete media or defined media? This might help to resolve this.

      3) The labeling looks non-uniform in mitochondria as evidenced by the images in figure 2a. Why is the labeling only at the outer edge of the mito in these cells in this figure? What happens if labeling goes longer? Similarly, the authors quantify "30 cell images" or the like in the figures for Pearson correlations. How were the 30 cells selected, and since labeling across the mitochondria is not uniform, how were images selected? A much larger number of images scanned in an unbiased manner would increase confidence.

      4) Likewise, what happens if the labeling is allowed to proceed beyond 15 min. Can the authors provide a 30 min and 1 hr image?

      5) There are a number of conclusions drawn about specific pathways required for the trafficking of accumulation of labeled lipids. I realize that some of these studies are used as a specific proof-of-concept for the approach. However, there are many studies that go beyond proof-of-concept and draw conclusions about biology. Many of the studies are somewhat superficial and the conclusions reached by the authors should be tempered given that they have not deeply investigated this new biology.

    2. Reviewer #2:

      In this study, Zhang et al. use a semi-novel method to track acyltransferase activity using fluorescently labeled palmitic acid (NBC-16:0) to track where specific lipids are incorporated with subcellular specificity. They show that NBD-16:0 can be incorporated into different lipid classes that segregate based previously known on subcellular specificity. While this is an interesting technique, it is difficult to determine how much fidelity this method has in recapitulating biological function without additional experimentation, orthogonal measurements, and a more descriptive methods section.

      Comments:

      1) The authors do not specify which lysophospholipids were used in their study. In the method section they specify that they came from Avanti, but there are >100 different lPLs in their catalog. Also, the authors give a range of lPL concentrations in the methods, but do not specify which concentration was used for which experiment. Without this information and other unspecified aspect of their studies, interpreting subsequent experiments is difficult.

      2) One potential advantage of this method is that it is a method to track endogenous lipids in live cells, however the authors show the NBD-16:0 transporting to lipid species where palmitate is almost never measured. For example, the use of the transport of NBD-16:0 to CL as evidence this is working. However natural cardiolipins are almost completely devoid of 16:0. In mammalian cells >80% of the fatty acids in CL is 18:2 with most of the remaining being 18:1 and 18:3. Similarly (assuming you are using sn-1 lPL-16:0), phospholipids with two 16:0 are extremely rare in mammalian cells with the exception of lung surfactant. Further, 16:0 composes <5% of cholesteryl esters in typical cells. The authors should be clearer about how this discrepancy between natural sorting of palmitate and the sorting of NBD-16:0 supports this as an accurate model of acyltransferase activity and intracellular transport.

      3) The authors state that PA is primarily remodeled in the ER and transported to the mitochondria as a precursor to CLs (lines 108-111). This statement needs a source. In most studies I am aware of, the vast majority of both PA and 16:0 are primarily converted to TGs or PC/PE with only a small fraction going towards the CDP-DAG pathway required for CL biosynthesis. Are C2C12 cells unique in this regard? Does lPA stimulation specificity induce CL production? Does any of the NBD get into the TG or phospholipid fractions?

      4) This study would be much stronger if another fluorescently labeled fatty acid was added. A comparison of the sorting of 20:4 and 16:0 would be very informative. This is especially true if the studies were done in the context of a known acyltransferase, for example LPCAT3.

      5) This study would also be strengthened by an orthogonal technique showing similar sorting. For example, separation of the organelles and measurement of labeled fatty acids by MS or nano-SIM analysis would greatly strengthen these studies.

      6) In figure 1A, the authors draw a schematic with an sn-2 lyso-PL in the figure. Sn-2 lyso-PLs as labile and will acyl migrate to the sn-1 position without careful handling of the PL in a basic solution. The authors make no mention of this type of handling in the method section. This figure should either be corrected or more details of how they handled their lysophospholipids should be provided.

    3. Reviewer #1:

      My general assessment of this work is that it is full of good ideas and presents a novel and general approach to examine lipid remodeling in cells and perhaps subsequent transport of lipids, mainly to mitochondria, but it lacks the scientific rigor necessary to be fully confident that their conclusions firmly support their claims. Often, insufficient information about the methods are provided and the manuscript is hard to follow critically.

      More specific comments:

      1) I am surprised that acyl-CoAs are transported into cells. I don't know of any precedent for this. Usually fatty acids are imported into cells and then converted to acyl-CoAs as part of the mechanism of import. Could it be that the acyl--CoAs are hydrolysed before uptake only to be reformed inside the cells? I would suggest feeding the NBD-palmitate plus the lysolipids to the cells as a control to see whether this is the case.

      2) In fig 1 as an example they choose a region to blow up. As one can see there is a large variation, even in the blowups of mitochondrial labeling and if one looks at the originals the variation is confirmed. How have they chosen these areas? Furthermore, in figure 1 there is quite a bit of label with MLCL outside of the mitochondria, in particular in regions that they did not choose to blow up. What are these structures? Remodeling of MLCL is thought to take place in mitochondria.

      3) They speak of transport of lipids from ER to mitochondria, but in fact the demonstration of this is very weak from what they show in the time course in supp fig 1. I am also disturbed by the difference in patterns of the NBD-PA patterns in a and b. They should be the same, but there are problems, maybe focus? I would say anyway that there is no clear evidence that the NBD PA first appears in the ER then goes to mitos. It could be synthesized in both compartments from their data.

      4) The product characterization by TLC is insufficient. There are no standards, no characterization. Would they have seen the free NBD-palm by their methods?

      5) When they use mutants and find less "transport" the mitochondrial signal as seen by mitotracker is always more diffuse. This indicates to me that there is another problem.

      6) In fig 3 the fluorescent pictures do not correspond to what is seen in the quantification. There is more yellow in e than in h.

      7) How did they add cholesterol at 50 or 100 micromolar? It is soluble at less than 1 micromolar in aqueous solution. The cholesterol experiments are puzzling. From what we know about StAR protein it recognizes cholesterol not esters. There is no precedent for cholesterol ester transport into mitochondria. Can they rule out that the esters are transported to the surface of the mitochondria and the NBD-Palm cleaved off and transported into the mitochondria?

      8) The MAG and DAG experiments are overinterpreted. It could just be a kinetic problem since the MAG gets converted to DAG before TAG

      9) They compare to externally added NBD lipids, but we don't know which ones they used. Are they using short chain NBD phospholipids. I could not find this in their manuscript. If they do not have the same NBD-palm in the sn-2 position then the comparison is meaningless.

      10) The excitation and emission spectra of their probes are sometimes overlapping. How did they deal with this? Are they sure that they are not seeing FRET?

    4. Summary: Zhang et al. describe an interesting method to label newly synthesized lipids with fluorescent fatty acids and track their movement in cells. All reviewers agreed that this could potentially be a useful tool. However, they all raised concerns regarding the rigor of the characterization of this methodology.

    1. Reviewer #3:

      In this manuscript, Icke and colleagues show that the secreted protein CexE/Aap from entergotoxigenic E. coli is acylated at an N-terminal glycine and suggest that acylation is required for secretion via a Type I Aat secretion system to the cell's surface or into the environment. The key findings is the identification of an N-acyltransferase (AatD) encoded nearby cexE/aap and demonstration that this enzyme is required for acylation.

      There is a concern about the novelty of the findings. The publication by Belmont-Monroy et al. (PLoS Pathogens, August 2020) cited by the authors is very similar to the current manuscript. That publication demonstrated that N-acylation of Aap (a CexE homolog) occurs at its N-terminal glycine (made available after signal peptide cleavage), that acylation is dependent on the acyltransferase AatD, that acylation is required for Aap secretion, and that N-terminal residues are sufficient for acylation of a heterologous protein (though this was poorly analyzed in that paper). Almost all of those findings are shown in this current manuscript by Icke et al., independently confirming the acylation reaction.

      This Icke et al. study is well done and convincing on the AatD-dependent acylation of CexE/Aap. Overall, the same conclusions are drawn as Belmont-Monroy et al., 2020. The major new advance (not previously described) is the observation that the N-terminal glycine is required for N-acylation by AatD.

      As described in my comments (below), the manuscript could be improved in a few instances by including key controls to support the conclusions. In other instances, broad conclusions are made from narrowly focused data and the text should be revised.

      Major comments:

      1) "To our knowledge this is the first report of enzyme mediated N-palmitoylation in nature". This statement is not correct. The lipoprotein N-acyltransferase Lnt (used as a reference for AatD analysis in this manuscript) performs N-palmitoylation (C16:0) in E. coli and distantly related bacteria such as mycobacteria/corynebacteria. See Jackowski & Rock 1986 (JBC 261,11328-11333), Hillman et al. 2011 (JBC 86, 27936-27946), Brulle et al. 2013 (BMC Microbiology 13, 223).

      2) The conclusion that "we reveal a new function for acylation - protein secretion" is not fully supported. The authors do not directly show that the secreted CexE is acylated (Fig 2A) or that acylation is required for secretion. The use of 17 ODYA is innovative and could be used to show that secreted supernatant CexE is acylated. The CexE N-terminal substitution mutants that are not acylated (Fig 7C) could be used to test if acylation is required for secretion.

      3) If the secreted CexE is acylated, some discussion is needed. How is the acylated form sometimes secreted into the aqueous environment but sometimes embedded in the outer membrane as shown in the model?

      4) Can the authors show/detect CexE acylation in the native system that doesn't rely on overproduction of the CfaD transcription factor? Is the observed acylation physiological or a consequence of strong overexpression?

      5) Claims of novelty in text should be altered following Belmont-Monroy et al., 2020.

    2. Reviewer #2:

      I think this is a superb manuscript - it is written in a clear way, such that the story starts at the historical understanding of lipoprotein trafficking and builds up convincingly using various experimental methods to show that a new class of lipoproteins is trafficked via acylation of glycine, through the Aat secretion system.

      It is highly exciting that a protein that does the acylation AND the secretion from the periplasm to the cell surface has been identified! Next step is to get a structure.

      The data are convincing and the paper is extremely well-written. My comment is that I am not convinced by the argument that CexE would not be recognised by the lol system, when it is acylated it likely would be as the hydrophobic pocket of LolA and LolB are fairly indiscriminate - see e.g. the binding of small hydrophobic molecules to these proteins. The authors should comment on this aspect.

      It is intriguing how glycine in particular is recognised for acylation.

      Overall, a great paper - authors should be commended.

    3. Reviewer #1:

      This study from the Henderson laboratory describes the identification of a hybrid secretion system involved in the acylation and trafficking of a conserved class of bacterial lipoproteins. Spurred by the serendipitous observation of posttranslational modification, Icke et al. identify the AatD protein as the factor responsible for CexE acylation. Combining alignment of conserved sequences and structural data the authors isolate the site of acylation on the CexE polypeptide and identify AatD residues responsible for catalysis Overall this is a strong manuscript, densely packed with supporting data and extremely well written.

      My only significant concern is the issue of novelty. Although the authors seem to imply they are the first to report this type of system, they cite a 2020 PLOS Pathogens paper by Belmont-Monroy detailing nearly identical results in Enteroaggregative E. coli. Given the significant amount of overlap between these two manuscripts, it would seem prudent for the authors to spend some time in the introduction and discussion highlighting open questions that this paper addresses.

    4. Summary: All three reviewers were enthusiastic about the identification and characterization of a hybrid secretion system involved in lipoprotein acylation and trafficking. We were impressed by the strength and extent of the data and the clever use of genetic, biochemical and bioinformatic approaches. At the same time, there was agreement that the conclusion that acylation is involved in CexE secretion is not fully supported. There was also consensus that overlap between this study and the 2020 PLOS Pathogen paper from Belmont-Monroy, necessitates more direct acknowledgement.

    1. Reviewer #2:

      The manuscript prepared by Kim and Colleagues provides a solid attempt at understanding the neural correlates associated with self-reassurance and self-criticism in relation to what they term neural pain. While it is well written and there is a clear story presented here, there appears to be insufficient details in the introduction and discussion. The methodology appears sound for the most part, but I have some concerns relating to stimuli and gender effects that I believe would make the findings more compelling if addressed.

      Criticisms:

      1) The example items of the neutral statements appear to involve an external agent (i.e., a reference to a friend), while the neural pain is purely about the self. Are there also references to other people in the neural pain condition? If not, how have the authors ensured that the neutral condition is actually neutral. It seems likely that the inclusion of an external agent for many of the neutral statements could pose problems with interpretation, especially when talking about self-criticism and self-reassurance. The presence of an external agent in the neutral statements changes the meaning from a purely self-oriented experience to a shared experience.

      2) I am curious as to why the inverse contrasts (i.e., reassurance - criticism) were not run? Knowing whether there was a unique network associated with self-reassurance would provide a more comprehensive understanding of the authors' findings.

      3) I am wondering why the authors did not accommodate for gender differences in their study? Given recent evidence (See citation) it seems likely that this may play a part in self-compassion. The authors report an almost equal distribution of males and females, so it should be possible. If the authors did explore this and found no difference with gender as a regressor then this should be noted in the manuscript. Mercadillo, R. E., Díaz, J. L., Pasaye, E. H., & Barrios, F. A. (2011). Perception of suffering and compassion experience: brain gender disparities. Brain and cognition, 76(1), 5-14.

      4) It seems as though a whole body of literature is being very lightly touched on here but would benefit from inclusion. I think it would be useful to have some information in the introduction regarding moral emotions (i.e., compassion) and the link with empathy and emotion regulation (see work by Jean Decety). This would also be beneficial for the discussion as the authors are in essence describing empathy.

    2. Reviewer #1:

      This is a potentially interesting analysis, but there is a lack of framing, details, and specificity that dampens my enthusiasm for the work.

      1) As far as I can tell, the authors do not really demonstrate that "markers of negative emotion and pain" can be down-regulated during self-reassurance". They simply show that regions surviving multiple comparisons change depending on condition, but they don't show data supporting their hypothesis. How much do regions activated during criticism actually change during reassurance? What is the time course of these differences?

      2) Behaviorally, the neutral statements from the two "conditions" appeared to have distinct intensity levels. Specifically the "intensity" for neutral trials during criticism blocks appears significantly lower than neutral trials for reassuring blocks. Because of this behavioral effect, within their design it is difficult to identify the cause of the brain changes.

      3) How were subjects trained in self-criticism vs. reassurance? Is there any way to confirm that they were in fact doing the "task"? Further, at what point in the 2-week compassion training paradigm were FMRI data collected?

      4) Figure 2 is quite confusing to me: (1) the authors refer to brain maps as "neural pain"? I would strongly advise against this as it is very reverse-inferency. I would recommend against using this phrase throughout the paper. (2) How would one interpret the phrase "neural pain during self-reassurance"? Is this emotional > neutral during reassurance?

      5) Figure 3 refers to "trial by trial ratings of intensity" but if I am understanding the figure, this is not an accurate description. The authors are reporting the mean across subjects for each condition. It is unclear in fact how much variability there is on a trial-by-trial level within persons for the intensity of each condition. One idea is to use an amplitude modulation analysis to scale FMRI parameter estimates by the intensity rating on a per-trial basis. That would be an interesting analysis, IMO.

      6) It is unclear from this paper what was done previously. It appears that the authors examined physiological data (e.g., HRV) in their previous report but don't talk about other measures that were collected here. It would be useful to know the extent to which they buttress the authors findings (or if they do not).

    3. Summary: Kim and colleagues present a secondary analysis of an already published imaging dataset in 40 participants going through a two-week compassion training paradigm. They show participants standardized statements that are emotional or neutral and further have participants either engage in "self-criticism" or "self-reassurance" while considering the statements. The authors report on differences in brain regions (what they refer to as "neural pain") depending on criticism or reassurance condition. Concerns with the conceptual framework, approach, and interpretation substantially dampened our enthusiasm.

    1. Reviewer #3:

      Three different anti-asprosin mAbs were produced and tested in different metabolic syndrome animal models. Beneficial effects were noted on body weight, food intake and blood glucose and insulin levels. The effects were modest, but seemed to be relevant to elevated aprosin levels, as the AB blocked the effects of adenoviral overexpression of the hormone. Some issues require attention:

      1) Additional characterization of the aprosin neutralizing effect of the AB is required.. It will be helpful to show the endogenous free asprosin levels at different time points after a single or repeated mAb injection. This result is also important to tell whether this mAb will cause other immune responses and side effects that might confound interpretation of the results.

      2) In Figure 3 (a, e, j) and Figure 4 (a, e, I, m). please show body weight to rule out the stress or side effects caused by virus injection. For DIO mice, 14 days IgG injection also caused weight loss; for db/db mice, IgG injection increased body weight. Please discuss.

      3) Although adenovirus and AAV are widely used for in vivo protein overexpression, it is important to show here that endogenous asprosin levels were increased after virus injection and decreased after antibody neutralization.

      4) In Figure 5, more data on liver weight, histology, etc. is required to support their conclusion on liver health. The current data from three different mice models are very contradictory, this can be caused by the side effect or off-target effect of this mAb.

      5) In Figure 6, it is important to demonstrate the neutralizing effect of the mAbs.

    2. Reviewer #2:

      Asprosin, as identified by the authors' group, is reported to stimulate glucose release from the liver and also centrally act as an orexigenic hormone. The present study developed monoclonal antibodies for asprosin and demonstrated that antibody-based asprosin depletion lowered food intake, prevented diet-induced body weight gain, and lowered blood glucose levels in mice. Overall the data are supportive of the conclusion; however, several concerns were identified as follows:

      1) One of the central issues is the specificity of the antibody action. The authors should demonstrate if the effect of the asprosin antibodies is blunted in mice that lack either asporosin or its receptor OR4M1.

      2) Previous studies from the authors' group show that asprosin acts on hepatocytes and triggers cAMP signaling. The authors should examine if the neutral antibodies blunt the cAMP signaling in DIO mice.

      3) Similarly, asprosin was shown to stimulate AgRP+ neurons. The authors need to demonstrate the effect of asprosin antibodies on AgRP+ neuronal activity.

      4) A recent paper (von Herrath et al. Cell Metabolism 2019) challenged the author's observation of the metabolic action of asprosin. The authors claim that this is due to "due to use of poor quality recombinant asprosin". However, no scientific evidence was presented. This study needs a more rigorous assessment of data reproducibility.

      5) Most of the bodyweight data are presented as "body-weight change". However, the authors should present them as whole-body weight.

      6) Some of the data points and stat analyses require further clarification. e.g.) lack of SE in Fig.1c, statistical analysis of Fig.3, Sup Fig.1

    3. Reviewer #1:

      The study is interesting and does have potential translational relevance. There are some concerns however: (1) in Figure 1 the blood glucose drops independent of food intake is this all related to decreased hepatic glucose output or are there any effects on urine. Was urinary glucose measured? Is there increased glycosuria?; (2) In previous papers you discuss the increased lean body mass when aprosin is not present. There is no body composition data in this study. Was there any body composition differences with the antibody among the different mouse models (e.g DIO vs Nash diet)?; (3) were any changes in lean body mass with the antibodies associated with increases in strength?; (4) several mouse ages are discussed in the Methods section: 12 weeks, 16 weeks, 12 week of high fat diet or 24 week of NASH diet. Not clear from description if mice were matched for age. Please clarify; (5) In Figure 5 there are a number of inflammatory markers which can vary according to the model. What about anti inflammatory markers (cortisol, IL-10 etc) would be helpful to get a better picture of physiologic changes.

    4. Summary: Mishra et al. present data characterizing the effect of asprosin neutralizing antibodies on the parameters of metabolic syndrome (weight, glucose, lipids, etc). This group were the initial discoverers and characterized asprosin as a hormone that increases blood sugar and stimulates appetite. In their Nature Medicine 2017 article they also present data on a neutralizing antibody. In this follow-up manuscript the group characterizes the impact of neutralizing monoclonal antibodies on metabolic parameters of three mouse models of obesity (DIO, NASH diet and Leptin receptor knockout). The translational focus of the manuscript is potential use of monoclonal antibodies against aprosin as a treatment of metabolic syndrome.

    1. Reviewer #3:

      This paper shows that during a second-order conditioning (SOC) task, the representation of a conditioned outcome is represented in the lateral orbitofrontal cortex (lOFC). The BOLD signal in this region shows increased functional coupling with the amygdala for second-order conditioned stimuli that indirectly predict a negative outcome. The authors suggest these findings reflect a mechanism by which value is conferred to stimuli that were never paired with reinforcement.

      The paper tackles an interesting question concerning the neural mechanisms that support second order conditioning. The task design includes relevant controls and, on the whole, the findings support the claims made by the authors. I have a few questions about interpretation of the data, but my main suggestion would be to revise the framing of the article. There are many previous studies that have investigated the mechanisms that support second order conditioning which are not always given due credit. I believe this paper would benefit from placing the hypotheses and findings more firmly within the context of previous literature.

      Comments:

      1) The authors test the hypothesis that CS2 is directly paired with a neural representation of the US. They state that this hypothesis 'has never been tested to date'. However, a number of studies have shown evidence for and against this hypothesis (for example: Wimmer and Shohamy 2012; Wang et al., 2020; Barron et al., 2020). Can the authors clarify how the hypothesis tested here differs from those investigated previously? In addition, it is not clear to me how the four potential mechanisms they propose are really distinct from each other?

      2) Relatedly, given the authors use an SOC paradigm that differs from sensory preconditioning studies used by many previous authors, does the difference in task paradigm provide new insight? Do the authors expect the neural mechanism to be the same or different between their version of SOC and sensory preconditioning?

      3) Why is the behavioural data in Figure 1F bimodal for CS1 and CS2? i.e. what does choice probability of 0 for CS2+ vs CS2- mean for a given subject?

      4) To test the author's hypothesis, is it not necessary to assess evidence for US in response to CS2? They instead report reactivation of US in response to CS1 and for the PPI it is not clear to me how the authors distinguish between CS1 and CS2 given the temporal proximity in their presentation (Figure 1D).

      5) For the PPI, is there a main effect of CS- and CS+ versus CSn in lOFC? If not, how does this affect interpretation of the PPI? On a separate note, is the effect reported in Figure 3 really in the hippocampus? Does it survive small volume correction using a hippocampal mask?

      6) The following is stated as a premise: "To form an associative link between CS2 and US, the reinstated US patterns need to be projected from their cortical storage site to regions like amygdala and hippocampus, allowing for convergence of US and CS2 information." This potentially seems fair for the hippocampus, with added reference to relevant literature (e.g. publications from Shohamy and Preston labs), but in my opinion the jury is still out on this one. It is not clear to me why we necessarily expect amygdala here.

      7) There are various strong statements that in my opinion need to be toned down in light of existing literature. For example, the paper claims this study is the first to show evidence for implicit inference. However, as far as I'm aware, Wimmer & Shohamy 2012 also found no evidence for explicit memory of stimulus-stimulus associations with no relationship between measures of explicit memory and decision bias. Similarly, the authors claim this paper is 'the only report so far of behavioral evidence for associative transfer of motivational value during human second-order conditioning', overlooking a large number of other studies that have shown similar behavioural effects.

    2. Reviewer #2:

      The authors investigate the neural correlates of second order conditioning in carefully designed behavioural experiments coupling multivariate fMRI and functional connectivity. They found that the lateral OFC in connection with the amygdala, plays an important role. I think the paper represents a valuable addition to the human cognitive literature, where second order conditioning is surprisingly under-investigated. I have only a few suggestions to make.

      I encourage the authors to complement the multivariate analyses with a standard univariate analysis. To be clear, I am not without seeing the added value of the multivariate approach, however, given the extensive literature on the neural bases of conditioning using univariate analyses and the strong prediction about directionally of the effects in the OFC (which should positively encoded expected values and rewards), I think the paper would definitely benefit from including the univariate results for the main contrasts / variables.

      I am also curious to see the reaction times in the attentional control task analyzed to check if they were affected by the underlying conditioning procedure. Following the Pavlovian-to-Instrumental transfer theory, we should observe that the reaction times are slower for negative (aversive) stimuli and faster for positive (appetitive) stimuli.

    3. Reviewer #1:

      This manuscript by Luettgau et al. describes a study of second-order conditioning in humans. The behavioral task involved visual first- (CS1) and second-order cues (CS2) and gustatory outcomes (US). Behavioral results show that subjects preferred both the CS1+ and CS2+ over the CS1- and CS2-, respectively. MVPA shows that the CS1 evokes US representations in the lateral OFC, and that US representations in the amygdala increase over second-order conditioning. This study addresses an important and novel question. However, I have several major concerns regarding the study design and data analysis:

      1) I do not see how it would be possible to disentangle responses to the CS1 and CS2 in this task. The delay between the CS2 and CS1 is only 500 ms, which is not long enough to disentangle fMRI responses to the two CS.

      2) For the main "reinstatement" analysis, activity was averaged across both CS2 and CS1, and so it is unclear whether reinstatement is driven by the CS1 or CS2. The authors argue that "US reinstatement during SOC could only be faithfully attributed to the respective CS1, but not to CS2, since only CS1 had been directly paired with the US, and CS2 had not previously been experienced." However, this is only strictly true for the very first trial during which the CS2 could have gained full access to the US representation.

      3) In this regard, it is unclear why the authors did not use data from the first-order conditioning phase to test for US reinstatement. Although the 4-second delay between CS1 and US is still quite short, TR-wise MVPA could provide evidence that signals are related to the CS1 and not the US itself.

      4) Relatedly, the authors perform analyses suggesting that, from early to late phases of second-order conditioning, representations of CS2 in the amygdala became more similar to US representations. Although here they attempt to model fMRI responses to the CS1 and CS2 separately, there is no evidence that this was indeed successful. As I see it, the delay between the two CS is just not long enough to dissociate these responses.

      5) Is there evidence for a CS1 evoked reinstatement of the US in the amygdala, and a CS2 evoked reinstatement of the US in the lateral OFC? In theory these signals should exist, but independently testing for activity related to the two CS requires a task design where the two CS are presented in isolation or with long enough delay between them.

    4. Summary: All reviewers agreed that the neural mechanisms by which value is conferred to stimuli that were never directly paired with reinforcement is an important topic. However, individual reviewers raised questions regarding the study design and data analysis. In particular, reviewers agreed it was not clear how you could distinguish BOLD responses to CS1 and CS2 given the temporal proximity of their presentation. They also wondered whether the current results would provide enough advance beyond previous work.

    1. Reviewer #3:

      This manuscript examines data from the Young Adult Human Connectome Project's 900-subject release to compare both structural and functional connections between iso-eccentricity bands in striate cortex and the fronto-parietal, cingulo-opercular, and default mode networks. The authors find that central vision is most strongly connected to the fronto-parietal network, which is associated with attention, while the far periphery is more strongly connected to the default mode network. The questions asked in this manuscript are of considerable interest to the field, and this study has the potential to be impactful. However, substantial work is needed to make the methods and results sufficiently clear and reproducible to the reader.

      Major Comments:

      A major problem throughout this paper is that the authors have not been very careful in documenting their methods, what they are plotting, or how they are supporting their assertions. This is a major shortcoming of the work. I do not believe there is sufficient detail in this paper as is to reproduce the methods, nor was I able to understand what precisely was calculated in the statistical tests reported.

      The amount of work that has been put into this project's quality control (at minimum, visual inspection and filtering of 900 MR images) is very impressive! This information should really be shared with the broader research community in order to make this manuscript more reproducible and in order to ensure that other researchers can simply use and cite the authors' efforts rather than repeating them. This could be as simple as a supplemental table or text-file that includes the subject IDs of those HCP subjects that were included in all analyses.

      It should be crystal-clear from the Methods section whether the manuscript's data were collected or reanalyzed by the authors. My understanding is that all of this manuscript's analyzed data are from the HCP database. However, had I read only the "Data Acquisition" section I would have been left with the strong impression that the authors collected the data themselves using the same kind of scanner and the same analysis pipelines as the HCP. Unless this is the case, the opening sentence of this section should probably be something like "All data were acquired and preprocessed by the Human Connectome Project (Van Essen et al., 2013)" [10.1016/j.neuroimage.2012.02.018]. It may also be wise to reference the HCP in the Acknowledgements section. Further information: https://www.humanconnectome.org/study/hcp-young-adult/document/hcp-citations. This should apply equally to the data and the preprocessing methods-i.e., if the quality control mentioned in the above comment was performed by the HCP and not the authors, that should have been explicit.

      P3, ❡6. This paragraph is critical to the methods but is not at all clear. In particular, the paragraph eventually describes seven eccentricity segments per subject, yet it does not explain what the eccentricity boundaries of these segments are, nor does Figure 2 show these segments. It isn't clear from the manuscript if these are ever used (rather than the 3 central/mid-peripheral/far-peripheral segments) or exclusively used.

      In looking at Figure 4, my first and strongest impression is that the central connectivity is very similar to the far-peripheral connectivity, and the z-score differences are incredibly small. Additionally, the legend does not make the quantities plotted very clear (these are based on the averaged z-scores across subjects?) so I'm left wondering how to assess any sort of significance. I have a similar reaction to Figure 5. More help is needed to understand these results.

      Given that this paper consists of a large analysis of a large existing dataset, it would be especially nice if the authors would make their source code and intermediate analysis files publicly available. Having access to the source code directly is virtually a requirement of making this kind of study reproducible and would mediate many of my concerns about the ambiguities of the methods.

    2. Reviewer #2:

      In this work, Sims and colleagues use resting-state functional connectivity and diffusion tractography in human connectome project data to examine the connectivity of the central and peripheral aspects of the primary visual cortex. They find that central V1 connects more strongly to regions of the prefrontal cortex interpreted as the Fronto-parietal network than does peripheral V1.

      The idea that central V1 may be directly connected to control-related networks is an interesting one, and has fascinating implications for the study of top-down modulation of visual cortex function. However, I must say I am somewhat skeptical of these findings, for several reasons.

      First, I find the a priori anatomical basis for these proposed connections to be dubious. The authors themselves describe how Markov et al. explicitly conducted tract tracing with central V1 and found connections with posterior frontal and parietal cortex, but nothing with areas classically associated with the fronto-parietal cortex. The authors propose that the inferior fronto-occipital fasciculus may connect V1 with lateral prefrontal regions only in humans. However, they provide no evidence for this suggestion. Indeed, my understanding of the iFOF is that it connects to inferior and lateral occipital cortex (see e.g. figures from the Takemura study cited in this work). Can the authors better support the idea that the iFOF might be the route of connection between V1 and frontal cortex?

      Second, I am concerned that both 1) the Central V1 ROI employed in this work and 2) the inferior frontal cortex region showing strong FC with that Central V1 ROI overlap very closely with regions where we have seen poor BOLD signal in our own fMRI data (I would like to attach a figure if possible).

      We are not confident what the source of the poor signal might be in posterior occipital or inferior frontal cortex; we suspect the presence of large veins (possibly the transverse sinus in V1; see Winawer et al., 2010, Journal of Vision). In any case, the data quality is low enough that we believe our data should not be considered to represent actual neural function in those regions. Can the authors demonstrate convincingly that this is not the case in their HCP data?

      Third, I have an issue with the localization of effects in this paper. The paper describes effects in the fronto-parietal network throughout the manuscript, including the title. How surprising, then, that the strongest effects are not in the FP network at all! Figure 4A makes it very clear that the largest effects are in the IFG, which is outside the green outlines describing the extent of the fronto-parietal network, but inside the Default network. Figure 3A also supports this Default-centric localization, with Central V1 effects in posterior lateral parietal, medial parietal, and superior frontal cortex, all outside FP but inside Default. Since the FC effects are not actually primarily in FP, I see no reason why FP should be used as a mask in Figure 5. Indeed, the authors should show the localization of SC effects throughout the cortex, not just in FP. I also see no reason why these V1-Default connections should be characterized in any way as "attention" or "control".

      Fourth, I feel that these FC and SC differences are wildly over-interpreted. From the scale, the actual strength of FC and SC between central V1 and lateral parietal cortex is extremely weak (around Z(r) = .1 for FC and p-track = .1 for SC). Under no circumstances would I believe that either of those values represents any sort of real connection. Cortical regions with direct structural connections have much stronger FC values, as do regions that influence each other indirectly via multi-step connections. Further, very large portions of the brain probably have both stronger FC and SC to central V1 than these FP regions (the authors show this for FC but exclude this info for SC). Most glaringly, I certainly don't believe there is a "direct structural connection" as is claimed in the discussion--a claim based, strangely, on the spatial correspondence between the structural and functional maps, which really has nothing to do with any evidence for a direct connection.

      Finally, the authors must note that p values may not be used for spatial correlations between brain maps. This is because these maps are always highly autocorrelated, which violates the independence assumption of the correlation procedure.

    3. Reviewer #1:

      This manuscript extends on prior work by the authors (Griffis et al, 2017), which originally reported eccentricity-dependent differences in resting state connectivity between V1 and regions brain wide. This study builds on that work by expanding the pool of participants, using the HCP dataset, as well as also investigating any eccentricity-dependent effects that may emerge with tractography. Interestingly, both measures find that foveal areas in V1 are more strongly connected to frontoparietal networks. The study is interesting, but I have a few remaining points.

      1) While during the resting state scans, there was, in theory, no 'task', participants were asked to maintain fixation on the cross in the center of the screen throughout the scan. I think it would be important for the authors to note that there is a possibility that the resting state correlations observed wherein foveal areas were more correlated with frontoparietal regions (and far periphery with DMN areas) could be due to attention directed towards the fixation cross, and away from the periphery. While I acknowledge the authors have no way to test this with this data set, it is possible that if participants had been asked to covertly attend to a ring in their far periphery the entire time instead, the correlations might have been flipped, with frontoparietal connectivity highest in the periphery towards the attended eccentricity. The authors should either explain why this is not a concern, or acknowledge it in the manuscript.

      2) Related to the last point, what was the size of the screen used during the connectivity data acquisition? I ask because the far eccentricity bands determined using Benson et al's technique are very eccentric. And if participants had eyes opened and were fixating, was that eccentricity outside the outer edge of the screen? Because then it would be encouraged to be 'unattended', thereby potentially influencing connectivity results.

      3) Was there any attempt at replicating these results in extra striate cortex? Are these patterns still there, both in structural and functional connectivity, for V2 or V3?

    4. Summary: The manuscript is a replication of findings from Griffis et al., 2017, and it seeks to validate those findings using a different modality (diffusion-weighted imaging; DWI). While the questions asked in this manuscript are of considerable interest to the field, the findings' focus and implications are relatively narrow. Further, the study does not reveal new conclusions about brain function or organization. Authors may be cautious about interpreting the findings as representing direct structural connections between the occipital and frontal cortex -- as the reported structural and functional connectivity values may not be strong enough to support such a strong interpretation. The reviewers also agree that the methods are not presented clearly, in a manner that is straightforward to follow and critique.

    1. Author Response:

      This response corresponds to the essential revisions sent to the authors after review.


      1) Further characterization and clarification are needed regarding the sensor properties. This is crucial for the potential users in the field to judge and use the sensor, and for interpretation of the biology results using the sensor.

      We are grateful to the reviewers and editors to raise such important questions regarding the characterization of sensor properties. The feedback surely contributes to clarify important aspects of the sensor.

      i) Clear statement in prominent places about the improvement of the sensor and new potential for its biologic applications separating from the authors' 2015 paper.

      Previous enzyme-based biosensor designs, including the ChOx biosensor described in our publication on 2015 (Santos et al, 2015), were based on the differential coating of electrode sites with matrices containing or lacking ChOx. This modifications render the sites Ch- sensitive or insensitive, respectively. The latter have been termed “sentinel” sites, as they are designed to respond to any perturbation except to the analyte of interest (Ch in this case). By subtracting the sentinel from the Ch-measuring site, this approach has been useful to decrease the contribution of interferent signals, namely caused by electrochemical oxidation of electroactive compounds or by voltage fluctuations associated with LFP. However, cross- talk caused by H2O2 diffusion from enzyme-coated to sentinel sites poses important constraints on this design. The inter-site spacing required to avoid diffusional cross-talk leads, for example, to uncontrolled differences in the amplitude and phase of LFP across sites, compromising common-mode rejection.

      In the current study, we have circumvented diffusional cross-talk-related limitations by implementing a novel sensing approach. Rather than changing the coating composition across recording sites, we have differentially modified their electrocatalytic properties towards H2O2, resulting in Ch-sensitive and pseudo-sentinel sites. As Ch responses depended solely on the intrinsic properties of the metal surface, we could dramatically reduce the size and increase the spatial density of recording sites by using tetrode configuration. Tetrodes, a bundle of four twisted wires glued together, are conventionally used for separating single neuron action potentials based on the spatial structure of their action potentials across wires. Here, the spatial structure of the electrochemical signal is created by electrochemical modification of wires. Importantly this design allows the unbiased measurement of ChOx activity and O2 in the same brain spot by using a tetrode site to directly measure the latter. This has not been possible to achieve with conventional enzyme-based biosensor designs, including our own previous stereotrode design.

      We acknowledge that the improvements of the TACO sensor over our previous stereotrode design, published in 2015 (as well as other conventional enzyme-based biosensors in general), were not clearly emphasized in the manuscript. We added new paragraphs/sentences in the introduction and results of the revised manuscript (page 4 lines 10-16, page 5 lines 6-15 and page 6 line 8) highlighting the main difference between the two sensors and advantages of the new design for the unbiased measurement of the signals derived from ChOx activity (COA) and O2.

      ii) Regarding the choline responses: characterizing the linearity of choline response is important for users to understand the sensor properties.

      Responses to choline were highly linear within the concentration range tested (up to 30 μM). This information was added to Table 1 and mentioned in the text (page 7, line 18) of the revised manuscript.

      Related, demonstration how to calibrate moving artificial signals in freely-moving rodents will be useful for the future applications.

      Movement can cause electromagnetic or mechanical perturbations (movement artifacts) that are expected to scale with the impedance of individual recording sites. As the same applies for LFP-related currents, it is not trivial to discriminate both confounds. Nevertheless, our common-mode rejection approach, which is optimized by a frequency-domain correction of electrode impedances (please check Methods section, page 40, for detailed explanation), is designed to optimally remove both LFP- and movement-related artifacts.

      In our freely-moving recordings we did not have prominent movement-related perturbations, probably due to the proximity of the head-stage to the sensor and the shielding effect of the grounded copper mesh that covers the implant. Nevertheless, candidate events likely caused by movement consisted in current deflections aligned to locomotion bouts, which were completely removed by common-mode rejection. In the revised manuscript we added the average raw traces triggered on locomotion bouts in Figure 2D, highlighting the usefulness of our method to remove putative movement-related artifacts in addition to LFP and other interferents. We have also added a brief mention to this issue in page 10, lines 32-35 and page 11, lines 1-2.

      Further, since the COA signal is confounded by phasic O2 fluctuations, the authentic changes in COA are potentially interfered by O2-evoked enzymatic responses. The interpretation of the signal interference needs to be clearly discussed, including O2-evoked changes, and other related signaling changes, like DA.

      The main focus of our study was to investigate the effect of physiological O2 fluctuations on the ChOx biosensor signal, which is given by the activity of immobilized ChOx, which we abbreviate as COA across the manuscript. In order to address this issue in an unbiased manner it is essential to clean artifacts that directly generate currents on the electrode surface (please see response to point 1vi for details). Our TACO sensor was designed to optimize the removal of such confounds, resulting in a clean COA signal. As this signal reflects the activity of immobilized enzyme, it is sensitive to changes in O2, not only Choline. Thus, the COA signal is not confounded, but rather modulated by changes in O2. Our main finding was that phasic O2 modulation of COA is a major confound of phasic Ch dynamics measurements using ChOx sensors in vivo in the brain. In this sense, the central tenet of the paper is that COA is not reflecting an authentic choline concentration dynamics, but rather a nonlinear function of Ch and O2 dynamics, with no feasible analytical approach to separate the two.

      We recognize that, in the Methods section, the description of how the COA signal was computed could lead to confusion between authentic COA and authentic Ch measurement. In the revised manuscript we have changed the terms used in the signal cleaning procedure (page 40-41).

      Regarding neurochemical confounds (e. g. ascorbate or dopamine and other monoamines), we acknowledge that the description of multichannel sensor properties in Table 1 could be confusing to readers. The table was also not conveying the important information on how sensitive is our COA measurement to these artifacts. In the revised manuscript we have removed the information about selectivity ratios for individual sites. Instead, the table section now called “Analytical properties for COA measurement” was expanded and now shows DA and AA sensitivities and selectivity ratios for the COA signal, computed from the difference between Au/Pt/m-PD and Au/m-PD sites.

      Additionally, we added a column in the color plot in Figure 1E describing the relative responses of the COA measurement to the different factors. This addition highlights the high selectivity of the COA signal for Ch, as compared with individual sites.

      Finally, we have detailed the interpretation of the freely-moving signals triggered on SWRs and locomotion bouts. In the Methods section of the revise manuscript (page 41, lines 4-11), we clarify how the differential signals COAnon-mPD and NCC (neurochemical confounds) presented in Figure 2 (revised version) were computed. In the description of these results, we also explain how the response patterns of raw and cleaned signals can be used to infer the contribution of different sorts of artifacts, including movement- and LFP-related and those caused by neurochemicals (page 10 lines 26-35, page 11 lines 1-5).

      iii) The dimensions of the sensor head need to be specified and spelled out clearly. It seems to be around 50 um, but the text seems to suggest 150 um. The individual sensing elements are 17 um in diameter. If this is true, it is very exciting because it exhibits hemispherical diffusion yielding higher response and enhanced sensitivity. This may improve spatial and temporal resolution if this is in indeed a much smaller sensor as a disk-shaped one.

      We thank the reviewers for referring to this point. It is an important detail that was not clearly stated in the manuscript. In the Methods section (page 34 of original manuscript), the description of the insertion of the tetrode inside a silica tube might have been misleading. In fact, the tetrode actually protrudes 1-2 cm out of the silica tube. This distance assures that the latter is not in contact with the brain in in vivo recordings. The cutting of the twisted ending of the tetrode results in four disc-shaped sensing elements with 17 μm diameter. The total diameter of the tetrode is approximately 60 μm. In the revised manuscript we have clarified and emphasized these details in the Methods section (page 36 lines 10, 15-16), in the results (page 6, lines 3-5) and with an additional cartoon in Figure 1A.

      iv) The role of the sentinels with differential plating is very interesting, but the function of the sentinels is not clear (p. 4 "canceling LFP-related currents"). They consume oxygen. Why does this not result in overlap of the diffusion layer for the choline sensor and therefore affect choline response? Please explain why differential electroplating was employed.

      We further clarified the role of the pseudo-sentinel sites on the removal of LFP-related currents and neurochemical artifacts and expanded the reasoning behind this approach. Please check the Introduction of the revised manuscript (page 4 lines 4-18, page 5 lines 6- 15).

      When polarized at +0.6 V vs. Ag/AgCl, the pseudo-sentinel channels display a residual activity towards electrochemical oxidation of H2O2. This electrochemical reaction generates O2, but the effect on the local O2 concentration is negligible due to the poor sensitivity and very small electrode surface area (17 μm diameter disc). We measured O2 (head-fixed mice and in vitro) by electrochemical reduction at -0.2 V vs. Ag/AgCl at a pseudo-sentinel site (gold-plated without m-PD). In this case O2 is consumed, but at a very limited extent that does not affect the local O2 level in the sensor. In accordance with the expected lack of effect on O2 levels, we have confirmed that switching the applied potential on a gold-plated site between +0.6 V and -0.2 V vs. Ag/AgCl has no effect on the COA signal. In the revised manuscript we added a supplementary figure (Figure S4) describing this observation. Accordingly, we extended the discussion of this topic in the results section (page 13, lines 17-18).

      v). Please explain how time-dependent behavior of the sensor was measured. This process typically leads to the formation of a film on this electrode surface which can affect sensitivity. According the authors' 2015 paper, the method for measuring the response time seems rather crude, and may overestimate the response time which is related to the mixing of the solution. This needs to be discussed.

      The sensor response times were estimated from the rise of the current in response to analyte additions in a stirred buffer solution, as described in the Methods section (page 40, lines 9-10 of revised manuscript). In the revised manuscript, we added a sentence to further clarify the use of this setup to estimate response times (page 37, line 29). Indeed, this setup is not the most appropriate to precisely determine response times due to the bias introduced by the analyte mixing time after its addition to the buffer. Our previous study (Santos et al, 2015) suggests however that the biggest contribution to the estimated response time is due to diffusion of Ch in the sensor coating. Besides the fact that we cannot precisely determine response times, it is noteworthy that real response times are faster than the values we report. This further highlights the high temporal resolution of the TACO sensor. We added a paragraph discussing this topic in the revised manuscript (page 7, lines 19-21).

      vi). The effect of LFP and other perturbations of sensor responses need to be more clearly explained.

      Two main types of artifacts affect the response of enzyme-based electrochemical biosensors: electromagnetic or electrochemical sources that directly generate currents at the electrode surface and biochemical factors that affect the activity of the immobilized enzyme. The first group can be sub-divided into: a) artifacts that generate faradaic currents, arising from oxidation/reduction of electrochemically active molecules, such as ascorbate or dopamine; b) artifacts that change the charge distributions at the electrode surface, generating capacitive currents, which in the brain are mainly caused by local fluctuations in field potentials (LFP) generated by the transmemberane current sources of the surrounding neural tissue. Effectively, LFP causes potential changes at the electrode surface who’s voltage is clamped by the potentiostat circuit, giving rise to apparent current, similar to voltage clamp measurement of the intracellular current. The second group, consisting in biochemical artifacts, comprises mainly the effect of oxygen on enzymatic activity (although other factors such as temperature and pH might have a minor effect, as discussed in the manuscript, page 34, lines 16-20).

      Importantly, the strategies devised to reduce artifacts that directly generate electrochemical currents (chemical surface modifications or common-mode rejection) do not control for factors influencing immobilized ChOx activity.

      Since O2 interference was the main focus of the paper and is thoroughly described throughout the manuscript, in the Introduction of revised manuscript we extended the description of the factors directly generating currents on the electrode surface (page 4, lines 4-18).

      2) Re-organization of the manuscript to improve the readability. This manuscript contains the characterization of the TACO sensor and application of this sensor to monitor real-time behavior in freely moving rodents. The design and characterization of the sensor is intermingled with the application of studying the choline biology with the sensor, making the logic flow hard to follow. The arrangement and presentation of the figures need to be improved so readers can appreciate both characterization and applications aspects and how they are tightly linked. This might also involves properly arrange main figures and associated supplementary figures.

      We believe this suggestion stems from the expectation that we may have conveyed to the readers regarding the possibility of measuring authentic Ch dynamics in behaving animals with our TACO sensor. Indeed the TACO sensor design makes it ideally suited for the unbiased measurement of brain Ch dynamics based on ChOx, while controlling for O2 changes that might modulate immobilized enzyme activity. However, our data shows that phasic ChOx activity (COA) is dominated by O2 fluctuations in the brain of behaving animals. The complexity of the nonlinear interplay between COA and O2, which depends on multiple time-scale concentration dynamics of both enzyme substrates made it impossible to extract authentic Ch from the in vivo COA signal.

      Following the logic of data presentation in our manuscript, the initial description of TACO sensor design and properties towards COA measurement was followed by its in vivo application in freely-moving and head-fixed rodents, which led to the discovery of the possible O2 confound. This, in turn, prompted the next in vivo experiments with causal manipulations to prove the hypothetical confound effect. Next, in vitro experiments were used for more systematic investigation of the details of the confound and its underlying causes guided by the prior in vivo observations. Finally, we used a detailed mathematical model to quantitatively uncover the mechanism of the oxygen confound of the choline-oxidase-based biosensor.

      We think this logic of exposition is guiding the reader through our thought process and progresses consistently from the development of novel methodology to evaluation and identification of the confound, and then to unraveling the mechanism in vivo, in vitro and in the model. Reversing the order of presentation would break this logic and hurt the presentation of the story.

      We would like to ask the editor for her consent not to follow the suggested major reorganization. Instead, we clarified the internal logic at the end of the introduction section (page 5, lines 16-23), as well as throughout exposition of the results. Morevover, throughout the revised manuscript we emphasize the focus of our study on phasic COA dynamics instead of putative Ch by replacing terms alluding to the latter by “COA”. Accordingly, we better articulated the motivation for assessing SWR- and locomotion-related signals in freely- moving animals (Figure 2) and the interpretation of these results to avoid a biased expectation of the reader that COA signals provide authentic Ch readout. The revised manuscript now provides an unbiased perspective on motivation and interpretation of the in vivo experiments (page 10 lines 19-22, page 11 lines 5-12). The bias of COA by O2 and the issues associated with derivation of authentic Ch dynamics from our measurements were also further explained in the discussion (page 34, lines 35-37). Along the same lines, we have trimmed Figure 2 in order to keep the focus of the paper on phasic dynamics of the COA signal. Namely, we moved panels B and C describing tonic COA dynamics in the original manuscript to a supplementary figure in the revised version (Figure S3).

    2. Reviewer #3:

      In this manuscript, Santos and Sirota demonstrated that the in vivo fast choline dynamics detected using choline-oxidase based biosensors is strongly correlated with, and likely caused by, phasic oxygen dynamics in vivo. The authors developed a novel tetrode-based amperometric choline oxidase (ChOx) sensor that can simultaneously measure ChOx and O2 levels within the same tetrode, which enabled the authors to observe strong correlations between ChOx and O2 levels in vivo (in behaving rats and mice, and under several distinct behavioral contexts). To dissect the causal relationship and determine the role of phasic O2 transients, the authors further combined in vivo as well as in vitro perturbation experiments to demonstrate that that phasic fluctuations in O2 concentration can lead to fluctuations in ChOx measurements. Moreover, mathematical modeling recapitulates the systemic relationship between ChOx and O2, suggesting the source of this coupling stems from non-steady-state enzyme kinetics. Together, these findings challenge the long-held belief that ChOx sensors can measure sub-second temporal dynamics of choline concentrations in vivo, and also calls for critical re-evaluation of all oxidase-based biosensors literature to determine the contribution of phasic O2 dynamics in vivo.

      The study provides extensive evidence to support their claim: correlational, causal, analytical and modeling. The authors employed multiple levels of approaches, from the development of novel biosensors that leads to the observed correlation, to careful in vivo and in vitro perturbation experiments to demonstrate causal relationship. The data is carefully analyzed, and elegantly matched with modeling results. The results of this study have broad implications beyond the ChOx literature and in fact challenge the entire literature on oxidase-based biosensors.

    3. Reviewer #2:

      This is an important piece of work addressing in-vivo measurements where two coupled components are to be measured ideally in the same time and space components. Unfortunately, the impact of this work is likely to be minimized due to its poor organization and the attempt to deal with a number of separate but related issues in the same manuscript. Accordingly, it is suggested that this work be divided into two manuscripts to be published together. The focus of the two might be:

      A) A Tetrode-Based Microsensor (TACO) - This work would focus on the criteria for performance that would be expected for a new sensor, presumably with new and unique properties. This work would include differential plating of electrodes (It is unclear whether some or all of this work has been previously reported), dimension considerations, and the simulation of sensor response. (An important consideration but frequently overlooked is that a sensing element with a 17um diameter will exhibit hemispherical diffusion (Eq. 4)). Other issues such as interferences, stability, sensor response time and linearity need to be more fully explained. Presumably such a sensor configuration would be useful in other applications involving oxidase-based sensors.

      B) Effects of Local Field Potential and Oxygen-evoked ChOx transients in the In-Vivo Measurement of Acetylcholine in Freely Moving Rodents (A better title can surely be presented!) - Here the focus should be on the in-vivo measurements including a qualitative explanation of the LFP and O2 response and how the TACO sensor corrects for this. Presumably the detection of REM and NREM sleep will be detected by EEG. This is not well explained. Also unclear is how the improved performance of the sensor affects the conclusions of the in-vivo studies.

    4. Reviewer #1:

      Santos and Sirota developed a novel Tetrode-based Amperometric ChOx (TACO) sensor. This multichannel configuration can simultaneously measure the ChOX activity (COA) and O2 in the same brain spot. Using the TACO sensor in freely-moving and head-fixed rodents, they found that COA and O2 dynamics following locomotion in active state and hippocampal sharp-wave/ripple (SWR) complexes during quiescence state. It's interesting that the COA signal can be calibrated by subtraction of the pseudo-sentinel from the Ch-sensing sites signal the TACO sensor. However, the COA signal is confounded by phasic O2 fluctuations, so, the authentic changes in COA are interfered by O2-evoked enzymatic responses. This question isn't addressed in this paper.

      Major concerns:

      1) The author found that the COA readout is confounded by phasic O2 fluctuations in in vitro and in vivo experiments. These results cast doubt on the validity of the authentic cholinergic response in freely-moving or head-fixed rodents. These findings seem to be generalized to other oxidase-based biosensors, although the author has some discussion on how to address this question. However, we can't get authentic cholinergic dynamic in vivo by TACO biosensor if we didn't clear the biosensor O2 dependence. So, the author should try to address this question.

      2) The author should demonstrate how to calibrate moving artificial signals in freely-moving rodents.

      3) Concerns on the selectivity. Figure 1E shows the TACO sensor also responses to dopamine and ascorbate. The author should demonstrate the selectivity of TACO sensor on different monoamines at different concentrations.

    5. Summary: In the manuscript, the authors developed a novel tetrode-based amperometric choline oxidase (ChOx) sensor that can simultaneously measure ChOx and O2 levels within the same tetrode. This sensor allowed the authors to observe strong correlations between ChOx and O2 levels in vivo in behaving rats and mice and under several distinct behavioral contexts. The authors further combined in vivo as well as in vitro perturbation experiments to demonstrate that that phasic fluctuations in O2 concentration can lead to fluctuations in ChOx measurements. These findings also challenge the long-held belief that ChOx sensors can measure sub-second temporal dynamics of choline concentrations in vivo, and also calls for critical re-evaluation of all oxidase-based biosensors literature to determine the contribution of phasic O2 dynamics in vivo.

    1. Reviewer #3:

      This work by Katada and colleagues uses M4 and 5B transgenic lines to express ChR2 in starburst amacrine cells (SACs) and retinal ganglion cells (RGCs). It finds that ChR2 activation in SACs improves the ChR2 response in RGCs. Thus, in a gene therapy strategy that expresses optogenetic proteins in RGCs, SACs may be considered as a helpful additional target. The rationale of the manuscript basically regards RGCs as a uniform population and disregards all amacrine cells except SACs. If differences in RGC and amacrine subtypes are taken into consideration, some conclusions of this manuscript should be revised.

      Major comments:

      1) This manuscript makes one assumption: that the RGCs in M4-ChR2 and 5B-ChR2 have comparable ChR2 evoked response if activated alone, thus the difference between their ChR2 responses is entirely attributed to the activation of extra SACs in the M4 line. Yet there is no experimental evidence to support this assumption. Both M4-YC and 5B-YC label ~35% of the RGC consisting of multiple subtypes, the subtype compositions of the two populations are not shown. ChR2 response properties of a neuron may be influenced by its own ion channel composition that differ between cell types. The authors need to either a) show the 2 mouse lines label identical subsets of RGCs (unlikely, given FigS6E), or b) compare M4 line with or without coactivation of SACs to single out the effect of SACs.

      2) The experiment results using rAAV (Fig4) are hard to interpret:

      a) CAG promoter directs expression in most cell types. So other amacrine (Fig4D) and RGC cell types in addition to SACs and M4/5B RGCs are also infected. Comparison between rAAV/M4/5B retinas cannot provide clean insight into the effect of SAC.

      b) The manuscript makes comparisons within the rAAV experiments (Fig4I-K FigS8F-H), trying to link induction efficiency into SACs with visual restoration. However, it is a given that higher infection in RGCs also leads to better visual restoration. So SAC effect cannot be separated from RGCs (Fig4J-K FigS8G-H).

      c) The one exception shown in Fig4I and FigS8F, where SAC infection rate is linked to maintained/peak ratio, while RGC infection is not, has two caveat: First, the authors acknowledge that higher maintained response may not causally link to better restoration (line 235). Second, the same correlational analysis for other AC types is missing.

      d) At this stage, a simpler interpretation of the results is equally plausible: that higher infection in all retinal neurons (regardless of type) is correlated with better restoration.

      3) M4-ChR2 retina has very weak OFF response to regular light stimulus, but 5B has normal ON/OFF ratio. The authors speculate that SACs are responsible for this difference. But one observes that M4 labels mostly OFF RGCs while 5B labels equal amount of ON and OFF RGCs (S3 and S6E, lamination patterns of M4 and 5B), so there is a simpler explanation: RGCs that express tet-ON ChR2 are no longer very responsive to regular light stimuli. If that is true, that these cells are very unhealthy, then comparison of their ChR2 responses becomes less meaningful. The authors need to address the cell health problem caused by tet-ON ChR2 expression.

      4) Only a few RGC subtypes form synaptic connections with SACs in the rodent retina. Thus, the effect of SACs would be limited. In the case of primate retina, ChAT positive neurons are much fewer, so their effect in ChR2 gene therapy are likely even more limited.

      5) Lines 154-155: an equally likely explanation: M4 contains ON and ON-OFF DSGCs, which are known to be important for OKR, whereas 5B does not. This possibility needs to be considered.

    2. Reviewer #2:

      This paper presents the results of a study of optogenetic visual restoration. ChR2 was targeted either to a subset of ganglion cells (GCs) or to a subset of ganglion cells-not necessarily the same ones-plus starburst amacrine cells (SACs) using an intersectional genetic strategy. Photoreceptors were ablated using MNU in animals expressing ChR2, and then retinal and whole animal responses to visual stimuli were assessed. Interestingly, co-expression of ChR2 in SACs and GCs resulted in different, potentially more "naturalistic" responses than expression in GCs alone. This is an interesting result, but given the number of possible explanations for it, the lack of any rigorous investigation of the underlying mechanism is problematic. Results presented by the authors indicate that ACh release from stimulated SACs acts upon some network(s) containing electrical synapses and presynaptic to the GCs to alter GC responses, but the identities of these network(s) remain unknown. Given that ACh is considered to act in a paracrine manner within the retina, the affected cells could be any number of amacrine or bipolar cells.

      There are a number of lines of investigation that the authors could pursue to identify-or at least, rule out-specific presynaptic networks. While too numerous to discuss individually, potential lines of investigation could differentiate nicotinic from muscarinic effects and effects on inhibitory and excitatory inputs to ganglion cells. As well, it would be important to express ChR2 in SACs alone to see if this drives changes in GC spiking.

      In all, the authors here have the opportunity to examine the effects of paracrine signaling by SACs on inner retinal network excitability and function using a nice model system, and they should take advantage of it.

    3. Reviewer #1:

      The authors use a tetracycline controlled gene expression system to compare the effectiveness of two difference promoters to express channelrhodopsin in different populations of retinal neurons with the goal of rescuing visual function in mouse models of photoreceptor degeneration. The expression patterns of two promoters were compared - the first a muscarinic AChR (referred to as M4 in the manuscript) led to expression in a subset of RGCs and a subset of amacrine cells, while the second a 5-HT receptor (5B) led to expression in a subset of RGCs only. In the M4 line, the amacrine cells that were labeled were a subset of starburst amacrine cells located in the INL and did not label the SACs displaced in the GCL. Also, it was a subset of the INL-SACs. To assess the impact of these different expression patterns on vision restoration, mice expression ChR under these two different promoters were treated with MNU to induce PR degeneration. The light responses restored by ChR were assessed with a MEA recordings cortical VEPs and behavior. The M4 promoter had stronger light evoked responses. The authors used pharmacology to assess how the M4 retinal circuit might explain the enhanced light response.

      There were several fundamental problems with the manuscript that need to be addressed. These problems range from experimental design, interpretation of findings, some mistakes in description of retina circuits. Moreover, there is no context given comparing these results to the multiple other studies on vision restoration impact on visual-guided behaviors. These problems are listed here:

      1) The choice of promoters and expression patterns need to be further explored. The motivation for a particular subtype of mAChRs and 5-HT is not given. Though M4 and 5b drives expression in roughly the same percentage of total RGCs, there is no way to know whether they drive expression in the same subtypes of RGCs. Hence differences in firing patterns are not likely to be fully explained by the fact that M4 promoter also drives expression I a subset of INL-SACs.

      2) The observation that M4 drives expression in a subset of OFF SACs was quite intriguing. Though there are ways to distinguish ON from OFF SACs, this is the first example of which I am aware that a subset of OFF-SACs is labeled. Does this mean only a subset of OFF-SACs have mAChRs? Or was this reflective of the partial express induced by Tet? It is worth the authors quantifying the percent of OFF-SACs labeled in the M4 mouse line.

      3) The observation that they are able to rescue the OKR result in MNU treated mice using the M4-promoter is impressive. Again, the authors conclude that this is due to presence of ChR in INL-SACs but it could be they also have ChR expression in direction selective ganglion cells themselves. Hence the rescue is impressive, it is difficult to interpret. Also, this important behavior is confined to a supplemental figure.

      4) The authors conclude that M4-driven expression of ChR rescues the OKR in MNU-treated mice and not rd-mice because the rd mice have a "thinner INL" and therefore may have a depletion of INL-SACs. This appears to be an easy test for the authors using immunofluorescence.

      5) The authors do some pharmacology to test whether SACs are the basis of the larger sustained response observed in M4 vs 5B . However, the assumptions/interpretations for these experiments are based on some mistakes regarding retinal circuits. SACs release GABA and acetylcholine. However, the pharmacology they do is quite limited. Namely they use TPMPA, which blocks GABA-C receptors which are found on a subset of bipolar cell terminal and by no means represent the major source of GABAergic signaling in retina which is via GABA-A and GABA-B receptors. Similarly, the authors assess impact of ACh release by using atropine, which blocks muscarinic receptors but not nicotinic ACh receptors. Finally, the authors use MFA, a blocker of gap junctions, which does have clear impact on sustained responses. However, SACs are not thought to be gap junction coupled to anything. So, it is more likely MFA is acting via RGC-RGC gap junction coupling or having an off-target effect. Much more needs to be done to have a complete understanding of the circuits that mediate the ChR-mediated light responses.

      6) 226-227 - what is the conclusion - results suggest not due entirely to gene transfer? This needs further explanation.

      7) Comparison of light induced responses in MNU vs non MNU treated rather confusing. Authors should consider revising this point.

    1. Author Response:

      We would like to thank the reviewers for taking the time review our manuscript. The comments below have been thought-provoking and will inspire several new analyses that we hope address concerns. In particular, we will carefully reappraisal the framing of the results, shifting away from a false dichotomy of “this is perception” and “this is binding”, and towards more restraint terminology that discusses the shift in balance between perception and binding. Moreover, we will expand our analysis of theta-gamma phase-amplitude coupling beyond the hippocampus and to the whole brain.

      We answered each comment in turn, first by providing a general response to the comment and then by providing an outline of the explicit action we will take to address this issue.

      Reviewer #1:

      This MEG study by Griffiths and colleagues used a sequence learning paradigm which separates information encoding and binding in time to investigate the role of two neural indexes - neocortical alpha/beta desynchronization and hippocampal theta/gamma oscillation - in human episodic memory formation. They employed a linear regression approach to examine the behavioral correlates of the two neural indexes in the two phases, respectively and demonstrated an interesting dissociation, i.e., decreased alpha/beta power only during the "sequence perception" epoch and increased hippocampal theta/gamma coupling only during the "mnemonic binding" phase. Based on the results, they propose that the two neural mechanisms separately mediate two processes - information representation and mnemonic binding. Overall, this is an interesting study using a state-of-art approach to address an important question. Meanwhile, I have several major concerns that need more analysis and clarifications.

      Major comments:

      1) The lack of theta-gamma coupling during stimulus encoding period is possibly due to the presentation of figure stimulus, which would elicit strong sensory responses that mask the hippocampus activity. How could the author exclude the possibility? In other words, the dissociated results might derive from different sensory inputs during the two phases.

      Response: The reviewer raises a good point; However, we feel this is already addressed by our use of memory-related contrasts. The masking of an effect that arises due to stimulus presentation would be consistent across all memory conditions, and therefore subtracted out in any contrast between these conditions. The analyses in our original submission use this approach to avoid such a confound. Furthermore, previous studies (e.g. Heusser et al., 2016, Nat. Neuro.) have demonstrated that hippocampal theta-gamma coupling can arise during stimulus presentation, suggesting strong sensory responses do not, generally speaking, mask measures of theta-gamma coupling.

      Action: We will explain the potential concern about masking in the main text, and also explain how we have addressed such a concern with the use of contrasts.

      2) About the hippocampal theta/gamma phase-power coupling analysis. I understand that this hypothesis derives from previous research (e.g., Heusser et al., 2018) as well as the group itself (Griffiths et al., PNAS, 2019). Meanwhile, MEG recording, especially the gradiometer, is known to be relatively insensitive to deep sources. Therefore, the authors should provide more direct evidence to support this approach. For instance, the theta/gamma analysis relies on the presence of theta-band and gamma-band peak in each subject. Although the authors have provided two representative examples (Figure 3A), it remains unknown how stable the theta-band and gamma-band peak exist in individual subject.

      Action: We will plot the data for all participants to demonstrate the stability of the theta/gamma band peaks.

      Additional response: In regards to the concerns to the MEG gradiometers being relatively insensitive to deep sources, we feel it is worth noting that a recent review (Ruzich et al., 2019, Human Brain Mapping) identified 29 studies that had reported successful hippocampal measurements when only using gradiometers, suggesting our use of gradiometers is not unprecedented nor unjustified. Furthermore, in their recommendations for optimising hippocampal recordings with MEG, the old wisdom of using magnetometers rather than gradiometers is conspicuous in its absence in the review – perhaps because while magnetometers have a greater theoretical potential to detect deep signal, they also have greater theoretical potential to pick up noise, so the signal-to-noise ratio (which, arguably, is key here) for deep sources may not differ so much between gradiometers and magnetometers.

      3) Related to the above comment, the theta-gamma coupling is a brain-wide phenomenon including both cortical and subcortical areas and not limited to just hippocampus. Although the authors have performed a control analysis to assess the behavioral correlates of the coupling in other regions, the division of brain region is too coarse and I am not convinced that this is a fair comparison, since they differ from hippocampus at least in terms of area size in the source space. The authors could consider plotting the power-phase coupling distribution in the source space and then assessing their behavioral correlates, rather than just showing results from hippocampus. This result would be important to confirm the uniqueness of the hippocampus in this binding process.

      Response: We concur that the plots currently do not demonstrate the specificity of the hippocampus, and whole brain images would better demonstrate the effect.

      Action: As suggested by the reviewer, we will plot theta-gamma coupling across the brain.

      4) About behavioral correlates. The current behavioral index confounds encoding and binding processes. Is there any way to seperate the encoding and binding performance from the overall behavioral measurements? It would be more convincing for me to find the two neural indexes at two phases predict the two behavioral indexes, respectively.

      Response: This is a really interesting idea, but one which perhaps requires a different experiment paradigm. For associative memory, we would argue that binding is an essential step for the successful encoding of a memory, so it would quite possibly be impossible to separate the two processes in the paradigm used here. That said, a different paradigm that compared associative memory to, say, item memory, may be able to answer such a question.

      Action: We will discuss this as an avenue of future research within the discussion.

      5) The author's previous works have elegantly shown the two neural indexes during fMRI and intracranial recording in episodic memory. The current work, although providing an interesting view about their possible dissociated functions, only focuses on the memory formation period (information encoding and binding). Given previous works showing an interesting relationship between encoding and retrieval (Griffith et al., PNAS, 2019), I would recommend the authors to also analyze the retrieval period and see whether the two indexes show consistent dissociated function as well.

      Response: Yes, we completely agree. We had included this in a previous draft of the manuscript, and found a consistent dissociation here, where alpha/beta power decreases accompanied retrieval (perhaps linked to the representation of retrieved information) and theta-gamma coupling did not (perhaps due to the absence of a need to bind stimuli together in order to complete the retrieval task). We had cut this section to make a more streamlined manuscript, but have no qualms adding this back in.

      Action: We will include the same central analyses, this time conducted at retrieval.

      Reviewer #2:

      In this manuscript, the authors examine the neural correlates of perception and memory in the human brain. One issue that has plagued the field of memory is whether the neural processes that underlie perception can be dissociated from those that underlie memory formation. Here the authors directly test this question by introducing a behavioral paradigm designed to dissociate perception from mnemonic binding. In brief, while recording MEG data, they present subjects with a sequence of visual stimuli. Following the sequence, the subjects are instructed to bind the three stimuli together into a cohesive memory, and then are tested on their memory for which pattern was associated with an object, and which scene. The authors investigate changes in alpha/beta power and theta/gamma phase amplitude coupling during two separate epochs - perceptual processing and mnemonic binding. Overall, this is a well written and clear manuscript, with a clear hypothesis to be tested. Using MEG data enables the authors to draw conclusions about the neurophysiological changes underlying both perception and memory, and establishing this dissociation would be an important contribution to the field. I think the conclusions are justified, but there are several issues that should be addressed to improve the strength and clarity of the work.

      The fundamental premise of the task design is that subjects view a sequence of stimuli, and then separately at a later time actively try to bind those visual stimuli together as a memory. However, it is entirely possible, and even likely, that memories are being formed and even bound together as the subjects are still viewing the sequences of objects. How would the authors account for this possibility? One possible way would be if there were a control task where subjects were just asked to view items and not remember them.

      Response: Indeed, it is impossible to be certain that no binding is occurring during sequence presentation, and the terminology used in the original submission is ill-fitting as a result. However, we would argue that there is a shift in the ratio between perception and binding across the encoding task, with greater perceptual processes arising during the presentation of the sequence relative to the “associate” cue (as this is when the items are presented), and greater associative processes arising during the “associate” cue (as this is when all items are available for binding). To suggest that the two processes can be completely separated would be erroneous, but we feel it is also difficult to argue that there is no shift in balance between the two processes over the course of the encoding task. Importantly, linking a shift in balance between the two processes (binding/perception) with neurophysiological correlates (alpha-beta/theta-gamma) is sufficient for our main conclusion.

      Action: We will carefully rephrase the manuscript in such a way that it no longer implies that there is a perfect separation of perception and binding, but rather a shift in the balance between the two processes.

      Note on a “control” task: In our view, the control task proposed by the reviewer is captured by the “forgotten” condition – participants view the items, but do not subsequently remember them.

      Another possibility would be to examine the trials that the participants failed to remember correctly. Presumably, one would still see the same decreases in alpha power. Yet it seems from the data, and the correlations, that during those trials that were not remembered properly, alpha power changed very little. Of course, it is unclear in these trials if failed memory is due to failed perception, but one concern would be that this would imply that decreases in alpha power are relevant for memory too. It would be helpful to see how changes in alpha power break down as a function of the number of actual items remembered. It would also be helpful to know how strong these correlations actually are.

      Note: We are a little unsure of what the reviewer is suggesting here, as we feel that most of these analyses were included in the main text. The response below re-cap of the results and how they link to our interpretation of the reviewer’s comment, but if we have misunderstood the point, we would be willing to re-address it in a subsequent revision.

      Response: In the original submission, we had focused solely on the memory-related change in alpha/beta power (that is: the contrast “2 items recalled” > “1 item recalled” > “no items recalled”). Therefore, the inferential statistics allow us to conclude that a relative decrease in alpha/beta power correlates with an increase in number of items recalled. What the analyses in the original submission do not show is that alpha/beta power changes from baseline (that is, are all items perceived [i.e. as indexed by a power decrease], or just the remembered items?). This is something we’d be happy to address in the revision

      Action: We will probe the change in alpha/beta power following stimulus presentation, and ask whether alpha/beta power decreases are present for all memory conditions, or only when the items are subsequently remembered.

      A related issue is with respect to hippocampal PAC. The authors investigate this during the mnemonic binding period. Yet they also raise the possibility in discussion that this could also be happening during perception, which goes back to the point above. Did they analyze these data during perception, and are there changes with perception that correlate with memory? This would suggest that binding is actually occurring during this sequence of visual stimuli.

      Response: We did indeed analyse the data during perception in the original submission (see lines 127-128; figure 3d) and found no evidence to suggest that memory-related PAC varied during perception. In an additional analysis, we also examined with PAC varied as the sequence progressed (that is, does PAC change from the first item to the second, and from the second to the third?), but found no evidence to suggest it does. Together, these results would suggest that putative binding mechanisms are not dominating the sequence perception phase of encoding.

      Action: We will supplement the original analyses of PAC during sequence perception (collapsed over the three epochs) with additional analyses investigating PAC fluctuations over the course of the presentation of the sequence.

      The authors perform a whole brain analysis examining the correlation between alpha power and memory to identify cluster corrected regions of significant. However, the PAC analysis focuses only on the hippocampus, raising the question of whether these results can account for the possible comparisons one could make in the whole brain. They do look at four other brain regions for PAC, which it would be helpful to account for. In addition, are there other measures of mnemonic binding that are significant? For example, theta power, or even gamma power?

      Response: We had focused our PAC analyses on the hippocampus because of our a priori hypotheses but appreciate that only showing data from the hippocampus would obscure the whole picture. Our analyses did not uncover convincing evidence for changes in theta or gamma power, but we will report these in the main text.

      Action: We will present the PAC results across the whole brain. We will add analyses into theta and gamma power.

      The authors note in the discussion that the magnitude of hippocampal gamma synchrony has been shown to be related to the decreases in alpha power. Is this also true in their data?

      Action: We will include an additional analysis probing the correlation between hippocampus theta/gamma activity and neocortical alpha/beta power

      Reviewer #3:

      The authors report results of an MEG analysis deploying a cognitive paradigm in which participants engage in a source memory task characterized by the appearance of three images in succession and are then tested via a cue (the first of the three images) followed by a choice of responses for a two dimensional pattern and then a choice (out of three images) of a photographic scene.

      The principal finding is that (via MEG sensor level data) there is a widespread 8-15 Hz power decrease that is correlated with the number of recalled items (from 0 to 2) on a given trial. In the hippocampus (via MEG source reconstruction), the magnitude of phase amplitude coupling observed as participants are told to associate the items is correlated with memory performance. The 8-15 Hz power decrease/memory correlation (as estimated by beta coefficients in a model described in Figure 1) is larger (across individuals) during moments when subjects are viewing the stimulus items as opposed to during the "associate" period. The novelty in the result is related to the experimental task that attempts to dissociate memory-related effects related to perception from those related to binding which putatively occurs when subjects are given the "associate" instruction.

      My main conceptual concern is related to the design of the experimental task. I am not sure that the perception/binding framing is appropriate, since there is no reason to think that subjects are not associating/binding items during the periods when the items are being shown on the screen. I suppose this may partly explain the lack of a significant difference in PAC/memory beta coefficients observed in the hippocampus when contrasting these two epochs (Figure 4). But the corollary is that the alpha power-related beta coefficients are observed while binding is likely also occurring within the paradigm (esp since each image is shown for 1.5 seconds it would seem). Is the alpha power effect seen in the hippocampus? The plots in 3a suggest there is an oscillation present in the relevant frequency range, and the time course of alpha power differences seen in Figure 2 suggests that they occur relatively late after onset of the images, which may fit better with some contribution for this pattern to the forming of associations rather than perception.

      Response to comments on task: We agree that the task does not unequivocally separate the two cognitive tasks, and any statement to suggest that the does is erroneous. That said, we would argue that, on a balance of probability, there is likely to be more information processing going on during sequence perception relative to the associate cue. This is because the participant is still being exposed to rich stimuli during sequence presentation, while only being presented with a simple cue during the association phase. Similarly, there is likely to be more binding during the associate cue than during sequence presentation. This is because participants have greater cognitive resources available for binding during the associate cue relative to during sequence perception. Now, neither of these reasons are sufficient to argue that “association” does not occur during sequence perception. However, we feel that these reasons are sufficient to suggest we expect to see a shift in the balance of “association” between the sequence perception and the binding window, where “association” is more easily executed during the binding window. Indeed, we feel it would be difficult to argue that there is no shift in the balance between these processes at any point. Importantly, linking such a shift in balance between the two processes (binding/perception) with neurophysiological correlates (alpha-beta/theta-gamma) is sufficient for our main conclusion. As such, we feel a careful rephrasing can address these concerns, where portions of the text referring to a separation of perception and binding are rephrased as a “shift in the balance in perception and binding” – the latter phrasing allows for the possibility that there is some small mixing of the two tasks.

      Action to comments on task: We will carefully rephrase the manuscript such that the text does not suggest that perception and binding are perfectly separated, but rather that the balance between the two processes shift during the encoding task.

      Response to comments on hippocampal alpha: We agree that there appears to be an alpha peak in the hippocampus, but as this plot is across all trials, it remains unclear whether this alpha oscillation is linked to memory. This is, of course, something we can investigate in revisions.

      Action: We will investigate whether hippocampal alpha power demonstrates a memory-related effect during perception and/or binding.

      I understand that the paradigm was constructed in an attempt to temporally dissociate memory effects attributable to perception versus those attributable to binding. But given the temporal resolution available using EEG, I would imagine that the authors could differentiate an earlier perception-related effect from a later PAC binding effect in the time series if the associated images were presented in conjunction. Is it correct to frame the alpha results as related to "perception?" The beta coefficients used for analysis reflect a "memory related effect observed when visual stimuli are present on the screen," but not necessarily improved memory predicated on more accurate perception to my interpretation. I would think that a perception/binding distinction requires operationalizing perception as activity that doesn't vary with later associative memory success, and binding as activity that does. The notion of perception used by the authors here seems slightly different. The authors can perhaps comment on this concern.

      Response: This is a very interesting point. A hallmark of visual perception is a reduction in alpha/beta power (e.g. Pfurtscheller et al., 1994, Int. J. Psychophysiology), regardless of whether it is remembered or not. As such, we would expect alpha/beta power to decrease following stimulus onset even if a memory is not formed. This could be directly tested by examining the stimulus-evoked power decrease in all conditions, with the expectation that alpha/beta power drops from baseline in all conditions.

      Action: We will contrast of pre-stimulus and post-stimulus power investigate whether alpha/beta power decreases accompany visual perception regardless of successful memory encoding.

      The authors report PAC results for other regions on page 6, but claiming that PAC is a hippocampal-specific effect would require showing that the PAC-related beta coefficients are significantly greater than the other regions, rather than simply the absence of a significant effect in these regions. The authors should also clarify if they combined locally measured PAC over several ROIs into an average for these other regions? It seems unlikely to detect PAC if a single theta/gamma time series were extracted over such a large area of cortex.

      Response: We agree with the principle that the PAC results should be probed further, though would argue against the use of inter-region contrasts here as they will not provide evidence that PAC is specific to a single region. Take, for example, an effect where there is a significant memory-related increase in PAC in region A, but there is a significantly larger memory-related increase in region B. In a direct contrast, PAC in B will be significantly greater than A, but clearly PAC is not specific to B. Therefore, an inter-region contrast is not a means to irrefutably demonstrate regional specificity. While there has been a call for direct comparisons between experimental contrasts (see Nieuwenhuis et al., 2011), this is specifically for cases where individuals wish to make the claim that “A is significantly greater than B”, which was a claim that we never made here. Rather, we asked whether there is a memory-related difference in PAC within the hippocampus, and then followed this up by confirming that this effect was not a “bleed-in” from PAC in another neighbouring region (i.e. the cortical ROI analyses; where the absence of a significant difference would suggest that memory-related hippocampal PAC is not attributable to memory-related PAC in another region). We will, however, better visualise the PAC results to further rule out the risk of a “bleed-in” effect (see response to Reviewer 1, point 3).

      Action: We will visualise PAC across the cortex.

      Response to ROI-based contrasts: We had originally collapsed PAC measures over the ROI for the sake of simplicity, but the reviewer makes a good point for a more focal analysis.

      Action for ROI-based contrasts: We will run a voxel-wise analysis of PAC to compliment the ROI-based approach

      The interaction effect reported at the end of the results (ANOVA model) is interpreted such that the cortical alpha effect is stronger when the visual items are presented, while the hippocampal PAC effect is stronger when no items appear on the screen, but these recordings are made in different regions (hippocampus versus the entire cortex). If my understanding is correct, a result in line with the model the authors suggest (cortical alpha power decrease/hippocampal PAC) would show a region (hipp v cortex) x task (images on screen vs "associate" command) x metric (PAC vs alpha) interaction. Can the authors clarify if the cortical data entered into this model includes only those regions that showed a significant effect initially, or just all the sensors? The former would seem to introduce bias.

      Response: We had originally collapsed metric and region into a single factor (hippocampal PAC vs. cortical alpha), but the reviewer makes a very good point here – a better way to probe this interaction via a 3-factor ANOVA (using “region”, “epoch” and “metric”).

      Action: We will revise the ANOVA in such a way that we can probe a three-way interaction (location vs. time vs. measure).

      Similarly, the different visual classes are always presented in the same order, which may give rise to the strong disparity in recall fraction between the pattern and scene images. I understand the linear model incorporates predictor variables for scene/pattern recall, but given that scene recall is driving a significant amount of the overall recall number observed as the main variable of interest, I would wonder if the alpha/beta power effects are related to the relative complexity of the scene images as compared to the patterns. Given the analysis schematic the authors report, I assume the authors have analyzed whether the same effects occur when contrasting scene versus no recollection and pattern vs no recollection. If the same effects are observed regardless of type of image (when compared with no recollection) this may help address this concern.

      Action: We will include supplementary analyses that ask whether alpha/beta power decreases vary as a function of stimulus type.

      Additional note: the scene and pattern stimuli were not always presented in the same order, but rather counterbalanced across blocks to avoid order effects.

      My second conceptual question is related to MEG data. It appears to me that the authors use MEG sensor-level data for the alpha-related effect in the cortex (Figure 2), but MEG beamformer reconstructed data (localized to the hippocampus) for the PAC effect. Is there a reason the authors did not use MEG data localized to specific cortical regions rather than sensor data? This may reflect confusion on my part, but I don't understand why they would use qualitatively different types of data for these two aspects of the analysis that are then combined (in the ANOVA, for example).

      Response to questions on source-reconstructed alpha power: We had not included source-reconstructed analysis of the alpha power effect here because, in an earlier draft, extensive analysis (e.g. the reporting of both sensor-level and source-reconstructed alpha power effects) drew criticism from reviewers for a lack of conciseness. That said, as such analyses have already been conducted, it is relatively easy to add these back in.

      Action: We will include source-reconstructed alpha-band effects.

      The authors should also engage with concerns regarding the validity of localizing MEG signals (especially for an analysis such as PAC) to deep mesial temporal structures such as the hippocampus. I understand that MEG systems with greater than 300 sensors are more reliable for this purpose, but I think a number of readers would still have doubts about MTL localization of signal. Also, my understanding is that such deep source localization requires around 100 trials per class, which I think fits with what the subjects completed, but the authors may include references related to this issue.

      Response: In recent years, there has been a growing list of studies that have reported successful localisation of hippocampal signals using MEG (for review of 37 of these studies, see Ruzich et al., 2019, Human Brain Mapping). Generally speaking, our experimental paradigm and analysis pipeline show large overlap with these previous successes (e.g. use of beamformers, gradiometers, co-registered MRI-to-MEG head position), meaning our results are not completely out of line with what could be expected. Nonetheless, it would be beneficial to explicit state this in the manuscript.

      Action: We will explicitly address the historic difficulties of localising hippocampal MEG signals, and highlight how our approach fits with a growing consensus on how to successfully localise such signals (e.g. Ruzich et al., 2019, Human Brain Mapping).

      I think the signal processing steps are overall quite reasonable. I would ask the authors to clarify if they limited their analysis of cortical alpha/beta oscillations to those in which a peak exceeded the 1/f background, as they report for the PAC analysis on page 5. Also, it would be helpful to show that the magnitude of the MI values in the hippocampus exceed those observed by chance (using a shuffle procedure) in addition to showing that there is a memory-related association reflected in the beta coefficients.

      Response: We had not limited the analysis to peak alpha/beta oscillations in the original submission, but have no qualms about doing so – indeed, such an analytical approach may better substantiate the claim that we are probing oscillatory activity as opposed to non-oscillatory fluctuations.

      Action: We will restrict alpha/beta power analysis to the peak oscillation. We will add supplementary analysis contrasting measures of hippocampal PAC to a shuffled baseline.

    2. Reviewer #3:

      The authors report results of an MEG analysis deploying a cognitive paradigm in which participants engage in a source memory task characterized by the appearance of three images in succession and are then tested via a cue (the first of the three images) followed by a choice of responses for a two dimensional pattern and then a choice (out of three images) of a photographic scene.

      The principal finding is that (via MEG sensor level data) there is a widespread 8-15 Hz power decrease that is correlated with the number of recalled items (from 0 to 2) on a given trial. In the hippocampus (via MEG source reconstruction), the magnitude of phase amplitude coupling observed as participants are told to associate the items is correlated with memory performance. The 8-15 Hz power decrease/memory correlation (as estimated by beta coefficients in a model described in Figure 1) is larger (across individuals) during moments when subjects are viewing the stimulus items as opposed to during the "associate" period. The novelty in the result is related to the experimental task that attempts to dissociate memory-related effects related to perception from those related to binding which putatively occurs when subjects are given the "associate" instruction.

      My main conceptual concern is related to the design of the experimental task. I am not sure that the perception/binding framing is appropriate, since there is no reason to think that subjects are not associating/binding items during the periods when the items are being shown on the screen. I suppose this may partly explain the lack of a significant difference in PAC/memory beta coefficients observed in the hippocampus when contrasting these two epochs (Figure 4). But the corollary is that the alpha power-related beta coefficients are observed while binding is likely also occurring within the paradigm (esp since each image is shown for 1.5 seconds it would seem). Is the alpha power effect seen in the hippocampus? The plots in 3a suggest there is an oscillation present in the relevant frequency range, and the time course of alpha power differences seen in Figure 2 suggests that they occur relatively late after onset of the images, which may fit better with some contribution for this pattern to the forming of associations rather than perception.

      I understand that the paradigm was constructed in an attempt to temporally dissociate memory effects attributable to perception versus those attributable to binding. But given the temporal resolution available using EEG, I would imagine that the authors could differentiate an earlier perception-related effect from a later PAC binding effect in the time series if the associated images were presented in conjunction. Is it correct to frame the alpha results as related to "perception?" The beta coefficients used for analysis reflect a "memory related effect observed when visual stimuli are present on the screen," but not necessarily improved memory predicated on more accurate perception to my interpretation. I would think that a perception/binding distinction requires operationalizing perception as activity that doesn't vary with later associative memory success, and binding as activity that does. The notion of perception used by the authors here seems slightly different. The authors can perhaps comment on this concern.

      The authors report PAC results for other regions on page 6, but claiming that PAC is a hippocampal-specific effect would require showing that the PAC-related beta coefficients are significantly greater than the other regions, rather than simply the absence of a significant effect in these regions. The authors should also clarify if they combined locally measured PAC over several ROIs into an average for these other regions? It seems unlikely to detect PAC if a single theta/gamma time series were extracted over such a large area of cortex.

      The interaction effect reported at the end of the results (ANOVA model) is interpreted such that the cortical alpha effect is stronger when the visual items are presented, while the hippocampal PAC effect is stronger when no items appear on the screen, but these recordings are made in different regions (hippocampus versus the entire cortex). If my understanding is correct, a result in line with the model the authors suggest (cortical alpha power decrease/hippocampal PAC) would show a region (hipp v cortex) x task (images on screen vs "associate" command) x metric (PAC vs alpha) interaction. Can the authors clarify if the cortical data entered into this model includes only those regions that showed a significant effect initially, or just all the sensors? The former would seem to introduce bias.

      Similarly, the different visual classes are always presented in the same order, which may give rise to the strong disparity in recall fraction between the pattern and scene images. I understand the linear model incorporates predictor variables for scene/pattern recall, but given that scene recall is driving a significant amount of the overall recall number observed as the main variable of interest, I would wonder if the alpha/beta power effects are related to the relative complexity of the scene images as compared to the patterns. Given the analysis schematic the authors report, I assume the authors have analyzed whether the same effects occur when contrasting scene versus no recollection and pattern vs no recollection. If the same effects are observed regardless of type of image (when compared with no recollection) this may help address this concern.

      My second conceptual question is related to MEG data. It appears to me that the authors use MEG sensor-level data for the alpha-related effect in the cortex (Figure 2), but MEG beamformer reconstructed data (localized to the hippocampus) for the PAC effect. Is there a reason the authors did not use MEG data localized to specific cortical regions rather than sensor data? This may reflect confusion on my part, but I don't understand why they would use qualitatively different types of data for these two aspects of the analysis that are then combined (in the ANOVA, for example).

      The authors should also engage with concerns regarding the validity of localizing MEG signals (especially for an analysis such as PAC) to deep mesial temporal structures such as the hippocampus. I understand that MEG systems with greater than 300 sensors are more reliable for this purpose, but I think a number of readers would still have doubts about MTL localization of signal. Also, my understanding is that such deep source localization requires around 100 trials per class, which I think fits with what the subjects completed, but the authors may include references related to this issue.

      I think the signal processing steps are overall quite reasonable. I would ask the authors to clarify if they limited their analysis of cortical alpha/beta oscillations to those in which a peak exceeded the 1/f background, as they report for the PAC analysis on page 5. Also, it would be helpful to show that the magnitude of the MI values in the hippocampus exceed those observed by chance (using a shuffle procedure) in addition to showing that there is a memory-related association reflected in the beta coefficients.

    3. Reviewer #2:

      In this manuscript, the authors examine the neural correlates of perception and memory in the human brain. One issue that has plagued the field of memory is whether the neural processes that underlie perception can be dissociated from those that underlie memory formation. Here the authors directly test this question by introducing a behavioral paradigm designed to dissociate perception from mnemonic binding. In brief, while recording MEG data, they present subjects with a sequence of visual stimuli. Following the sequence, the subjects are instructed to bind the three stimuli together into a cohesive memory, and then are tested on their memory for which pattern was associated with an object, and which scene. The authors investigate changes in alpha/beta power and theta/gamma phase amplitude coupling during two separate epochs - perceptual processing and mnemonic binding. Overall, this is a well written and clear manuscript, with a clear hypothesis to be tested. Using MEG data enables the authors to draw conclusions about the neurophysiological changes underlying both perception and memory, and establishing this dissociation would be an important contribution to the field. I think the conclusions are justified, but there are several issues that should be addressed to improve the strength and clarity of the work.

      The fundamental premise of the task design is that subjects view a sequence of stimuli, and then separately at a later time actively try to bind those visual stimuli together as a memory. However, it is entirely possible, and even likely, that memories are being formed and even bound together as the subjects are still viewing the sequences of objects. How would the authors account for this possibility? One possible way would be if there were a control task where subjects were just asked to view items and not remember them.

      Another possibility would be to examine the trials that the participants failed to remember correctly. Presumably, one would still see the same decreases in alpha power. Yet it seems from the data, and the correlations, that during those trials that were not remembered properly, alpha power changed very little. Of course, it is unclear in these trials if failed memory is due to failed perception, but one concern would be that this would imply that decreases in alpha power are relevant for memory too. It would be helpful to see how changes in alpha power break down as a function of the number of actual items remembered. It would also be helpful to know how strong these correlations actually are.

      A related issue is with respect to hippocampal PAC. The authors investigate this during the mnemonic binding period. Yet they also raise the possibility in discussion that this could also be happening during perception, which goes back to the point above. Did they analyze these data during perception, and are there changes with perception that correlate with memory? This would suggest that binding is actually occurring during this sequence of visual stimuli.

      The authors perform a whole brain analysis examining the correlation between alpha power and memory to identify cluster corrected regions of significant. However, the PAC analysis focuses only on the hippocampus, raising the question of whether these results can account for the possible comparisons one could make in the whole brain. They do look at four other brain regions for PAC, which it would be helpful to account for. In addition, are there other measures of mnemonic binding that are significant? For example, theta power, or even gamma power?

      The authors note in the discussion that the magnitude of hippocampal gamma synchrony has been shown to be related to the decreases in alpha power. Is this also true in their data?

    4. Reviewer #1:

      This MEG study by Griffiths and colleagues used a sequence learning paradigm which separates information encoding and binding in time to investigate the role of two neural indexes - neocortical alpha/beta desynchronization and hippocampal theta/gamma oscillation - in human episodic memory formation. They employed a linear regression approach to examine the behavioral correlates of the two neural indexes in the two phases, respectively and demonstrated an interesting dissociation, i.e., decreased alpha/beta power only during the "sequence perception" epoch and increased hippocampal theta/gamma coupling only during the "mnemonic binding" phase. Based on the results, they propose that the two neural mechanisms separately mediate two processes - information representation and mnemonic binding. Overall, this is an interesting study using a state-of-art approach to address an important question. Meanwhile, I have several major concerns that need more analysis and clarifications.

      Major comments:

      1) The lack of theta-gamma coupling during the stimulus encoding period is possibly due to the presentation of figure stimulus, which would elicit strong sensory responses that mask the hippocampus activity. How could the author exclude the possibility? In other words, the dissociated results might derive from different sensory inputs during the two phases.

      2) About the hippocampal theta/gamma phase-power coupling analysis. I understand that this hypothesis derives from previous research (e.g., Heusser et al., 2018) as well as the group itself (Griffiths et al., PNAS, 2019). Meanwhile, MEG recording, especially the gradiometer, is known to be relatively insensitive to deep sources. Therefore, the authors should provide more direct evidence to support this approach. For instance, the theta/gamma analysis relies on the presence of theta-band and gamma-band peak in each subject. Although the authors have provided two representative examples (Figure 3A), it remains unknown how stable the theta-band and gamma-band peak exist in individual subject.

      3) Related to the above comment, the theta-gamma coupling is a brain-wide phenomenon including both cortical and subcortical areas and not limited to just hippocampus. Although the authors have performed a control analysis to assess the behavioral correlates of the coupling in other regions, the division of brain region is too coarse and I am not convinced that this is a fair comparison, since they differ from hippocampus at least in terms of area size in the source space. The authors could consider plotting the power-phase coupling distribution in the source space and then assessing their behavioral correlates, rather than just showing results from hippocampus. This result would be important to confirm the uniqueness of the hippocampus in this binding process.

      4) About behavioral correlates. The current behavioral index confounds encoding and binding processes. Is there any way to seperate the encoding and binding performance from the overall behavioral measurements? It would be more convincing for me to find the two neural indexes at two phases predict the two behavioral indexes, respectively.

      5) The author's previous works have elegantly shown the two neural indexes during fMRI and intracranial recording in episodic memory. The current work, although providing an interesting view about their possible dissociated functions, only focuses on the memory formation period (information encoding and binding). Given previous works showing an interesting relationship between encoding and retrieval (Griffith et al., PNAS, 2019), I would recommend the authors to also analyze the retrieval period and see whether the two indexes show consistent dissociated function as well.

    5. Summary: All reviewers agree that the study addressed an important question in episodic memory. Yet, the reviewers are not convinced that the experimental design could truly dissociate the perception and binding processes, an assumption the whole work is based on. Moreover, the PAC analysis in the hippocampus using MEG recordings and its comparison to other brain regions need more analyses and confirmation.

    1. Reviewer #3:

      This manuscript is a detailed analysis of the molecular mechanism for ISW2 recruitment in yeast and delineates not only the binding interface between ISW2 and the transcription factor Ume6, but also finds similar interactions between ISW2 and Swi6. The authors take a systematic and rigorous approach in finding that a 27 amino acid region of Ume6 and the WAC domain of Itc1, accessory subunit in ISW2, are responsible for recruiting ISW2 to Ume6 binding sites. The strength of this paper is that they focus on examining these interactions in vivo and using MNase-seq to show changes in nucleosome positioning upon mutation of Itc1, Ume6 and Swi6. The data is well supported and the conclusions are compelling. In addition, they use the Spytag approach to show these regions alone are capable of recruiting Isw2 to genomic target sites. They also show that amino acids 1-73 of Itc1 alone are sufficient for binding to the correct genomics sites and is compelling evidence of their specificity. The authors, by comparing the sequence composition of the WAC domain in ISW2 orthologs from flies to humans, are able to explain a contradiction that has been in this field for a long time about the apparent different role of yeast ISW2 and its Drosophila homolog ACF/ISWI. The Drosophila ISWI complex appears to have a more global role in chromatin organization; whereas yeast ISW2 is more specialized or targeted. The WAC domain in ISWI is defective for recruitment by such transcription factors like Ume6 and Swi6, unlike that observed for ISW2. The other interesting finding or correlation that is derived from their findings is that the recruitment of ISW2 by Ume6 and Swi6 may not only work to recruit ISW2 but may also regulate ISW2 activity as the same region of Itc1 shown to bind to these transcription factors is also shown to regulate the activating function of the H4 tail on Isw2. The paper is well written, clear and nicely organized. I did have one question for the authors, as it seems that this type of recruitment may not be universal as there are only a subset of Ume6 sites that behave as expected in their mutational analysis. Do the authors have any idea why that is the case and what makes this subset of sites behave differently?

    2. Reviewer #2:

      Chromatin remodelers use the energy derived from ATP hydrolysis to reposition or evict nucleosomes, thus shaping the chromatin landscape of the cell. In this study, the McKnight lab use creative genetic and genomic approaches to understand how the apparently nonspecific biochemical activity of one such chromatin remodeler, Isw2, is targeted to specific nucleosomes in the budding yeast genome. The use of an isw1/chd1 mutant is a nice approach to remove the effects of spacing factors, and the SpyTag/SpyCatcher approach is a novel idea for artificial recruitment of factors. The bottom line of the study is that small, conserved epitopes in transcription factors act as recruiting elements for Isw2, allowing precise targeting of a nonspecific biochemical activity to specific genomic loci. From a larger perspective, the results lend support to an interacting barrier model of nucleosome positioning, wherein positioning of specific nucleosomes defines the borders of nucleosomal arrays. The data appear to be of high quality and soundly interpreted, and I believe that the results will be of great interest to those interested in chromatin and transcription. There are many questions raised by the results that I believe will drive further investigation into specificity in chromatin remodeling. My one major criticism (not that major in the scheme of things) is that the authors analyze the interesting subsets of their sites, as detailed below. One example is the analysis of the Isw2/Itc2 co-bound sites to the exclusion of the Isw2-alone sites. I think some exploration of these sites would be warranted, as discussed below.

      1) In Fig. S1C, there is nice correspondence between strong Isw2 K215R binding and Isw2-dependent nucleosome remodeling. However, at PICs where there is no apparent Isw2 remodeling, there does seem to be some Isw2 K215R ChIP-seq signal, albeit at a lower level. Does this potentially represent capture of transient sampling-type interactions, or something else?

      2) In Fig. S1D, Ume6 ChIP (WT and DBD alone) is shown at 202 intergenic Ume6 motifs. It is stated that the rows are linked with Fig. 1B - it would be nice to see the nucleosome data next to the ChIP data in this panel, as it appears that Ume6 is bound to at some level to the majority of these 202 sites, while Isw2 seems only to be active at the 58 sites of cluster 1. Germane to this point, I of course understand why the authors focused on the cluster 1 sites, but it would be nice to have some speculation on why Isw2 only seems to function at a fraction of Ume6-bound loci. Also, the lengths of the cluster-denoting bars appear to be off here relative to Fig. 1B.

      3) In Fig. 5C, it appears that only a subset of Isw2 sites are bound by Itc1 as well. Again, as with the selection of the 58 Ume6 sites, I understand why the Isw2/Itc1 co-bound sites are selected for further analysis, but the Isw2 sites without Itc1 could be discussed as well. Are these sites non-functional? How does Itc1 ChIP-seq data compare to the Isw2 remodeling activity shown in Fig. 1A? How does it compare to Ume6 binding? Does it specify the Isw2-remodeled nucleosomes?

      4) Did the authors perform western blots to ensure that their various truncation constructs were stable? This is important for interpretation of the results vs deletions.

      5) To summarize the above points, a major thing missing from the discussion is why only subsets of TF binding sites recruit Isw2. For instance, as mentioned above, 58 Ume6 sites seem to specific Isw2 remodeling - what is special about those sites versus the other ~150 sites that appear to be bound by Ume6? It's mentioned briefly in the discussion that only three Swi6 sites were identified as Isw2-recruiting and that this may be tuned by cellular context, but this is quite vague and superficial. More speculation on what differentiates these sites from the TF-bound but non-Isw2 recruiting sites could be included.

    3. Reviewer #1 (Jerry Workman):

      This is a paradigm shifting study which demonstrates targeting of the Isw2 complex by a sequence-specific DNA binding protein Ume6. Previously the Isw2 complex was thought to be a promiscuous nucleosome sliding ATPase that would globally space nucleosomes like Chd1 or Isw1. However, the current study demonstrates the Isw2 primarily targets a single nucleosome adjacent to Ume6 binding sites.

  5. Nov 2020
    1. Reviewer #2:

      In this manuscript, the authors combine genetic/hormonal manipulation of expansin expression, localization studies, and mechanical measurements of root cell walls to study how this family of cell wall-loosening proteins influences root growth and development. This is an exciting topic, since expansins have a long history of in vitro characterization, but their characterization in living plants has lagged behind. The localization patterns of EXPA1, EXPA10, EXPA14, and EXPA15 are depicted using mCherry fusion proteins, and are shown to be distinct from one another. Despite the wide range of interesting approaches described here, I have some important concerns about the work as it stands, in terms of providing new insights into how expansins actually influence root growth.

      Major Comments:

      One major concern is the lack of appropriate controls, statistical appropriateness, and reporting (e.g., defining "n" clearly in all cases) in this work. All comparisons should include wild type and no-treatment controls; for example, in Figure 8, no AFM images are shown for wild type or EXPA1 overexpression cells.

      Figure 1-S1: there is no change in pEXPA1::nls:3xGFP - why is there this discrepancy with the EXPA1 qPCR result? This is not explained.

      Figure 3-S1: The finding of a lack of colocalization between EXPA10 and CFW staining is not convincing, due to a lack of a control showing positive colocalization and a lack of quantification of the degree of colocalization (e.g., Pearson correlation coefficient between red/blue pixels). The authors use these data as a lynchpin for part of their discussion, but this lack of colocalization could simply be an artifact of chromatic aberration, etc.

      L256: This statement is not supported by the statistical comparisons shown in Figure 5B-C. In Figure 5B, why does the WT show higher MOC with Dex than without? In Figure 5B-C, you do not compare 8-4 + Dex with WT + Dex statistically, which is the salient comparison, and instead compare each genotype with vs. without Dex. In addition, the fact that the pRPS5A>GR>EXPA1:mCherry line does not show a significant difference in BLS signal with Dex addition (Figure 5-S1) argues against a clearly established relationship between expansin expression and BLS signal. The data in Figure 5D-E are more informative, but there is no wild type control for these experiments.

      In Figure 8, the AFM color code scales do not seem to match the graphs, in that the color scales range from 0-2 MPa, whereas the graph Y axes range from 0 to 3 e6 MPa (unless that is supposed to be 0-3 MPa, or 0 to 3 e6 Pa!). No-Dex controls are missing from 8B.

      In the Discussion, the authors use the words "unclear" and "elusive", and "remains to be identified" to sum up their work, and this to me is an indication of the state of this work overall. Although some of the data are intriguing, they are neither conclusive nor explanatory in revealing the mechanisms of expansin-mediated growth control in roots.

      Finally, the manuscript needs to be revised for proper English grammar, syntax, and style.

    2. Reviewer #1:

      Expansins are mysterious cell wall proteins because they lack known hydrolytic activity but are somehow correlated with acid-induced cell wall loosening/extension and cell expansion. Here the authors catalog the tissue expression of several native promoter driven expansin-FP fusions (EXPA1, 10, 14, 15) and find partially overlapping expression patterns and evidence that some expansins are restricted to particular cell wall regions (e.g. tricellular junctions (Figs 1-4). Using Brillouin light scattering (BLS) microscopy they find that, contrary to several previous reports for EXPA1, EXPA1 overexpression induces tissue stiffening that is relatively independent of extracellular pH (Fig 5, 7). They corroborate these data using AFM of different cell walls in a similar tissue (Fig 8). Thus, EXPA1 overexpression results in shorter roots (Fig 9). While BLS seems like an interesting technique for studying cell walls, essential controls are missing making it difficult to interpret these results.

      Major Comments:

      1) Expansins have traditionally been identified with promoting cell wall extension by loosening the cell wall under acidic conditions. Recent reports have corroborated this: Ramakrishna et al., 2019 showed decreased lateral root initiation in mutants, implying EXPA1 plays a role in loosening, while Pacifici et al 2018 showed decreased cell elongation in expa1 mutants and increased cell elongation in EXPA overexpression lines, but only when grown on low pH (pH 4) media. All of these results are consistent with EXPAs playing a role in cell wall loosening. By contrast, the authors here find that EXPA1 overexpression causes cell wall stiffening and reduced root growth, that low pH (pH 4) media decreases this stiffening (Fig 5). Their discussion of these discrepancies is insufficient. For example, how do their levels of EXPA1 overexpression compare to Pacifici et al., 2018? How can they reconcile the results in these previous papers with their study?

      2) Since the authors only really see changes in BLS of their EXPA1 line with over 10,000x overexpression (their inducible EXPA1-mCherry line with "only" >100x expression relative to wild type does not cause significant changes to cell wall "stiffness"), it is unclear how sensitive this technique is to cell wall changes. Controls are required to interpret these BLS experiments. For example, a known mutant or overexpression line with increased cell wall stiffness and another with decreased cell wall stiffness.

      3) It will also be important to document whether the authors can replicate the lack of changes to cell wall stiffness in the expa1 mutant using AFM.

      4) It would be helpful to see a detailed correlation analysis between the new technique (BLS) and an established cell wall analysis technique (AFM) across multiple data points (i.e. positive and negative controls for cell wall stiffness changes).

      5) These AFM values are also presented on a scale that is almost 7x higher than previous data from the authors (e.g. Peaucelle 2014 JoVE). Please discuss.

      6) The authors are comparing BLS data from the inner longitudinal cell wall versus AFM data from the outer longitudinal cell wall, which have very different properties. Please discuss.

      7) EXPA1 gene overexpression is determined 7 days after Dex induction, but BLS experiments are conducted on plants that have been induced for a much shorter time (e.g. 3h). What is the expression of the EXPA1 gene over this timeframe of induction? Ideally, the authors would also use an EXPA1 antibody to monitor protein levels, since this is what is actually relevant.

      8) It is difficult to see from the BLS shift maps provided (e.g. Fig 5A) where in the root the authors are imaging. Given that this is a relatively new technique to the cell wall field, it would be helpful to provide additional images to provide context to readers.

      9) "Data not shown" (e.g. trans-zeatin treatments, line 149; EXPA1 protein levels, line 360) must be included as supplemental figures or the claims removed from the manuscript.

    3. Summary: The reviewers felt that this is important work because in vivo characterization of expansins has lagged far behind their in vitro characterization. However, both reviewers also made important points about additional controls and statistical comparisons that are required to fully interpret and appreciate the results that are presented here. It seems that the role of expansins in the plant cell wall may be complex and nuanced. However, it is clear from the author's discussion of their results that significant further experimentation is required to bring new insight to the function of expansins in mediating plant root growth.

    1. Reviewer #3:

      The paper titled: "Auditory detection is modulated by theta phase of silent lip movements" the authors investigate visual entrainment to lip movement using behavioral (exp1) and non-invasive physiology (EEG; exp2).

      In the first experiment participants engage in the detection of a brief tone embedded in noise. Critically, the tone appears whilst subjects are viewing a silent movie clip. Tones are critically timed with respect to the phase of the theta rhythm prevalent in the lip action trajectory (and its relation to the original audio track). Each trial includes 0, 1 or 2 tones and subjects provide a speeded response when the tone is detected. Tones are also critically presented either during the first half of the clip or the second half of the clip (or both or neither). This latter timing parameter is designed to probe the possibility of an increasing degree of entrainment to visual lip movement as the clip evolves. In the second experiment the findings demonstrated in the exp 1 are met with an analysis of visual entrainment and its impact on auditory sources using EEG and source estimation on data obtained while observers viewed the same silent movie clips passively. The paper is well written, the premise is clear and the findings are interesting and timely. In what follows I outline some questions and concerns that come to mind when assessing the validity of the interpretation of the findings. Those span the experimental and stimulus design as well as the analysis choices made.

      1) The behavioral procedure suggests that the tones were pseudo-randomly positioned w/ respect to the quantified theta phase of the lip movement. It would be interesting to understand whether any care was taken to exhaustively sample different phases of the phase of interest in the lip movement. It might be important, therefore to demonstrate that phases were equivalently sampled by chance in the first and second half trials and over the different clips. An inset in figure 1 would make for a good spot to demonstrate the descriptive statistics of target positioning (as a function of phase).

      2) Second and somewhat related, wouldn't it make more sense to quantify accuracy based on phase bins? This way no division to subpopulation would be required since each individual could be aligned to their best phase. The methods leave it somewhat unclear whether this was a possibility in terms of the stimulus design (i.e., were there enough phases to accomplish this in the stimulus/tone timing; see previous point).

      In addition the subject mean phase of the correctly detected target provides little insight as to the periodic nature of performance. Analyzing whether there is a periodic modulation of the pattern of responses over phase would provide richer, more nuanced evidence for the claims.

      3) It would be important and interesting to learn whether the first and second part of the trial has the same MI profile at theta b/w lip movement and audio track. Currently, The characterization of MI was done on the whole movie clips. This is crucial for both Experiment 1 and Experiment 2 interpretation.

      4) The distinction b/w the first and second half -- indicating that entrainment takes time to build up is somewhat overstated in the context of this paper seeing that the literature suggests that by 0.5 s entrainment is fully arrived at (among others -- the authors themselves say so in the TINs piece). Other processes such as calibration to a given speaker might take longer, and those might justify (or account for?) the result showing that early vs. late targets differ in the degree to which the phase of the lip action affects performance.

      Important details over the stimuli need to be clarified:

      5) Did every clip introduce a new speaker to the subject? Thus, time on cl cip also amounts to degree of familiarity with the speaker?

      6) Did each clip have the same degree of MI b/w audio and lip movement or were there better (more pronounced) lip clips than others when considering their link to the audio? Would it make sense to add these measures as covariates in the analysis?

      7) Is the same target timing used for the same clip for all subjects? Or are the tones truly randomly placed and matched onto clips such that a given clip could appear w/ tones at different times for different subjects?

      At the risk of somewhat repeating point #2 above -- within the analysis the following should be considered:

      8) The authors establish that in the second half performance there are, in fact, two subpopulations in the sample. Wouldn't this post hoc grouping factor, which isn't obviously motivated be better described by properly delineating performance as a function of phase? I can readily understand that the authors might not have a clear hypothesis over what might be the better phase for performing on an irrelevant tone probe. Nonetheless, if a periodic process is entraining performance once a best phase is identified adjacent phase bins should demonstrate this circular relationship. This would allow for a direct quantification of ALL data together after aligning performance to the best phase bin, per subject.

      Finally, the following points pertain for most for the contextualization of this work and the discussion:

      9) While the authors discuss at least two mechanisms relating to how entertainment affects growth by the second part of the clip, it would be nice to relate the concrete reading of this effect to cognitive processes that may evolve within these timescales. In other words, learning that tracking takes 0.5 s or learning that visual inputs to frontal cortex take a given time scale to exert impact on auditory sensory regions is another description of the finding. What might these time scales buy me as a speaker and as a listener? What processes might be reflected by arriving at these states of synchrony and top-down control for speech comprehension?

      10) The post hoc description of the subpopulations preferred phases is interesting and could relate interestingly to the entertainment literature (from Spaak 2014 in vision through Hickok 2015 in audition and others). Might the authors speculate on what part of speech is characterized by one phase vs. another?

      11) The author's conjecture in the discussion of this topic - an additional one - there are recent papers by Assaneo et al. (Poeppel as PI, Nat Neurosci, 2019) that show bi-modal behavior in a spontaneous synchronization task (motor to auditory), which was found to be related to morphological differences in frontal-to-auditory white matter pathways, functional differences AND better learning in a statistical learning paradigm. How do the two sets of bi-modal populations interact? The author's discussion of the motor cortex suggests they would.

      Methods section:

      The paper by and large is well written. An exception to this would be the methods section. Currently, the methods do not comply with best practices that would generate the work reproducible by others.

    2. Reviewer #2:

      This study performs behavioral assessment of the impact of watching lip movements on tone detection in noise and EEG recordings from passive observers of the same movies. The basic paradigm is that listeners watch a silent movie of lip movements (selected to be at ~theta rate) while listening for tone bursts that occur most commonly twice in a trial (early and late). The key findings are that perceptual sensitivity is higher when tones are in the second half of the trial, when hits align at a particular phase angle of the visual stimuli. Brain signals were also observed to entrain through the course of the trial. The authors conclude that visual modulation of auditory excitability explains these effects.

      The stimulus design is elegant, and if taken at face value are a nice demonstration that visual stimuli can modulate auditory perception in a temporally specific manner. However, I have concerns with the interpretation of the data while also feeling to some extent that these findings are expected; stimulating AC with a speech envelope modulates speech perception (Wilsch et al., 2018), silent speech modulates human auditory cortex (Calvert 1999) and visual stimuli modulated at theta rates directly entrain auditory cortical phase in animals (Atilgan et al., 2018) as do audiovisual speech stimuli in humans (Zion-Golumbic et al., 2013). This study is a further piece of evidence along these lines, but it's hard to be certain of a causal relationship when the behaviour and neurophysiology are in different listeners. I also have some concerns about the current interpretation some of which are addressable with additional analysis.

      I'm not convinced that the authors have sufficiently ruled out the possibility that the first tone causes a phase reset in AC that causes detected second tones to be entrained to a particular stimulus phase. In theory this should be easily addressed by looking at the 1 tone trials where the tone is in the second half of the stimulus. These data are in the supplemental material but are not particularly reassuring - while the d' is higher for the second tone, but the phase angles are uniformly distributed across participants in comparison to the clustering observed in the 2-tone data. This finding calls into question the causal link between the phase relationship and performance. The authors note that there are relatively few trials (50% of those available in the 2 tone data) - the contribution that this plays could be addressed by subsampling half the trials from the 2 tone dataset and re-estimating the phase modulation to estimate whether the single tone condition is any different. Another analysis that could be enlightening/ reassuring would be to compute the phase of the hits to tone 2 relative to the onset of tone 1 using the modulation rate of the clip (or 6 Hz, if clips were selected to be that anyway).

      I would like to see the distribution of the tones w.r.t. the phase of the lip movement (all tones, not just hits) to be reassured that there is nothing inherent in the movies that causes the phase alignment?

      The neurophysiology does not demonstrate a significant increase in entrainment from early to late windows, only that there is a different phase angle. Doesn't this also call into question the conclusion that performance is better in the second half due to better entrainment? While the phase in the second might be 'more efficient' if the entrainment is equivalent shouldn't there be a behavioural relationship in both cases? This is where performing both behaviour and EEG simultaneously (or at least in the same listeners) may have proved enlightening.

    3. Reviewer #1:

      In this manuscript, the authors report on two separate experiments designed to understand the relationship between lip-movement induced theta phase and auditory processing. In the first experiment, subjects detected tones embedded in noise while viewing silent videos. The results demonstrate that tone detection performance improved when tones are presented later relative to earlier in a trial. It was also demonstrated that correct detection, for tones that occurred later in the trial, was systematically linked with the phase of the theta oscillatory activity conveyed by the lip movements. In the second experiment EEG was recorded while participants viewed the silent videos and performed an emotion judgement task. Theta phase coupling was demonstrated between auditory and visual areas such that oscillations in the visual cortex preceded those in the auditory cortex.

      The authors conclude that these results demonstrate that lip movements directly affect the excitability of the auditory cortex. However, due to the indirect nature of the reported effects, I do not believe this conclusion is justified. I elaborate on this concern below:

      1) In experiment 1, the main finding that performance is better later in the trial could arise from many factors including non-specific attentional effects.

      2) The analysis reported in the bottom of page 5 (comparing vector lengths for hits vs misses) is critical to the argument but the results are inconclusive (significant interaction, but subsequent comparisons not quite significant. Likely because the experiment is underpowered?).

      3) In Experiment 2: the task performed by the listeners might have biased them towards speech imagery leading to the pattern of effects observed. Indeed, the observed involvement of the left hemisphere may be consistent with the involvement of speech imagery. This would render the observed link between visual and auditory cortices as somewhat trivial and not new (such links have been reported in many previous studies as acknowledged by the authors).

      4) Most importantly, the authors do not provide any direct evidence that the auditory effects observed in Experiment 2 are related to those observed in experiment 1.

      Other comments:

      1) For the analyses in Figure 2A, were the number of trials over which the analysis is conducted adjusted for "first tone" vs "second tone"? Since the hit rate is higher for the second tone, there may be a concern that including more trials in the analysis would result in better SNR and hence a more robust effect.

      2) In Experiment 2 the analysis is focused on phase effects. Can you report whether there are any power differences in the delta band in the "early" vs "later" time windows?

      3) Line 176, the authors write "these results established that entrainment of theta lip activity increased in time". It is not clear to me which aspect of the results supports this statement.

      4) Line 405: "any lag between visual and auditory stimuli onsets was later compensated...". I could not find mention of this elsewhere (i.e. how lags were compensated, how large they were). This is critical for interpreting the results and therefore should be described in detail.

      5) Line 430-437 why did you choose to quantify the envelope in this way rather than just taking the wide band envelope?

      6) Figure S3 is important and should be in the main text.

      7) Line 473 "auditory pure tones"

      8) The description in lines 478-481 doesn't make sense. It is unclear how loudness reported in line 480 (91dB SPL; incidentally this is very loud) relates to the later reported value of 72dB SPL.

      9) Line 485 "embedded"

      10) Please clarify whether in your loudness adjustment procedure you were adjusting the loudness of the tone, the noise or the SNR (and thus keeping the overall loudness of the stimulus fixed)

      11) Line 537 "preceding"

    4. Summary: The reviewers agreed that the paradigm proposed in this work is elegant, and the question timely and important. However, as detailed below, they highlighted several concerns about analysis choices and the interpretation of the data. While some of these can be addressed, it was felt that a major drawback of the present manuscript is that the behaviour and EEG are obtained separately and any links are hence only circumstantial.

    1. Reviewer #3:

      The authors showcase results from an experimental pipeline aiming at demonstrating how evolution of in vitro cancer models can be exploited to identify somatic genomic and structural variants associated with the emergence of drug resistance.

      To this aim the authors unbiasedly selected 5 widely used chemotherapeutic agents via systematically treating the HAP-1 cell line with 16 different drugs then chose those yielding clinically compatible half-maximal effective concentrations. After generating stably resistant clones ) of the HAP-1 parental cell line, across a number of replicates (by culturing in sublethal doses of the selected compounds), the authors whole-exome/whole-genome sequenced their models and compared variants observed in the resistant clones versus those present in the parental line.

      In this way the authors identified recurrent loss of function variants across replicates in the drug resistant clones per each drug and were able to reproduce the increase in drug resistance of the parental line by knocking out the genes found altered in the drug resistant clones or they were able to reproduce the same finding by pharmacologically inhibiting genes found to host gain-of-function mutations in the resistant clones. Thus highlighting a new potential target for combinatorial cancer therapy and chemosensitization.

      Briefly, this is a nice piece of work showing for the first time that exploiting in vitro evolution paired with whole genome analysis for identifying targets for combinatorial therapy and elucidate the mechanisms involved in the emergence of drug resistance is practically feasible.

      The experimental pipeline and the followup validation experiments are well thought and designed and outcomes convincingly support the authors' final claim. There are no arbitrary nor unjustified choices and the showcased platform seems to be robust enough.

      I would like to see the following few points addressed/answered:

      1) The authors focused only on chemotherapeutics while composing their initial search basin. Would considering also few targeted therapies worthy? or it is known that no effects would be exerted on HAP-1? This should be briefly mentioned.

      2) The title of the manuscript can be improved: the authors are deconvolving genomic alterations whose acquisition is linked to the development of drug resistance, thus potential chemosensitising targets or targets for combinatorial therapies. This could be better reflected by the title. As it is now it reads like the main aim is to identify 'innate/intrinsic' targets/cancer-dependencies.

      3) Mutagenesis experiments to identify mutations that are linked to the emergence of drug resistance might be mentioned in the introduction, and the following work cited: PMID: 28179366.

      4) When mentioning the 'Genomics of Drug Sensitivity in Cancer' portal (www.cancerrxgene.org) the following two works (describing the online resource) should be cited: PMID: 23180760 and PMID: 27397505

      5) Figure 1 nicely describes the experimental pipeline presented in this manuscript however it should be completed with a final panel or a couple of panels illustrating the genomic comparison between parental and drug resistant clones to identify SNV and CNV associated with drug resistance.

      6) It is not clear what the numbers in the 'fennel' in figure S3A refer to. Resistant clones within an individual tested drug? individual resistant clone or overall cases? This should be specified.

      7) As it is presented, Table 1 is not very informative/clear, I would replace it with a barplot.

    2. Reviewer #2:

      In this manuscript, Jado et al. studied the in vitro evolution of the haploid cell line HAP1 in the presence of five common anti drug agents. The authors exposed the cells to the drugs and then performed whole-exome or whole-genome sequencing (WES or WGS) in order to identify point mutations (SNVs) and copy number changes (CNVs) associated with resistance. In multiple cases, the authors confirmed that shRNA-mediated knockdown of a candidate gene (that is, a gene that was recurrently mutated at high allele fractions, or recurrently lost/gained) indeed conferred resistance to the drug.

      Overall, this is an elegant demonstration that in vitro evolution in cancer cell lines can be useful for the study of chemotherapy resistance. Surprisingly, relatively few studies attempted to identify resistance mechanisms to anticancer drugs using spontaneous evolution experiments, despite the prevalence of this approach in the study of antibiotics resistance. While the authors were able to identify and validate a few known resistance mechanisms to very commonly-used drugs, a major limitation of the current study is that it doesn't really shed any new light on chemotherapy resistance mechanisms. While I appreciate the time and effort that were required to perform the drug experiments and sequence the various clones, the follow-up studies are rather superficial and do not really extend our knowledge on any of the proposed mechanisms of drug resistance.

      Specific Comments:

      1) The AF threshold of 0.85 seems pretty arbitrary. Can this threshold be determined empirically based on the sequencing depth and noise of each sample? Mutations with AF>0.85 may still be subclonal, whereas mutations with AF<0.85 may still be of interest.

      2) While the rationale for performing the initial experiments in HAP1 cells is clear, it is unclear why no validation experiments were performed in additional cancer cell lines. It is imperative to perform the knockdown experiments not only in HAP1 cells but in a panel of additional cancer cell lines, in order to examine whether these are general mechanisms of resistance.

      3) Multiple CRISPR-Cas9 studies were performed to identify mechanisms of drug resistance to anticancer drugs. The authors note in the Discussion that these studies "are useful but cannot readily reveal critical gain-of-function, single nucleotide alleles". This makes sense, yet in almost all cases the authors use a simple loss-of-function shRNA assay in order to confirm their initial sequencing results. This means that the added value of the spontaneous evolution approach is rather limited, either because other mechanisms of resistance are much less common or because it is much easier to identify them.

      4) In the gemcitabine resistance experiment, the authors confirmed that RRM1 KD increased the sensitivity of the cells to the drug. A complementary experiment should be to test whether the overexpression of RRM1 would increase the resistance.

      5) In several cases, multiple SNVs or CNVs were identified in the same resistant clone at a clonal (or near-clonal) AF. Other than following up on "immediate suspects", is there a systematic way to tease apart resistance "drivers" from "passengers"? This should be at least discussed.

      6) The manuscript would benefit from language editing, there are quite a few grammatical errors.

    3. Reviewer #1:

      Major Comments:

      The experimental design is inconsistent in at least three ways:

      1) The genomes of 14 resistant clones were analyzed by whole exome sequencing (WES), whereas the genomes of the remaining 21 clones were analyzed using whole genome sequencing (WGS).

      2) And the sequencing approach even differed among the six lines evolved in three separate drugs: doxorubicin, paclitaxel, and gemcitabine.

      3) We feel the authors did not adequately explain how the different sequencing methodologies could affect their results and the inferences drawn from them. For example, one is likely to miss information with respect to copy number variants by only sequencing exomes. The authors highlight this fact in their discussion, but they do not explain by how much they could be off in their assessment.

      In some cases the same parental clone was used to find replicate lines subjected to the same selective pressure, and in other cases, the same parental clone was used to find replicate lines subjected to different selective pressures.

      Lines were evolved anywhere from seven to thirty weeks, and the length of the evolution experiments does not correlate with the selecting drug (e.g., three replicate lines were evolved in doxorubicin for 9 weeks and three other lines were evolved to this same drug for 12 weeks). Did the authors normalize by generations? Again, the authors do not address this issue in their manuscript.

    4. Summary: The authors examined the genomic basis of resistance evolution in human chronic myelogenous leukemia (CML) near-haploid cell lines to 5 separate chemotherapeutic agents.

      Using either whole genome or whole exome analysis, they found numerous instances of single nucleotide polymorphisms and copy number variants, including amplifications and deletions, among lines. They then used subsequent knockdown or knockout experiments to confirm that these variants, in fact, lead to increased resistance in these lines.

      The work is interesting, timely, and has potential clinical implications. For example, the resistance alleles identified here could be closely examined in future studies in order to develop treatment strategies. However, the experimental design has certain limitations, advances in understanding chemotherapy resistance mechanisms is currently modest, and the presentation of results can be improved. We feel overall that these could be addressed, but that they will require significant extra experimental work.

    1. Reviewer #3:

      The manuscript by Dr. Vlachos group has demonstrated many important features as well as mechanisms of RA-induced synaptic plasticity. For example, they demonstrated that RA-induced plasticity happens in human neurons as well as in rodent neurons in vivo; discovery that synapodin as a critical mediator of RA plasticity as well as RA effect on the size of spine head, synaptopodin cluster and spine apparatus. Moreover, the effect of RA on in vivo LTP plasticity is very interesting. The data looks solid and supports the authors' conclusions.

      However the manuscript can be significantly improved by discussion of their results, in the context with literature data as well as explaining the possible mechanism of their results.

      1) RA effect on AMPAR upregulation has been reported to not share the same SNARE mechanisms as electrical LTP (Synt1/7 independent vs dependent). How does RA have the extra effect on the LTP amplitude? Moreover, RA plasticity is recognized as a form of homeostatic synaptic plasticity, i.e., the effect takes hours to develop as shown by the authors of RA incubation of many hours in their experiment on human neurons. How does this compare with their RA manipulations in LTP exp (Is TA injected shortly before LTP stimulus? What do the author think that LTP stimulus does to RA signaling?)?

      What about metaplasticity involves RA? any connections to the present study?

      2) The authors conclude that RA have effects on spines with or without spine apparatus, however, the authors' data suggest that RA-plasticity is blocked when spine apparatus is eliminated (with synaptopodin KO). Moreover, there is significant overlap of spine size for spines with or without spine apparatus... How do the authors interpret their results here? Is spine apparatus dynamic? can floating between spines quickly? Any literature on this? The authors need to discuss more on the possible ways, with supporting literature data, of how this spine apparatus can affect RA function.

      In short, a discussion of the above points will add significance to the study.

    2. Reviewer #2:

      This paper explores the effect of all-trans retinoic acid (atRA) on synaptic plasticity in human and murine brain slices. The paper builds on previous work showing that atRA plays a key role in various forms of homeostatic and Hebbian plasticity, but extends our understanding in two very significant ways. First, the work convincingly shows that atRA enhances synaptic function in human layer 2/3 pyramidal neurons in intact cortical slices, and like previous studies using murine models and human ipSCs, this is critically dependent on new protein synthesis. Second, the studies show that atRA-mediated synaptic plasticity requires synaptopodin, a protein that is specifically localized to the spine apparatus.

      Overall, the studies have been well-executed and the data are both rigorous and convincing. The paper is very clearly written and the findings are significant. This is a very strong body of work that will be of broad interest.

      Comments:

      1) While the authors rightly point out in the introduction that no previous studies have assessed atRA effects in human cortical circuits, the Zhang et al. (2018) paper did elegantly show synaptic plasticity effects in human neurons (derived from ipSCs). This is noted in the discussion, but should also be pointed out in the introduction as it bears directly on the rationale for the studies described in the paper.

      2) Figure 1C illustrates responses of layer 2/3 pyramidal neurons to intracellular current injection. While the passive membrane properties are quantified and similar regardless of atRA exposure, it is not clear if atRA affects intrinsic excitability of these neurons (i.e., the number of spikes elicited by different levels of injected current). These data should be included.

      3) The legend for Figure 1 C-E is too vague and does not describe the specific measures that are shown in the figure.

      4) For the mouse studies shown in Figure 3A and 3B, did wild-type littermates serve as controls (the gold standard)? Data from wild-type neurons is described in the text but it is not clear if these were collected from a different cohort of animals or from the WT littermates of the Synpo-deficient mice. Also, the authors should state whether the deficient allele is null.

      5) The Synpo-deficient mice have basal sEPSC amplitudes that are noticeably larger than WT mice (as reported in the text). Some discussion of this observation is warranted.

      6) The cumulative frequency plots shown throughout the paper show a curious trend where the smallest events appear to be at least 10 pA or larger. This is somewhat atypical, as most studies find a large number of events between 5 and 10 pA (and many lower still). Does this reflect events only larger than 10 pA being included in the analysis? If so, the points to the left of 10 pA should probably be removed from these plots as including them implies that this data range was adequately sampled.

      7) The schematic shown in Fig4B refers to early-phase and late-phase LTP, but the recordings appear to be limited to 60 min post-LTP induction (i.e well before the late-phase). These terms should be replaced with the actual times post-LTP induction.

      8) The discussion is quite on point, but is rather brief. The paper would benefit from a more detailed discussion of the link between the spine-apparatus and translation-dependent forms of synaptic plasticity.

    3. Reviewer #1:

      The study by Lenz et al. explores the acute action of retinoic acid (RA) in adult human cortical neurons. The main findings are:

      1) Consistent with previous findings in mouse neurons, the authors reported enhanced excitatory synaptic transmission in RA-treated cortical layer 2/3 neurons.

      2) Also consistent with previous findings, this enhancement is independent of gene transcription, but requires protein synthesis.

      3) RA's effect on EPSC requires expression of an actin-modulating protein called synaptopodin. In the Synaptopodin deficient mouse mPFC neurons, RA's effect on EPSC is eliminated. Moreover, in synaptopodin deficient hippocampal dentate gyrus neurons, enhancement of LTP by RA is also reversed.

      Overall, this study demonstrates RA-induced synaptic plasticity in acute human cortical neurons, thus expanding the previous findings from mouse neurons and immature human neurons induced from iPS cells to adult human cortical neurons.

      Specific Comments:

      1) Figure 3 shows that in synaptopodin deficient mouse neurons, RA no longer increases sEPSC amplitudes. The rescue experiments are very nice. However, in both WT neurons (stated in main text, not in figure) and rescue neurons (Fig. 3B), the baseline sEPSC amplitudes are significantly smaller than those of the KO neurons. Can the authors speculate why deletion of synaptopodin may lead to enhanced basal excitatory synaptic transmission?

      2) The LTP experiments are a bit problematic. First of all, it was done in mouse hippocampal DG neurons, not cortical neurons. The effect of RA may be different in different neuronal types, as has been shown in previous mouse studies. It will be nice to examine whether RA changes basal synaptic transmission in these neurons in acute slices. Without knowing the effect on basal transmission, it is hard to interpret the LTP results. Second, why did WT DG show no LTP? Third, previous work by Arendt et al. (2015) showed that RA enhances hippocampal CA1 neuron basal EPSCs, and occludes further LTP. The observation here in the DG with RA treatment points the opposite direction. Can the authors offer some explanation (i.e. RA alters LTP threshold through some kind of priming)? Again, knowing the effect of RA on basal transmission specifically in the DG neurons would be informative toward understanding the effect on LTP.

      3) The pharmacological treatments (ActD, anisomycin etc.) in this study are in general very long (6 hr) compared to conventional methods (less than 2 hr). To control for potential toxicity associated with prolonged treatment, vehicle control should be added in both Fig 5 and Fig 6.

    4. Summary: All three reviewers are highly enthusiastic about the study reporting the acute effects of retinoic acid on excitatory synaptic transmission and its underlying mechanisms. The experiments are well executed and the results convincing. Aside from some minor comments that require minimal additional experiments or further clarification, the reviewers expressed one major concern regarding the dentate gyrus LTP data. Although further experiments are required to clarify the concerns, the reviewers recommended removing the LTP figure from the present study as it is not well connected with the rest of the study.

    1. Reviewer #1:

      The work by Pipitone et al. is a very carefully performed and technically sophisticated elucidation of the establishment of the thylakoid membrane system in Arabidopsis chloroplasts upon first illumination of cotyledons. Its charm is the three-dimensional resolution during a time course that allows it to follow the rapid changes occuring during the short time window in which the greening occurs. In addition, the authors included proteomics and lipidomics approaches complementing the morphological observations by sound molecular data. All together the study provides a very detailed catalogue of the processes that trigger chloroplast biogenesis that is highly useful for the community as it provides important numbers of size and development.

      Improvements:

      Actually the work has been performed very carefully and there is not much to improve.

      The introduction could contain more references (e.g. lines 77, 83, 90, 93, 98,, 131, 132)

      SBF-SEM should be spelled out at first mentioning (line 146) and maybe a bit more background about the technology would be helpful for the reader to understand it.

      Line 244: The occurrence of starch granules is of course caused by the continuous illumination. It however may also have an impact on the final size of the plastid. It would be interesting to know whether chloroplasts at the end of a night phase are smaller than at the end of a light phase. This is not mandatory for the current manuscript but an interesting question to follow in future and maybe to be discussed.

      Line 251: The surface area.... please define what is meant since membranes have two sides.

      Lines 256-261: There is another study done in cell culture that has a similar design (Dubreuil et al ), are the two studies compatible with each other in their conclusion and if not, what are the differences?

      Lines 549-551: This sentence is not perfectly clear to me. Maybe the authors can explain this a bit more in detail using examples.

      Lines 564-573: I think it is worth noting that the interactions between PSII complexes located in neighbouring thylakoid membranes trigger the stacking of the grana. It is therefore tempting to speculate that stroma lamellae are established first and that these membranes are then stacked after PSII complexes are inserted into the membrane because they provide the adhesion points between them.

    2. Summary: All three reviewers as well as myself are impressed by the in depth and multi-method analysis of chloroplast and thylakoid membrane development provided in your study, including time courses of 3D imaging combining TEM, SBF-SEM and confocal microscopy, lipidomics and proteomics. However, some analyses need to be improved and/or better explained.

      • There is a concern about the proteomics analysis, as the low number of proteins changing in abundance upon de-etiolation is unexpected. It is not clear how the samples were harvested. Were they harvested in the light and could that have influenced protein abundance? The harvesting procedure needs to be better explained. Or is the proteomics method not sensitive enough? The proteomics should be validated, for example by Western Blots with well-established marker proteins such as phyA and HY5.

      • Please also add loading controls to Fig 6 and the associated supplemental figure.

      • Please explain better how the volume of dividing chloroplasts was determined.

    1. Reviewer #2:

      The authors show that neonatal LPS (nLPS) treatment is associated with downregulated PFC levels of ATPase phospholipid tranporting8A2 (ATP8A2) that is associated with elevated IFN in serum and PFC and blocked by an IFN blocking antibody. Antibody treatment marginally antagonized effects of nLPS to cause depressive-like behavior, but was ineffective when females alone were examined.

      This paper adds to a long list of publications reporting alterations in a number of diverse signaling molecules after nLPS treatment. Strengths are that it is generally well done, with appropriate attention to experimental design (eg litter effects) and statistical treatment. However, while the down regulation of ATP8A2 is indisputable, a major weakness is that there is no functional relationship revealed between this and any subsequent behavioral, anatomical or physiological alterations. While the possible role of IFN in causing the increased depressive-like behavior is of some interest, the data here are not convincing. Furthermore, while other work has reported extensively on sex-specific alterations in behavior after nLPS, the behavioral analysis here ( FST, TST) is rather limited.

      1) There is little justification for reverting to the non-alpha corrected LSD test when the Tukey does not show significance.

      2) The extensive literature on the effects of nLPS is only superficially reviewed.

      3) The direct involvement of ATP8A2 in any behavioral or functional outcomes should be tested.

      4) How does IFN cause down regulation of the ATP8A2?

      5) Other behavioral alterations should be tested such as open field that are less stressful than FST or TST.

    2. Reviewer #1:

      This report makes a logical connection between depressive-like behaviors induced in mice following LPS-injection to mimic bacterial infection and the down regulation of phospholipid transporting enzyme, ATP8A2, in the prefrontal cortex. The intermediary is IFN-gamma. The work is quite convincing that LPS down regulates ATP8A2 by upregulating IFN-gamma and that this has some limited effects on behavior. However, the impact of the findings is limited by several factors.

      1) The use of FST and TST as measures of depression is increasingly falling out of favor as there is no face validity to humans. It is understood that these tests have been long in use and were in the past considered the best measures of "depressive-like" behaviors in mice but the field has moved on to much more relevant constructs such as social defeat, anhedonia etc. As it stands the behavioral analysis here is limited and the effects are modest at best.

      2) The use of LPS as a model to induce depression also has limitations. The injection paradigm used is likely to have caused massive inflammation, as evidenced by the increase in cytokines, but what this is modeling is unclear and how the impact would be specific to depression later in life is equally unclear. Indeed, the references the authors cite for the LPS regime they use offer completely different mechanisms and impacts of the inflammation. This is not to say the current findings aren't important, they are, but rather this pathway may be one among many that is invoked following massive inflammation during early development which then has many non-specific effects.

      3) There is no functional connection between down regulation of ATP8A2 developmentally and adult neural activity. Clearly a membrane phospholipid transporting enzyme is important, but exactly how it is important here, meaning what enduring impacts there are on neuronal function, is unknown.

      4) The experiments were designed to test the relationship between IFN-gamma and ATP8A2 but then conclude that the behavioral effects are mediated by this connection. There could be many other effects of IFN-gamma that are not considered here but would be nonetheless blocked by the neutralizing antibody approach used. Thus the main conclusions of the manuscript are not supported in terms of the role of ATP8A2 in LPS-induced depression.

    3. Summary: Both reviewers felt that the work was well done and quite convincing that LPS down regulates ATP8A2 by upregulating IFN-gamma. This is a novel and interesting finding. But both reviewers also agreed that there is insufficient evidence causally connecting the changes in ATP8A2 to behavior, and that the behavioral tests used are not sufficient to draw rigorous conclusions regarding depression-like behavior. Combined, these weaknesses lessen the impact of the findings for the field.

    1. Reviewer #3:

      The authors have conducted a very challenging study. The paper is clearly written and the topic of neural function under anesthesia is interesting. However, a significant limitation is that many of the analyses presented here do not provide clear insights into the processes the authors are studying.

      -A key issue is that the authors aim to predict who is more or less sensitive to general anesthesia. However, each individual subject was given a different target plasma concentration of propofol, based on clinical scoring. So any difference in behavior may reflect different dosing rather than different behavioral sensitivity to a particular drug concentration.

      -The interpretation of increased functional connectivity is challenging in the context of anesthesia, which modulates vessel dilation and systemic physiology. These analyses would benefit from additional information about the fMRI signal characteristics, e.g. amplitude and physiological signals.

      -Fig. 3 is used to portray comparisons of wakefulness vs. sedation, implied in the text, but does not include direct statistical tests of the difference between the two conditions, and contrasting p<0.05 with p>0.05 does not indicate a significant difference. The suggestion of reduced cortical responses to auditory stimuli makes sense given that the participants are sedated, but the analysis does not seem to provide information about which aspect of auditory processing is modulated by sedation.

      -The statements about response time not being mediated by age may reflect an underpowered study, as age is a strong modulator of anesthetic sensitivity and one group has an n=6.

      -While many interesting MRI studies can be done with quite small n, depending on the question being asked (e.g. Midnight Scan Club, high-resolution individual studies), this study aims to conduct structure-based predictions of individual differences in behavior. This type of analysis requires more than the n=6 slow responders used for Fig. 5, as there are many other features that likely vary in a group this small. I appreciate that the authors have conducted a very challenging study, and it is not easy to collect more data, but while many interesting analyses can be done on this type of data, this is not an appropriate sample size for assessing GMV-individual differences associations. Larger samples sizes or within-subjects analyses are needed for robust GMV effects.

      -Cluster correction method in 'Analyses of fMRI data' should be specified (and checked, Eklund et al.). The precise statistical method used to assess FDR corrected activity correlations with individual subject response times is not clear; it seems that the ANOVA resulted in non-significant results that are nevertheless being reported as differences using Hedges d?

      -The presented evidence does not sufficiently support the authors' conclusion that they "provided very strong evidence that individual differences in responsiveness under moderate anaesthesia related to inherent differences in brain function and structure within the executive control network, which can be predicted prior to sedation.". I would commend the authors on their interesting and challenging experiment, and recommend refocusing the analyses.

    2. Reviewer #2:

      In this study, Deng and colleagues have sought to assess the neural correlates of individual differences in responsiveness variability across wakefulness and moderate levels of propofol-induced anaesthesia. In addition to resting state scanning and an auditory story task, the participants underwent behavioural assessments including memory recall and a target detection task. Furthermore, the auditory story task was independently rated by a separate group for its "suspensefulness". Focusing their analysis on three major large-scale brain networks, the group-level results first indicated significant differences in the between network interactions of the chosen networks across wakefulness and sedation, specifically in the narrative condition. Furthermore, during the same condition, there was reduced cross-subject correlation between wakefulness versus sedation centred mainly on the sensorimotor brain regions. Moreover, based on the responses in the target detection task, the participants were grouped into fast and slow responders which then showed significant differences in gray matter volume as well as connectivity differences in the wakeful auditory story task condition within the executive control network.

      Overall, this is a well-written manuscript. However, my initial enthusiasm about the question of interest was hampered by major theoretical and methodological concerns related to this study. Below I outline these points in the hopes that they improve this study and its outcomes.

      First and foremost, the authors state that their major interest in this study was to assess individual differences in sedation-induced response variability and its potential brain bases. Despite the attractiveness of this topic, which is undoubtedly of interest both to the academic community and the general public, I do not believe that the current study design would allow the authors to answer this question. First of all, although I completely appreciate the difficulty in recruiting participants to take part in such pharmacological studies, I do not think that a group of 17 participants is enough to be able to assess "individual differences". For this to work, there has to be a large enough sample based on adequate power calculations, keeping in mind all the spurious false positive effects that are generated by pharmacological interventions and their downstream effects on connectivity estimates (e.g. motion, global signal etc.). Second, though it is perfectly valid to carry out the initial within-group connectivity and whole-brain activity analyses for the task/rest (which I believe are the only statistically and experimentally sound sections), following these results, the authors mainly carry out multiple exploratory analyses that aim to infer what happened to 3 non-respondent participants (or 6 slow responders). This to me is closer to a case study rather than an experimental study with proper statistics. Overall this fast/slow responder split only comes as an afterthought and does not seem to be the main intention behind the study. This is at odds with the major goal stated in the introduction that the main aim of the authors was to assess inter-individual differences. As such, I do not think that the analyses highlighted by the authors provide enough evidence to support their claims. More detailed points are provided below:

      • The introduction is well-written, citing as much of the relevant literature on this topic as possible. Having said that, I am not really convinced about the justification for selecting the dorsal attention, executive control and default mode networks as the sole focus of the authors' analysis. Although it is true that there is a strong a priori basis that these associative networks play an important role in maintaining consciousness, the references that the authors refer to are equally biased in focusing their analyses on specific higher-order networks, creating a circular argument. In light of evidence highlighting the importance of sensorimotor networks in this context, as well as the balance in their interactions with associative cortices, I would argue that a whole-brain approach would be better suited. Furthermore, as indicated by the whole-brain analysis during the auditory story task, most alterations were centered on the primary somatosensory regions. This is at odds with the justification of the authors on focusing their connectivity analyses solely on associative brain networks.

      • Given the wide age range (and its potential influence on the obtained results), it would be great for the authors to provide the mean and standard deviation of age within groups, and whether the groups were age-matched (though the range seems similar).

      • The authors state that only the reaction time was measured in the auditory target detection task, but later in the results section they mention "omissions". Given that such omissions might be strongly indicative of unresponsiveness/sleep, it is unclear how one can interpret the observed brain-based effects solely from the perspective of reduced information processing (especially when the data was collected under eyes-closed conditions).

      • The authors provide a thorough description of the sedation administration procedures, which is excellent. Nevertheless, I was wondering whether the blood plasma propofol concentrations could be used to explain some of the results in individual differences or even a nuisance regressor to show that the effects were not simply driven by this factor.

      • I failed to find any information in the methods section as to why/how the authors have decided on a mean-split of the participants to fast/slow responders. Given the already small sample size, further reducing degrees of freedom by a split of 11 versus 6 participants makes it very problematic in terms of any statistical tests that can be carried out.

      • Line 441 - Results should not be reported if it did not reach statistical significance.

      • Line 448 - For the two analyses on this page the authors indicate that although in the wakefulness condition there were significant brain activity that correlated with (not "predicted") task stimulus, no significant effects were observed in the sedation condition. This absence of evidence should not be then taken as evidence of absence. In other words, such lack of evidence can be explained by a variety of factors not attributable to the effect of sedation on brain activity (e.g. simply by the fact that the participants were not paying attention to the story or falling asleep).

      • Line 484 - I do not think it is acceptable/justifiable to carry out post-doc tests, when there was no significant difference in the main ANOVA.

      • Line 503 - I am not really sure about the justification behind the assessment of gray matter volume. Besides the issues related to small sample size, the observed differences in functional connectivity may then simply be due to differences in the quality of the data that can be extracted from the defined ROIs in a subset of participants. Was this analysis corrected for age (as a continuous variable)? In any case, as far as I am aware, there is no simple relationship between gray matter volume and functional connectivity (i.e. greater/smaller gray matter volume does not necessarily mean greater/smaller functional connectivity). Hence, once cannot make the conclusion that: "These results lend support to the functional connectivity results above, and together they strongly suggest that connectivity within the ECN, and especially the frontal aspect of this network, underlies individual differences in behavioural responsiveness under moderate anaesthesia."

      • Line 509 - Again, I am not really sure about the justification behind the analysis carried out here. The authors state that the ROIs that were found in the gray matter volume analysis overlapped with a priori ROIs which they suggest explain differences observed in functional connectivity. They then select a subset of these ROIs and again show that there are differences in connectivity. This seems rather circular.

      • The authors state that "Rather, only the functional connectivity within the ECN during the wakeful narrative condition differentiated the participants' responsiveness level, with significantly stronger ECN connectivity in the fast, relative to slow responders." I apologise if I am missing something, but I do not see any evidence for such a strong claim. All that the authors have found was that there were significant functional connectivity differences in the executive control network in the wakefulness condition between fast and slow responders (which was defined and grouped by the authors themselves), with no significant effect of condition or state. I fail to understand why this one result from a multitude of exploratory analyses that were conducted was picked out as the "main finding" when one cannot make any inferences about its direct relation to sedation.

      • Overall, I would urge the authors to re-think their analysis strategy and the corresponding discussion of their results.

    3. Reviewer #1:

      Deng et al. studied the mechanisms underlying the wide propofol effect-site concentration range associated with loss of responsiveness. Data was acquired from two centers (MRI, Canada; Auditory, Ireland). This is a well conducted study. The results could also explain why older patients (with presumably smaller gray matter volume) are more sensitive to propofol. My major concerns relate to precision in language.

      1) The authors studied mechanisms underlying why patients lose consciousness at a wide range of propofol effect-site concentration. This behavioral phenomenon is known and well described (Iwakiri H, Nishihara N, Nagata O, Matsukawa T, Ozaki M, Sessler DI. Individual effect-site concentrations of propofol are similar at loss of consciousness and at awakening. Anesth Analg. 2005;100:107-10). I would suggest that the. authors position their paper as such. They did not study general anesthesia per se, and the allusions to awareness under anesthesia may not be relevant.

      2) Per comment 1 above. Please reword the intro and discussion section i.e., " Anaesthesia has been used for over 150 years to reversibly abolish consciousness in clinical medicine, but its effect can vary substantially between individuals." What type of anesthesia are you referring to? Anesthetic vapors? Please provide a reference for this statement or make it propofol specific. Awareness under general anesthesia is related to numerous factors, many of which are iatrogenic as detailed in the NAP 5 study "The incidence of awareness rose from 1 out of 135,000 general anaesthetics to 1 out of 8,200 general anaesthetics when neuromuscular blockers were used" (https://pubmed.ncbi.nlm.nih.gov/25204697/). Further, it is unclear when dreaming occurs (during induction which is reasonable to expect/during emergence which is also reasonable to expect versus during the anesthesia). My suggestion is to qualify your statements by stating that this should be further studied in the context of possible genetic predisposition to awareness (Increased risk of intraoperative awareness in patients with a history of awareness. Anesthesiology 2013;119:1275-83).

      3) The term "moderate anaesthesia" is confusing to me, and would be to most clinicians. Please cite the description of what comprises moderate anesthesia. My interpretation is that the study was about sedation. Did you mean moderate sedation? (https://www.asahq.org/standards-and-guidelines/continuum-of-depth-of-sedation-definition-of-general-anesthesia-and-levels-of-sedationanalgesia).

      4) "the antagonistic relationship between the DMN and the DAN/ECN #and# was reduced during moderate anaesthesia, with a stronger and significant result in the narrative condition relative to the resting state." Anticorrelation?

      5) The suggestion that fMRI can be used to improve the accuracy of awareness monitoring is, in my opinion, not necessary and a stretch.

    4. Summary: There was general enthusiasm for the topic of study and the general approach of using neuroimaging to study brain function under anesthesia. However, the reviewers also shared a number of significant concerns, particularly regarding whether the data which has been collected is sufficient to answer the core questions being asked (for example, whether the number of participants supports a robust individual differences analysis).

      Jonathan E Peelle (Washington University in St. Louis) served as the Reviewing Editor.

    1. Reviewer #3: (Daniele Marinazzo)

      Dear authors,

      Thanks for the opportunity to read this nice paper. I appreciated the quality of the data analysis, and the quest towards associating electrophysiology and BOLD data through a data-driven transfer function, which can be interpreted as a proxy of the HRF. Also I completely agree with you that we need to move beyond a canonical response.

      There are a few issues I would like to discuss with you. I have done quite some work in this sense. On one hand this is good (and I think it's also the reason why I was invited to review this paper), on the other one there is always the risk that I have shaped my own goggles in these last years, and that I am projecting on your work some doubts and issues that I have with my own. In this case I apologize in advance, and I hope that we can have an enriching conversation.

      Please forgive me if I start by my own work; there is always the danger that reviewers try to make authors write the paper that they would write themselves, I will keep this in mind, but on the other hand I think that the best way to convey my thoughts to you is to let them flow as they come.

      So, here's our toolbox: https://www.nitrc.org/projects/rshrf. The idea behind it is that we can take peaks in the BOLD signal and take them as signatures of a pseudo neural event happening some time before at the neural level. This is in line with this work (which could also be relevant with respect to your power law figures):

      Tagliazucchi E, Balenzuela P, Fraiman D, Chialvo DR. Criticality in large-scale brain FMRI dynamics unveiled by a novel point process analysis. Front Physiol. 2012;3:15. Published 2012 Feb 8. doi:10.3389/fphys.2012.00015 and with the subsequent spatial clustering approach which has been called coactivation patterns (CAP)

      Liu X, Zhang N, Chang C, Duyn JH. Co-activation patterns in resting-state fMRI signals. Neuroimage. 2018;180(Pt B):485-494. doi:10.1016/j.neuroimage.2018.01.041 and innovation CAPs

      Karahanoğlu FI, Caballero-Gaudes C, Lazeyras F, Van de Ville D. Total activation: fMRI deconvolution through spatio-temporal regularization. Neuroimage. 2013;73:121-134. doi:10.1016/j.neuroimage.2013.01.067 Karahanoğlu FI, Van De Ville D. Transient brain activity disentangles fMRI resting-state dynamics in terms of spatially and temporally overlapping networks. Nat Commun. 2015;6:7751. Published 2015 Jul 16. doi:10.1038/ncomms8751

      Zoller DM, Bolton TAW, Karahanoglu FI, Eliez S, Schaer M, Van De Ville D. Robust Recovery of Temporal Overlap Between Network Activity Using Transient-Informed Spatio-Temporal Regression. IEEE Trans Med Imaging. 2019;38(1):291-302. doi:10.1109/TMI.2018.2863944

      We then fit these peaks with a GLM, with the time lag as a free parameter. We use several families of basis functions. In the original paper (Wu GR, Liao W, Stramaglia S, Ding JR, Chen H, Marinazzo D. A blind deconvolution approach to recover effective connectivity brain networks from resting state fMRI data. Med Image Anal. 2013;17(3):365-374. doi:10.1016/j.media.2013.01.003) we used canonical HRF and FIR (together with the rBETA, which is basically the portion of the BOLD peak exceeding a certain threshold, as in the Tagliazucchi paper above).

      We then included a mixture of gamma functions together with other families of basis functions in subsequent versions of the toolbox. Then we set up for validation of the approach with electrophysiological signatures, and that's where the doubts and pain kicked in. Some results on simultaneous EEG-fMRI, reported here (Wu G, Marinazzo D. 2015. Retrieving the Hemodynamic Response Function in resting state fMRI: methodology and applications. PeerJ PrePrints 3:e1317v1 https://doi.org/10.7287/peerj.preprints.1317v1 Wu GR, Deshpande G, Laureys S, Marinazzo D. Retrieving the Hemodynamic Response Function in resting state fMRI: Methodology and application. Conf Proc IEEE Eng Med Biol Soc. 2015;2015:6050-6053. doi:10.1109/EMBC.2015.7319771) were encouraging: for example we saw that the positive correlation between envelope of EEG and BOLD in the occipital cortex becomes more positive when we use instead the deconvolved BOLD and the EEG, while the negative correlation in the thalamus becomes more negative.

      Other things present in the PeerJ preprint were encouraging too (and I mention them since I think that they can be relevant to the validation of your approach): namely the retrieval of a simulated ground truth HRF within certain realistic limits of SNR and jitter, the correlation with cerebral blood flow (even though physiological regressors should always be taken into account, see: Wu GR, Marinazzo D. Sensitivity of the resting-state haemodynamic response function estimation to autonomic nervous system fluctuations. Philos Trans A Math Phys Eng Sci. 2016;374(2067):20150190. doi:10.1098/rsta.2015.0190 and this becomes even more relevant when considering aging and clinical datasets), and some similarity across resting state networks.

      So, the question is: can we really trust that peaks in M/EEG reflect the local pseudo-events that would originate the BOLD signal? Reading work by people who had thoroughly investigated this, e.g.

      Logothetis NK, Pauls J, Augath M, Trinath T, Oeltermann A. Neurophysiological investigation of the basis of the fMRI signal. Nature. 2001;412(6843):150-157. doi:10.1038/35084005

      Chen X, Sobczak F, Chen Y, et al. Mapping optogenetically-driven single-vessel fMRI with concurrent neuronal calcium recordings in the rat hippocampus. Nat Commun. 2019;10(1):5239. Published 2019 Nov 20. doi:10.1038/s41467-019-12850-x

      Yu X, He Y, Wang M, et al. Sensory and optogenetically driven single-vessel fMRI. Nat Methods. 2016;13(4):337-340. doi:10.1038/nmeth.3765

      and conversing with them, I got (almost) convinced that it's unlikely that spikes in coarsely recorded or reconstructed M/EEG signal can be one to one mapped to the HRF inducing events that we use in GLM (calcium or even better glutamate signal could be a better choice).

      Now, I like the way you associated HMM states with hemodynamic ones, thus adopting a more systemic/dynamical view, and taking fractional occupancy as a trigger. Do you think that these triggers can be better markers of BOLD-inducing neural events?

      Other issues:

      • What to make of events that are very close, and that would thus violate the assumption of linearity of the GLM?

      • Apart from hemodynamic changes, can aging be associated with different electrophysiological spectral features (both periodic and aperiodic), which in turn could influence the HMM analysis?

      • Detection of brain-behavior relationships with a non-huge dataset can be misleading, see for example this recent study:

      Towards Reproducible Brain-Wide Association Studies Scott Marek, Brenden Tervo-Clemmens, Finnegan J. Calabro, David F. Montez, Benjamin P. Kay, Alexander S. Hatoum, Meghan Rose Donohue, William Foran, Ryland L. Miller, Eric Feczko, Oscar Miranda-Dominguez, Alice M. Graham, Eric A. Earl, Anders J. Perrone, Michaela Cordova, Olivia Doyle, Lucille A. Moore, Greg Conan, Johnny Uriarte, Kathy Snider, Angela Tam, Jianzhong Chen, Dillan J. Newbold, Annie Zheng, Nicole A. Seider, Andrew N. Van, Timothy O. Laumann, Wesley K. Thompson, Deanna J. Greene, Steven E. Petersen, Thomas E. Nichols, B.T. Thomas Yeo, Deanna M. Barch, Hugh Garavan, Beatriz Luna, Damien A. Fair, Nico U.F. Dosenbach bioRxiv 2020.08.21.257758; doi: 10.1101/2020.08.21.257758

      • Why the parcellation in 38 regions? How are the results consistent/robust with finer parcellations?

      • You state that the DMN "is susceptible to aging and neurodegenerative disease". That's certainly probable, the thing is that DMN is possibly sensitive to everything and specific to a very few things.

      • Instead of a point-by-point statistical test, you could use the 3dMVM algorithm in AFNI (your reference 20) to test differences in the shape as a whole.

      • You analyse data from older subjects only. How confident can you be that you are observing effects specific to aging?

      Thanks for listening to this review version of "more of a comment than a question".

    1. Reviewer #3:

      The Aizenman lab has previously demonstrated the utility of Xenopus tectum as a model to examine neuronal, circuit and behavioral manifestations of VPA treatment, a teratogen associated with autism spectrum disorder in humans. In Gore et al., they demonstrate that the deficits induced by VPA treatment, including enhanced spontaneous and evoked neuronal activity, are blocked by pharmacological or morpholino based inhibition of MMP9. Inhibition of MMP9 also reverses the effects of VPA treatment on seizure susceptibility and the startle habituation response. Over-expression of MMP9 pheno-copies the effect of VPA, and inhibition of MMP9 in single tectal neuronal blocks the expression of experience-dependent structural plasticity. The results are convincing and add mechanistic insight into circuit and behavioral dysfunction induced by VPA signaling, as well as an expansion of the repertoire of plasticity mediated by MMP9 signaling.

      Minor points:

      -The time course for the introduction of VPA and MMP9 inhibitors should be reiterated in the results section.

      -Fig 1 Please report the number (or %) of tectal neurons in which MMP9 was over-expressed following whole-brain electroporation.

      -Does MMP9 transfection change the E/I ratio, as previously reported for VPA?

      -Does VPA or MMP9 inhibition change the initial large amplitude/short latency evoked response?

      Figure 2: please report statistics for total number of barrages or barrage distribution across experimental groups (latter also for Fig 3).

      Figs 3 and 5: The presentation of the immunoblots should clarify if raw or normalized (to Ponceau Blue) data were quantified.

      Fig 4: Please report a post hoc comparison following the repeated measures ANOVA

      Fig 5: Total growth and growth rates could also be included in the results section.

      Minor comments: -The discussion considers a broad range of potential targets of MMP9, including cell surface receptors, growth factors, adhesive proteins, and extracellular matrix components, many of these are left out of the abstract and introduction.

      -The statement of page 6 "Increased synaptic transmission observed in MMP9 over-expression tectal neurons is consistent with dysfunctional synaptic pruning" appears at odds with a body of literature in mouse hippocampus, included many papers cited in the discussion, demonstrating the role of MMP9 in spine elongation, synaptic potentiation and synapse maturation.