10,000 Matching Annotations
  1. Oct 2024
    1. Reviewer #3 (Public review):

      The authors convincingly show that SLC35G1 mediates uptake of citrate which is dependent on pH and chloride concentration. Putting their initial findings in a physiological context, they present human tissue expression data of SLC35G. Their Transwell assay indicates that SLC35G1 is a citrate exporter at the basolateral membrane.

      Weaknesses:

      The manuscript would benefit from the inclusion of the antibody validation results. Related to the localization of SLC35G1, the polyclonal antibody was not validated in the knockdown cells used in the study. This would strengthen the antibody validation, the localization results as well as the transport assay in 2C.

      Also, it is unclear why the Transwell assay was not performed upon knockdown of SLC35G1 to support the conclusions.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The current manuscript provides strong evidence that the molecular function of SLC35G1, an orphan human SLC transporter, is citrate export at the basolateral membrane of intestinal epithelial cells. Multiple lines of evidence, including radioactive transport experiments, immunohistochemical staining, gene expression analysis, and siRNA knockdown are combined to deduce a model of the physiological role of this transporter.

      Strengths:

      The experimental approaches are comprehensive, and together establish a strong model for the role of SLC35G1 in citrate uptake. The observation that chloride inhibits uptake suggests an interesting mechanism that exploits the difference in chloride concentration across the basolateral membrane.

      Weaknesses:

      Some aspects of the results would benefit from a more thorough discussion of the conclusions and/or model.

      For example, the authors find that SLC35G1 prefers the dianionic (singly protonated) form of citrate, and rationalize this finding by comparison with the substrate selectivity of the citrate importer NaDC1. However, this comparison has weaknesses when considering the physiological pH for SLC35G1 and NaDC1. NaDC1 binds citrate at a pH of ~5.4 (the pKa of citrate is 5.4, so there is a lot of dianionic citrate present under physiological circumstances). SLC35G1 binds citrate under pH conditions of ~7.5, where a very small amount of dianionic citrate is present. The data clearly show a pH dependence of transport, and the authors rule out proton coupling, but the discrepancy between the pH dependence and the physiological expectations should be addressed/commented on.

      Thank you for your insightful comment. Citrate exists mostly in its trianionic form under near neutral pH conditions in biological fluids, as you pointed out. Its dianionic form represents only a small portion (about 1/100) of total citrate due to the pKa. However, significant SLC35G1-specific uptake was observed under near neutral pH conditions (Figure 1G). Therefore, although SLC35G1-mediated citrate transport is less efficient under physiologically relevant near neutral pH conditions, it could still play a role particularly in the intestinal absorption process, in which the concentration gradient of dianionic citrate could be maintained by continuous supply by NaDC1-mediated apical uptake.

      The rationale for the series of compounds tested in Figure 1F, which includes metabolites with carboxylate groups, a selection of drugs including anion channel inhibitors and statins, and bile acids, is not described. Moreover, the lessons drawn from this experiment are vague and should be expanded upon. It is not clear what, if anything, the compounds that reduce citrate uptake have in common.

      Thank you for highlighting the need for clarity regarding the compounds tested in Figure 1F. The tested compounds were TCA cycle intermediates (fumarate, α-ketoglutarate, malate, pyruvate, and succinate) as substrate candidate carboxylates analogous to citrate, diverse anionic compounds (BSP, DIDS, probenecid, pravastatin, and taurocholate) as those that might be substrates or inhibitors, and diverse cationic compounds (cimetidine, quinidine, and verapamil) as those that are least likely to interact with SLC35G1. Among them, certain anionic compounds significantly reduced SLC35G1-specific citrate uptake, suggesting that they may interact with SLC35G1. However, we could not identify any structural features commonly shared by these compounds, except that they have anionic moieties. We acknowledge that it requires further elaboration to clarify such structural features. We have revised the relevant section on p. 3 (line 25 - 32) to include these.

      The transporter is described as a facilitative transporter, but this is not established definitively. For example, another possibility could involve coupling citrate transport to another substrate, possibly even chloride ion.

      Thank you for your insightful comment regarding the nature of SLC35G1's transport mechanism. While we have described SLC35G1 as a facilitative transporter based on our current data, we acknowledge that this has not been definitively proven, as you pointed out, and we cannot exclude the possibility that its sensitivity to extracellular Cl- might imply its operation as a citrate/Cl- exchanger. To examine the possibility, we would need to manipulate the chloride ion gradient across the plasma membrane. Particularly, generating an outward Cl- gradient to see if it could enhance citrate uptake could be a potential strategy. However, current techniques do not allow us to effectively generate the Cl- gradient, thus preventing us from conclusively verifying this possibility. We recognize the importance of further investigating this aspect in future studies. Your suggestion highlights an important area for additional research to fully understand the transport mechanism of SLC35G1. We have additionally commented on this issue on p. 4 (line 1 – 3).

      Reviewer #2 (Public Review):

      Summary:

      The primary goal of this study was to identify the transport pathway that is responsible for the release of dietary citrate from enterocytes into blood across the basolateral membrane.

      Strengths:

      The transport pathway responsible for the entry of dietary citrate into enterocytes was already known, but the transporter responsible for the second step remained unidentified. The studies presented in this manuscript identify SLC35G1 as the most likely transporter that mediates the release of absorbed citrate from intestinal cells into the serosal side. This fills an important gap in our current knowledge of the transcellular absorption of dietary citrate. The exclusive localization of the transporter in the basolateral membrane of human intestinal cells and the human intestinal cell line Caco-2 and the inhibition of the transporter function by chloride support this conclusion.

      Weaknesses:

      (i) The substrate specificity experiments have been done with relatively low concentrations of potential competing substrates, considering the relatively low affinity of the transporter for citrate. Given that NaDC1 brings in not only citrate as a divalent anion but also other divalent anions such as succinate, it is possible that SLC35G1 is responsible for the release of not only citrate but also other dicarboxylates. But the substrate specificity studies show that the dicarboxylates tested did not compete with citrate, meaning that SLc35G1 is selective for the citrate (2-), but this conclusion might be flawed because of the low concentration of the competing substrates used in the experiment.

      Thank you for your valuable comment on our substrate specificity experiments. As you pointed out, we cannot rule out the possibility that dicarboxylates might be recognized by SLC35G1 with low affinity as the tested concentration was relatively low. However, at the concentration of 200 μM, competing substrates with an affinity comparable to that of citrate could inhibit SLC35G1-specific citrate uptake by about 30%. Therefore, it is likely that the compounds that did not exhibit significant effect have no affinity or at least lower affinity than citrate to SLC35G1. Further studies should explore a broader range of concentrations for potential substrates including those with lower affinity. It would help clarify the substrate recognition characteristics of SLC35G1 and if it indeed has a unique preference for citrate over dicarboxylates. We have additionally mentioned that on p. 3, line 32 – 35.

      (ii) The authors have used MDCK cells for assessment of the transcellular transfer of citrate via SLC35G1, but it is not clear whether this cell line expresses NaDC1 in the apical membrane as the enterocytes do. Even though the authors expressed SLC35G1 ectopically in MDCK cells and showed that the transporter localizes to the basolateral membrane, the question as to how citrate actually enters the apical membrane for SLC35G1 in the other membrane to work remains unanswered.

      Thank you for highlighting this important aspect of our study. The mechanism of apical citrate entry in MDCKII cells is unknown, although NaDC1 or a similar transporter may be involved. However, this set of experiments have successfully demonstrated the basolateral localization of SLC35G1 and its operation for citrate efflux. Attempts to clarify the apical entry mechanism may need to be included in future studies for more detailed characterization of the model system using MDCKII cells. This would help in fully understanding the transcellular transport system for citrate. Investigation using Caco-2 cells or MDCKII cells double transfected with NaDC1 and SLC35G1 would also need to be induced in future studies to gain more definitive insights into the transcellular transport mechanism for citrate in the intestine, delineating the suggested cooperative role of NaDC1 and SLC35G1. We would be grateful for your understanding of our handling regarding this issue.

      (iii) There is one other transporter that has already been identified for the efflux of citrate in some cell types in the literature (SLC62A1, PLoS Genetics; 10.1371/journal.pgen.1008884), but no mention of this transporter has been made in the current manuscript.

      Thank you for bringing up the relevance of SLC62A1, which has recently been identified as a citrate efflux transporter in some cell types (PLoS Genet, 16, e1008884, 2020). We have now included comments on this transporter in Introduction (p. 2).

      Reviewer #3 (Public Review):

      Summary:

      Mimura et al describe the discovery of the orphan transporter SLC35G1 as a citrate transporter in the small intestine. Using a combination of cellular transport assays, they show that SLC35G1 can mediate citrate transport in small intestinal cell lines. Furthermore, they investigate its expression and localization in both human tissue and cell lines. Limited evidence exists to date on both SLC35G1 and citrate uptake in the small intestine, therefore this study is an important contribution to both fields. However, the main claims by the authors are only partially supported by experimental evidence.

      Strengths:

      The authors convincingly show that SLC35G1 mediates uptake of citrate which is dependent on pH and chloride concentration. Putting their initial findings in a physiological context, they present human tissue expression data of SLC35G. Their Transwell assay indicates that SLC35G1 is a citrate exporter at the basolateral membrane.

      Weaknesses:

      Further confirmation and clarification are required to claim that the SLC indeed exports citrate at the basolateral membrane as concluded by the authors. Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120 mM at the basolateral side). The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect. Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Thank you for highlighting these important points. We used the Cl--rich medium in transcellular transport studies, as stated in the relevant section in Meterials and Methods (p. 6, line 2 – 5). The Cl- concentration (144 mM) was comparable to the physiological concentration in extracellular body fluids. To clarify that experimental condition, we have additionally noted that in the text (p. 4, line 9) and the legends of Figs. 1K and 1L. The results indicate that basolaterally localized SLC35G1 can mediate citrate export effectively under the Cl--rich extracellular condition. The transport mechanism regulated by Cl- is unclear, but it is difficult to further clarify the mechanism at this time. We recognize the importance of further investigating the aspect in future studies, including the possibility that SLC35G1 might be a citrate/Cl- exchanger, as pointed out by Reviewer #1 (3rd comment).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The figures are very tiny and difficult to see. The inset in Figure 1C is much too small to be readable. I suggest enlarging the panels.

      Thank you for your feedback. As advised, we have enlarged the panels to improve visibility.

      Line 74: "certain anionic compounds signficantly inhibited SLC35G1-specific citrate uptake, indicating they are also recognized by SLC35G1." This sentence should be reworded since the mechanism is not clear. The word "reduced" would be a better option than "inhibited." Are there other interpretations besides SLC35G1 binding to explain the observations?

      Thank you for your suggestion. We have reworded the sentence to improve clarity (p. 3, line 30). It may be possible to speculate that they interact with SLC35G1, but the mechanisms are not clear yet.

      The manuscript is vague about how the transporter was discovered. If a screen of orphan transporters was performed to identify a citrate transporter, this should be described.

      Thank you for pointing out the need for more details regarding the discovery of the transporter. We have added some detailed description at the beginning of Results and Discussion (p. 3).

      Reviewer #2 (Recommendations For The Authors):

      Recommendations for the authors:

      (1) For transcellular transport of citrate and the role of SLC35G1, it would be better to use Caco-2 cells cultured on Transwells because these cells express NaDC1 in the apical membrane and the authors have shown that SLC35G1 is expressed in the basolateral membrane in this cell line. The mechanism for the entry of citrate into MDCK cells used in the present manuscript is not known. If the authors prefer to use MDCK cells because of their superior use for polarization, they can use a double transfection (NaDC1 and SLC35G1) to differentially express the two transporters in the apical versus and basolateral membrane and then use the cells for trans cellular transport of citrate.

      Please refer to our reply to your second review comment.

      (2) The substrate specificity experiments should use concentrations higher than 0.2 mM for competing dicarboxylates because the Km for citrate is only 0.5 mM. It is likely that NaDC1 brings in citrate and other dicarboxylates into enterocytes and then SLC35G1 mediates the efflux of these metabolic intermediates into blood.

      Please refer to our reply to your first review comment.

      (3) One major aspect of the transport function of this newly discovered citrate efflux transporter that has not been explored is the role of membrane potential in the transport function. The transporter is not coupled to Na or K or even H; so then the transport of citrate via this transporter must be electrogenic. Of course, this would be perfect for the transporter to function in the efflux of citrate because of the inside-negative membrane potential, but the authors need to show that the transporter is electrogenic. This can be examined through Caco-2 cells and/or MDCK cells expressing SLC35G1 and examining the impact of changes in membrane potential (valinomycin and K) on the transport of citrate.

      Thank you for your suggestion. As shown in Figure 1D, the use of K-gluconate in place of Na-gluconate, which induces plasma membrane depolarization, had no impact on the specific uptake of citrate, suggesting that SLC35G1-mediated citrate transport is independent of membrane potential. We have additionally mentioned this on p. 3 (line 21 – 24).

      (4) The localization studies mention Na/K ATPase component as a basolateral membrane marker, but the text describes it as BCRP. This needs to be corrected.

      Thank you for pointing out the mistake. We have corrected that. The marker was ATP1A1.

      Reviewer #3 (Recommendations For The Authors):

      Major points:

      (1) Most experiments measure citrate uptake, but the authors state that SLC35G1 is an exporter, mostly based on the lack of uptake at physiological conditions faced at the basolateral side. The Transwell assay in Figure 1L is the only evidence that it indeed is an exporter. However, in this experiment, the applied chloride concentration was not according to the proposed model (120mM at basolateral side). Why was this chloride concentration not mimicked accordingly in the Transwell assay?

      (2) The Transwell assay, or a similar assay measuring export instead of import, should be carried out in knockdown cells to prove that the export indeed occurs through SLC35G1 and not through an indirect effect.

      (3) Related to the mentioned chloride sensitivity, it is unclear how the proposed model works if the SLC faces high chloride conditions under physiological conditions though it is inhibited by chloride.

      Please refer to our reply to your review comments.

      Related to the localization of SLC35G1:

      (4) The polyclonal antibody against SLC35G1 should be validated to prove the specificity. This should be relatively straightforward given the authors have SLC35G1 knockdown cells.

      Thank you for your suggestion. To validate the specificity of the polyclonal antibody against SLC35G1, we prepared HEK293 cells transiently expressing SLC35G1 and SLC35G1 tagged with a FLAG epitope at the C-terminus (SLC35G1-FLAG). In the immunostained images, whereas only SLC35G1-FLAG was stained with the anti-FLAG antibody, both SLC35G1 and SLC35G1-FLAG were stained with the anti-SLC35G1 antibody, indicating that the anti-SLC35G1 antibody can recognize SLC35G1. In addition, the localization patterns of SLC35G1-FLAG observed with both antibodies were consistent, indicating furthermore that the anti-SLC35G1 antibody can recognize SLC35G1 specifically. Based on all these, the specificity of the anti-SLC35G1 antibody was validated.

      Author response image 1.

      (5) To strengthen the data on the localization of SLC35G1, the cell lines should be co-stained with a plasma membrane marker as well, not just in tissue with ATP1A1. In polarized cells co-staining with apical and basolateral markers should be applied.

      SLC35G1 was indicated to be localized to the basolateral membrane geometrically in both polarized MDCKII and Caco-2 cells. This finding aligns with its basolateral localization indicated by its colocalization with ATP1A1 in the human small intestinal section. These results are we consider sufficient to support the basolateral localization characteristics of SLC35G1.

      General points:

      (6) In the abstract the authors mention that they focus on highly expressed orphan transporters in the small intestine as candidates. However, no other candidates are mentioned or discussed in the study. Consequently, this should be rephrased.

      Thank you for the advice. Also taking into consideration the third recommendation point by Reviewer #1, we have added some detailed description at the beginning of Results and Discussion (p. 3).

      (7) As far as mentioned there is exactly one (other) publication on SLC35G1 (10.1073/pnas.1117231108). The authors should discuss this only publication with functional data on SLC35G1 in more detail. How do the authors integrate their findings with the existing knowledge? For example, why did the authors not investigate the impact of Ca2+ on SLC35G1 transport?

      Thank you for your suggestion. SLC35G1 was indicated to be mainly localized to the endoplasmic reticulum (ER) in the earlier study, in which SLC35G1 was tagged with GFP. A possibility is that SLC35G1 was wrongly directed to ER due to the modulation in the study. We have additionally mentioned this possibility in the relevant section (p. 3, line 9 – 11). We have also revised a relevant sentence on p. 3 (line 5).

      With regard to another point that GFP-tagged SLC35G1 was indicated to interact with STIM1, we examined its effect on SLC35G1-mediated citrate uptake supplementary. As shown in the accompanying figure, coexpression of HA-tagged STIM1 did not affect the elevated citrate uptake induced by FLAG-tagged SLC35G1, indicating that STIM1 has no impact on citrate transport function of SLC35G1 at the plasma membrane.

      Author response image 2.

      (A) Effect of the coexpression of HA-tagged STIM1 on [14C]citrate (1 μM) uptake by FLAG-tagged SLC35G1 transiently expressed in HEK293 cells. The uptake was evaluated for 10 min at pH 5.5 and 37°C. Data represent the mean ± SD of three biological replicates. Statistical differences were assessed using ANOVA followed by Dunnett’s test. *, p < 0.05 compared with the control (gray bar). (B) Western blot analysis was conducted by probing for the HA and FLAG tags, using the whole-cell lysate samples (10 µg protein aliquots) prepared from cells expressing HA-STIM1 and/or FLAG-SLC35G1. The blots of β-actin are shown for reference.

      (8) Generally, the introduction could provide more background.

      In response to your suggestion and also to the third review comment from Reviewer #2, we have now additionally included comments on SLC62A1, which has recently been reported as a citrate efflux transporter in some cell types, in Introduction.

      Minor points:

      (9) There is a typo in Figure 1D: manniotol instead of mannitol.

      Thank you for pointing that out. We have corrected the typo in Figure 1D.

      (10) Figure 1J: The resolution is low and the localization to the basolateral membrane is not conclusive based on this image. It seems rather localized at the whole membrane and intracellularly too.

      Thank you for your feedback. We have enhanced the resolution of the image and also enlarged it to improve clarity and make the basolateral membrane localization more discernible.

      (11) Figure 1K: Clarification is needed if the experiment was performed in the Transwell plate. Based on the results from the pH titration experiment, it is expected that there is no uptake at pH7.4. Therefore, this experiment does not seem to provide additional evidence or support the conclusions drawn related to cellular polarization.

      Please refer to our reply to your review comments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Galanti et al. present an innovative new method to determine the susceptibility of large collections of plant accessions towards infestations by herbivores and pathogens. This work resulted from an unplanned infestation of plants in a greenhouse that was later harvested for sequencing. When these plants were extracted for DNA, associated pest DNA was extracted and sequenced as well. In a standard analysis, all sequencing reads would be mapped to the plant reference genome and unmapped reads, most likely originating from 'exogenous' pest DNA, would be discarded. Here, the authors argue that these unmapped reads contain valuable information and can be used to quantify plant infestation loads.

      For the present manuscript, the authors re-analysed a published dataset of 207 sequenced accessions of Thlaspi arvense. In this data, 0.5% of all reads had been classified as exogenous reads, while 99.5% mapped to the T. arvense reference genome. In a first step, however, the authors repeated read mapping against other reference genomes of potential pest species and found that a substantial fraction of 'ambiguous' reads mapped to at least one such species. Removing these reads improved the results of downstream GWAs, and is in itself an interesting tool that should be adopted more widely.

      The exogenous reads were primarily mapped to the genomes of the aphid Myzus persicae and the powdery mildew Erysiphe cruciferarum, from which the authors concluded that these were the likely pests present in their greenhouse. The authors then used these mapped pest read counts as an approximate measure of infestation load and performed GWA studies to identify plant gene regions across the T. arvense accessions that were associated with higher or lower pest read counts. In principle, this is an exciting approach that extracts useful information from 'junk' reads that are usually discarded. The results seem to support the authors' arguments, with relatively high heritabilities of pest read counts among T. arvense accessions, and GWA peaks close to known defence genes. Nonetheless, I do feel that more validation would be needed to support these conclusions, and given the radical novelty of this approach, additional experiments should be performed.

      A weakness of this study is that no actual aphid or mildew infestations of plants were recorded by the authors. They only mention that they anecdotally observed differences in infestations among accessions. As systematic quantification is no longer possible in retrospect, a smaller experiment could be performed in which a few accessions are infested with different quantities of aphids and/or mildew, followed by sequencing and pest read mapping. Such an approach would have the added benefit of allowing causally linking pest read count and pest load, thereby going beyond correlational associations.

      On a technical note, it seems feasible that mildew-infested leaves would have been selected for extraction, but it is harder to explain how aphid DNA would have been extracted alongside plant DNA. Presumably, all leaves would have been cleaned of live aphids before they were placed in extraction tubes. What then is the origin of aphid DNA in these samples? Are these trace amounts from aphid saliva and faeces/honeydew that were left on the leaves? If this is the case, I would expect there to be substantially more mildew DNA than aphid DNA, yet the absolute read counts for aphids are actually higher. Presumably read counts should only be used as a relative metric within a pest organism, but this unexpected result nonetheless raises questions about what these read counts reflect. Again, having experimental data from different aphid densities would make these results more convincing.

      We agree with the reviewer that additional aphid counts at the time of (or prior to) sequencing would have been ideal, but unfortunately we do not have these data. However, compared to such counts one strength of our sequencing-based approach is that it (presumably) integrates over longer periods than a single observation (e.g. if aphid abundances fluctuated, or winged aphids visited leaves only temporarily), and that it can detect pathogens even when invisible to our eyes, e.g. before a mildew colony becomes visible. Moreover, the key point of our study is that we can detect variation in pest abundance even in the absence of count data, which are really time consuming to collect.

      Conducting a new experiment, with controlled aphid infestations and continuous monitoring of their abundances, to test for correlation between pest abundance and the number of detected reads would require resequencing at least 30-50% of the collection for the results to be reliable. It would be a major experimental study in itself.

      Regarding the origin of aphid reads and the differences in read-counts between e.g. aphids and mildew, we believe this should not be of concern. DNA contamination is very common in all kinds of samples, but these reads are simply discarded in other studies. For example, although we collected and handled samples using gloves, MG-RAST detected human reads (Hominidae, S2 Table), possibly from handling the plants during transplanting or phenotyping 1-2 weeks before sequencing. Therefore, although we did remove aphids from the leaves at collection, aphid saliva or temporary presence on leaves must have been enough to leave detectable DNA traces. Additionally, the fact that the M. persicae load strongly correlates with the Buchnera aphidicola load (R2\=0.86, S6 Table), is reassuring. This obligate aphid symbiont is expected to be found in high amounts when sequencing aphids (see e.g. The International Aphid Genomics Consortium (2010))

      The higher amount of aphid compared to mildew reads, can probably be explained by aphids having expanded more than mildew at the time of plant collection, but most importantly, as already mentioned by the reviewer, the read-counts were meant to compare plant accessions rather then pests to one another. We are interested in relative not absolute values. Comparisons between pest species are a challenge because they can be influenced by several factors such as the availability of sequences in the MG-RAST database and the DNA extraction kit used, which is plant-specific and might bias towards certain groups. All these potential biases are not a concern when comparing different plants as they are equally subject to these biases.

      Reviewer #2 (Public Review):

      Summary:

      Galanti et al investigate genetic variation in plant pest resistance using non-target reads from whole-genome sequencing of 207 field lines spontaneously colonized by aphids and mildew. They calculate significant differences in pest DNA load between populations and lines, with heritability and correlation with climate and glucosinolate content. By genome-wide association analyses they identify known defence genes and novel regions potentially associated with pest load variation. Additionally, they suggest that differential methylation at transposons and some genes are involved in responses to pathogen pressure. The authors present in this study the potential of leveraging non-target sequencing reads to estimate plant biotic interactions, in general for GWAS, and provide insights into the defence mechanisms of Thlaspi arvense.

      Strengths:

      The authors ask an interesting and important question. Overall, I found the manuscript very well-written, with a very concrete and clear question, a well-structured experimental design, and clear differences from previous work. Their important results could potentially have implications and utility for many systems in phenotype-genotype prediction. In particular, I think the use of unmapped reads for GWAS is intriguing.

      Thank you for appreciating the originality and potential of our work.

      Weaknesses:

      I found that several of the conclusions are incomplete, not well supposed by the data and/or some methods/results require additional details to be able to be judged. I believe these analyses and/or additional clarifications should be considered.

      Thank you very much for the supportive and constructive comments. They helped us to improve the manuscript.

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      The authors address an interesting and significant question, with a well-written manuscript that outlines a clear experimental design and distinguishes itself from previous work. However, some conclusions seem incomplete, lacking sufficient support from the data, or requiring additional methodological details for proper evaluation. Addressing these limitations through additional analyses or clarifications is recommended.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      - So far it is not clear to me how read numbers were normalised and quantified. For instance, Figure 1C only reports raw read numbers. In L149: "Prior to these analyses, to avoid biases caused by different sequencing depths, we corrected the read counts for the total numbers of deduplicated reads in each library and used the residuals as unbiased estimates of aphid, mildew and microbe loads". Was library size considered? Is the load the ratio between exogenous vs no exogenous reads? It is described in L461, but according to this, read counts were normalised and duplicated reads were removed. Now, why read counts were used? As opposite to total coverage / or count of bases per base? I cannot follow how variation in sequencing quality was considered. I can imagine that samples with higher sequencing depth will tend to have higher exogenous reads (just higher resolution and power to detect something in a lower proportion).

      Correcting for sequencing depth/library size is indeed very important. As the reviewer noted, we had explained how we did this in the methods section (L464), and we now also point to it in the results (L151):

      “Finally, we log transformed all read counts to approximate normality, and corrected for the total number of deduplicated reads by extracting residuals from the following linear model, log(read_count + 1) ∼ log(deduplicated_reads), which allowed us to quantify non-Thlaspi loads, correcting for the sequencing depth of each sample.”

      We showed the uncorrected read-counts only in Fig 1 to illustrate the orders of magnitude but used the corrected read-counts (also referred to as “loads”) for all subsequent analyses.

      In our view, theoretically, the best metric to correct the number of reads of a specific contaminant organism, is the total number of DNA fragments captured. Importantly, this is not well reflected by the total number of raw reads because of PCR and optical duplicates occurring during library prep and sequencing. For this reason we estimated the total number of reads captured multiplying total raw reads (after trimming) by the deduplication rate obtained from FastQC (methods L409-411). This metric reflects the amount of DNA fragments sampled better than the raw reads. Also it better reflects MG-RAST metrics as this software also deduplicates reads (Author response image 1 below). We also removed duplicates in our strict mappings to the M. persicae and B. aphidicola genomes.

      Coverage is not a good option for correction, because it is defined for a specific reference genome and many of the read-counts output by MG-RAST do not have a corresponding full assembly. Moreover, coverage and base counts are influenced by read size, which depends on library prep and is not included in the read-counts produced by MG-RAST.

      Author response image 1.

      Linear correlations between the number of MG-RAST reads post-QC and either total (left) or deduplicated (right) reads from fastq files of four full samples (not only unmapped reads).

      - The general assumption is that plants with different origins will have genetic variants or epigenetic variations associated with pathogen resistance, which can be tracked in a GWAS. However, plants from different regions will also have all variants associated with their origin (isolation by state as presented in the manuscript). In line 169: "Having established that our method most likely captured variation in plant resistance, we were interested in the ecological drivers of this variation". It is not clear to me how variation in plant resistance is differentiated from geographical variation (population structure). in L203: "We corrected for population structure using an IBS matrix and only tested variants with Minor Allele Frequency (MAF) > 0.04 (see Methods).". However, if resistant variants are correlated with population structure as shown in Table 1, how are they differentiated? In my opinion, the analyses are strongly limited by the correlation between phenotype and population structure.

      The association of any given trait with population structure is surely a very important aspect in GWAS studies and when looking at correlations of traits with environmental variables. If a trait is strongly associated with population structure, then disentangling variants associated with population structure vs. the ones associated with the trait can indeed be challenging, a good example being flowering time in A. thaliana (e.g. Brachi et al. 2013).

      In our case, although the pest and microbiome loads are associated with population structure to some extent, this association is not very strong. This can be observed for example in Fig. 1C, where there is no clear separation of samples from different regions. This means that we can correct for population structure (in both GWAS and correlations with climatic variables) without removing the signals of association. It is possible that other associations were missed if specific variants were indeed strongly associated with structure, but these would be unreliable within our dataset, so it is prudent to exclude them.

      - Similarly, in L212: "we still found significant GWA peaks for Erysiphales but not for other types of exogenous reads (excluding isolated, unreliable variants) (Figure 3A and S3 Figure)." In a GWA analysis, multiple variants will constitute an association pick (as shown for instance in main Figure 3A) only when the pick is accentuated by lockage disequilibrium around the region under selection (or around the variant explaining phenotypic variation in this case). However, in this case, I suspect there is a strong component of population structure (which still needs to be corroborated as suggested in the previous comment). But if variants are filtered by population structure, the only variants considered are those polymorphic within populations. In this case, I do not think clear picks are expected since most of the signal, correlated with population has been removed. Under this scenario, I wonder how informative the analyses are.

      As mentioned above, the traits we analyse (aphid and mildew loads) are only partially associated with population structure. This is evident from Fig. 1C (see answer above) but also from the SNP-based heritability (Table 1, last column) which measures indeed the proportion of variance explained by genetic population structure. Although some variance is explained (i.e. the reviewer is correct that there is some association) there is still plenty of leftover variance to be used for GWAS and correlations with environmental variables. The fact that we still find GWAS peaks confirms this, as otherwise they would be lost by the population structure correction included in our mixed model.

      - How were heritability values calculated? Were related individuals filtered out? I suggest adding more detail in both the inference of heritability and the kinship matrix (IBS matrix). Currently missing in methods (for heritability I only found the mention of an R package in the caption of Table 1).

      We somehow missed this in the methods and thank the reviewer for noticing. We now added this paragraph to the chapter “Exogenous reads heritability and species identification”:<br /> “To test for variation between populations we used a general linear model with population as a predictor. To measure SNP-based heritability, i.e. the proportion of variance explained by kinship, we used the marker_h2() function from the R package heritability (Kruijer and Kooke 2019), which uses a genetic distance matrix as predictor to compute REML-estimates of the genetic and residual variance. We used the same IBS matrix as for GWAS and for the correlations with climatic variables.”

      We also added the reference to the R package heritability to the Table 1 caption.

      - Figure 2C. in line 188: "Although the baseline levels of benzyl glucosinolates were very low and probably sometimes below the detection level, plant lines where benzyl glucosinolate was detected had significantly lower aphid loads (over 70% less reads) in the glasshouse (Figure 3C)". It is not clear to me how to see these values in Figure 2C. From the boxplot, the difference in aphid loads between detected and not detected benzyl seems significantly lower. From the boxplot distribution is not clear how this difference is statistically significant. It rather seems like a sampling bias (a lot of non-detected vs low detected values). Is the difference still significant when random subsampling of groups is considered?

      Here the “70% less reads” refers to the uncorrected read-counts directly (difference in means between samples where benzyl-GS were detected vs. not). We agree with the reviewer that this is confusing when referred to figure 2C which depicts the corrected M. persicae load (residuals). We therefore removed that information.

      Regarding the significance of the difference, we re-calculated the p value with the Welch's t-test, which accounts for unequal variances, and with a bootstrap t-test. Both tests still found a significant difference. We now report the p value of the Welch’s t-test.

      - I think additional information regarding the read statistics needs to be improved. At the moment some sections are difficult to follow. I found this information mainly in Supplementary Table 1. I could not follow the difference in the manuscript and supplementary materials between read (read count), fragment, ambiguous fragments, target fragments, etc. I didn't find information regarding mean coverage per sample and relative plant vs parasite coverage. This lack of clarity led me to some confusion. For instance, in L207: "We suspected that this might be because some non-Thlaspi reads were very similar to these highly conserved regions and, by mapping there, generated false variants only in samples containing many non-Thlaspi reads". I find it difficult to follow how non-Thlaspi reads will interfere with genotyping. I think the fact that the large pick is lost after filtering reads is already quite insightful. However, in principle I would expect the relative coverage between non-Thlaspi:Thlaspi reads to be rather low in all cases. I would say below 1%. Thus, genotyping should be relatively accurate for the plant variants for the most part. In particular, considering genotyping was done with GATK, where low-frequency variants (relative coverage) should normally be called reference allele for the most part.

      We agree with the reviewer that some clarification over these points is necessary! We modified Supplementary Table 1 to include coverage information for all samples before and after removal of ambiguous reads and explained thoroughly how each value in the table was obtained. Regarding reads and fragments, we define each fragment as having two reads (R1 and R2). The classification into Target, Ambiguous and Unmapped reads was based on fragments, so we used that term in the table, but referring to reads has the same meaning in this context as for example an unmapped read is a read whose fragment was classified as unmapped.

      We did not include the pest coverage specifically, because this cannot be calculated for any of the read counts obtained with MG-RAST as this tool is mapping to online databases where genome size is not necessarily known. What is more meaningful instead are the read counts, which are in Supplementary tables 2 and 6. Importantly as mentioned in other answers, if different taxa are differently represented in the databases this does not affect the comparison of read counts across different samples, but only the comparison of different taxa which was not used for any further analyses.

      Regarding the ambiguous reads causing unreliable variants, these occur only in very few regions of the Thlaspi genome that are highly conserved in evolution or of very low complexity. In these regions reads generated from both plant or for instance aphid DNA, can map, but the ones from aphid might contain variants when mapping to the Thlaspi reference genome (L207 and L300). The reviewer is right that there is only a very small difference in average coverage when removing those ambiguous reads (~1X, S1 Table), but that is not true for those few regions where coverage changes massively when removing ambiguous reads as shown on the right side Y axes of S2 Figure. Therefore these unreliable variants are not low-frequency and therefore not removed by GATK.

      - L215. I am not very convinced with the enrichment analyses, justified with a reference (52). For instance, how many of the predicted picks are not close to resistance genes? How was the randomisation done? At the moment, the manuscript reads rather anecdotally by describing only those picks that effectively are "close" to resistance genes. For instance, if random windows (let's say 20kb windows) are sampled along the genome, how often there are resistant genes in those random windows, and how is the random sampling compared with observed picks (windows).

      Enrichment is by definition an increase in the proportion of true positives (observed frequency: proportion of significant SNPs located close to a priori candidate genes) compared to the background frequency (number of all SNPs located close to a priori candidate genes). So the background likelihood of SNPs to fall into a priori candidate SNPs (i.e. the occurrence of a priori candidate genes in randomly sampled windows, as suggested by the reviewer) is already taken into account as the background frequency. We now explained more extensively how enrichment is calculated in the relevant methods section (L545-549), but it is an extensively used method, established in a large body of literature, so it can be found in many papers (e.g. Atwell et al. 2010, Brachi et al. 2010, Kawakatsu et al. 2016, Kerdaffrec et al. 2017, Sasaki et al. 2015-2019-2022, Galanti et al. 2022, Contreras-Garrido et al. 2024).

      Although we had already calculated an upper bound for the FDR based on the a priori candidates, as in previous literature, we now further calculated the significance of the enrichment for the Bonferroni-corrected -log(p) threshold for Erysiphales. Calculating significance requires adopting a genome rotation scheme that preserves the LD structure of the data, as described in the previously mentioned literature (eg. Kawakatsu et al. 2016, Sasaki et al. 2022). Briefly, we calculated a null distribution of enrichments by randomly rotating the p values and a priori candidate status of the genetic variants within each chromosome, for 10 million permutations. We then assessed significance by comparing the observed enrichment to the null distribution. We found that the enrichment at the Bonferroni corrected -log(p) threshold is indeed significant for Erysiphales (p = 0.016). We added this to the relevant methods section and the code to the github page.

      In addition, many other genes very close (few kb max) to significant SNPs were not annotated with the “defense response” GO term but still had functions relatable to it. Some examples are CAR8, involved in ABA signalling, PBL7 in stomata closure and SRF3 in cell wall building and stress response  (Fig 3D). This means that our enrichment is actually most likely underestimated compared to if we had a more complete functional annotation.

      - L247. Additional information is needed regarding sampling. It is not clear to me why methylation analyses are restricted to 20 samples, contrary to whole genome analyses.

      The sampling is best described in the original paper (on natural DNA methylation variation; Galanti et al. 2022), although the most important parts are repeated in the first chapter of the methods.<br /> Regarding methylation analysis, they are not restricted to 20 samples. Only the DMR calling was restricted to the 20 vs. 20 samples with the most divergent values (of pest loads) to identify regions of variation. This analysis was used to subset the genome to potential regions associated with pest presence rather than thoroughly testing actual methylation variants associated with pest presence. The latter was done in the second step, EWAS, which was based on the whole dataset with the exclusions of samples with high non-conversion rate. This left 188 samples for EWAS. We added this number in the new manuscript (L251 and L571).

      To clarify, we made a few additions to the results (L250) and methods (last two subchapters) sections, where we explain the above.

      - No clear association with TEs: in L364: "Erysiphales load was associated with hypomethylated Copia TEs upstream of MAPKKK20, a gene involved in ABA-mediated signaling and stomatal closure. Since stomatal closure is a known defense mechanism to block pathogen access (21), it is tempting to conclude that hypomethylation of the MAPKKK20 promoter might induce its overexpression and consequent stomatal closure, thereby preventing mildew access to the leaf blade. Overall, we found associations between pathogen load and TE methylation that could act both in cis (eg. Copia TE methylation in MAPKKK20 promoter) and in trans, possibly through transposon reactivation (eg. LINE, Helitron, and Ty3/Gypsi TEs isolated from genes)." I find the whole discussion related to transposable elements, first, rather anecdotical, and second very speculative. To claim: "Overall, we found associations between pathogen load and TE methylation", I believe a more detailed analysis is needed. For instance, how often there is an association? In general, there are some rather anecdotical examples, several of which are presented as association with pathogen load on the basis of being "in proximity" to a particular region/pick. The same regions contain multiple other genes and annotations, but the authors limit the discussion to the particular gene or TE concordant with the hypothesis. This is for both the discussion and results sections.

      Here we are referring to associations in a purely statistical sense. The fact that “Overall, we found associations between pathogen load and TE methylation” is simply a conclusion drawn from Fig. 4b, without implying any causality. Some methylation variants are statistically associated with the traits (aphid or mildew loads), and whether they are true positives or causal is of course more difficult to assess.

      Regarding the methylation variants associated with mildew load in proximity of MAPKKK20, those are the only two significant ones, located close to each other and close to many other variants that, although not significant, have low P-values (Author response image 2 below), so it is the most obvious association warranting further exploration. The reviewer is correct that there are other genes flanking the large DMR that covers the TEs (Fig. 4D), but the DMR is downstream of these genes, so less likely to affect their transcription.

      Author response image 2.

      Regarding all other associations found with M. persicae load, we stated that these are not really reliable due to a skewed P-value distribution (L269, S5B Fig), but we think that for future reference it is still worth reporting the closeby genes and TEs.

      We slightly changed the wording of the passage the reviewer is citing above to make it clearer that we are only offering potential explanations for the associations we observe with TE methylation, but by no means we state that TE reactivation is surely what is happening.

      - One conclusion in the manuscript is that DMRs have been mostly the result of hypomethylation. This is shown for instance in supplementary Figure 4. However, no general statistic is shown of methylation distribution (not only restricted to DMRs). Was the ratio methylation over de-methylation proportional along the genome? Thus the finding in DMRs is out of the genome-wide distribution? Or on the contrary, the DMRs are just a random sampling of the global distribution. The same for different annotated regions. For instance, I would expect that in general coding regions would be less methylated (not restricted to DMRs).

      Complete and exhaustive analyses of the methylomes were already published in the original manuscript (Galanti et al 2022). However, the variation among these methylomes is complex and influenced by multiple factors including genetic background and environment of origin, and talking about these things would have been beyond the scope of our paper. In this paper, we just took advantage of the existing methylome information to identify the few genomic regions that are consistently differentially methylated between samples with extreme values of pest loads. As for the GWAS, the phenotypes are only partially associated with population structure, so the 20 samples with the lowest and the 20 with the highest pathogen loads are not e.g. all Swedish vs. all German but they are a mixture, which allowed us to correct for population structure running EWAS with a mixed model that includes a genetic distance matrix.

      In this study we called DMRs between two defined groups: samples with the lowest amounts of pathogen DNA (not-infected; the “control” group) vs. samples with the highest amounts of pathogens (infected or the “treatment” group), so we could define a directionality (“hyper vs. “hypo” methylation). However, this is not the case for population DMRs called between many different combinations of populations. This is why the hyper- and hypomethylated regions found here cannot be compared to the genome-wide averages, which are influenced by other factors than the pathogens. Even with relaxed thresholds we indeed found very few DMRs associated to pathogen presence here.

      Specifically about coding regions, the reviewer is correct that they are less methylated, especially because T. arvense has largely lost gene body methylation (Nunn et al. 2021, Galanti et al. 2022), but this is unrelated and was discussed in the original publication (Galanti et al. 2022).

      Minor comments:- Figure 1B: it would be good to add also percentage values.

      As the figure is already tightly packed, we rather keep it simple. As the chart gives a good impression of frequencies of different kingdoms, and the frequences of several relevant groups. Also, as explained in a previous answer, comparing different taxonomic groups could be imprecise (as opposed to comparing the same group between different samples), so exact percentages seem unnecessary. If needed, the exact percentages can still be calculated from S2 Table.

      - L159: It is not clear to me what "enemy variation" is referring to here.

      We are referring to variation in enemy densities (attack rates) in the field, that could potentially be carried over to the greenhouse to cause the patterns of infection we observed. We changed it to “variation in enemy densities” to make it more clear.

      - L259: "In accordance with previous studies (8,9), most DMRs were hypomethylated in the affected samples, indicating that genes needed for defense might be activated through demethylation". Not clear to me what "affected samples" is referring to. Samples with lower load?

      Affected samples have a higher load of pathogen reads. We changed it to “infested” to make it more clear.

      - L336. Figure should be Fig 3E.

      We fixed it, thanks for noticing.

      ADDITIONAL CHANGES

      We updated reference 43 to point to the published paper rather than the preprint.

      We corrected the phenotype names in S3 Fig, to make them consistent with the rest of the manuscript and increased font size on the axes to make it more readable.

    2. Reviewer #1 (Public Review):

      Galanti et al. present an innovative new method to determine the susceptibility of large collections of plant accessions towards infestations by herbivores and pathogens. This work resulted from an unplanned infestation of plants in a greenhouse that was later harvested for sequencing. When these plants were extracted for DNA, associated pest DNA was extracted and sequenced as well. In a standard analysis, all sequencing reads would be mapped to the plant reference genome and unmapped reads, most likely originating from 'exogenous' pest DNA, would be discarded. Here, the authors argue that these unmapped reads contain valuable information and can be used to quantify plant infestation loads.

      For the present manuscript, the authors re-analysed a published dataset of 207 sequenced accessions of Thlaspi arvense. In this data, 0.5% of all reads had been classified as exogenous reads, while 99.5% mapped to the T. arvense reference genome. In a first step, however, the authors repeated read mapping against other reference genomes of potential pest species and found that a substantial fraction of 'ambiguous' reads mapped to at least one such species. Removing these reads improved the results of downstream GWAs, and is in itself an interesting tool that should be adopted more widely.

      The exogenous reads were primarily mapped to the genomes of the aphid Myzus persicae and the powdery mildew Erysiphe cruciferarum, from which the authors concluded that these were the likely pests present in their greenhouse. The authors then used these mapped pest read counts as an approximate measure of infestation load and performed GWA studies to identify plant gene regions across the T. arvense accessions that were associated with higher or lower pest read counts. In principle, this is an exciting approach that extracts useful information from 'junk' reads that are usually discarded. The results seem to support the authors' arguments, with relatively high heritabilities of pest read counts among T. arvense accessions, and GWA peaks close to known defence genes. Nonetheless, I do feel that more validation would be needed to support these conclusions, and given the radical novelty of this approach, additional experiments should be performed.

      A weakness of this study is that no actual aphid or mildew infestations of plants were recorded by the authors. They only mention that they anecdotally observed differences in infestations among accessions. As systematic quantification is no longer possible in retrospect, a smaller experiment could be performed in which a few accessions are infested with different quantities of aphids and/or mildew, followed by sequencing and pest read mapping. Such an approach would have the added benefit of allowing causally linking pest read count and pest load, thereby going beyond correlational associations.

      On a technical note, it seems feasible that mildew-infested leaves would have been selected for extraction, but it is harder to explain how aphid DNA would have been extracted alongside plant DNA. Presumably, all leaves would have been cleaned of live aphids before they were placed in extraction tubes. What then is the origin of aphid DNA in these samples? Are these trace amounts from aphid saliva and faeces/honeydew that were left on the leaves? If this is the case, I would expect there to be substantially more mildew DNA than aphid DNA, yet the absolute read counts for aphids are actually higher. Presumably read counts should only be used as a relative metric within a pest organism, but this unexpected result nonetheless raises questions about what these read counts reflect. Again, having experimental data from different aphid densities would make these results more convincing.

      Comments on revised version:

      The authors have addressed many technical details in their revision, but they did not address my more fundamental concerns about validation of their results. I still believe that validation would be needed, but I also acknowledge that an additional experiment that reliably tests a causal relationship between read counts and pest abundance would go beyond the scope of a revision. Nonetheless, the authors currently only show variation in pest read counts among plant accessions, not in pest abundance. While the two measures are likely correlated, I hope that future studies will address more directly how pest abundance and read counts are causally linked, and whether pest read counts truly are a robust measure of pest abundance across a range of conditions and systems

    3. Reviewer #2 (Public Review):

      Summary:

      Galanti et al investigate genetic variation in plant pest resistance using non-target reads from whole-genome sequencing of 207 field lines spontaneously colonized by aphids and mildew. They calculate significant differences in pest DNA load between populations and lines, with heritability and correlation with climate and glucosinolate content. By genome-wide association analyses they identify known defence genes and novel regions potentially associated with pest load variation. Additionally, they suggest that differential methylation at transposons and some genes are involved in responses to pathogen pressure. The authors present in this study the potential of leveraging non-target sequencing reads to estimate plant biotic interactions, in general for GWAS, and provide insights into the defence mechanisms of Thlaspi arvense.

      Strengths:

      The authors ask an interesting and important question. Overall, I found the manuscript very well-written, with a very concrete and clear question, a well-structured experimental design, and clear differences from previous work. Their important results could potentially have implications and utility for many systems in phenotype-genotype prediction. In particular, I think the use of unmapped reads for GWAS is intriguing.

      Comments on revised version:

      The revisions to the manuscript have significantly enhanced its clarity and scientific rigor. Methodological clarifications, especially regarding the normalization of read counts, now provide a stronger foundation for the presented results. Statistical enhancements, including more robust methods for controlling population structure and refined GWAS approaches, have solidified the reliability of the findings, effectively linking genetic variants and epigenetic modifications to pest loads. The discussion section has been improved to offer a more cautious interpretation of the correlations between transposable element (TE) methylation and pathogen load, emphasizing the associative nature of these findings. Additionally, increased transparency in data handling, particularly the treatment of ambiguous reads, has significantly reduced potential biases. These improvements have made the manuscript better suited to the readership, providing clearer insights into the genomic and epigenetic underpinnings of plant pest resistance.

    1. eLife Assessment

      This study reports important findings on identifying sequence motifs that predict substrate specificity in a class of lipid synthesis enzymes. It sheds light on a mechanism used by bacteria to modify the lipids in their membrane to develop antibiotic resistance. The evidence is compelling, with a careful application of machine learning methods, validated by mass spectrometry-based lipid analysis experiments. This interdisciplinary study will be of interest to computational biologists and to the community working on lipids and on enzymes involved in lipid synthesis or modification.

    2. Reviewer #1 (Public review):

      The manuscript by Christensen, et al. presents an application of restricted Boltzmann machines to analyze the MprF family of enzymes, which catalyze the addition of amino acids to lipid substrates in bacteria. Overall the manuscript is an interesting and very compelling combination of advanced statistical analysis of sequences and experimental determination of MprF function. One notable outcome is (as stated in the title) the identification of a novel substrate/product. I expect that other researchers interested in using advanced methods to connect sequence to lipid synthesis functions will find the work of significant value and that others interested in microbial resistance will find inspiration in the results. This is an excellent contribution that will be of great value to the field, and which is improved following revisions.

    3. Reviewer #3 (Public review):

      Summary:

      After the previous identification that the Streptococcus agalactiae MprF enzyme can synthesize also lysyl-glucosyl-diacylglycerol (Lys-Glc-DAG), besides the already known lysyl-phosphatidylglycerol (Lys-PG), the authors aim for the current manuscript was to investigate the molecular determinants of MprF lipid substrate specificity, for which MprF from a variety of bacterial species were used. This then led to the coincidental discovery of a novel lipid species.

      The manuscript is well constructed and easy to follow, especially taking into account the multidisciplinary aspect of it (computational machine learning combined with lipid biology). The Restricted Boltzmann machines (RBM) approach enables the successful, although not perfect, classification and categorization of MprF activity. The computational approach is validated by lab experiments in which LC-MS analysis reveals the specific activity of the lipid synthesizing enzymes. In a few cases lipid synthesis activity is completely absent. Due to the lack of protein expression data, it is unclear if this is caused by enzyme inactivity or the overall absence of enzyme.

      Overall, the authors largely achieved their goals, as the applied RBM approach led to specific sequence determinants in MprF enzymes that could categorize the specificity of these enzymes. The experimental data could largely confirm this categorization, although a stronger connection between synthesized lipids and enzyme activity would have further strengthened the observations.

      The work now focuses only on MprF enzymes, but could in theory be expanded to other categories of lipid synthesizing enzymes. In other words, the RBM approach could have an impact on the lipid synthesis field, if it would be a tool that is easy applicable. Moreover, the lipids synthesized by MprF (Lys-PG, but also other cationic lipids) play an important role in the bacterial resistance against certain antibiotics.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      The contribution of individual resides is shown in Figure 3c, which highlights one of the strengths of this RBM implementation - it is interpretable in a physically meaningful way. However, there are several decisions here, the justification of which is not entirely clear.

      i) Some of the residues in Fig 3c are stated as "relevant" for aminoacylated PG production. But is this the only such hidden unit? Or are there others that are sparse, bimodal, and involve "relevant" AA?

      Thanks for bringing this important question to our attention. In fact,  this was the only hidden unit involving the combination of positions 152 and 212.  Although we don't  have knowledge of all relevant amino acids for this catalytic process, the residues we uncover were however shown through experimental analysis to be critical for the catalytic function of two MprF variants, and thus since our protein of interest involved this function, any domain which did not contain these residues were excluded. We can't rule out that the domains we excluded from further analysis could be performing similar catalytic functions, but we found it unlikely considering the amino acids found in the negative portion of the weight were chemically unlikely to form a complex with the amino acid lysine. We have clarified in the text, that this selection is probably a subset of all important amino acids, however, this selection provided predictive power.

      ii) In order to filter the sequences for the second stage, only those that produce an activation over +2.0 in this particular hidden unit were taken. How was this choice made?

      The +2.0 was chosen as it ensured that the bimodal distribution was split into two distinct distributions.

      iii) How many sequences are in the set before and after this filtering? On the basis of the strength of the results that follow I expect that there are good reasons for these choices, but they should be more carefully discussed.  

      We started with 11,507 sequences and after filtering we had 7,890 to train our model with.  We think this number still maintains robust statistics. This is noted in the Dataset acquisition and pre-processing section of the Methods section.

      iv) Do the authors think that this gets all of the aminoacylated PG enzymes? Or are some missed?

      This is an interesting question that prompted us to do further analysis. We have added a new supplemental figure providing more details to this question. Based on the Uniprot derived annotations and the Pfam domain-based analysis of these sequences, the large majority of sequences that were excluded were proteins which included the LPG_synthase_C domain but not the transmembrane flippase domain required by the MprF class of enzymes, and were instead accompanied by different domains which  seem less relevant to our enzyme of interest.  It is true though, and related to question (i), that variants which might retain the functionality despite losing experimentally determined key catalytic residues could have been excluded by this method, but such sequences could still be reasonably excluded due to their dissimilarity with MprF from Streptococcus agalactiae.

      However, some similar criticisms from the last point occur here as well, namely the selection of which weights should be used to classify the enzymes' function. Again the approach is to identify hidden unit activations that are sparse (with respect to the input sequence), have a high overall magnitude, and "involve residues which could be plausibly linked to the lipid binding specificity."

      (i) Two hidden units are identified as useful for classification, but how many candidates are there that pass the first two criteria? Indeed, how many hidden units are there?

      We note in the Model training section of the methods that our final model used had 300 hidden units in total.  As to the first part of your question,  rather than systematically test the predictive power of all other hidden units to this task, we decided to use the weights that we did because of their connection to a proposed lipid binding pocket found through Autodocking experiments. While another weight might provide predictive power, it might lack this critical secondary information. Moreover, the direction of our research necessitated finding weights which first satisfied our lipid-binding pocket plausibility before using these weights to propose MprF variants to test for our novel functionality. Given the limited information we had early in the research process, to go in reverse would have provided too many options for experimental testing with reduced mechanistic justification. We included a brief explanation of our rationale in section " Restricted Boltzmann Machines can provide sensitive, rational guidance for sequence classification “ in the updated manuscript.

      ii) The criterion "involve residues which could be plausibly linked to the lipid binding specificity" is again vague. Do all of the other candidate hidden units *not* involve significant contributions from substrate-binding residues? Maybe one of the other units does a better job of discriminating substrate specificity. (As indicated in Figure 8, there are examples of enzymes that confound the proposed classification.) Why combine the activations of two units for the classification, instead of 1 or 3 or...?

      In fact, it is true that the other hidden units do not involve significant contribution to substrate-binding residues, and we will clarify this. The weights found through this RBM methodology are biased to be probabilistically independent, meaning that the residues and amino acids implicated by each weight are not shared among the other weights through the design of the model. We will update the Model Weight selection section to clarify that the weights we chose had more significantly weighted residues overlapping with the residues near the lipid-binding region than the other weights we checked. We combined these two because they were the only ones which had both overlap with these residues and predictive power of lipid activity with the few sequences we had detailed knowledge of at the time of decision (Figure 5b).

      The Model Weight section reads as follows:

      “Weights were chosen which involved sequence coordinates implicated in our function of interest. Specifically, locations identified through Autodock (Hebecker et al., 2015) where the lipid was likely to interact, and a small radius around this region to select a small set of coordinates. We chose the only weights which had both overlap with multiple residues in this chosen radius and predictive power (separation) for the three examples we had to start with.”

      Author Recommendations:

      The manuscript will likely be read by many membrane biologists/biochemists, and they might like to better understand how the RBM might be useful in their own approach. Here are some suggestions along these lines. The overall goal is to explain the RBM in *plain English* - the mathematical description in Eqs 2-4 is not easily interpretable.

      (1a) Explain that the RBM is a two-layer structure, in which one layer is the "visible" elements of the input sequence, and the other is called "hidden units." Connections are only made between visible and hidden units, but all such connections are made.

      (1b) The strengths of these connections are called "weights", and are determined in a statistical way based on a large set of input sequences. Once parametrized, the RBM is capable of capturing correlations among many positions in an input sequence - a significant advantage over the DCA approach.

      We agree with this assessment, and have updated the section of the text where we introduce the RBM with a non-technical explanation of what this method is doing. It reads as:

      “The design of this RBM can be seen in Figure 4, where the model architecture is represented by purple dots and green triangles. The dots are the “visible” layer, which take in input sequences and encode them into the “hidden” layer, where each triangle represents a separate hidden unit. The lines connecting the visible and hidden layers show that each hidden unit can see all the visible units (the statistics are global), but they cannot see any of the other hidden units, meaning the hidden units are mutually independent. This global model with mutually independent hidden units (see also the marginal distribution form shown in Equation 3) has the following useful properties: higherorder couplings between... “

      (1c) Although strictly true that the DCA model is a Boltzmann machine, it's not a typical Boltzmann machine, because all of the units are visible. Typically a Boltzmann machine would also include hidden units, in order to increase its capacity/power. 

      We have clarified the relationship between DCA and Boltzmann machines, and this section now reads as:

      This class of models is closely related to another model termed the Boltzmann machine. The Boltzmann machine formulation is closely related to the Potts model from physics, which was successfully applied in biology to elucidate important residues in protein structure and function (Morcos et al., 2011), and another example being the careful tuning of enzyme specificity in bacterial two-component regulatory systems (Cheng et al., 2014; Jiang et al., 2021). The Boltzmann machine-like formulation from Morcos et al. (2011), termed Direct Coupling Analysis (DCA), stores patterns...

      (1d) Throughout, the authors refer to the activation of the hidden units as weights, but this is not a typical usage of this terminology. Connections between units are weights and have two subscripts. Given an input sequence, the sum over these weights for a given hidden unit is its activation (Eq. 1). I suggest aligning the description with the typical usage in order to make the presentation easier to follow. Hereafter I will refer to these hidden unit activations as simply activations. 

      We agree with you, the hidden units are a collection of edge weights. We have modified the terminology in the text and in our figures to consistently refer to the collections of weights as hidden units and refer to the hidden unit outputs given a sequence input as activations.

      (1e) How many hidden units are there?

      The final model was trained with 300 hidden units.

      (2) It is redundant to say that lipids are both amphiphiles and hydrophobic...amphiphile already means hydrophobic plus hydrophilic. 

      This is true, we have edited the manuscript to reflect this.

      (3) What does this mean, and what's the point of this remark? "They [lipids] are relatively smaller than other complex biomolecules, such as proteins, thereby allowing a larger portion of their surface to interact with other macromolecules." 

      We have removed this sentence.

      Reviewer 2 (Author Recommendations):

      While the idea of filtering out a part of the sequence data obtained with BLAST makes sense per se, it would be nice if the authors could comment on the nature of the sequences corresponding to the left peak in Figure 3b. It is hypothesised in conclusion that these sequences could lack any catalytic function. Could the authors experimentally check that this is the case or provide further evidence for this hypothesis?

      Yes, in this revision we provide further evidence as a new supplementary figure S2. At the time we performed domain analysis of the sequences we excluded; most of these sequences lacked the flippase domain associated with MprF function, and instead were combined with different domains. On this basis we excluded them due to their lack of relevance to the MprF from Streptococcus agalactiae we were interested in. Although there is possibility that some relevant sequences might be excluded, our assessment is that we gained specificity by reducing the set of sequences. 

      A key step in the RBM-based approach is the identification of "meaningful" hidden units, i.e. whose values are related to biological function. In Methods, the authors explain how they selected these units based on the L1 norms of the weights and the region of interaction with the lipid. While these criteria are reasonable, I wonder whether they are too stringent. In particular, one could think that regions in the proteins not in direct contact with the lipid could also be important for binding. It is known for instance that the length of loops can affect flexibility and help regulate activity in some catalytic enzymes. So my question is: if one relaxes the criterion about the coordinates of large weight values, what happens? Are other potentially interesting hidden units identified?

      We completely agree that other regions of the protein are likely involved in determining enzyme specificity, and that focusing on solely regions which interact with the lipid is perhaps missing important contributions to the catalytic function; we hypothesize that the flippase domain itself and its interaction with the catalytic domain are involved, especially considering the concerted mechanism by which they must operate. We are currently investigating these theories and will be the subject of future work. As an initial step, we present this current work with restricted information that led to concrete predictions. We focused on the lipid binding pocket because it was one of just a few bits of information we had from the start, but as the reviewer suggests, we plan to follow up our research to try to identify other relevant hidden units and domains. 

      From a purely machine-learning point of view, it would be good to see more about cross-validation of the model. More precisely, could the authors show the log-likelihood of test set data compared to the one of training sequence data?

      We agree this is an important piece of information. We will update our methods section with this information. We performed a parameter sweep to search for the parameter’s we used in our final model, and in that testing with a random 80/20% training/test split we had a training log probability loss of -0.91, and a test loss of -0.98. However, for our final model we used all available data and did not perform a split; the final result did not change dramatically by including the additional data, and the weight structure and composition was consistent with the results presented in the paper.

      Reviewer 3 (Public Review):

      In many of the analyzed strains, the presence of the lipid species Lys-PG, Lys-Glc-DAG, and Lys-Glc2-DAG is correlated to the presence of the MprF enzyme(s), but one should keep in mind that a multitude of other membrane proteins are present that in theory could be involved in the synthesis as well. Therefore, there is no direct evidence that the MprF enzymes are linked to the synthesis of these lipid species. Although, it is unlikely that other enzymes are involved, this weakens the connection between the observed lipids and the type of MprF. 

      While there are a number of proteins found on the membrane that could play a role, we have specifically used a background strain that has a transposon in mprF that makes the bacteria incapable of synthesizing Lys-lipids (Figure 7B) unless complemented back with a functional MprF (Figure 7D-E). This led us to conclude that MprF is responsible for Lys-lipid synthesis.

      Related to this, in a few cases MprF activity is tested, but the manuscript does not contain any information on protein expression levels. Heterologous expression of membrane proteins is in general challenging and due to various reasons, proteins end up not being expressed at all. As an example, the absence of activity for the E. faecalis MprF1 and E. faecium MprF2 could very well be explained by the entire absence of the protein.

      The genes were expressed on the same plasmid to control for expression. While we did not run a western blot to examine expression levels the plasmid backbone was used as a control for protein expression. Previous research supports E. faecalis MprF1 and E. faecium MprF2 not synthesizing Lys-lipids and instead most likely play a different role in the cell membrane. 

      The title is somewhat misleading. The sequence statistics and machine learning categorized the MprFs, but the identification of a novel lipid species was a coincidence while checking/confirming the categorization. 

      We believe the title is appropriate given that the identification of Enterococcus dispar was through computational methods that led to the discovery Lys-Glc2-DAG. In other words, the categorization of potential organisms that produce lipids related to MprF has been driven by the proposition from the computational method. We agree, however, that the discovery was unexpected but would not have happened without the suggested organisms coming from the methodology presented here.  

      Please read the manuscript one more time to correct textual errors.  

      The example of the role of LPS in delivering siRNA to targeted cancer cells is a bit farfetched as LPS is very different from the lipids that are being discussed here. I would rather focus on the role of Lysyl-lipids in antibiotic resistance in the introduction.  

      We included LPS here to explain that natural lipids/components of the bacterial cell membrane could be used for drug delivery systems. While it is true LPS is quite different from Lys-lipid compounds, our goal was to create an emphasis on how the bacterial domain is a rich untapped source of lipids that could be used in biotechnology.  In this way we wanted our statement to be more broadly about bacterial lipids and the importance of their continued study for diverse applications like pharmaceuticals.

      The MS identification of Lys-Glc2-DAG is convincing, especially in combination with the fragmentation data, but the ion counts suggest low abundance. The observation would be strengthened if the identification of Lysyl-Glc2-DAG with different acyl-chain configurations has been observed. This should be then mentioned or visualized in the manuscript. 

      We agree and have added an updated Figure 8A to demonstrate the presence of different acyl-chain configurations in Enterococcus dispar.  

      Further analysis of the Enterococcus strains shows the presence of the three lipids Lys-PG, Lys-Glc-DAG, and LysGlc2-DAG, although the Lys-Glc-DAG is only detected in trace amounts. This raises questions on the specificity of the MprF for the substrate Glc-DAG. If the ratio of Glc2-DAG compared to Glc-DAG abundance is similar to the ratio of Lys-Glc2-DAG vs. Lys-Glc-DAG abundance, this would strengthen the observation that the enzyme has equal affinity. However, if there is a rather large amount of Glc-DAG but a small amount of Lys-Glc-DAG, the production of Lys-Glc-DAG might be a side-reaction. 

      The reviewer brings a relevant point of discussion, however, a clear resolution might be part of future work as we do not use spike in controls when completing lipid extractions. Because of this, it  it is not possible for us to compare lipid levels across different samples. We now include a note clarifying this in the discussion section.  

      The plotting of the MprF sequence variants using the chosen RBM weights reveals a rather complex distribution over the quadrants (Figure 8). It is rather unclear in Figure 8 why only 1 sequence is plotted for Enterococcus faecalis and faecium, while 2 different MprFs are present (and tested) for these two organisms. This should be clarified.  

      We agree this can be a source of confusion. We have further clarified this in the text that only the functional alleles were plotted in Figure 8 and that all Enterococcal alleles are plotted in Figure S3 regardless of function.

    1. Author response:

      Reviewer 1:

      The role of Fgf signaling in gliogenesis and Foxg1 in neurogenesis is well known. It is not clear if Fgf18 is a direct target of Foxg1.

      We agree with the reviewer- Fgf signaling is an established pro-gliogenic pathway (Duong et al 2019) and Foxg1 overexpression is known to promote neurogenesis in cultured neural stem cells (Branacaccio et al 2019). Our study links these two mechanisms, as the Reviewer has summarized: (a) we demonstrate that FOXG1 works via modulating Fgf signaling cell-autonomously within progenitors by regulating the levels of Fgfr3. (b) Loss of Foxg1 in postmitotic neurons results in the upregulation of Fgf ligand expression (possibly via indirect mechanisms) and this non-cell autonomously increases Fgf signaling in progenitors. Our study is entirely performed in vivo.

      Proposed revision: We will revise the manuscript to reflect that Fgf18 may be an indirect target of FOXG1 in postmitotic neurons.

      Reviewer 2:

      It wasn't clear to me why the authors chose postnatal day 14 to examine the effects of Foxg1 deletion at E15 - this is a long time window, giving time for indirect consequences of Foxg1 deletion to influence development and thereby potentially complicating the interpretation of findings. For example, the authors show that there is no increased proliferation of astrocytes or death of neurons lacking Foxg1 shortly after cre-mediated deletion, but it remains formally possible (if perhaps unlikely) that these processes could be affected later during the time window. The rationale underlying the choice of this time point should be explained.

      I don't agree with the statement in the very last sentence of the results section that "neurogenesis is not possible in the absence of [Foxg1]" as there are multiple reports in the literature demonstrating the presence of neurons in Foxg1-/- mice (eg: Xuan et al., 1995; Hanashima et al., 2002, Martynoga et al., 2005, Muzio and Mallamaci 2005). Perhaps the statement refers specifically to late-born cortical neurons. This point also arises in the discussion section.

      Proposed revisions:

      (a) We will revise the manuscript to explain why we chose postnatal day 14 to examine the effects of Foxg1 deletion at E15.

      ● We have examined the transcriptomic dysregulation after Foxg1 deletion at E17.5, which is a reasonable period to identify potential direct targets. Furthermore, FOXG1 occupies the Fgfr3 locus in ChIP-seq performed at E15.5. Together, these support the interpretation that Fgfr3 is a direct target of Foxg1.

      ● As the Reviewer notes, we have investigated the possibility of increased proliferation of astrocytes and death of neurons and found no evidence that suggests these phenomena occur in the 3 days after loss of Foxg1. Cortical neurons are postmitotic and differentiated by E18.5, the stage at which we examined CC3 staining and found no difference in cell death in control and mutants (Supplementary Figure S2C, C’). The majority of progenitors (PAX6+ve cells) that lose Foxg1 at E15.5 express the gliogenic transcription factor NFIA by E18.5 (Figure 2C, C’), but hardly any express intermediate (neurogenic) progenitor marker TBR2 (Supplementary Figure S2B, B’). It is therefore unlikely that neurons are born from Foxg1 mutant progenitors and then die at a later stage.

      ● The cellular consequences of loss of Foxg1 require additional time to detect e.g. it takes ~ 5 days for GFAP to be detected in astrocytes once they are born. The P14 timepoint permits the assessment of oligogenesis which begins after astrogliogenesis and therefore permits a comprehensive assessment of the lineage of E15.5 Foxg1 null progenitors.

      (b) Thank you for pointing out that the last sentence of the results section implied (incorrectly) that ALL neurogenesis is not possible in the absence of Foxg1 We will modify this (and the discussion) to reflect that this applies to E14/15 progenitors and late-born cortical neurons.

    2. eLife Assessment

      This important study provides convincing evidence that developing neurons in the neocortex regulate glial cell development. The data demonstrates that the transcription factor FOXG1 negatively regulates gliogenesis by controlling the expression of a member of the FGF ligand family and by suppressing the receptor for this ligand in developing neurons. This study leads to a new understanding of the cascade of events regulating the timing of glial development in the neocortex.

    3. Reviewer #1 (Public review):

      Summary:

      In this paper, Bose et al. investigated the role of Foxg1 transcription factor in the progenitors at late stages of cerebral cortex development.<br /> They discover that Foxg1 is a repressor of gliogenesis and has a dual function, first as a repressor of Fgfr3 receptor in progenitors, and second as a suppressor of the Fgf ligands in young neurons.

      They found that the inactivation of Foxg1 in cortical progenitors causes premature astrogliogenesis at the expense of neurogenesis. They identify Fgfr3 as a novel FOXG1 target. They show that suppression of Fgfr3 by FOXG1 in progenitors is required to maintain neurogenesis. On the other hand, they also show that FOXG1 negatively regulates the expression of Fgf gliogenic secreted factors in young neurons suppressing gliogenesis cells extrinsically.

      Strengths:

      The authors used time-consuming in vivo experiments utilizing several mouse strains including Foxg1-MADM in combination with RNA-Seq and ChIP to convincingly show that Foxg1 acts upstream of FGF signalling in the control of gliogenesis onset. The conclusions of this paper are mostly well supported by data.

      Weaknesses:

      The role of Fgf signaling in gliogenesis and Foxg1 in neurogenesis is well known. It is not clear if Fgf18 is a direct target of Foxg1.

    4. Reviewer #2 (Public review):

      Summary:

      We have known for some time that neural progenitors in the cerebral cortex switch their output from cortical neurons to glia at late embryonic stages, however little is known about how this switch is regulated at the molecular level. Bose et al present a convincing set of findings, demonstrating that the transcription factor Foxg1 plays a key role in this process, mediated through FGF signalling. Foxg1 cell-autonomously inhibits gliogenesis in progenitor cells (thereby promoting neuronal identity), and lower Foxg1 expression in postnatal neurons leads to increased expression of FGF ligand, promoting glial development from nearby progenitors.

      Strengths:

      The study is very well designed, having a systematic, thorough, and logical approach. The data is convincing. The authors make full use of a range of existing transgenic strains, published 'omics data, and elegant genetic approaches such as MADM. This combination of approaches is particularly rigorous, lending significant weight to the study. The manuscript is well-written, clear, and easy to follow.

      Weaknesses:

      It wasn't clear to me why the authors chose postnatal day 14 to examine the effects of Foxg1 deletion at E15 - this is a long time window, giving time for indirect consequences of Foxg1 deletion to influence development and thereby potentially complicating the interpretation of findings. For example, the authors show that there is no increased proliferation of astrocytes or death of neurons lacking Foxg1 shortly after cre-mediated deletion, but it remains formally possible (if perhaps unlikely) that these processes could be affected later during the time window. The rationale underlying the choice of this time point should be explained.

      I don't agree with the statement in the very last sentence of the results section that "neurogenesis is not possible in the absence of [Foxg1]" as there are multiple reports in the literature demonstrating the presence of neurons in Foxg1-/- mice (eg: Xuan et al., 1995; Hanashima et al., 2002, Martynoga et al., 2005, Muzio and Mallamaci 2005). Perhaps the statement refers specifically to late-born cortical neurons. This point also arises in the discussion section.

      Impact

      This manuscript identifies a previously unknown role for Foxg1 in forebrain development and a mechanism underlying the neurogenic-to-gliogenic switch that occurs at late embryonic stages of cortex development. These findings will stimulate further research to uncover more details of how this important switch is controlled and may provide useful insight into some of the symptoms experienced by children with FOXG1 Syndrome.

    1. eLife Assessment

      This manuscript establishes a mathematical model to estimate the key parameters that control the repopulation of planarian stem cells after sublethal irradiation as they undergo fate-switching as part of their differentiation and self-renewal process. The findings are valuable for future investigation of stem cell division in planarians. The methods are solid, integrating modeling with perturbations of key transcription factors known to be critical for cell fate decisions, but the authors have only shown that this is the case for a small number of stem cell types.

    2. Reviewer #1 (Public review):

      Summary:

      This is a very creative study using modeling and measurement of neoblast dynamics to gain insight into the mechanism that allows these highly potent cells to undergo fate-switching as part of their differentiation and self-renewal process. The authors estimate growth equation parameters for expanding neoblast clones based on new and prior experimental observations. These results indicate neoblast likely undergo much more symmetric self-amplifying division than loss of the population through symmetric differentiation, in the case of clone expansion assays after sublethal irradiation. Neoblasts take on multiple distinct transcriptional fates related to their terminally differentiated cell types, and prior work indicated neoblasts have a high plasticity to switch fates in a way linked to cell cycle progression and possibly through a random process. Here, the authors explore the impact of inhibition of key transcription factors defining such states (ie "fate specifying transcription factors", FSTFs) plus measurement and modeling in the clone expansion assay, to find that inhibition of factors like zfp1 likely cause otherwise zfp1-fated neoblasts to fail to proliferate and differentiation without causing compensatory gains in other lineages. A mathematical model of this process assuming that neoblasts do not retain a memory of prior states while they proliferate, and transition across specified states can mimic the experimentally determined decreased sizes of clones following inhibition of zfp1. Complementary approaches to inhibit more than one lineage (muscle plus intestine) supports the idea that this is a more general process in planarian stem cells. These results provide an important advance for understanding the fate-switching process and its relationship to neoblast growth.

      Overall I find the evidence very well presented and the study compelling. It offers an important new perspective on the key properties of neoblasts. I do have some comments to clarify the presentation and significance of the work.

    3. Reviewer #2 (Public review):

      Summary:

      Cell cycle duration and cell fate choice are critical to understanding the cellular plasticity of neoblasts in planarians. In this study, Tamar et al. integrated experimental and computational approaches to simulate a model for neoblast behaviors during colony expansion.

      Strengths:

      The finding that "arresting differentiation into specific lineages disrupts neoblast proliferative capacities without inducing compensatory expression of other lineages" is particularly intriguing. This concept could inspire further studies on pluripotent stem cells and their application for regenerative biology.

      Weaknesses:

      However, the absence of a cell-cell feedback mechanism during colony growth and the likelihood of the difference needs to be clarified. Is there any difference in interpreting the results if this mechanism is considered? More explanation and discussion should be included to distinguish the stages controlled by the one-step model from those discussed in this study. Although hnf-4 and foxF have been silenced together to validate the model, a deeper understanding of the tgs-1+ cell type and the non-significant reduction of tgs-1+ neoblasts in zfp-1 RNAi colonies is necessary, considering a high neural lineage frequency.

    4. Author response:

      Reviewer #1:

      Overall I find the evidence very well presented and the study compelling. It offers an important new perspective on the key properties of neoblasts. I do have some comments to clarify the presentation and significance of the work.

      We thank the reviewer for the positive feedback and plan to improve the presentation of the work.

      Reviewer #2:

      However, the absence of a cell-cell feedback mechanism during colony growth and the likelihood of the difference needs to be clarified. Is there any difference in interpreting the results if this mechanism is considered?

      We will improve the description of the model assumptions and the interpretation of the data on the basis of these assumptions.

      Although hnf-4 and foxF have been silenced together to validate the model, a deeper understanding of the tgs-1+ cell type and the non-significant reduction of tgs-1+ neoblasts in zfp-1 RNAi colonies is necessary, considering a high neural lineage frequency.

      We will improve the analysis of this result in light of the experimentally determined frequency of the tgs-1+ neoblast population.

    1. eLife Assessment

      This important work attempts to establish a causal link between neurotrophin signaling and experience-induced structural plasticity in dopaminergic circuits in the adult fly brain, a topic of broad interest to the neuroscience community. While the authors provide solid evidence for the role of this signaling in regulating the structure and synapses of dopaminergic circuits, the evidence for a direct link between neurotrophin signaling and experience-induced structural plasticity remains incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      Sun et al. are interested in how experience can shape the brain and specifically investigate the plasticity of the Toll-6 receptor-expressing dopaminergic neurons (DANs). To learn more about the role of Toll-6 in the DANs, the authors examine the expression of the Toll-6 receptor ligand, DNT-2. They show that DNT-2 expressing cells connect with DANs and that loss of function of DNT-2 in these cells reduces the number of PAM DANs, while overexpression causes alterations in dendrite complexity. Finally, the authors show that alterations in the levels of DNT-2 and Toll-6 can impact DAN-driven behaviors such as climbing, arena locomotion, and learning and long-term memory.

      Strengths:

      The authors methodically test which neurotransmitters are expressed by the 4 prominent DNT-2 expressing neurons and show that they are glutamatergic. They also use Trans-Tango and Bac-TRACE to examine the connectivity of the DNT-2 neurons to the dopaminergic circuit and show that DNT-2 neurons receive dopaminergic inputs and output to a variety of neurons including MB Kenyon cells, DAL neurons, and possibly DANS.

      Weaknesses:

      (1) To identify the DNT-2 neurons, the authors use CRISPR to generate a new DN2-GAL4. They note that they identified at least 12 DNT-2 plus neurons. In Supplementary Figure 1A, the DNT-2-GAL4 driver was used to express a UAS-histoneYFP nuclear marker. From these figures, it looks like DNT-2-GAL4 is labeling more than 12 neurons. Is there glial expression?

      (2) In Figure 2C the authors show that DNT-2 upregulation leads to an increase in TH levels using q-RT-PCR from whole heads. However, in Figure 3H they also show that DNT-2 overexpression also causes an increase in the number of TH neurons. It is unclear whether TH RNA increases due to expression/cell or the number of TH neurons in the head.

      (3) DNT-2 is also known as Spz5 and has been shown to activate Toll-6 receptors in glia (McLaughlin et al., 2019), resulting in the phagocytosis of apoptotic neurons. In addition, the knockdown of DNT-2/Spz5 throughout development causes an increase in apoptotic debris in the brain, which can lead to neurodegeneration. Indeed Figure 3H shows that an adult-specific knockdown of DNT-2 using DNT2-GAL4 causes an increase in Dcp1 signal in many neurons and not just TH neurons.

    3. Reviewer #2 (Public review):

      This paper examines how structural plasticity in neural circuits, particularly in dopaminergic systems, is regulated by Drosophila neurotrophin-2 (DNT-2) and its receptors, Toll-6 and Kek-6. The authors show that these molecules are critical for modulating circuit structure and dopaminergic neuron survival, synaptogenesis, and connectivity. They show that loss of DNT-2 or Toll-6 function leads to loss of dopaminergic neurons, dendritic arborization, and synaptic impairment, whereas overexpression of DNT-2 increases dendritic complexity and synaptogenesis. In addition, DNT-2 and Toll-6 modulate dopamine-dependent behaviors, including locomotion and long-term memory, suggesting a link between DNT-2 signaling, structural plasticity, and behavior.

      A major strength of this study is the impressive cellular resolution achieved. By focusing on specific dopaminergic neurons, such as the PAM and PPL1 clusters, and using a range of molecular markers, the authors were able to clearly visualize intricate details of synapse formation, dendritic complexity, and axonal targeting within defined circuits. Given the critical role of dopaminergic pathways in learning and memory, this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. However, despite the promise in the abstract and introduction of the paper, the study falls short of establishing a direct causal link between neurotrophin signaling and experience-induced plasticity.

      Simply put, this study does not provide strong evidence that experience-induced structural plasticity requires DNT-2 signaling. To support this idea, it would be necessary to observe experience-induced structural changes and demonstrate that downregulation of DNT-2 signaling prevents these changes. The closest attempt to address this in this study was the artificial activation of DNT-2 neurons using TrpA1, which resulted in overgrowth of axonal arbors and an increase in synaptic sites in both DNT-2 and PAM neurons. However, this activation method is quite artificial, and the authors did not test whether the observed structural changes were dependent on DNT-2 signaling. Although they also showed that overexpression of DNT-2FL in DNT-2 neurons promotes synaptogenesis, this phenotype was not fully consistent with the TrpA1 activation results (Figures 5C and D).

      In conclusion, this study demonstrates that DNT-2 and its receptors play a role in regulating the structure of dopaminergic circuits in the adult fly brain. However, it does not provide convincing evidence for a causal link between DNT-2 signaling and experience-dependent structural plasticity within these circuits.

    4. Reviewer #3 (Public review):

      Summary:

      The authors used the model organism Drosophila melanogaster to show that the neurotrophin Toll-6 and its ligands, DNT-2 and kek-6, play a role in maintaining the number of dopaminergic neurons and modulating their synaptic connectivity. This supports previous findings on the structural plasticity of dopaminergic neurons and suggests a molecular mechanism underlying this plasticity.

      Strengths:

      The experiments are overall very well designed and conclusive. Methods are in general state-of-the-art, the sample sizes are sufficient, the statistical analyses are sound, and all necessary controls are in place. The data interpretation is straightforward, and the relevant literature is taken into consideration. Overall, the manuscript is solid and presents novel, interesting, and important findings.

      Weaknesses:

      There are three technical weaknesses that could perhaps be improved.

      First, the model of reciprocal, inhibitory feedback loops (Figure 2F) is speculative. On the one hand, glutamate can act in flies as an excitatory or inhibitory transmitter (line 157), and either situation can be the case here. On the other hand, it is not clear how an increase or decrease in cAMP level translates into transmitter release. One can only conclude that two types of neurons potentially influence each other.

      Second, the quantification of bouton volumes (no y-axis label in Figure 5 C and D!) and dendrite complexity are not convincingly laid out. Here, the reader expects fine-grained anatomical characterizations of the structures under investigation, and a method to precisely quantify the lengths and branching patterns of individual dendritic arborizations as well as the volume of individual axonal boutons.

      Third, Figure 1C shows two neurons with the goal of demonstrating between-neuron variability. It is not convincingly demonstrated that the two neurons are actually of the very same type of neuron in different flies or two completely different neurons.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      Sun et al. are interested in how experience can shape the brain and specifically investigate the plasticity of the Toll-6 receptor-expressing dopaminergic neurons (DANs). To learn more about the role of Toll-6 in the DANs, the authors examine the expression of the Toll-6 receptor ligand, DNT-2. They show that DNT-2 expressing cells connect with DANs and that loss of function of DNT-2 in these cells reduces the number of PAM DANs, while overexpression causes alterations in dendrite complexity. Finally, the authors show that alterations in the levels of DNT-2 and Toll-6 can impact DAN-driven behaviors such as climbing, arena locomotion, and learning and long-term memory.

      Strengths:

      The authors methodically test which neurotransmitters are expressed by the 4 prominent DNT-2 expressing neurons and show that they are glutamatergic. They also use Trans-Tango and Bac-TRACE to examine the connectivity of the DNT-2 neurons to the dopaminergic circuit and show that DNT-2 neurons receive dopaminergic inputs and output to a variety of neurons including MB Kenyon cells, DAL neurons, and possibly DANS.

      We are very pleased that Reviewer 1 found our connectivity analysis a strength.

      Weaknesses:

      (1) To identify the DNT-2 neurons, the authors use CRISPR to generate a new DN2-GAL4. They note that they identified at least 12 DNT-2 plus neurons. In Supplementary Figure 1A, the DNT-2-GAL4 driver was used to express a UAS-histoneYFP nuclear marker. From these figures, it looks like DNT-2-GAL4 is labeling more than 12 neurons. Is there glial expression?

      Indeed, we claimed that DNT-2 is expressed in at least 12 neurons (see line 141, page 6 of original manuscript), which means more than 12 could be found. The membrane tethered reporters we used – UAS-FlyBow1.1, UASmcD8-RFP, UAS-MCFO, as well as UAS-DenMark:UASsyd-1GFP – gave a consistent and reproducible pattern. However, with DNT-2GAL4>UAS-Histone-YFP more nuclei were detected that were not revealed by the other reporters. We have found also with other GAL4 lines that the patterns produced by different reporters can vary. This could be due to the signal strength (eg His-YFP is very strong) and perdurance of the reporter (e.g. the turnover of His-YFP may be slower than that of the other fusion proteins).

      We did not test for glial expression, as it was not directly related to the question addressed in this work.

      (2) In Figure 2C the authors show that DNT-2 upregulation leads to an increase in TH levels using q-RT-PCR from whole heads. However, in Figure 3H they also show that DNT-2 overexpression also causes an increase in the number of TH neurons. It is unclear whether TH RNA increases due to expression/cell or the number of TH neurons in the head.

      Figure 3H shows that over-expression of DNT-2 FL increased the number of Dcp1+ apoptotic cells in the brain, but not significantly (p=0.0939). The ability of full-length neurotrophins to induce apoptosis and cleaved neurotrophins promote cell survival is well documented in mammals. We had previously shown that DNT-2 is naturally cleaved, and that over-expression of DNT-2 does not induce apoptosis in the various contexts tested before (McIlroy et al 2013 Nature Neuroscience; Foldi et al 2017 J Cell Biol; Ulian-Benitez et al 2017 PLoS Genetics). Similarly, throughout this work we did not find DNT-2FL to induce apoptosis.

      Instead, in Figure 3G we show that over-expression of DNT-2FL causes a mild yet statistically significant increase in the number of TH+ cells. This is an important finding that supports the plastic regulation of PAM cell number. We thank the Reviewer for highlighting this point, as we had forgotten to add the significance star in the graph. In this context, we cannot rule out the possibility that the increase in TH mRNA observed when we over-express DNT-2FL could not be due to an increase in cell number instead. Unfortunately, it is not possible for us to separate these two processes at this time. Either way, the result would still be the same: an increase in dopamine production when DNT-2 levels rise.

      (3) DNT-2 is also known as Spz5 and has been shown to activate Toll-6 receptors in glia (McLaughlin et al., 2019), resulting in the phagocytosis of apoptotic neurons. In addition, the knockdown of DNT-2/Spz5 throughout development causes an increase in apoptotic debris in the brain, which can lead to neurodegeneration. Indeed Figure 3H shows that an adult-specific knockdown of DNT-2 using DNT2-GAL4 causes an increase in Dcp1 signal in many neurons and not just TH neurons.

      Indeed, we did find Dcp1+ cells in TH-negative cells too (although not widely throughout the brain). This is not surprising, as DNT-2 neurons have large arborisations that can reach a wide range of targets; DNT-2 is secreted, and could reach beyond its immediate targets; Toll-6 is expressed in a vast number of cells in the brain; DNT-2 can bind promiscuously at least also Toll-7 and other Keks, which are also expressed in the adult brain (Foldi et al 2017 J Cell Biology; Ulian-Benitez et al 2017 PLoS Genetics; Li et al 2020 eLife). Together with the findings by McLaughlin et al 2019, our findings further support the notion that DNT-2 is a neuroprotective factor in the adult brain. It will be interesting to find out what other neuron types DNT-2 maintains.

      We would like to thank Reviewer 1 for their positive comments on our work and their interesting and valuable feedback.

      Reviewer #2 (Public review):

      This paper examines how structural plasticity in neural circuits, particularly in dopaminergic systems, is regulated by Drosophila neurotrophin-2 (DNT-2) and its receptors, Toll-6 and Kek-6. The authors show that these molecules are critical for modulating circuit structure and dopaminergic neuron survival, synaptogenesis, and connectivity. They show that loss of DNT-2 or Toll-6 function leads to loss of dopaminergic neurons, dendritic arborization, and synaptic impairment, whereas overexpression of DNT-2 increases dendritic complexity and synaptogenesis. In addition, DNT-2 and Toll-6 modulate dopamine-dependent behaviors, including locomotion and long-term memory, suggesting a link between DNT-2 signaling, structural plasticity, and behavior.

      A major strength of this study is the impressive cellular resolution achieved. By focusing on specific dopaminergic neurons, such as the PAM and PPL1 clusters, and using a range of molecular markers, the authors were able to clearly visualize intricate details of synapse formation, dendritic complexity, and axonal targeting within defined circuits. Given the critical role of dopaminergic pathways in learning and memory, this approach provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. However, despite the promise in the abstract and introduction of the paper, the study falls short of establishing a direct causal link between neurotrophin signaling and experience-induced plasticity.

      Simply put, this study does not provide strong evidence that experience-induced structural plasticity requires DNT-2 signaling. To support this idea, it would be necessary to observe experience-induced structural changes and demonstrate that downregulation of DNT-2 signaling prevents these changes. The closest attempt to address this in this study was the artificial activation of DNT-2 neurons using TrpA1, which resulted in overgrowth of axonal arbors and an increase in synaptic sites in both DNT-2 and PAM neurons. However, this activation method is quite artificial, and the authors did not test whether the observed structural changes were dependent on DNT-2 signaling. Although they also showed that overexpression of DNT-2FL in DNT-2 neurons promotes synaptogenesis, this phenotype was not fully consistent with the TrpA1 activation results (Figures 5C and D).

      In conclusion, this study demonstrates that DNT-2 and its receptors play a role in regulating the structure of dopaminergic circuits in the adult fly brain. However, it does not provide convincing evidence for a causal link between DNT-2 signaling and experience-dependent structural plasticity within these circuits.

      We would like to thank Reviewer 2 for their very positive assessment of our approach to investigate structural circuit plasticity. We are delighted that this Reviewer found our cellular resolution impressive. We are also very pleased that Reviewer 2 found that our work demonstrates that DNT-2 and its receptors regulate the structure of dopaminergic circuits in the adult fly brain. This is already a very important finding that contributes to demonstrating that, rather than being hardwired, the adult fly brain is plastic, like the mammalian brain.

      We are very pleased that this Reviewer acknowledges that this work provides a good opportunity to explore the role of DNT-2, Toll-6, and Kek-6 in experience-dependent structural plasticity. We provide a molecular mechanism and proof of principle, and we demonstrate a direct link between the function of DNT-2 and its receptors in circuit plasticity, and a suggestive link to neuronal activity. Finding out the direct link to lived experience is a big task, beyond the scope of this manuscript, and we will be testing this with future projects. Nevertheless, it is important to place our findings within this context, as it opens opportunities for discovery by the neuroscience community.

      We would like to thank Reviewer 2 for the positive and thoughtful evaluation of our work, and for their feedback.

      Reviewer #3 (Public review):

      Summary:

      The authors used the model organism Drosophila melanogaster to show that the neurotrophin Toll-6 and its ligands, DNT-2 and kek-6, play a role in maintaining the number of dopaminergic neurons and modulating their synaptic connectivity. This supports previous findings on the structural plasticity of dopaminergic neurons and suggests a molecular mechanism underlying this plasticity.

      Strengths:

      The experiments are overall very well designed and conclusive. Methods are in general state-of-the-art, the sample sizes are sufficient, the statistical analyses are sound, and all necessary controls are in place. The data interpretation is straightforward, and the relevant literature is taken into consideration. Overall, the manuscript is solid and presents novel, interesting, and important findings.

      We are delighted that Reviewer 3 found our work solid, novel, interesting and with important findings. We are also very pleased that this Reviewer found that all necessary controls have been carried out.

      Weaknesses:

      There are three technical weaknesses that could perhaps be improved.

      First, the model of reciprocal, inhibitory feedback loops (Figure 2F) is speculative. On the one hand, glutamate can act in flies as an excitatory or inhibitory transmitter (line 157), and either situation can be the case here. On the other hand, it is not clear how an increase or decrease in cAMP level translates into transmitter release. One can only conclude that two types of neurons potentially influence each other.

      Thank you for pointing out that glutamate can be inhibitory. In mammals, the neurotrophin BDNF has an important function in glutamatergic synapses, thus we were intrigued by a potential evolutionary conservation. Our evidence that DNT-2A neurons could be excitatory is indirect, yet supportive: exciting DNT-2 neurons with optogenetics resulted in an increase in GCaMP in PAMs (data not shown); over-expression of DNT-2 in DNT-2 neurons increased TH mRNA levels; optogenetic activation of DNT-2 neurons results in the Dop2R-dependent downregulation of cAMP levels in DNT-2 neurons. Dop2R signals in response to dopamine, which would be released only if dopaminergic neurons had been excited. Accordingly, glutamate released from DNT-2 neurons would have been rather unlikely to inhibit DANs.

      cAMP is a second messenger that enables the activation of PKA. PKA phosphorylates many target proteins, amongst which are various channels. This includes the voltage gated calcium channels located at the synapse, whose phosphorylation increases their opening probability. Thus, a rise in cAMP could facilitate neurotransmitter release, and a downregulation would have the opposite effect. Other targets of PKA include CREB, leading to changes in gene expression. Conceivably, a decrease in PKA activity could result in the downregulation of DNT-2 expression in DNT-2 neurons. This negative feedback loop would restore the homeostatic relationship between DNT-2 and dopamine levels.

      Our data indeed demonstrate that DNT-2 and PAM neurons influence each other, not potentially, but really. We have provided data that: DNT-2 and PAMs are connected through circuitry; that the DNT-2 receptors Toll-6 and kek-6 are expressed in DANs, including in PAMs; that alterations in the levels of DNT-2 (both loss and gain of function) and loss of function for the DNT-2 receptors Toll-6 and Kek-6 alter PAM cell number, alter PAM dendritic complexity and alter synaptogenesis in PAMs; alterations in the levels of DNT-2, Toll-6 and kek-6 in adult flies alters dopamine dependent behaviours of climbing, locomotion in an arena and learning and long-term memory. These data firmly demonstrate that the two neuron types DNT-2 and PAMs influence each other.

      We have also shown that over-expression of DNT-2 in DNT-2 neurons increases TH mRNA levels, whereas activation of DNT-2 neurons decreases cAMP levels in DNT-2 neurons in a dopamine/Dop2R-dependent manner. These data show a functional interaction between DNT-2 and PAM neurons.

      Second, the quantification of bouton volumes (no y-axis label in Figure 5 C and D!) and dendrite complexity are not convincingly laid out. Here, the reader expects fine-grained anatomical characterizations of the structures under investigation, and a method to precisely quantify the lengths and branching patterns of individual dendritic arborizations as well as the volume of individual axonal boutons.

      Figure 5C, D do contain Y-axis labels, all our graphs in main manuscript and in supplementary files contain Y-axis labels.

      In fact, we did use a method to precisely quantify the lengths and branching patterns of individual dendritic arborisations, volume of individual boutons and bouton counting. These analyses were carried out using Imaris software. For dendritic branching patterns, the “Filament Autodetect” function was used. Here, dendrites were analysed by tracing semi-automatically each dendrite branch (ie manual correction of segmentation errors) to reconstruct the segmented dendrite in volume. From this segmented dendrite, Imaris provides measurements of total dendrite volume, number and length of dendrite branches, terminal points, etc. For bouton size and number, we used the Imaris “Spot” function. Here, a threshold is set to exclude small dots (eg of background) that do not correspond to synapses/boutons. All samples and genotypes are treated with the same threshold, thus the analysis is objective and large sample sizes can be analysed effectively. We have already provided a description of the use of Imaris in the methods section.

      Third, Figure 1C shows two neurons with the goal of demonstrating between-neuron variability. It is not convincingly demonstrated that the two neurons are actually of the very same type of neuron in different flies or two completely different neurons.

      We thank Reviewer 3 for raising this interesting point. It is not possible to prove which of the four DNT-2A neurons per hemibrain, which we visualised with DNT-2>MCFO, were the same neurons in every individual brain we looked at. This is because in every brain we have looked at, the soma of the neurons were not located in exactly the same location. Furthermore, the arborisation patterns are also different and unique, for each individual brain. Thus, there is natural variability in the position of the soma and in the arborisation patterns. Such variability presumably results from the combination of developmental and activity-dependent plasticity.

      We would like to thank Reviewer 3 for the very positive evaluation of our work and the interesting and valuable feedback.

    1. Author response:

      Public Reviews: 

      Reviewer #1 (Public review): 

      Summary: 

      Here the authors present their evidence linking the mitochondrial uniporter (MCU-1) and olfactory adaptation in C. elegans. They clearly demonstrate a behavioral defect of mcu-1 mutants in adaptation over 60 minutes and present evidence that this gene functions in the AWC primary sensory neurons at, or close to, the time of adaptation. 

      Strengths: 

      The paper is very well organized and their approach to unpacking the role of mcu-1 mutants in olfactory adaptation is very reasonable. The authors lean into diverse techniques including behavior, genetics, and pharmacological manipulation in order to flesh out their model for how MCU-1 functions in AWC neurons with respect to olfaction. 

      Weaknesses: 

      I would like to see the authors strengthen the link between mitochondrial calcium and olfactory adaptation. The authors present some gCaMP data in Figure 5 but it is unclear to me why this tool is not better utilized to explore the mechanism of MCU-1 activity. I think this is very important as the title of the paper states that "mitochondrial calcium modulates.." behavior in AWC and so it would be nice to see more evidence to support this direct connection. I would also like to see the authors place their findings into a model based on previous findings and perhaps examine whether mcu-1 is required for EGL-4 nuclear translocation, which would be straightforward to examine. 

      We agree that observing calcium levels inside the mitochondria would conclusively demonstrate that mitochondria calcium directly impacts neuropeptide secretion and behavior. We will try to do this with a mitochondrially targeted calcium indicator. We will also better integrate our findings to existing models in the literature, such as EGL-4 nuclear localization in AWC in response to prolonged odor exposure. Thank you for your comments.

      Reviewer #2 (Public review): 

      Summary: 

      In their manuscript, "Mitochondrial calcium modulates odor-mediated behavioural plasticity in C. elegans", Lee et al. aim to link a mitochondrial calcium transporter to higher-order neuronal functions that mediate memory and aversive learning behaviours. The authors characterise the role of the mitochondrial calcium uniporter, and a specific subunit of this complex, MCU-1, within a single chemosensory neuron (AWCOFF) during aversive odor learning in the nematode. By genetically manipulating mcu-1 as well as using pharmacological activators and blockers of MCU activity, the study presents compelling evidence that the activity of this individual mitochondrial ion transporter in AWCOFF is sufficient to drive animal behaviour through aversive memory formation. The authors show that perturbations to mcu-1 and MCU activity prevent aversive learning to several chemical odors associated with food absence. The authors propose a model, experimentally validated at several steps, whereby an increase in MCU activity during odor conditioning stimulates mitochondrial calcium influx and an increase in mitochondrial reactive oxygen species (mtROS) production, triggering the release of the neuropeptide NLP-1 from AWC, all of which are required to mediate future avoidance behaviour of the chemical odor. 

      Strengths: 

      Overall, the authors provided robust evidence that mitochondrial function, mediated through MCU activity, contributes to behavioural plasticity. They also demonstrated that ectopic MCU activation or mtROS during odor exposure could accelerate learning. This is quite profound, as it highlights the importance of mitochondrial function in complex neuronal processes beyond their general roles in the development and maintenance of neurons through energy homeostasis and biosynthesis, amongst their other cell-non-specific roles. 

      Weaknesses: 

      While the manuscript is generally robust, there are some concerns that should be addressed to improve the strength of the proposed model: 

      (1) Throughout the manuscript, it is implied that MCU activation caused by odor conditioning changes mitochondrial calcium levels. However, there is no direct experimental evidence of this. For example, the authors write on p.10 "This shows that H2O2 production occurs downstream of MCU activation and calcium influx into the mitochondria", and on p. 11, the statement that prolonged exposure to odors causes calcium influx. Because this is a key element of the proposed model, experimental evidence would be required to support it. 

      We are planning to measure mitochondrial calcium levels directly by using a mitochondrially targeted calcium indicator. We agree that this is a key element of our model.

      (2) Some controls missing, e.g. a heat-shock-only control in WT and mcu-1 (non-transgenic) background in Figure 1h is required to ensure the heat-shock stress does not interfere with odor learning. 

      We will conduct the experiments again with necessary controls.

      (3) Lee et al propose that mcu-1 is required at the adult stage to accomplish odor learning because inducing mcu-1 expression at larval stages did not rescue the phenotype of mcu-1 mutants during adulthood. However, the requirement of MCU for odor learning was narrowed down to a 15' window at the end of odor conditioning (Figure 5c). Is it possible that MCU-1 protein levels decline after larval induction so that MCU-1 is no longer present during adulthood when odor conditioning is performed? 

      Yes, we also noted that the early induction of MCU-1 is not effective to restore learning, and hypothesized that MCU-1 protein may be subject to high turnover. It may be that MCU-1 induced during larval stages no longer exist by the time odor conditioning is performed, although we have not confirmed this. We had a brief sentence noting this in the discussion section, but we will discuss this a little further in the revision. Thank you.

      (4) There is a limited learning effect observable after 30 minutes, and a very pronounced effect in all animals after 90 minutes. The authors very carefully dissect the learning mechanism at 60 minutes of exposure and distinguish processes that are relevant at 60 minutes from those important at 30 minutes. Some explanation or speculation as to why the processes crucial at the 60-minute mark are redundant at 90 minutes of exposure would be important. 

      I think this is in line with Reviewer #1’s comments that we should discuss our findings more in relation to existing models in the literature. We will do this in our revision.

      (5) Given the presumably ubiquitous function of mcu-1/MCU in mitochondrial calcium homeostasis, it is remarkable that its perturbation impacts only a very specific neuronal process in AWC at a very specific time. The authors should elaborate on this surprising aspect of their discovery in the discussion. 

      We will discuss the implication further in our revised manuscript.

      (6) Associated with the above comment, it remains possible that mcu-1 is required in coelomocytes for their ability to absorb NLP-1::Venus (Figure 3B), and the AWC-specific role of mcu-1 for this phenotype should be determined. 

      To confirm that mcu-1 is not required for coelomocyte uptake, we can stimulate NLP-1:Venus secretion in mcu-1 worms by adding H2O2, then observe whether Venus is observed in the coelomocytes. We will include this in our revised manuscript. Thank you for your comments.

      Reviewer #3 (Public review): 

      Summary: 

      This manuscript reports a role for the mitochondrial calcium uniporter gene (mcu-1) in regulating associative learning behavior in C. elegans. This regulation occurs by mcu-1-dependent secretion of the neuropeptide NLP-1 from the sensory neuron AWC. The authors report a post-developmental role for mcu-1 in AWC to promote learning. The authors further show that odor conditioning leads to increases in NLP-1 secretion from AWC, and that interfering with mcu-1 function reduces NLP-1 secretion. Finally, the authors show that NLP-1 secretion increases when ROS levels in AWC are genetically or pharmacologically elevated. The authors propose that mitochondrial calcium entry through MCU-1 in response to odor conditioning leads to the generation of ROS and the subsequent increase in neuropeptide secretion to promote conditioned behavior. 

      Strengths: 

      (1) The authors show convincingly that genetically or pharmacologically manipulating MCU function impacts chemotaxis in a conditioned learning paradigm. 

      (2) The demonstration that the secretion of a specific neuropeptide can be up-regulated by MCU, ROS and odor conditioning is an important and interesting advance that addresses mechanisms by which neuropeptide secretion can be regulated in vivo. 

      Weaknesses: 

      (1) The authors conclusion that mcu-1 functions in the AWC-on neuron is not adequately supported by their rescue experiments. The promoter they use for rescue drives expression in a number of additional neurons including AWC-on, that themselves are implicated in adaptation, leaving open the possibility that mcu-1 may function non-autonomously instead of autonomously in AWC to regulate this behavior. 

      We recognized this as well, and we now have a promoter construct more specific to AWCON (str-2). Using this more specific promoter, we will confirm that the role of mcu-1 is indeed AWCON-specific in our revised manuscript.

      (2) The authors conclude MCU promotes neuropeptide release from AWC by controlling calcium entry into mitochondria, but they did not directly examine the effects of altered MCU function on calcium dynamics either in mitochondria or in the soma, even though they conducted calcium imaging experiments in AWC of wild type animals. Examination of calcium entry in mitochondria would be a direct test of their model.

      We agree. As we stated above for reviewer #1 and #2, we will include results from the mitochondrial calcium data in our revised manuscript.

      (3) The authors' conclusion that mitochondrial-derived ROS produced by MCU activation drives neuropeptide release does not appear to be experimentally supported. A major weakness of this paper is that experiments addressing whether mcu-1 activity indeed produces ROS are not included, leaving unanswered the question of whether MCU is the endogenous source of ROS that drives neuropeptide secretion.

      We can confirm this using mitochondrially targeted redox indicator roGFP, and we will be sure to include the data in the revised manuscript. Thank you for your comments.

    2. eLife Assessment

      This study presents important findings that will allow for a better understanding of the role of mitochondria in behaviours of C. elegans. There is convincing evidence that mutants in a subunit of the mitochondrial calcium uniporter (MCU-1) show defects in olfactory adaptation and this gene regulates neuropeptide secretion and allows for behavioural modulation in C. elegans. However, the evidence that mitochondrial calcium modulates odour-based behaviour in C. elegans is incomplete. This claim would require support from calcium imaging in conditioned WT and mcu-1 animals. This work would be of interest to labs working on behaviours across phyla.

    3. Reviewer #1 (Public review):

      Summary:

      Here the authors present their evidence linking the mitochondrial uniporter (MCU-1) and olfactory adaptation in C. elegans. They clearly demonstrate a behavioral defect of mcu-1 mutants in adaptation over 60 minutes and present evidence that this gene functions in the AWC primary sensory neurons at, or close to, the time of adaptation.

      Strengths:

      The paper is very well organized and their approach to unpacking the role of mcu-1 mutants in olfactory adaptation is very reasonable. The authors lean into diverse techniques including behavior, genetics, and pharmacological manipulation in order to flesh out their model for how MCU-1 functions in AWC neurons with respect to olfaction.

      Weaknesses:

      I would like to see the authors strengthen the link between mitochondrial calcium and olfactory adaptation. The authors present some gCaMP data in Figure 5 but it is unclear to me why this tool is not better utilized to explore the mechanism of MCU-1 activity. I think this is very important as the title of the paper states that "mitochondrial calcium modulates.." behavior in AWC and so it would be nice to see more evidence to support this direct connection. I would also like to see the authors place their findings into a model based on previous findings and perhaps examine whether mcu-1 is required for EGL-4 nuclear translocation, which would be straightforward to examine.

    4. Reviewer #2 (Public review):

      Summary:

      In their manuscript, "Mitochondrial calcium modulates odor-mediated behavioural plasticity in C. elegans", Lee et al. aim to link a mitochondrial calcium transporter to higher-order neuronal functions that mediate memory and aversive learning behaviours. The authors characterise the role of the mitochondrial calcium uniporter, and a specific subunit of this complex, MCU-1, within a single chemosensory neuron (AWCOFF) during aversive odor learning in the nematode. By genetically manipulating mcu-1 as well as using pharmacological activators and blockers of MCU activity, the study presents compelling evidence that the activity of this individual mitochondrial ion transporter in AWCOFF is sufficient to drive animal behaviour through aversive memory formation. The authors show that perturbations to mcu-1 and MCU activity prevent aversive learning to several chemical odors associated with food absence. The authors propose a model, experimentally validated at several steps, whereby an increase in MCU activity during odor conditioning stimulates mitochondrial calcium influx and an increase in mitochondrial reactive oxygen species (mtROS) production, triggering the release of the neuropeptide NLP-1 from AWC, all of which are required to mediate future avoidance behaviour of the chemical odor.

      Strengths:

      Overall, the authors provided robust evidence that mitochondrial function, mediated through MCU activity, contributes to behavioural plasticity. They also demonstrated that ectopic MCU activation or mtROS during odor exposure could accelerate learning. This is quite profound, as it highlights the importance of mitochondrial function in complex neuronal processes beyond their general roles in the development and maintenance of neurons through energy homeostasis and biosynthesis, amongst their other cell-non-specific roles.

      Weaknesses:

      While the manuscript is generally robust, there are some concerns that should be addressed to improve the strength of the proposed model:

      (1) Throughout the manuscript, it is implied that MCU activation caused by odor conditioning changes mitochondrial calcium levels. However, there is no direct experimental evidence of this. For example, the authors write on p.10 "This shows that H2O2 production occurs downstream of MCU activation and calcium influx into the mitochondria", and on p. 11, the statement that prolonged exposure to odors causes calcium influx. Because this is a key element of the proposed model, experimental evidence would be required to support it.

      (2) Some controls missing, e.g. a heat-shock-only control in WT and mcu-1 (non-transgenic) background in Figure 1h is required to ensure the heat-shock stress does not interfere with odor learning.

      (3) Lee et al propose that mcu-1 is required at the adult stage to accomplish odor learning because inducing mcu-1 expression at larval stages did not rescue the phenotype of mcu-1 mutants during adulthood. However, the requirement of MCU for odor learning was narrowed down to a 15' window at the end of odor conditioning (Figure 5c). Is it possible that MCU-1 protein levels decline after larval induction so that MCU-1 is no longer present during adulthood when odor conditioning is performed?

      (4) There is a limited learning effect observable after 30 minutes, and a very pronounced effect in all animals after 90 minutes. The authors very carefully dissect the learning mechanism at 60 minutes of exposure and distinguish processes that are relevant at 60 minutes from those important at 30 minutes. Some explanation or speculation as to why the processes crucial at the 60-minute mark are redundant at 90 minutes of exposure would be important.

      (5) Given the presumably ubiquitous function of mcu-1/MCU in mitochondrial calcium homeostasis, it is remarkable that its perturbation impacts only a very specific neuronal process in AWC at a very specific time. The authors should elaborate on this surprising aspect of their discovery in the discussion.

      (6) Associated with the above comment, it remains possible that mcu-1 is required in coelomocytes for their ability to absorb NLP-1::Venus (Figure 3B), and the AWC-specific role of mcu-1 for this phenotype should be determined.

    5. Reviewer #3 (Public review):

      Summary:

      This manuscript reports a role for the mitochondrial calcium uniporter gene (mcu-1) in regulating associative learning behavior in C. elegans. This regulation occurs by mcu-1-dependent secretion of the neuropeptide NLP-1 from the sensory neuron AWC. The authors report a post-developmental role for mcu-1 in AWC to promote learning. The authors further show that odor conditioning leads to increases in NLP-1 secretion from AWC, and that interfering with mcu-1 function reduces NLP-1 secretion. Finally, the authors show that NLP-1 secretion increases when ROS levels in AWC are genetically or pharmacologically elevated. The authors propose that mitochondrial calcium entry through MCU-1 in response to odor conditioning leads to the generation of ROS and the subsequent increase in neuropeptide secretion to promote conditioned behavior.

      Strengths:

      (1) The authors show convincingly that genetically or pharmacologically manipulating MCU function impacts chemotaxis in a conditioned learning paradigm.

      (2) The demonstration that the secretion of a specific neuropeptide can be up-regulated by MCU, ROS and odor conditioning is an important and interesting advance that addresses mechanisms by which neuropeptide secretion can be regulated in vivo.

      Weaknesses:

      (1) The authors conclusion that mcu-1 functions in the AWC-on neuron is not adequately supported by their rescue experiments. The promoter they use for rescue drives expression in a number of additional neurons including AWC-on, that themselves are implicated in adaptation, leaving open the possibility that mcu-1 may function non-autonomously instead of autonomously in AWC to regulate this behavior.

      (2) The authors conclude MCU promotes neuropeptide release from AWC by controlling calcium entry into mitochondria, but they did not directly examine the effects of altered MCU function on calcium dynamics either in mitochondria or in the soma, even though they conducted calcium imaging experiments in AWC of wild type animals. Examination of calcium entry in mitochondria would be a direct test of their model.

      (3) The authors' conclusion that mitochondrial-derived ROS produced by MCU activation drives neuropeptide release does not appear to be experimentally supported. A major weakness of this paper is that experiments addressing whether mcu-1 activity indeed produces ROS are not included, leaving unanswered the question of whether MCU is the endogenous source of ROS that drives neuropeptide secretion.

    1. eLife Assessment

      This manuscript presents a valuable minimal model of habituation which is quantified by information theoretic measures. The results here could be of use in interpreting habituation behavior in a range of biological systems. However, the evidence presented is incomplete and would benefit from more rigorous approaches and a fuller accounting of the hallmarks of habituation.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths:

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      Weaknesses:

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery.

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed?

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above.<br /> Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

    3. Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation.

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained:

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels?

    4. Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment.

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed:

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      (2) Several clarifications are needed on the treatment of energy dissipation.<br /> - When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.<br /> - I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on , however, is not fully clear. If the environment were static and the memory block was absent, the term with would still contribute to dissipation. What would be the nature of this dissipation?<br /> - Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate?

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics?

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Nicoletti et al. presents a minimal model of habituation, a basic form of non-associative learning, addressing both from dynamical and information theory aspects of how habituation can be realized. The authors identify that negative feedback provided with a slow storage mechanism is sufficient to explain habituation.

      Strengths:

      The authors combine the identification of the dynamical mechanism with information-theoretic measures to determine the onset of habituation and provide a description of how the system can gain maximum information about the environment.

      We thank the reviewer for highlighting the strength of our work.

      Weaknesses:

      I have several main concerns/questions about the proposed model for habituation and its plausibility. In general, habituation does not only refer to a decrease in the responsiveness upon repeated stimulation but as Thompson and Spencer discussed in Psych. Rev. 73, 16-43 (1966), there are 10 main characteristics of habituation, including (i) spontaneous recovery when the stimulus is withheld after response decrement; dependence on the frequency of stimulation such that (ii) more frequent stimulation results in more rapid and/or more pronounced response decrement and more rapid spontaneous recovery; (iii) within a stimulus modality, the less intense the stimulus, the more rapid and/or more pronounced the behavioral response decrement; (iv) the effects of repeated stimulation may continue to accumulate even after the response has reached an asymptotic level (which may or may not be zero, or no response). This effect of stimulation beyond asymptotic levels can alter subsequent behavior, for example, by delaying the onset of spontaneous recovery.

      These are only a subset of the conditions that have been experimentally observed and therefore a mechanistic model of habituation, in my understanding, should capture the majority of these features and/or discuss the absence of such features from the proposed model.

      We are really grateful to the reviewer for pointing out these aspects of habituation that we overlooked in the previous version of our manuscript. Indeed, our model is able to capture most of these 10 observed behaviors, specifically: 1) habituation; 2) spontaneous recovery; 3) potentiation of habituation; 4) frequency sensitivity; and 5) intensity sensitivity. Here, we are following the same terminology employed in bioRxiv 2024.08.04.606534, the paper highlighted by the referee. Regarding the hallmark 6) subliminal accumulation, we also believe that our model can capture it as well, but more analyses are needed to substantiate this claim. We will include the discussion of these points in the revised version.

      Notably, in line with the discussion in bioRxiv 2024.08.04.606534, we also think that feature 10) long-term habituation, is ambiguous and its appearance might be simply related to the other features discussed above. In the revised version, we will detail our take on this aspect in relation to the presented model.

      All other hallmarks require the presence of multiple stimuli and, as a consequence, they cannot be observed within our model, but are interesting lines of research for future investigations. We believe that this addition will help clarify the validity of the model and the relevance of our result, consequently improving the quality of our manuscript.

      Furthermore, the habituated response in steady-state is approximately 20% less than the initial response, which seems to be achieved already after 3-4 pulses, the subsequent change in response amplitude seems to be negligible, although the authors however state "after a large number of inputs, the system reaches a time-periodic steady-state". How do the authors justify these minimal decreases in the response amplitude? Does this come from the model parametrization and is there a parameter range where more pronounced habituation responses can be observed?

      The referee is correct, but this is solely a consequence of the specific set of parameters we selected. We made this choice solely for visualization purposes. In the next version, when different emerging behaviors characterizing habituation are discussed, we will also present a set of parameters for which habituation can be better appreciated, justifying our new choice.

      We stated that the time-periodic steady-state is reached “after a large number of stimuli” from a mathematical perspective. However, by using a habituation threshold, as defined in bioRxiv 2024.08.04.606534 for example, we can say that the system is habituated after a few stimuli for the set of parameters selected in the first version of the manuscript. We will also discuss this aspect in the Supplemental Material of the revised version, as it will also be important to appreciate the hallmarks of habituation listed above.

      The same is true for the information content (Figure 2f) - already at the first pulse, IU, H ~ 0.7 and only negligibly increases afterwards. In my understanding, during learning, the mutual information between the input and the internal state increases over time and the system extracts from these predictions about its responses. In the model presented by the authors, it seems the system already carries information about the environment which hardly changes with repeated stimulus presentation. The complexity of the signal is also limited, and it is very hard to clarify from the presented results, whether the proposed model can actually explain basic features of habituation, as mentioned above.

      The point about information is more subtle. We can definitely choose a set of parameters for which the information gain is higher and we will show it in the Supplemental Material of the revised version. However, as the reviewer correctly points out, it is difficult to give an interpretation of the specific value of I_U,H for such a minimal model.

      We also remark that, since the readout population and the receptor both undergo a fast dynamics (with appropriate timescales as discussed in the text), we are not observing the transient gain of information associated with the first stimulus and, as such, the mutual information presents a discontinuous behavior resembling the dynamics of the readout.

      Additionally, there have been two recent models on habituation and I strongly suggest that the authors discuss their work in relation to recent works (bioRxiv 2024.08.04.606534; arXiv:2407.18204).

      We thank the reviewer for pointing out these relevant references. We will discuss analogies and differences in the revised version of the main text. The main difference is the fact that information-theoretic aspects of habituation are not discussed in the presented references, while the idea of this work is to elucidate exactly the interplay between information gain and habituation dynamics.

      Reviewer #2 (Public review):

      In this study, the authors aim to investigate habituation, the phenomenon of increasing reduction in activity following repeated stimuli, in the context of its information-theoretic advantage. To this end, they consider a highly simplified three-species reaction network where habituation is encoded by a slow memory variable that suppresses the receptor and therefore the readout activity. Using analytical and numerical methods, they show that in their model the information gain, the difference between the mutual information between the signal and readout after and before habituation, is maximal for intermediate habituation strength. Furthermore, they demonstrate that the Pareto front corresponds to an optimization strategy that maximizes the mutual information between signal and readout in the steady state, minimizes some form of dissipation, and also exhibits similar intermediate habituation strength. Finally, they briefly compare predictions of their model to whole-brain recordings of zebrafish larvae under visual stimulation.

      The author's simplified model might serve as a solid starting point for understanding habituation in different biological contexts as the model is simple enough to allow for some analytic understanding but at the same time exhibits all basic properties of habituation in sensory systems. Furthermore, the author's finding of maximal information gain for intermediate habituation strength via an optimization principle is, in general, interesting. However, the following points remain unclear or are weakly explained:

      We thank the reviewer for deeming our work interesting and for considering it a solid starting point for understanding habituation in biological systems.

      (1) Is it unclear what the meaning of the finding of maximal information gain for intermediate habituation strength is for biological systems? Why is information gain as defined in the paper a relevant quantity for an organism/cell? For instance, why is a system with low mutual information after the first stimulus and intermediate mutual information after habituation better than one with consistently intermediate mutual information? Or, in other words, couldn't the system try to maximize the mutual information acquired over the whole time series, e.g., the time series mutual information between the stimulus and readout?

      This is an important and delicate aspect to discuss. We considered the mutual information with a prolonged stimulation when building the Pareto front, by maximizing this quantity while minimizing the dissipation. The observation that the Pareto front lies in the vicinity of the maximum of the information gain hints at the fact that reducing the information gain by increasing the mutual information at each stimulation will require more energy. However, we did not thoroughly explore this aspect by considering all sources of dissipation and the fact that habituation is, anyway, a dynamical phenomenon. In the revised version, we will clarify this point, extending our analyses.

      We would like to add that, from a naive perspective, while the first stimulation will necessarily trigger a certain mutual information, multiple observations of the same stimulus have to reflect into accumulated infor

      mation that consequently drives the onset of observed dynamical behaviors, such as habituation.

      (2) The model is very similar to (or a simplification of previous models) for adaptation in living systems, e.g., for adaptation in chemotaxis via activity-dependent methylation and demethylation. This should be made clearer.

      We apologize for having missed this point. Our choice has been motivated by the fact that we wanted to avoid any confusion between the usual definition of (perfect) adaptation and habituation. At any rate, we will add this clarification in the revised version.

      (3) It remains unclear why this optimization principle is the most relevant one. While it makes sense to maximize the mutual information between stimulus and readout, there are various choices for what kind of dissipation is minimized. Why was \delta Q_R chosen and not, for instance, \dot{\Sigma}_int or the sum of both? How would the results change in that case? And how different are the results if the mutual information is not calculated for the strong stimulation input statistics but for the background one?

      We thank the referee for giving us the opportunity to deepen this aspect of the manuscript. We decided to minimize \delta Q_R since this dissipation is unavoidable. In fact, considering the existence of two different pathways implementing sensing and feedback, the presence of any input will result in a dissipation produced by the receptor. This energy consumption is reflected in \delta Q_R. Conversely, the dissipation associated with the storage is always zero in the limit of a fast memory. However, we know that such a limit is pathological and leads to no habituation. As a consequence, in the revised version we will discuss other choices for our optimization approach, along with their potentialities and limitations.

      The dependence of the Pareto front on the stimulus strength is shown in the Supplemental Material, but not in relation to habituation and information gain. We will strengthen this part in the revised version of the manuscript, elaborating more on the connection between optimality, information gain, and dynamical behavior.

      (4) The comparison to the experimental data is not too strong of an argument in favor of the model. Is the agreement between the model and the experimental data surprising? What other behavior in the PCA space could one have expected in the data? Shouldn't the 1st PC mostly reflect the "features", by construction, and other variability should be due to progressively reduced activity levels?

      The agreement between data and model is not surprising - we agree on this - since the data exhibit habituation. However, the fact that, without any explicit biological details, our minimal model is able to capture the features of a complex neural system just by looking at the PCs is non-trivial. The 1st PC only reflects the feature that captures most of the variance of the data and, as such, it is difficult to have a-priori expectations on what it should represent. Depending on the behavior of higher-order PCs, we may include them in the revised version if any interesting results arise.

      Reviewer #3 (Public review):

      The authors use a generic model framework to study the emergence of habituation and its functional role from information-theoretic and energetic perspectives. Their model features a receptor, readout molecules, and a storage unit, and as such, can be applied to a wide range of biological systems. Through theoretical studies, the authors find that habituation (reduction in average activity) upon exposure to repeated stimuli should occur at intermediate degrees to achieve maximal information gain. Parameter regimes that enable these properties also result in low dissipation, suggesting that intermediate habituation is advantageous both energetically and for the purpose of retaining information about the environment.

      A major strength of the work is the generality of the studied model. The presence of three units (receptor, readout, storage) operating at different time scales and executing negative feedback can be found in many domains of biology, with representative examples well discussed by the authors (e.g. Figure 1b). A key takeaway demonstrated by the authors that has wide relevance is that large information gain and large habituation cannot be attained simultaneously. When energetic considerations are accounted for, large information gain and intermediate habituation appear to be a favorable combination.

      We thank the referee for this positive assessment of our work and its generality.

      While the generic approach of coarse-graining most biological detail is appealing and the results are of broad relevance, some aspects of the conducted studies, the problem setup, and the writing lack clarity and should be addressed:

      (1) The abstract can be further sharpened. Specifically, the "functional role" mentioned at the end can be made more explicit, as it was done in the second-to-last paragraph of the Introduction section ("its functional advantages in terms of information gain and energy dissipation"). In addition, the abstract mentions the testing against experimental measurements of neural responses but does not specify the main takeaways. I suggest the authors briefly describe the main conclusions of their experimental study in the abstract.

      We thank the referee for this suggestion. The revised version will present a modified abstract in line with the reviewer’s proposal.

      (2) Several clarifications are needed on the treatment of energy dissipation.

      - When substituting the rates in Eq. (1) into the definition of δQ_R above Eq. (10), "σ" does not appear on the right-hand side. Does this mean that one of the rates in the lower pathway must include σ in its definition? Please clarify.

      We apologize to the referee for this typo. Indeed, \sigma sets the energy scale of the feedback and, as such, it appears in the energetic driving given by the feedback on the receptor, i.e., together with \kappa in Eq. (1). We will fix this issue in the revised version. Moreover, we will check the entire manuscript to be sure that all formulas are consistent.

      - I understand that the production of storage molecules has an associated cost σ and hence contributes to dissipation. The dependence of receptor dissipation on <H>, however, is not fully clear. If the environment were static and the memory block was absent, the term with <H> would still contribute to dissipation. What would be the nature of this dissipation?

      In the spirit of building a paradigmatic minimal model with a thermodynamic meaning, we considered H to act as an external thermodynamic driving. Since this driving acts on a different pathway with respect to the one affected by the storage, the receptor is driven out of equilibrium by its presence. By eliminating the memory block, we would also be necessarily eliminating the presence of the pathway associated with the storage effect (“internal pathway” in the manuscript). In this case, the receptor is a 2-state, 1-pathway system and, as such, it always satisfies an effective detailed balance. As a consequence, the definition of \delta Q_R reported in the manuscript does not hold anymore and the receptor does not exhibit any dissipation. Our choice to model two different pathways has been biologically motivated. We will make this crucial aspect clearer in the revised manuscript.

      - Similarly, in Eq. (9) the authors use the ratio of the rates Γ_{s → s+1} and Γ_{s+1 → s} in their expression for internal dissipation. The first-rate corresponds to the synthesis reaction of memory molecules, while the second corresponds to a degradation reaction. Since the second reaction is not the microscopic reverse of the first, what would be the physical interpretation of the log of their ratio? Since the authors already use σ as the energy cost per storage unit, why not use σ times the rate of producing S as a metric for the dissipation rate?

      In the current version of the manuscript, we employed the scheme of a controlled birth and death process to model the coupled process of readout and storage production. Since we are not dealing with a detailed biochemical underlying network, we used this coarse-grained description to capture the main features of the dynamics. In this sense, the considered reactions produce and destroy a molecule from a certain pool even if they are controlled in different ways by the readout. However, we completely agree with the point of view of the referee and will analyze our results following their suggestion.

      (3) Impact of the pre-stimulus state. The plots in Figure 2 suggest that the environment was static before the application of repeated stimuli. Can the authors comment on the impact of the pre-stimulus state on the degree of habituation and its optimality properties? Specifically, would the conclusions stay the same if the prior environment had stochastic but aperiodic dynamics?

      The initial stimulus is indeed stochastic with an average constant in time. Model response depends on the pre-stimulus level, since it also sets the stationary storage concentration before the first “strong” stimulation arrives. This dependence is not crucial for our result but deserves proper discussion, as the referee correctly pointed out. We will clarify this point in the revised version of this study.

      (4) Clarification about the memory requirement for habituation. Figure 4 and the associated section argue for the essential role that the storage mechanism plays in habituation. Indeed, Figure 4a shows that the degree of habituation decreases with decreasing memory. The graph also shows that in the limit of vanishingly small Δ⟨S⟩, the system can still exhibit a finite degree of habituation. Can the authors explain this limiting behavior; specifically, why does habituation not vanish in the limit Δ⟨S⟩ -> 0?

      We apologize for the lack of clarity here. Actually, Δ⟨S⟩ is not strictly zero, but equal to 0.15% at the final point. However, due to rounding this appears as 0% in the plot, and we will fix it in the revised version. Let us note that the fact that Δ⟨S⟩ is small signals a nonlinear dependence of Δ⟨U⟩ from Δ⟨S⟩, but no contradiction. We will clarify this aspect in the revised version.

    1. eLife Assessment

      The paper describes a novel approach for inferring features of synaptic networks from recordings of individual cells within the network. The paper will be a valuable contribution to those studying central pattern generators, including those involved in respiration. However, the theoretical approach to drawing inferences regarding the underlying synaptic currents is incomplete as it relies on unsupported simplifying assumptions.

    2. Reviewer #1 (Public review):

      Summary:

      The paper develops a phase method to obtain the excitatory and inhibitory afferents to certain neuron populations in the brainstem. The inferred contributions are then compared to the results of voltage clamp and current clamp experiments measuring the synaptic contributions to post-I, aug-E, and ramp-I neurons.

      Strengths:

      The electrophysiology part of the paper is sound and reports novel features with respect to earlier work by JC Smith et al 2012, Paton et al 2022 (and others) who have mapped circuits of the respiratory central pattern generator. Measurements on ramp-I neurons, late-I neurons, and two types of post-I neurons in Figure 2 besides measurements of synaptic inputs to these neurons in Figure 5 are to my knowledge new.

      Weaknesses:

      The phase method for inferring synaptic conductances fails to convince. The method rests on many layers of assumptions and the inferred connections in Figure 4 remain speculative. To be convincing, such a method ought to be tested first on a model CPG with known connectivity to assess how good it is at inferring known connections back from the analysis of spatio-temporal oscillations. For biological data, once the network connectivity has been inferred as claimed, the straightforward validation is to reconstruct the experimental oscillations (Figure 2) noting that Rybak et al (Rybak, Paton Schwaber J. Neurophysiol. 77, 1994 (1997)) have already derived models for the respiratory neurons.

      The transformation from time to phase space, unlike in the Kuramoto model, is not justified here (Line 94) and is wrong. The underpinning idea that "the synaptic conductances depend on the cycle phase and not on time explicitly" is flawed because synapses have characteristic decay times and delays to response which remain fixed when the period of network oscillations increases. Synaptic properties depend on time and not on phase in the network. One major consequence relevant to the present identification of excitatory or inhibitory behaviour, is that it cannot account for change in the behaviour of inhibitory synapses - from inhibitory to excitatory action - when the inhibitory decay time becomes commensurable to the period of network oscillations (Wang & Buzsaki Journal of Neuroscience 16, 6402 (1996), van Vreeswijk et al. J. Comp. Neuroscience 1,313 (1994), Borgers and Kopell Neural Comput. 15, 2003). In addition, even small delays in the inhibitory synapse response relative to the pre-synaptic action potential also produce in-phase synchronization (Chauhan et al., Sci. Rep. 8, 11431 (2018); Borgers and Kopell, Neural Comput. 15, 509 (2003)). The present assumptions are way too simplistic because you cannot account for these commensurability effects with a single parameter like the network phase. There is therefore little confidence that this model can reliably distinguish excitatory from inhibitory synapses when their dynamic properties are not properly taken into account.

      Line 82, Equation 1 makes extremely crude assumptions that the displacement current (CdV/dt) is negligible and that the ion channel currents are all negligible. Vm(t) is also not defined. The assumption that the activation/inactivation times of all ion channels are small compared to the 10-20ms decay time of synaptic currents is not true in general. Same for the displacement current. The leak conductance is typically g~0.05-0.09ms/cm^2 while C~1uF/cm^2. Therefore the ratio C/g leak is in the 10-20ms range - the same as the typical docking neurotransmitter time in synapses.

      Models of brainstem CPG circuits have been known to exist for decades: JC Smith et al 2012, Paton et al 2022, Bellingham Clin. Exp. Pharm. And Physiol. 25, 847 (1998); Rubin et al., J. Neurophysiol. 101, 2146 (2009) among others. The present paper does not discuss existing knowledge on respiratory networks and gives the impression of reinventing the wheel from scratch. How will this paper add to existing knowledge?

    3. Reviewer #2 (Public review):

      Summary:

      By measuring intracellular changes in membrane voltage from a single neuron of the medulla the authors describe a method for determining the balance of excitatory and inhibitory synaptic drive onto a single neuron within this important brain region.

      Strengths:

      This approach could be valuable in describing the microcircuits that generate rhythms within this respiratory control centre. This method could more generally be used to enable microcircuits to be studied without the need for time-consuming anatomical tracing or other more involved electrophysiological techniques.

      Weaknesses:

      This approach involves assuming the reversal potential that is associated with the different permeant ions that underlie the excitation and inhibition as well as the application of Ohms law to estimate the contribution of excitation and inhibitory conductance. My first concern is that this approach relies on a linear I-V relationship between the measured voltage and the estimated reversal potential. However, open rectification is a feature of any I-V relationship generated by asymmetric distributions of ions (see the GHK current equation) and will therefore be a particular issue for the inhibition resulting from asymmetrical Cl- ion gradients across GABA-A receptors. The mixed cation conductance that underlies most synaptic excitation will also generate a non-linear I-V relationship due to the inward rectification associated with the polyamine block of AMPA receptors. Could the authors please speculate what impact these non-linearities could have on results obtained using their approach?

      This approach has similarities to earlier studies undertaken in the visual cortex that estimated the excitatory and inhibitory synaptic conductance changes that contributed to membrane voltage changes during receptive field stimulation. However, these approaches also involved the recording of transmembrane current changes during visual stimulation that were undertaken in voltage-clamp at various command voltages to estimate the underlying conductance changes. Molkov et al have attempted to essentially deconvolve the underlying conductance changes without this information and I am concerned that this simply may not be possible. The current balance equation (1) cited in this study is based on the parallel conductance model developed by Hodgkin & Huxley. However, one key element of the HH equations is the inclusion of an estimate of the capacitive current generated due to the change in voltage across the membrane capacitance. I would always consider this to be the most important motivation for the development of the voltage-clamp technique in the 1930's. Indeed, without subtraction of the membrane capacitance, it is not possible to isolate the transmembrane current in the way that previous studies have done. In the current study, I feel it is important that the voltage change due to capacitive currents is taken into consideration in some way before the contribution of the underlying conductance changes are inferred.

      Studies using acute slicing preparations to examine circuit effects have often been limited to the study of small microcircuits - especially feedforward and feedback interneuron circuits. It is widely accepted that any information gained from this approach will always be compromised by the absence of patterned afferent input from outside the brain region being studied. In this study, descending control from the Pons and the neocortex will not be contributing much to the synaptic drive and ascending information from respiratory muscles will also be absent completely. This may not have been such a major concern if this study was limited to demonstrating the feasibility of a methodological approach. However, this limitation does need to be considered when using an approach of this type to speculate on the prevalence of specific circuit motifs within the medulla (Figure 4). Therefore, I would argue that some discussion of this limitation should be included in this manuscript.

    1. eLife Assessment

      The study by Power and colleagues is important as elucidating the dynamic immune responses to photoreceptor damage in vivo potentiates future work in the field to better understand the disease process. However the evidence supporting the authors' claims is incomplete. The current manuscript would further benefit from validating their conclusion with additional supporting data from earlier time points (6 to 12 hours), additional markers to characterize neutrophils, more n numbers to strengthen the analysis, and evaluation of immune responses in mice with a stronger laser ablation, as well as further evidence to distinguish resident microglia vs. infiltrating macrophages due to the BRB breakdown. The authors should reorganize the article to make it easier and more straightforward to follow.

    2. Reviewer #1 (Public review):

      Summary:

      The authors aimed to investigate the interaction between tissue-resident immune cells (microglia) and circulating systemic neutrophils in response to acute, focal retinal injury. They induced retinal lesions using 488 nm light to ablate photoreceptor (PR) outer segments, then utilized various imaging techniques (AOSLO, SLO, and OCT) to study the dynamics of fluorescent microglia and neutrophils in mice over time. Their findings revealed that while microglia showed a dynamic response and migrated to the injury site within a day, neutrophils were not recruited to the area despite being nearby. Post-mortem confocal microscopy confirmed these in vivo results. The study concluded that microglial activation does not recruit neutrophils in response to acute, focal photoreceptor loss, a scenario common in many retinal diseases.

      Strengths:

      The primary strength of this manuscript lies in the techniques employed.

      In this study, the authors utilized advanced Adaptive Optics Scanning Laser Ophthalmoscopy (AOSLO) to document immune cell interactions in the retina accurately. AOSLO's micron-level resolution and enhanced contrast, achieved through near-infrared (NIR) light and phase-contrast techniques, allowed visualization of individual immune cells without extrinsic dyes. This method combined confocal reflectance, phase-contrast, and fluorescence modalities to reveal various cell types simultaneously. Confocal AOSLO tracked cellular changes with less than 6 μm axial resolution, while phase-contrast AOSLO provided detailed views of vascular walls, blood cells, and immune cells. Fluorescence imaging enabled the study of labeled cells and dyes throughout the retina. These techniques, integrated with conventional histology and Optical Coherence Tomography (OCT), offered a comprehensive platform to visualize immune cell dynamics during retinal inflammation and injury.

      Weaknesses:

      One significant weakness of the manuscript is the use of Cx3cr1GFP mice to specifically track GFP-expressing microglia. While this model is valuable for identifying resident phagocytic cells when the blood-retinal barrier (BRB) is intact, it is important to note that recruited macrophages also express the same marker following BRB breakdown. This overlap complicates the interpretation of results and makes it difficult to distinguish between the contributions of microglia and infiltrating macrophages, a point that is not addressed in the manuscript.

      Another major concern is the time point chosen for analyzing the neutrophil response. The authors assess neutrophil activity 24 hours after injury, which may be too late to capture the initial inflammatory response. This delayed assessment could overlook crucial early dynamics that occur shortly after injury, potentially impacting the overall findings and conclusions of the study.

    3. Reviewer #2 (Public review):

      Summary:

      This study uses in vivo multimodal high-resolution imaging to track how microglia and neutrophils respond to light-induced retinal injury from soon after injury to 2 months post-injury. The in vivo imaging finding was subsequently verified by an ex vivo study. The results suggest that despite the highly active microglia at the injury site, neutrophils were not recruited in response to acute light-induced retinal injury.

      Strengths:

      An extremely thorough examination of the cellular-level immune activity at the injury site. In vivo imaging observations being verified using ex vivo techniques is a strong plus.

      Weaknesses:

      This paper is extremely long, and in the perspective of this reviewer, needs to be better organized.

      Study weakness: though the finding prompts more questions and future studies, the findings discussed in this paper are potentially important for us to understand how the immune cells respond differently to different severity levels of injury.

    4. Reviewer #3 (Public review):

      Summary:

      This work investigated the immune response in the murine retina after focal laser lesions. These lesions are made with close to 2 orders of magnitude lower laser power than the more prevalent choroidal neovascularization model of laser ablation. Histology and OCT together show that the laser insult is localized to the photoreceptors and spares the inner retina, the vasculature, and the pigment epithelium. As early as 1-day after injury, a loss of cell bodies in the outer nuclear layer is observed. This is accompanied by strong microglial proliferation at the site of injury in the outer retina where microglia do not typically reside. The injury did not seem to result in the extravasation of neutrophils from the capillary network constituting one of the main findings of the paper. The demonstrated paradigm of studying the immune response and potentially retinal remodeling in the future in vivo is valuable and would appeal to a broad audience in visual neuroscience. However, there are some issues with the conclusions drawn from the data and analysis that can be addressed to further bolster the manuscript.

      Strengths:

      Adaptive optics imaging of the murine retina is cutting edge and enables non-destructive visualization of fluorescently labeled cells in the milieu of retinal injury. As may be obvious, this in vivo approach is beneficial for studying fast and dynamic immune processes on a local time scale - minutes and hours, and also for the longer days-to-months follow-up of retinal remodeling as demonstrated in the article. In certain cases, the in vivo findings are corroborated with histology.

      The analysis is sound and accompanied by stunning video and static imagery. A few different sets of mouse models are used, (a) two different mouse lines, each with a fluorescent tag for neutrophils and microglia, (b) two different models of inflammation - endotoxin-induced uveitis (EAU) and laser ablation are used to study differences in the immune interaction.

      One of the major advances in this article is the development of the laser ablation model for 'mild' retinal damage as an alternative to the more severe neovascularization models. While not directly shown in the article, this model would potentially allow for controlling the size, depth, and severity of the laser injury opening interesting avenues for future study.

      Weaknesses:

      (1) It is unclear based on the current data/study to what extent the mild laser damage phenotype is generalizable to disease phenotypes. The outer nuclear cell loss of 28% and a complete recovery in 2 months would seem quite mild, thus the generalizability in terms of immune-mediated response in the face of retinal remodeling is not certain, specifically whether the key finding regarding the lack of neutrophil recruitment will be maintained with a stronger laser ablation.

      (2) Mice numbers and associated statistics are insufficient to draw strong conclusions in the paper on the activity of neutrophils, some examples are below :

      a) 2 catchup mice and 2 positive control EAU mice are used to draw inferences about immune-mediated activity in response to injury. If the goal was to show 'feasibility' of imaging these mouse models for the purposes of tracking specific cell type behavior, the case is sufficiently made and already published by the authors earlier. It is possible that a larger sample size would alter the conclusion.

      b) There are only 2 examples of extravasated neutrophils in the entire article, shown in the positive control EAU model. With the rare extravasation events of these cells and their high-speed motility, the chance of observing their exit from the vasculature is likely low overall, therefore the general conclusions made about their recruitment or lack thereof are not justified by these limited examples shown.

      c) In Figure 3, the 3-day time point post laser injury shows an 18% reduction in the density of ONL nuclei (p-value of 0.17 compared to baseline). In the case of neutrophils, it is noted that "Control locations (n = 2 mice, 4 z-stacks) had 15 {plus minus} 8 neutrophils per sq.mm of retina whereas lesioned locations (n = 2 mice, 4 z-stacks) had 23 {plus minus} 5 neutrophils per sq.mm of retina (Figure 10b). The difference between control and lesioned groups was not statistically significant (p = 0.19)." These data both come from histology. While the p-values - 0.17 and 0.19 - are similar, in the first case a reduction in ONL cell density is concluded while in the latter, no difference in neutrophil density is inferred in the lesioned case compared to control. Why is there a difference in the interpretation where the same statistical test and methodology are used in both cases? Besides this statistical nuance, is there an alternate possibility that there is an increased, albeit statistically insignificant, concentration of circulating neutrophils in the lesioned model? The increase is nearly 50% (15 {plus minus} 8 vs. 23 {plus minus} 5 neutrophils per sq.mm) and the reader may wonder if a larger animal number might skew the statistic towards significance.

      (2) The conclusions on the relative activity of neutrophils and microglia come from separate animals. The reader may wonder why simultaneous imaging of microglia and neutrophils is not shown in either the EAU mice or the fluorescently labeled catchup mice where the non-labeled cell type could possibly be imaged with phase-contrast as has been shown by the authors previously. One might suspect that the microglia dynamics are not substantially altered in these mice compared to the CX3CR1-GFP mice subjected to laser lesions, but for future applicability of this paradigm of in vivo imaging assessment of the laser damage model, including documenting the repeatability of the laser damage model and the immune cell behavior, acquiring these data in the same animals would be critical.

      (3) Along the same lines as above, the phase contrast ONL images at time points from 3-day to 2-month post laser injury are not shown and the absence of this data is not addressed. This missing data pertains only to the in vivo imaging mice model but are conducted in histology that adequately conveys the time-course of cell loss in the ONL. It is suggested that the reason be elaborated for the exclusion of this data and the simultaneous imaging of microglia and neutrophils mentioned above. Also, it would be valuable to further qualify and check the claims in the Discussion that "ex vivo analysis confirms in vivo findings" and "Microglial/neutrophil discrimination using label-free phase contrast"

    1. eLife Assessment

      This useful study combines multiplexed RNA-FISH with downstream analyses and modelling to describe novel dendritic mRNA distribution and behavioural features. Although the downstream analysis pipeline is novel, the results from this study are as of yet incomplete. Further inclusion of key missing controls, further work to better assess the physiological relevance, or additional modelling to expand their conclusions would make this work of greater interest to RNA biologists.

    2. Reviewer #1 (Public review):

      Summary:

      Characterizing the molecular and spatial organization of dendritically localized RNAs is an important endeavor as the authors nicely articulate in their abstract and introduction. In particular, identifying patterns of mRNA distribution and colocalization between groups of RNAs could characterize new mechanisms of transport and/or reveal new functional relationships between RNAs. However, it's not clear to me how much the current study addresses those gaps in knowledge. The manuscript by Kim et al uses 8 overlapping combinations of 3-color fluorescence in situ hybridization to characterize the spatial distributions and pairwise colocalizations of six previously uncharacterized dendritically localized RNAs in cultured neurons (15 DIV). The strength of the work is in the graph-based analyses of individual RNA distances from the soma, but the conclusions reached, that spatial distributions vary per dendritic RNA, has been well known since early 2000s (as reviewed in Schuman and Steward, 2001 & 2003), but paradoxically the authors show that dendritic length can account for these differences. It's not clear to me the significance of the spatial distribution relationship with dendritic morphology as distinct spatial distribution patterns (i.e. proximal expression then drop off) have been clearly shown in intact circuits with homogeneity in dendrite length governed by neuropil laminae. The colocalization results are intriguing but as currently presented they lack sufficient control analyses and contextualization to be compelling. In general, the results of the manuscript are potentially interesting but unnecessarily difficult to follow both in text and figure presentation.

      Major comments:

      The authors state that their data expand upon our understanding of dendritic RNA spatial distributions by adding high-resolution data for six newly characterized dendritic RNAs. While this is true, without including data for a well-known/previously characterized RNA, it makes it difficult for the reader to contextualize how these new data on six dendritic RNAs fit in with our understanding of the dendritic RNAs with well-described spatial distributions and colocalization analyses (Camk2a, Actb, Map1b, etc). For example, how do we interpret the 7-fold higher colocalization values between RNAs in this manuscript compared to the results of Batish et al (as referred to in the paper)-is it because these RNAs are fundamentally different, or is it because of other experimental factors/conditions? The spatial distribution patterns described in this manuscript differ from those of Fonkeu et al, but an alternative explanation is that Fonkeu et al modeled based on Camk2a, not the six genes studied here. Is it possible that these six RNAs have similar distribution patterns (as shown) whereby dendritic morphology impacts distribution more than individual differences but inclusion of dendritic RNAs with demonstrably different distributions (Camk2a/distal localization vs Map2/proximal localization) would alter the results?

    3. Reviewer #2 (Public review):

      In the manuscript by Kim et al titled, "Characterizing the Spatial Distribution of Dendritic RNA at Single Molecule Resolution," the authors perform multiplex single-molecule FISH in cultured neurons, along with analysis and modeling, to show the spatial features, including differing mRNA densities between soma and dendrites, dendritic length-related distributions and clustering, of multiple mRNAs in dendrites. Although the clustering analyses and modeling are intriguing and offer previously underappreciated spatial association within and across mRNA molecules, the data is difficult to interpret and the conclusions lack novelty in their current form. There is a need for a stronger rationale as to why the methodology employed in the manuscript is better suited to characterize the clustering of mRNA in dendrites compared to previously published works and how such clustering or declustering can affect dendritic/neuronal function.

      (1) Validation of mRNA labeling, detection, and quantification is necessary. Single-molecule fluorescence in situ hybridization (smFISH) is the gold standard to detect RNA inside cells. The method utilizes multiple fluorescent probes (~48) designed to hybridize along a single RNA, resulting in a population of diffraction-limited fluorescent puncta with varying intensities. A histogram of cytoplasmic smFISH puncta intensities should reveal a normally distributed population with a single major peak, where the upper and lower tails indicate the maximum probe binding and the lower detection limit, respectively. Once single molecule detection (and limits) have been established, smFISH should be performed for each gene individually to obtain ground truth of detection under identical experimentally-defined conditions using the same fluorophore. Total RNA counts from different probe combinations (Figure S1A) or total mRNA density (Figure 2A) is not sufficient to inform individual gene labeling efficiency or detection. It is difficult to interpret whether observed variabilities across different probe combinations are of significance. For example, the mRNA densities of Adap2 and Dtx3L in soma seem to vary even after normalization with the pixel area (Figure 2A).

      Absolute counts and normalized counts for each gene detected should be included in the results or in supplementary data/table to provide the reader with a reference point for evaluation.

      As a control, it is recommended to perform smFISH against beta-actin or aCaMKII, which are the two most abundant mRNA in dendrites, and serve as internal validation that the technique, detection, and quantification are consistent with previously published works.

      (2) The rationale for single dendrite selection is unclear. To suggest that dendrite length, as a feature of dendritic morphology, may affect mRNA localization in dendrites, the authors manually selected segments of dendrites that have no branching or overlap, 'biased for shorter dendrites,' resulting in a subset of dendritic segments that changes mRNA distribution in raw distances (Figure S3A) into the normalized distance (Figure 4A). As a result, the distribution appears to convert from a monotonic- or exponential-decay to a more even distribution of mRNA (plateau). The rationale for this normalization is unclear, as manual curation of dendritic segments can incorporate experimenter bias. Moreover, the inclusion of short dendritic segments can stretch out their mRNA distributions following distance normalization which can give the appearance of an even distribution of mRNAs when aggregated.

      Next, the authors use pairwise Jensen-Shannon distance cluster analysis to identify 4 different patterns of clustering among mRNAs. Although the patterns are quite intriguing, the distributions of mRNA clusters were i) difficult to interpret and ii) compared to Fonkeu et al (2019) protein distribution is not a sufficient explanation for the observed clustering. For example, the clustering patterns (C1-4) are quite striking and even if the authors' analyses were an improvement in identifying mRNA clustering in dendrites, the authors need to provide better justification or modeling on what role such clustering can play on dendritic function or cellular physiology. This is important and necessary as the authors are suggesting that their analysis is different from mRNA distributions previously observed or modeled by Buxbaum et al (2014) and Fonkeu et al (2019), respectively.<br /> Of note, the identity-independent and dendritic length-dependent aspect of spatial distributions of mRNAs is striking (Figure S3E-F, Figure 4), and this length-related feature is one of the major take-home points in the first part of the manuscript. However, it is evident that some mRNAs (e.g. Adap2 and Dtx3L) or probe combinations (e.g. Colec12-Adap2-Nsmf) disproportionally make up the mRNA distribution clusters (Figure 4D and Figure S3F). It seems plausible that the copy numbers of mRNAs can differentially affect clusters' distribution patterns. Appropriate statistical tests among the cluster groups, therefore, will help to strengthen the interpretation of the results provided in the supplementary figures (Figures S3E and S3F).

      (3) It is not clear how Figure 5 GradCAM analysis helps the point that the authors put forth in previous sections or forthcoming sections. Unless this section and figure are more effectively linked to the general theme of the paper - the morphological features as a determinant of mRNA distribution or clustering of mRNA molecules, it may be included in the supplementary figure section.

      (4) Clustering of mRNA remains an exception rather than the rule. From their high-resolution triple smFISH data, the authors make some interesting findings regarding colocalization in dendrites. Among the six genes tested, the authors found higher incidents of colocalization between pair-wise genes (up to 23%) than previously reported (5-10%). Also, they report higher levels of colocalization within the same gene (17-23%) than previously reported (5-10%). First, to better evaluate this increased colocalization efficiency overall, the histograms of smFISH puncta intensity are necessary (as stated in 1) to determine whether a second peak is present in the population. Second, even though 23% is higher than previously reported, it remains that 77% do not colocalize and does not suggest that colocalization is the rule but remains the exception. Given the results in Table 1, it is likely that the increased colocalization could be a gene-specific effect and not transcriptome-wide as the majority of values between genes are below 10%, consistent with previous findings. Third, labeling of a control gene (i.e. b-actin or aCaMKII) would provide higher confidence that the detection and colocalization comparisons are consistent with previous findings.

      It is recommended to refrain from concluding that mRNA is 'co-transported' from smFISH results. Typically co-transport is best identified through observations in live cells where two fluorescent particles of different colors are moving together. Although stationary particles positioned in close proximity to one another could potentially be co-transported, there has been very little evidence to support this.

      The use of Ripley's K-function is an interesting way to look at clustering neighborhoods within a single or pairwise sets of genes. Previous studies from the Singer group have looked at mRNA clustering and have observed that mRNA in living cells tends to cluster within a 6-micron range for b-actin and for both b-actin and Arc after local stimulation. What was intriguing in the results in Figure 7 was that there was an exclusion zone 2-4 microns away from the area of colocalization that may suggest that mRNA are able to avoid over-clustering and maintain an even distribution throughout the dendrite--perhaps with a goal of not devoting too many resources (mRNA) to a single dendritic area. Modeling how mRNAs avoid over-clustering to a specific 2-micron segment of dendrites could provide an explanation on how dendrites can respond to multiple or simultaneous synaptic activity at different sites along the same dendrite.

    4. Reviewer #3 (Public review):

      Summary:

      The paper by Kim et al utilizes smFISH method to probe for six genes to understand the spatial distribution of the mRNAs in dendrites and identify the spatial relationships between the transcripts. While they have delved into a high-resolution characterization of the dendritic transcripts and compared their data with existing datasets, the analysis needs more robustness, and therefore the findings are inconclusive. The rationale of the study and choosing these genes is not clear - it appears more like a validation of some of the datasets without much biological significance.

      Overall, several conclusions for spatial distribution of dendritic RNAs were based on correlations and it is difficult to understand whether this represents a true biological phenomenon or if it is an artifact of the imaging and morphological heterogeneity of neurons and difficulties in dendritic segmentation.

      Strengths:

      The authors have performed an extensive analysis of the smFISH datasets and quantified the precise localization patterns of the dendritic mRNAs in relation to the dendritic morphology. Their images and the analysis pipeline can be a resource for the community.

      Weaknesses:

      (1) The authors have attempted to identify general patterns of mRNA distribution as a function of distance, proximal vs distal, however, in many of the cases the results are a bit redundant and the size of the neurons or the length of the dendrites or image segmentation artifacts turn out to be the determining factors. A better method to normalize the morphological differences is needed to make meaningful conclusions about RNA distribution patterns.

      (2) Another concerning factor is that there are many redundancies throughout the paper. For example, to begin with, all analysis should have been done as RNA density measurements (and not absolute numbers of mRNAs) and with proper normalization and accounting for differences in length. Some of these were done only in the latter half of the paper, for example in Figure 4.

      (3) Images for the smFISH are missing. It is important to show the actual images, and the quality of the images is a crucial factor for all subsequent analyses.

      (4) The parameters used for co-localization analysis are very relaxed (2 - 6 microns), particularly the distances of interactions far exceed feasible interactions between the biomolecules. Typically, transport granules are significantly smaller than the length scales used.

    1. eLife Assessment

      The study is useful for advancing understanding of spinal cord injuries, but it presents inadequate evidence due to the use of multiple datasets. Data were collected from different models of spinal cord injury, various regions of the spinal cord, and an iPSC model, with the differences between these models making it difficult to draw reliable conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The work of Zhou's team is to perform bioinformatics analysis of single-cell transcriptomes (scRNA), spatial transcriptomic (ST) data, and bulk RNA-seq data from Gene Expression Omnibus (GEO) datasets, published or not in different journals from other teams, about spinal cord injury and/or microglia cells derived human iPSC. Based on their analysis, the authors claim that innate microglial cells are inhibited. They postulate that TGF beta signaling pathways play a role in the regulation of migration to enhance SCI recovery and that Trem2 expression contributes to neuroinflammation response by modulating cell death in spinal cord injury. Finally, they suggest a therapeutic strategy to inhibit Trem2 responses and transplant iPSC-derived microglia with long-term TGF beta stimulation.

      Although the idea of using already available data and reanalyzing them is remarkable, I have major concerns about the paper. The authors have used data from different models of injury, regions, as well as IPSC. It is not possible to mix and draw conclusions when the models used are different. This raises doubts about the authors' expertise in the field of spinal cord injury. Furthermore, the innovativeness of the results is of little significance, especially as no hypothesis is confirmed by experimental data.

      Strengths:

      Analysis of already large-scale existing data.

      Weaknesses:

      Mixing data from different models, unfounded conclusions, and over-interpretations, little expertise in the field of spinal cord injury.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present an intriguing study utilizing datasets from spinal cord injury (SCI) research to identify potential microglial genes involved in SCI-induced neuronal damage. They identify that inhibiting TREM2 and enhancing the TGF-b signal pathway can inhibit reactive microglia-mediated neuroinflammation. Microglia transplantation using iPSC-derived microglia could also be beneficial for SCI recovery.

      Strengths:

      This research aims to identify potential genes and signaling pathways involved in microglia-mediated inflammation in spinal cord injury (SCI) models. Meanwhile, analyzing transplanted microglia gene expression provides an extra layer of potential in SCI therapy. The approach represents a good data mining strategy for identifying potential targets to combat neurological diseases.

      Weaknesses:

      Microglial gene expression patterns may vary significantly between these models. Without proper normalization or justification, combining these datasets to draw conclusions is problematic. Moreover, other factors also need to be considered, like the gender of the microglia source. Are there any gender differences? How were the iPSC-derived microglia generated? Different protocols may affect microglia gene expression.

      While the concept is interesting, the data presented in this study appears preliminary. Without further experiments to support their findings, the conclusions are not convincing.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors perform a meta-analysis of existing transcriptomic data describing the responses of cells in the mouse spinal cord to traumatic injury (SCI). They identify two subclasses of microglia, which they term 'innate' and 'reactive' microglia, in the dataset, with the majority of microglia in the uninjured spinal cord being 'innate' and the majority of microglia in the injured region being 'reactive'. The authors propose that, during injury, the population of innate microglia is depleted and replaced by the population of reactive microglia. Using DEG and gene ontology pipelines, the authors suggest that TGF signaling is a positive force that helps recruit healthy microglia to enhance recovery in the context of SCI. In contrast, the microglial phagocytic receptor Trem2 contributes to neuroinflammation and neuronal death. Finally, the authors suggest replacing reactive microglia with innate microglia as a potential therapeutic approach to treat SCI in humans.

      Strengths:

      The work utilizes numerous and multi-modal datasets describing transcriptomic changes in the mouse CNS following SCI.

      The topic is translationally relevant.

      Weaknesses:

      There is not enough information about how each of the datasets re-analyzed by the authors was obtained and processed both by the group generating the data and by the group re-analyzing it.

      The conclusions drawn by the authors are not sufficiently supported by the evidence.

      Whether the study represents a significant conceptual advance in our understanding of microglial contributions to SCI is not clear.

      My specific concerns and suggestions to address these weaknesses are provided below.

      Major comments:

      (1) Questions remain about the nature, quality, and features of the datasets re-analyzed in the study. For example, how were these datasets obtained? Were the same animal models and time points used in each? What modality of RNA sequencing was done? What criteria did the authors consider in deciding which datasets to include in the study? Since the study is entirely reliant on data generated elsewhere, a more thorough description of these datasets within the text is needed.

      (2) Relatedly, the authors chose to filter out some cells from the datasets based on quality, but this information is incomplete. For example, the authors omit cells with 10% mitochondrial genes, but this value is higher than most investigators use (typically between 1%-5%). Why is 10% the appropriate limit in this particular study? Further, how did the authors ensure the removal of doublets from the dataset?

      (3) A principal finding of the paper is that microglia in the uninjured CNS mostly have an 'innate' transcriptomic phenotype, while microglia in the injured CNS mostly have a 'reactive' phenotype. However, there are some issues here that require further discussion. First, while historically microglia were thought to possess distinct 'homeostatic' versus 'activated' profiles which would be consistent with the authors' interpretations here, these differences are now thought of more as changes in a given microglial cell's transcriptomic status. Thus, while the authors interpret their results as meaning that innate microglia are depleted and replaced by a different set of reactive microglia following SCI (or at least this is how the paper is written), it is equally if not more likely that the microglia within the injured regions themselves become more reactive as a result of the insult. The authors should clarify why their interpretation is more likely to be correct.

      (4) Related to the above point, the authors base the manuscript on the idea that microglia are mostly 'innate' in the uninjured CNS and 'reactive' after injury, however, the UMAP plots in Figures 1A and 1C suggest that both classes of microglia cluster together and may not actually represent distinct subclasses. Have the authors tried sub-clustering just the myeloid clusters and seeing how well they separate? Even if they do technically represent distinct clusters, the UMAP could be interpreted to mean that their transcriptomic differences are not particularly robust.

      (5) I appreciate the authors' use of loss-of-function data to explore the roles of microglial TGF and Trem2 signaling to glean some mechanistic insights into SCI. However, many of the conclusions reached by the authors in the manuscript are insufficiently supported by the data and would require additional experiments to rigorously confirm. A couple of examples are the following:<br /> 5a. Lines 160-162: "Hence, we conclude that the cascade of injury events in SCI significantly influences microglia, leading to the replacement of innate microglial cells by reactive microglia." That SCI influences microglia is well-supported by the study, but whether reactive microglia replace innate microglia, versus whether innate microglia in the region transition to a reactive state, needs to be tested experimentally.<br /> 5b. Lines 321-323: "Taken together, iPSC-derived microglia have the potential to replace the functions of naïve microglial cells, and they perform even more effectively in the in vivo CNS." Again, the first part of the sentence is supported, but whether iPSCs are more effective than other populations in vivo would need to be tested experimentally.

      (6) As microglia have long been appreciated as contributors to the CNS injury response, the conceptual advance here isn't particularly clear to me. For example, Gao et al, 2023 (*cited by the authors) describe the role of Trem2+ microglia in SCI versus demyelinating disease with major conceptual overlap with the current study. It would be helpful for the authors to include a discussion of what we now know about SCI based on this study that we did not know (or strongly suspect) before.

    1. eLife Assessment

      Barzó et al. assessed the electrophysiological and anatomical properties of a large number of layer 2/3 pyramidal neurons in brain slices of human neocortex across a wide range of ages, from infancy to elderly individuals, using whole-cell patch clamp recordings and anatomical reconstructions. This large data set represents a valuable contribution to our understanding of how these properties change across the human lifespan, and although the results presented are convincing, analyzing the data by absolute age rather than age ranges as well as clarifying the methods used and some of the statistical approaches applied would strengthen the conclusions. The analysis of spine density requires additional biological replicates to support the conclusions stated. These data strengthen our understanding of how these properties change with age and will contribute to building more realistic models of human cortical function.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript co-authored by Pál Barzó et al is very clear and very well written, demonstrating the electrophysiological and morphological properties of human cortical layer 2/3 pyramidal cells across a wide age range, from age 1 month to 85 years using whole-cell patch clamp. To my knowledge, this is the first study that looks at the cross-age differences in biophysical and morphological properties of human cortical pyramidal cells. The community will also appreciate the significant effort involved in recording data from 485 cells, given the challenges associated with collecting data from human tissue. Understanding the electrophysiological properties of individual cells, which are essential for brain function, is crucial for comprehending human cortical circuits. I think this research enhances our knowledge of how biophysical properties change over time in the human cortex. I also think that by building models of human single cells at different ages using these data, we can develop more accurate representations of brain function. This, in turn, provides valuable insights into human cortical circuits and function and helps in predicting changes in biophysical properties in both health and disease.

      Strengths:

      The strength of this work lies in demonstrating how the electrophysiological and morphological features of human cortical layer 2/3 pyramidal cells change with age, offering crucial insights into brain function throughout life.

      Weaknesses:

      One potential weakness of the paper is that the methodology could be clearer, especially in how different cells were used for various electrophysiological measurements and the conditions under which the recordings were made. Clarifying these points would improve the study's rigor and make the results easier to interpret.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Barzo and colleagues aim to establish an appraisal for the development of basal electrophysiology of human layer 2/3 pyramidal cells across life and compare their morphological features at the same ages.

      Strengths:

      The authors have generated recordings from an impressive array of patient samples, allowing them to directly compare the same electrophysiological features as a function of age and other biological features. These data are extremely robust and well organised.

      Weaknesses:

      The use of spine density and shape characteristics is performed from an extremely limited sample (2 individuals). How reflective these data are of the population is not possible to interpret. Furthermore, these data assume that spines fall into discrete types - which is an increasingly controversial assumption.

      Many data are shown according to somewhat arbitrary age ranges. It would have been more informative to plot by absolute age, and then perform more rigourous statistics to test age-dependent effects.

      Overall, the authors achieve their aims by assessing the physiological and morphological properties of human L2/3 pyramidal neurons across life. Their findings have extremely important ramifications for our understanding of human life and implications for how different neuronal properties may influence neurological conditions.

    4. Reviewer #3 (Public review):

      Summary:

      To understand the specificity of age-dependent changes in the human neocortex, this paper investigated the electrophysiological and morphological characteristics of pyramidal cells in a wide age range from infants to the elderly.

      The results show that some electrophysiological characteristics change with age, particularly in early childhood. In contrast, the larger morphological structures, such as the spatial extent and branching frequency of dendrites, remained largely stable from infancy to old age. On the other hand, the shape of dendritic spines is considered immature in infancy, i.e., the proportion of mushroom-shaped spines increases with age.

      Strengths:

      Whole-cell recordings and intracellular staining of pyramidal cells in defined areas of the human neocortex allowed the authors to compare quantitative parameters of electrophysiological and morphological properties between finely divided age groups.

      They succeeded in finding symmetrical changes specific to both infants and the elderly, and asymmetrical changes specific to either infants or the elderly. The similarity of pyramidal cell characteristics between areas is unexpected.

      Weaknesses:

      Human L2/3 pyramidal cells are thought to be heterogeneous, as L2/3 has expanded to a high degree during the evolution from rodents to humans. However, the diversity (subtyping) is not revealed in this paper.

    1. eLife Assessment

      Oor and colleagues report the potentially independent effects of the spatial and feature-based selection history on visuomotor choices. They outline compelling evidence, tracking the dynamic history effects based on their extremely clever experimental design (urgent version of the search task). Their finding is of fundamental significance, broadening the framework to identify variables contributing to choice behavior and their neural correlates in future studies.

    2. Reviewer #1 (Public review):

      Summary:

      Oor et al. report the potentially independent effects of the spatial and feature-based selection history on visuomotor choices. They outline compelling evidence, tracking the dynamic history effects based on their clever experimental design (urgent version of the search task). Their finding broadens the framework to identify variables contributing to choice behavior and their neural correlates in future studies.

      Strengths:

      In their urgent search task, the variable processing time of the visual cue leads to a dichotomy in choice performance - uninformed guesses vs. informed choices. Oor et al. did rigorous analyses to find a stronger influence of the location-based selection history on the uninformed guesses and a stronger influence of the feature-based selection history on the informed choices. It is a fundamental finding that contributes to understanding the drivers of behavioral variance. The results are clear.

      Weaknesses:

      (1) In this urgent search task, as the authors stated in line 724, the variability in performance was mainly driven by the amount of time available for processing the visual cue. The authors used processing time (PT) as the proxy for this "time available for processing the visual cue." But PT itself is already a measure of behavioral variance since it is also determined by the subject's reaction time (i.e., PT = Reaction time (RT) - Gap). In that sense, it seems circular to explain the variability in performance using the variability in PT. I understand the Gap time and PT are correlated (hinted by the RT vs. Gap in Figure 1C), but Gap time seems to be more adequate to use as a proxy for the (imposed) time available for processing the visual cue, which drives the behavioral variance. Can the Gap time better explain some of the results? It would be important to describe how the results are different (or the same) if Gap time was used instead of PT and also discuss why the authors would prefer PT over Gap time (if that's the case).

      (2) The authors provide a compelling account of how the urgent search task affords<br /> (i) more pronounced selection history effects on choice and<br /> (ii) dissociating the spatial and feature-based history effects by comparing their different effects on the tachometric curves. However, the authors didn't discuss the limits of their task design enough. It is a contrived task (one of the "laboratoray tasks"), but the behavioral variability in this simple task is certainly remarkable. Yet, is there any conclusion we should avoid from this study? For instance, can we generalize the finding in more natural settings and say, the spatial selection history influences the choice under time pressure? I wonder whether the task is simple yet general enough to make such a conclusion.

      (3) Although the authors aimed to look at both inter- and intra-trial temporal dynamics, I'm not sure if the results reflect the true within-trial dynamics. I expected to learn more about how the spatial selection history bias develops as the Gap period progresses (as the authors mentioned in line 386, the spatial history bias must develop during the Gap interval). Does Figure 3 provide some hints in this within-trial temporal dynamics?

      (4) The monkeys show significant lapse rates (enough error trials for further analyses). Do the choices in the error trials reflect the history bias? For example, if errors are divided in terms of PTs, do the errors with short PT reflect more pronounced spatial history bias (choosing the previously selected location) compared to the errors with long PT?

    3. Reviewer #2 (Public review):

      Summary:

      This is a clear and systematic study of trial history influences on the performance of monkeys in a target selection paradigm. The primary contribution of the paper is to add a twist in which the target information is revealed after, rather than before, the cue to make a foveating eye movement. This twist results in a kind of countermanding of an earlier "uninformed" saccade plan by a new one occurring right after the visual information is provided. As with countermanding tasks in general, time now plays a key factor in the success of this task, and it is time that allows the authors to quantitatively assess the parametric influences of things like previous target location, previous target identity, and previous correctness rate on choice performance. The results are logical and consistent with the prior literature, but the authors also highlight novelties in the interpretation of prior-trial effects that they argue are enabled by the use of their paradigm.

      Strengths:

      Careful analysis of a multitude of variables influencing behavior

      Weaknesses:

      Results appear largely confirmatory.

    1. eLife Assessment

      The manuscript explores how bacterial evolution in the presence of lytic phages modulates b-lactams resistance and virulence properties in methicillin-resistant Staphylococcus aureus (MRSA). The work is useful as it identifies underlying mutations that may confer sensitivity to b-lactams and alter virulence properties. While the findings are generally convincing, additional experiments linking how particular mutations regulate phenotypic changes are required to improve the work mechanistically.

    2. Reviewer #1 (Public review):

      Summary:

      These authors have asked how lytic phage predation impacts antibiotic resistance and virulence phenotypes in methicillin-resistant Staphylococcus aureus (MRSA). They report that staphylococcal phages cause MRSA strains to become sensitized to b-lactams and to display reduced virulence. Moreover, they identify mutations in a set of genes required for phage infection that may impact antibiotic resistance and virulence phenotypes.

      Strengths:

      Phage-mediated re-sensitization to antibiotics has been reported previously but the underlying mutational analyses have not been described. These studies suggest that phages and antibiotics may target similar pathways in bacteria.

      Weaknesses:

      One limitation is the lack of mechanistic investigations linking particular mutations to the phenotypes reported here. This limits the impact of the work.

      Another limitation of this work is the use of lab strains and a single pair of phages. However, while incorporation of clinical isolates would increase the translational relevance of this work it is unlikely to change the conclusions.

    3. Reviewer #2 (Public review):

      Summary:

      The work presented in the manuscript by Tran et al deals with bacterial evolution in the presence of bacteriophage. Here, the authors have taken three methicillin-resistant S. aureus strains that are also resistant to beta-lactams. Eventually, upon being exposed to phage, these strains develop beta-lactam sensitivity. Besides this, the strains also show other changes in their phenotype such as reduced binding to fibrinogen and hemolysis.

      Strengths:

      The experiments carried out are convincing to suggest such in vitro development of sensitivity to the antibiotics. Authors were also able to "evolve" phage in a similar fashion thus showing enhanced virulence against the bacterium. In the end, authors carry out DNA sequencing of both evolved bacteria and phage and show mutations occurring in various genes. Overall, the experiments that have been carried out are convincing.

      Weaknesses:

      Although more experiments are not needed, additional experiments could add more information. For example, the phage gene showing the HTH motif could be reintroduced in the bacterial genome and such a strain can then be assayed with wildtype phage infection to see enhanced virulence as suggested. At least one such experiment proves the discoveries regarding the identification of mutations and their outcome. Secondly, I also feel that authors looked for beta-lactam sensitivity and they found it. I am sure that if they look for rifampicin resistance in these strains, they will find that too. In this case, I cannot say that the evolution was directed to beta-lactam sensitivity; this is perhaps just one trait that was observed. This is the only weakness I find in the work. Nevertheless, I find the experiments convincing enough; more experiments only add value to the work.

    1. eLife Assessment

      This study investigates a dietary intervention that employs a smartphone app to promote meal regularity, which may be useful. Despite no self-reported changes in caloric intake, the authors report significant weight loss for the relatively short duration of only 6 weeks. While the concept is very interesting and deserves to be studied due to its potential clinical relevance, the study's rigor needs to be improved upon and is currently considered incomplete, notably the reliance on self-reported food intake, a highly unreliable way to assess food intake. Additionally, the study theorizes that the intervention resets the circadian clock, but the study needs more reliable methods for assessing circadian rhythms, such as actigraphy. Further, if this restrictive dietary intervention has any more promise in achieving long-term weight loss than the myriad other restrictive diets, it remains to be tested.

    2. Reviewer #1 (Public review):

      Summary:

      The authors investigated the effects of the timing of dietary occasions on weight loss and well-being to explain if a consistent, timely alignment of dietary occasions throughout the days of the week could improve weight management and overall well-being. The authors attributed these outcomes to a timely alignment of dietary occasions with the body's circadian rhythms. This concept is rooted in understanding dietary cues as a zeitgeber for the circadian system, potentially leading to more efficient energy use and weight management. The study participants self-reported the primary outcome, body weight loss.

      Strengths:

      The innovative focus of the study on the timing of dietary occasions rather than daily energy intake or diet composition presents a fresh perspective in dietary intervention research. The feasibility of the diet plan, developed based on individual profiles of the timing of dietary occasions identified before the intervention, marks a significant step towards personalised nutrition.

      Weaknesses:

      The methodology lacks some measurements that are emerging as very relevant in the field of nutritional science, such as data on body composition, and potential confounders not accounted for (e.g., age range, menstrual cycle, shift work, unmatched cohorts, inclusion of individuals with normal weight, overweight, and obesity). The primary outcome's reliance on self-reported body weight and subsequent measurement biases undermines the reliability of the findings.

      Achievement of Objectives and Support for Conclusions:

      The study's objectives were partially met; however, the interpretation of the effects of meal timing on weight loss is compromised by the aforementioned weaknesses. The evidence does not fully support most of the claims due to methodological limitations caused partially by the COVID-19 pandemic.

      Impact and Utility:

      Despite its innovative approach, the study's utility for practical application is limited by methodological and analytical shortcomings. Nevertheless, it represents a good basis for further research. If these findings were further investigated, they could have meaningful implications for dietary interventions and metabolic research. The concept of timing of dietary occasions in sync with circadian rhythms holds promise but requires further rigorous investigation.

    3. Reviewer #2 (Public review):

      The authors tested a dietary intervention focused on improving meal regularity. Participants first utilized a smartphone application to track participants' meal frequencies, participants were then asked to restrict their meal intake to times when they most often eat to enhance meal regularity for six weeks, resulting in significant weight loss despite supposedly no changes in caloric intake.

      While the concept is appealing, and the use of a smartphone app in participants' typical everyday environment to regularize food intake is interesting, significant weaknesses severely limit the value of the study due to a lack of rigor, such as the reliance on self-reported food intake which has been discredited in the field. The study's major conclusions are insufficiently supported, particularly that weight loss occurred even though food intake supposedly is not altered. This intervention may merely represent another restrictive diet among countless others that all seem to work for a few weeks to months resulting in a few pounds of weight loss

      (1) Unreliable method of caloric intake

      The trial's reliance on self-reported caloric intake is problematic, as participants tend to underreport intake. For example, as cited in the revised manuscript, the NEJM paper (DOI: 10.1056/NEJM199212313272701) reported that some participants underreported caloric intake by approximately 50%, rendering such data unreliable and hence misleading. More rigorous methods for assessing food intake should have been utilized. Further, the control group was not asked to restrict their diet in any way, and hence, to do that in the treatment group may be sufficient to reduce caloric intake and weight loss. Merely acknowledging the unreliability of self-reported caloric intake is insufficient, as it still leaves the reader with the impression that there is no change in food intake when, in reality, we actually have no idea if food intake was altered. A more robust approach to assessing food intake is imperative. Even if a decrease in caloric intake is observed through rigorous measurement, as I am convinced that a more rigorous study would unveil testing this paradigm, this intervention may merely represent another restrictive diet among countless others that show that one may lose weight by going on a diet. Seemingly, any restrictive diet works for a few months.

      (2) Lack of objective data regarding circadian rhythm

      The assessment of circadian rhythm using the MCTQ, a self-reported measure of chronotype, is unreliable, and it is unclear why more objective methods like actigraphy was not used.

      In the revised version, the authors emphasize these limitations in the manuscript. The study's major conclusions are insufficiently supported, in particular, that weight loss occurred even though food intake supposedly is not altered and that circadian rhythm was improved.

    4. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment

      This study investigates a dietary intervention that employs a smartphone app to promote meal regularity, which may be useful. Despite no observed changes in caloric intake, the authors report significant weight loss. While the concept is very interesting and deserves to be studied due to its potential clinical relevance, the study's rigor needs to be revised, notably for its reliance on self-reported food intake, a highly unreliable way to assess food intake. Additionally, the study theorizes that the intervention resets the circadian clock, but the study needs more reliable methods for assessing circadian rhythms, such as actigraphy.

      Thank you for the positive yet critical feedback on our manuscript. We are pleased with the assessment that our study is very interesting and deserves to be continued. We have addressed the points of criticism mentioned and discussed the limitations of the study in more detail in the revised version than before.

      Nevertheless, we would like to note that one condition for our study design was that the participants were able to carry out the study in their normal everyday environment. This means that it is not possible to fully objectively record food intake - especially not over a period of eight weeks. In our view, self-reporting of food intake is therefore unavoidable and also forms the basis of comparable studies on chrononutrition. We believe that recording data with a smartphone application at the moment of eating is a reliable means of recording food consumption and is better suited than questionnaires, for example, which have to be completed retrospectively. Objectivity could be optimized by transferring photographs of the food consumed. However, even this only provides limited protection against underreporting, as photos of individual meals, snacks, or second servings could be omitted by the participants. Sporadic indirect calorimetric measurements can help to identify under-reporting, but this cannot replace real-time self-reporting via smartphone application.

      Our data show that at the behavioral level, the rhythms of food intake are significantly less variable during the intervention. Our assumption that precise mealtimes influence the circadian rhythms of the digestive system is not new and has been confirmed many times in animal and human studies. It can therefore be assumed that comparable effects also apply to the participants in our study. Of course, a measurement of physiological rhythms is also desirable for a continuation of the study. However, we suspect that cellular rhythms in tissues of the digestive tract in particular are decisive for the changes in body weight. The characterization of these rhythms in humans is at best indirectly possible via blood factors. Reduced variability of the sleep-wake rhythm, which is measured by actigraphy, may result from our intervention, but in our view is not the decisive factor for the optimization of metabolic processes.

      We have addressed the specific comments and made changes to the manuscript as indicated below.

      Reviewer #1 (Public Review):

      The authors Wilming and colleagues set out to determine the impact of regularity of feeding per se on the efficiency of weight loss. The idea was to determine if individuals who consume 2-3 meals within individualized time frames, as opposed to those who exhibit stochastic feeding patterns throughout the circadian period, will cause weight loss.

      The methods are rigorous, and the research is conducted using a two-group, single-center, randomized-controlled, single-blinded study design. The participants were aged between 18 and 65 years old, and a smartphone application was used to determine preferred feeding times, which were then used as defined feeding times for the experimental group. This adds strength to the study since restricting feeding within preferred/personalized feeding windows will improve compliance and study completion. Following a 14-day exploration phase and a 6-week intervention period in a cohort of 100 participants (inclusive of both the controls and the experimental group that completed the study), the authors conclude that when meals are restricted to 45min or less durations (MTVS of 3 or less), this leads to efficient weight loss. Surprisingly, the study excludes the impact of self-reported meal composition on the efficiency of weight loss in the experimental group. In light of this, it is important to follow up on this observation and develop rigorous study designs that will comprehensively assess the impact of changes (sustained) in dietary composition on weight loss. The study also reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Perhaps the most important observation is that personalized interventions that cater to individual circadian needs will likely result in more significant weight loss than when interventions are mismatched with personal circadian structures.

      We would like to thank the reviewer for the positive assessment of our study.

      (1) One concern for the study is its two-group design; however, single-group cross-over designs are tedious to develop, and an adequate 'wash-out' period may be difficult to predict.

      A cross-over design would of course be highly desirable and, if feasible, would be able to provide more robust data than a two-group design. However, we have strong doubts about the feasibility of a cross-over design. Not only does the determination of the length of the washout period to avoid carry-over effects of metabolic changes pose a difficulty, but also the assumption that those participants who start with the TTE intervention will consciously or unconsciously pay attention to adherence to certain eating times in the next phase, when they are asked to eat at times like before the study.

      In a certain way, however, our study fulfills at least one arm of the cross-over design. During the follow-up period of our study, there were some participants who, by their own admission, started eating at more irregular times again, which is comparable to the mock treatment of the control subjects. And these participants gained weight again.

      (2)  A second weakness is not considering the different biological variables and racial and ethnic diversity and how that might impact outcomes. In sum, the authors have achieved the aims of the study, which will likely help move the field forward.

      In the meantime, we have at least added analyses regarding the age and gender of the participants and found no correlations with weight loss. The sample size of this pilot study was too small for a reliable analysis of the influence of ethnic diversity. If the study is continued with a larger sample size, this type of analysis will certainly come into play.

      We are pleased with the assessment that we have achieved our goals and are helping to advance the field.

      Reviewer #2 (Public Review):

      Summary:

      The authors investigated the effects of the timing of dietary occasions on weight loss and well-being with the aim of explaining if a consistent, timely alignment of dietary occasions throughout the days of the week could improve weight management and overall well-being. The authors attributed these outcomes to a timely alignment of dietary occasions with the body's own circadian rhythms. However, the only evidence the authors provided for this hypothesis is the assumption that the individual timing of dietary occasions of the study participants identified before the intervention reflects the body's own circadian rhythms. This concept is rooted in understanding of dietary cues as a zeitgeber for the circadian system, potentially leading to more efficient energy use and weight management. Furthermore, the primary outcome, body weight loss, was self-reported by the study participants.

      Strengths:

      The innovative focus of the study on the timing of dietary occasions rather than daily energy intake or diet composition presents a fresh perspective in dietary intervention research. The feasibility of the diet plan, developed based on individual profiles of the timing of dietary occasions identified before the intervention, marks a significant step towards personalised nutrition.

      We thank the reviewer for the generally positive assessment of our study and for sharing the view that our personalized approach represents an innovative step in chrononutrion.

      Weaknesses:

      (1) Several methodological issues detract from the study's credibility, including unclear definitions not widely recognized in nutrition or dietetics (e.g., "caloric event"), lack of comprehensive data on body composition, and potential confounders not accounted for (e.g., age range, menstrual cycle, shift work, unmatched cohorts, inclusion of individuals with normal weight, overweight, and obesity).

      We have replaced the term "caloric event" with "calorie intake occasion" and otherwise revised our manuscript with regard to other terminology in order to avoid ambiguity.

      We agree with the reviewer that the determination of body composition is a very important parameter to be investigated. Such investigations will definitely be part of the future continuation of the study. In this pilot study, we aimed to clarify in principle whether our intervention approach shows effects. Since we believe that this is certainly the case, we would like to address the question of what exactly the physiological mechanisms are that explain the observed weight loss in the future.

      Part of these future studies will also include other parameters in the analyses. However, in response to the reviewer's suggestions, we have already completed analyses regarding age and gender of the participants, which show that both variables have no influence on weight loss.

      In our view, the menstrual cycle should not have a major influence on the effectiveness of a 6-week intervention.

      The inclusion of shift workers is not a problem from our point of view. If their work shifts allow them to follow their personal eating schedule, we see no violation of our hypothesis. If this is not the case, as our data in Fig. 1G show, we do not expect any weight loss. Nevertheless, the reviewer is of course right that shift work can generally be a confounding factor and have an influence on weight loss success. To our knowledge, none of the 100 participants evaluated were shift workers. In a continuation of the study, however, shift work should be an exclusion criterion. Yet, our intervention approach could be of great interest for shift workers in particular, as they may be at a particularly high risk of obesity due to irregular eating times. A separate study with shift workers alone could therefore be of particular interest.

      The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Although this is a limitation, it does not raise much doubt about the effectiveness of the intervention, as a subgroup analysis shows that intervention subjects lose more weight than control subjects of the same BMI.

      The inclusion of a wide BMI range was intentional. Our hypothesis is that reduced temporal variability in eating times optimizes metabolism and therefore excess body weight is lost (which we would like to investigate specifically in future studies). We hypothesize that people living with a high BMI will experience greater optimization than people with a lower BMI. Our data in Figs. 1H and S2I suggest that this assumption is correct.

      (2) The primary outcome's reliance on self-reported body weight and subsequent measurement biases further undermines the reliability of the findings.

      Self-reported data is always more prone to errors than objectively measured data. With regard to the collection of body weight, we were severely restricted in terms of direct contact with the participants during the conduct of the study due to the Covid-19 pandemic. At least the measurement of the initial body weight (at T0), the body weight after the end of the exploration phase (at T1) and the final body weight (at T2) were measured in video calls in the (virtual) presence of the study staff. These are the measurement points that were decisive for our analyses. Intermediate self-reported measurement points were not considered for analyses. We have added in the Materials & Methods section that video calls were undertaken to minimize the risk of misreporting.

      (3) Additionally, the absence of registration in clinical trial registries, such as the EU Clinical Trials Register or clinicaltrials.gov, and the multiple testing of hypotheses which were not listed a priori in the research protocol published on the German Register of Clinical Trials impede the study's transparency and reproducibility.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE). […] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place.

      Furthermore, in our view, we did not provide less information on planned analyses than is usual and all our analyses were covered by the information in the study registry. We have stated the hypothesis in the study register that „strict adherence to [personalized] mealtimes will lead to a strengthening of the circadian system in the digestive tract and thus to an optimization of the utilization of nutrients and ultimately to the adjustment of body weight to an individual ideal value.“

      In our view, numerous analyses are necessary to test this hypothesis. We investigated whether it is the adherence to eating times that is related to the observed weight loss (Fig. 1), or possibly other variables resulting from adherence to the meal schedule (Fig. 3). In addition, we analyzed whether the intervention optimized the utilization of nutrients, which we did based on the food composition and number of calories during the exploration and intervention phases (Fig. 2). We investigated whether the personalization of meal schedules plays a role (Fig. 3). And we attempted to analyze whether the adjustment of body weight to an individual ideal value occurs by correlating the influence of the original BMI with weight loss. Only the hypothesis that the circadian system in the digestive tract is strengthened has not yet been directly investigated, a fact that is listed as a limitation. Although it can be assumed that this has happened, as the Zeitgeber “food” has lost significant variability as a result of the intervention. The analyses on general well-being are covered in the study protocol by the listing of secondary endpoints.

      Beyond that, we did not analyze any hypotheses that were not formulated a priori.

      For these reasons, we see no restriction in transparency, reproducibility or requirements and regulations.

      Achievement of Objectives and Support for Conclusions:

      (4) The study's objectives were partially met; however, the interpretation of the effects of meal timing on weight loss is compromised by the weaknesses mentioned above. The evidence only partially supports some of the claims due to methodological flaws and unstructured data analysis.

      We hope that we have been able to dispel uncertainties regarding some interpretations through supplementary analyses and the addition of some methodological details.

      Impact and Utility:

      (5) Despite its innovative approach, significant methodological and analytical shortcomings limit the study's utility. If these issues were addressed, the research could have meaningful implications for dietary interventions and metabolic research. The concept of timing of dietary occasions in sync with circadian rhythms holds promise but requires further rigorous investigation.

      We are pleased with the assessment that our data to date is promising. We hope that the revised version will already clarify some of the doubts about the data available so far. Furthermore, we absolutely agree with the reviewer: the present study serves to verify whether our intervention approach is potentially effective for weight loss - which we believe is the case. In the next steps, we plan to include extensive metabolic studies and to adjust the limitations of the present study.

      Reviewer #3 (Public Review):

      The authors tested a dietary intervention focused on improving meal regularity in this interesting paper. The study, a two-group, single-center, randomized, controlled, single-blind trial, utilized a smartphone application to track participants' meal frequencies and instructed the experimental group to confine their eating to these times for six weeks. The authors concluded that improving meal regularity reduced excess body weight despite food intake not being altered and contributed to overall improvements in well-being.

      The concept is interesting, but the need for more rigor is of concern.

      We would like to thank the reviewer for the interest in our study.

      (1) A notable limitation is the reliance on self-reported food intake, with the primary outcome being self-reported body weight/BMI, indicating an average weight loss of 2.62 kg. Despite no observed change in caloric intake, the authors assert weight loss among participants.

      As already described above in the responses to the reviewer 2, the body weight assessment took place in video calls in the (virtual) presence of study staff, so that the risk of misreporting is minimized. We have added this information to the manuscript.

      When recording food intake, we had to weigh up the risk of misreporting against the risk of a lack of validity in a permanently monitored setting. It was important to us to investigate the effectiveness of the intervention in the participants' everyday environment and not in a laboratory setting in order to be able to convincingly demonstrate its applicability in everyday life. The restriction of self-reporting is therefore unavoidable in our view and must be accepted. It can possibly be reduced by photographing the food, but even this is not a complete protection against underreporting, as there is no guarantee that everything that is ingested is actually photographed.

      However, our analyses show that the reporting behavior of individual participants did not change significantly between the exploration and intervention phases. We do not assume that participants who underreported only did so during the exploration phase (and only ate more than reported in this study phase) and reported correctly in the intervention phase (and then indeed consumed fewer calories).  We discuss this point in the section "3.1 Limitations".

      (2) The trial's reliance on self-reported caloric intake is problematic, as participants tend to underreport intake; for example, in the NEJM paper (DOI: 10.1056/NEJM199212313272701), some participants underreported caloric intake by approximately 50%, rendering such data unreliable and hence misleading. More rigorous methods for assessing food intake are available and should have been utilized. Merely acknowledging the unreliability of self-reported caloric intake is insufficient as it would still leave the reader with the impression that there is no change in food intake when we actually have no idea if food intake was altered. A more robust approach to assessing food intake is imperative. Even if a decrease in caloric intake is observed through rigorous measurement, as I am convinced a more rigorous study would unveil testing this paradigm, this intervention may merely represent another short-term diet among countless others that show that one may lose weight by going on a diet, principally due to heightened dietary awareness.

      The risks of self-reporting, our considerations, and our analysis of participants' reporting behavior and caloric intake over the course of the study are discussed in detail both in our responses above and in the manuscript. 

      With regard to the reviewer's second argument, we have largely adapted the study protocol of the control group to that of the experimental group. Apart from the fact that the control subjects were not given guidelines on eating times and were instead only given a very rough time window of 18 hours for food intake, the content of the sessions and the measurement methods were the same in both groups. This means that the possibility of increased nutritional awareness was equally present in both groups, but only the participants in the experimental group lost a significant amount of body weight.

      In future continuations of the study, further follow-up after an even longer period than four weeks (e.g. after 6 months) can be included in the protocol in order to examine whether the effects can be sustained over a longer period.

      (3) Furthermore, the assessment of circadian rhythm using the MCTQ, a self-reported measure of chronotype, may not be as reliable as more objective methods like actigraphy.

      The MCTQ is a validated means of determining chronotype and its results are significantly associated with the results of actigraphic measurements. In our view, the MCTQ is sufficient to test our hypothesis that matching the chronobiological characteristics of participants is beneficial. Nevertheless, measurements using actigraphy could be of interest, for example to correlate the success of weight loss with parameters of the sleep-wake rhythm.

      (4) Given the potential limitations associated with self-reported data in both dietary intake and circadian rhythm assessment, the overall impact of this manuscript is low. Increasing rigor by incorporating more objective and reliable measurement techniques in future studies could strengthen the validity and impact of the findings.

      The body weight data was not self-reported, but the measurements were taken in the presence of study staff. Although optimization might be possible (see above), we do not currently see any other way of recording all calorie intake occasions in the natural environment of the participants over a period of several weeks (or possibly longer, as noted by the reviewer) other than self-report and, in our opinion, it would not be feasible. For the future continuation of the study, we are planning occasional indirect calorimetry measurements that can provide information about the actual amount of food consumed in different phases of the study. These can reveal errors in the self-report but will not be able to replace daily data collection by means of self-report.

      Reviewer #1 (Recommendations For The Authors):

      Summary:

      This interesting and timely study by Wilming and colleagues examines the effect of regularity vs. irregularity of feeding on body weight dynamics and BMI. A rigorous assessment of the same in humans needs to be improved, which this study provides. The study is well-designed, with a 14-day exploration phase followed by 6 weeks of intervention, and it is commendable to see the number of participants (100) who completed the study. Incorporation of a follow-up assessment 4 weeks after the conclusion of the study shows maintained weight loss in a subset of Experimental Group (EG) participants who continue with regular meals. There are several key observations, including particular meal times (lunch and dinner), which, when restricted to 45min or less in duration (MTVS of 3 or less), will lead to efficient weight loss, as well as correlations between baseline BMI and weight loss. The authors also exclude the impact of self-reported meal composition on the efficiency of weight loss in the EG group in the context of this study. The study reports interesting effects of regularity of feeding on eating behavior, which appears to be independent of weight loss. Finally, the authors highlight an important point: to provide attention to personalized feeding and circadian windows and that personalized interventions that cater to individual circadian structures will result in more significant weight loss. This is an important concept that needs to be brought to light. There are only a few minor comments listed below:

      Minor comments:

      (1) The authors may provide explanations for the reduction in the MTVS in the EG and the increase in the same for the Control Group (CG). The increases in MTVS in CG are surprising (lines 105-106) because it is assumed that there is no difference in CG eating patterns prior to and during the study.

      As the reviewer correctly states, our assumption was that there should be no change in the MTVS before and during the study - but we could not rule this out, as the subjects were not given any indication of the regularity of food intake in the fixed time window in the meetings with the study staff, i.e. they were not instructed to continue eating exactly as before. This would possibly have led to an effort on the part of the participants to adhere to a schedule as precisely as possible. As a result, there was a statistically significant worsening of the MTVS in the CG, which was less than 0.6 MTVS, i.e. a time span of only approx. ± 7.5 min, and remained within the MTVS 3. Since there were no correlations between the measured MTVS and the weight of the subjects in the CG and a change of about half an MTVS value has only a rather minor effect on weight, we do not attribute great significance to the observed deterioration in the MTVS.

      (2) There would be greater clarity for the readers if the authors clearly defined the study design in detail at the outset of the study, e.g., in section 2.1.

      We have included a brief summary of the study design at the end of the introduction so that the reader is already familiar with it at the beginning of the manuscript without having to switch to the material and methods section.

      (3) The data in Fig S2H is important and informs readers that the regularity of lunch and dinner is more related to body weight changes than breakfast. These data should be incorporated in the Main Figure. In addition, analyses of Table S7 data indicate that MTVS of no greater than 3 or -/+45mins of the meal-timing window is associated with efficient weight loss) should be represented in a figure panel in the Main Figures.

      As suggested by the reviewer, we have moved Fig. S2H to the main Fig. 1. In addition, Table S7 is now no longer inserted as a supplementary table but as main Table 1 in the manuscript.

      (4) The authors state in lines 222-223 that "weight changes of participants were not related to one of these changes in eating characteristics (Fig. 3B-D, Tab. S6)", referring to the shortening of feeding windows as noted in the EG group. This is a rather simplistic statement, which should be amended to include that weight changes may not relate to changes in eating characteristics per se but likely relate to changes in metabolic programming, for instance, energy expenditure increases, which have been shown to associate with these changes in eating characteristics. This is important to note.

      We have changed the wording at this point so that it is clear that we are only referring here in the results section to the results of the mathematical analysis, which showed no correlation between the eating time window and weight loss in our sample. However, we have now explicitly mentioned the change in metabolic programming correctly noted by the reviewer in the discussion at the end of section 3.

      (5) Please provide more background and details on the attributes that define individual participant chronotypes in the manuscript before discussing datasets, e.g., mSP and mEP. This is relation to narratives between 228-230: "Indeed, our data show that the later the chronotype of participants (measured by the MCTQ mid-sleep phase, mSP [24]), the later their mid-eat phase (mEP) on weekends (Fig. 3E, Tab. S6), with the mSP and mEP being almost antiphasic on average (Fig. 3F, Tab. S10)." This will help readers unfamiliar with circadian biology/chronobiology research understand the contents of this manuscript, particularly Fig 3.

      We have explained the new chronobiology terms that appear in the chapter better in the revised version so that they are easier to understand.

      Reviewer #2 (Recommendations For The Authors):

      (1) Clarify Terminology: Define or avoid using ambiguous terms such as "caloric event" to prevent confusion, especially for readers less familiar with chronobiology. Consider providing clear explanations or opting for more widely understood terms.

      We have replaced "caloric event" with “calorie intake occasion” and explain various chronobiology terms better, so that hopefully readers from other disciplines can now follow the text more easily.

      (2) Detailed Methodological Descriptions: Improve the transparency of your methods, especially concerning the measurement of primary and secondary outcomes. Address the concerns raised about the reliability of self-reported weight and the potential biases in measurement methods.

      In the section "3.1 Limitations", we have examined the aspect of the reliability of self-reported data and our measures to reduce this uncertainty in more detail. We have also added further details on the measurement of outcomes in the materials and methods section.

      (3) Address Participant Selection Criteria: Reevaluate the inclusion criteria and consider discussing the implications on the study's findings of the broad age range, the inclusion of shift work, unmatched cohorts, and inclusion of individuals with normal weight, overweight, and obesity. Provide a subgroup analysis or discuss how BMI might have influenced the results. Even though this is an additional post-hoc analysis, it would directly address one of the major weaknesses of the study design.

      We have supplemented the analyses and now show in Fig. S2G that neither age nor gender had any influence on weight loss as a result of the intervention. To our knowledge, none of the 100 participants evaluated were shift workers. Even if shift workers were part of the study without our knowledge, we do not consider this to be a problem as long as their shifts allow them to keep to certain eating times. The fact that it turned out that the baseline BMI of the remaining 67 EG and 33 CG participants did not match is discussed in detail in the section "3.1 Limitations". Our previous analysis in Fig. S2I already showed that there is a negative correlation between baseline BMI and weight loss - an interesting result, as it shows that people with a high BMI particularly benefit from the intervention. In addition, we already showed in Fig. S2J in a subgroup analysis that in all strata the BMI of EG subjects decreased more than that of CG subjects, even if they had the same initial BMI. We do not consider the wide dispersion of the BMIs of the included participants to be a weakness of the study design. On the contrary, it allows us to make a statement about which target group the intervention is particularly suitable for.

      (4) Improve Statistical Analysis: If not already done, involve a biostatistician to review the statistical analyses, particularly concerning post-hoc tests, correlation analyses, and the handling of measurement biases. Ensure that deviations from the original study protocol are clearly documented and justified.

      All analyses have already been checked by a statistician, decided together with him and approved by him.

      (5) Data Interpretation and Speculation: Limit speculation and clearly distinguish between findings supported by your data from hypotheses and future directions. Ensure that discussions about the implications of meal timing on metabolism are supported by evidence with adequate references and clearly state where further research is needed.

      We have revised the discussion and, especially through the detailed discussions of the limitations, we have emphasized more clearly what has been achieved and what still needs to be proven in future studies.

      (6) Clinical Trial Registration: Address the lack of registration in the EU Clinical Trials Register and clinicaltrials.gov. Discuss its potential implications on the study's transparency and how it aligns with current requirements and regulations.

      Our study was registered in the DRKS - German Clinical Trials Register in accordance with international requirements. The DRKS fulfills the same important criteria as the EU Clinical Trial Register and clinicaltrials.gov.

      We quote from the homepage of the DRKS: „The DRKS is the approved WHO Primary Register in Germany and thus meets the requirements of the International Committee of Medical Journal Editors (ICMJE).[…] The WHO brings together the worldwide activities for the registration of clinical trials on the International Clinical Trials Registry Platform (ICTRP). […] As a Primary Register, the DRKS is a member of the ICTRP network.”

      We are therefore convinced that we registered our study in the correct place before it began and see no restriction in transparency or requirements and regulations.

      (7) Use of Sensitive and Current Terminology: Update the manuscript to reflect the latest recommendations regarding the language used to describe obesity and patients living with obesity. This ensures respect and accuracy in reporting and aligns with contemporary standards in the field.

      We updated the manuscript accordingly.

      (8) Strengthen the Introduction: Expand the literature review to include more recent and relevant studies that contextualise your work within the broader field of chrononutrition. This could help clarify how your study builds upon or diverges from existing research.

      We have included further studies in the introduction that aim to reduce body weight by restricting food intake to certain time periods. We have also more clearly contrasted the designs of these studies with the design of our study.

      (9) Clarify Discrepancies and Errors: Address any inconsistencies, such as the discrepancy in meal timing instructions (90 minutes reported in the conclusion vs. 60 minutes reported in the methods), and ensure all figures, tables, and statistical analyses are correctly referenced and described.

      The first point mentioned by the reviewer is not an inconsistency. To ensure the feasibility of the intervention, each participant was initially given a time window of +/- 30 minutes (60 min) from the specified eating time. Our later analyses show that even a time window of +/- 45 minutes (90 min) around the specified eating time is sufficient to lose weight efficiently (see results in Table 1).

      We have checked all references to figures, tables and statistical analyses and updated them if necessary.

      (10) Discuss Limitations and Bias: More thoroughly discuss the limitations of your study, including the potential impacts of biases and how they were mitigated. Additionally, consider the effects of including shift workers and how this choice impacts the applicability of your findings.

      Section “3.1 Limitations” has now been supplemented by a number of points and discussions. As described above, we do not consider the inclusion of shift workers to be a limitation as long as they are able to adhere to the specifications of the eating time plan. We cannot derive any indications to the contrary from our data.

      (11) Consider Publishing Separate Manuscripts: If the study encompasses a wide range of outcomes or post-hoc analyses, consider separating these into distinct publications to allow for a more focused and detailed exploration of each set of findings.

      We will take this advice into consideration for future publications on the continuation of the study. As this is a pilot study that is intended to clarify whether and to what extent the intervention is effective, we believe it makes sense to report all the data in a publication.

      (12) By addressing these recommendations, the authors can significantly improve their manuscript's clarity, reliability, and impact. This would not only support the dissemination of their findings but also would contribute valuable insights into the growing field of chrononutrition.

      We hope that we have satisfactorily answered, discussed and implemented the points mentioned by the reviewer in the manuscript, so that clarity, reliability, and impact have been increased and it can offer a valuable contribution to the named field.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The report describes the control of the activity of the RNA-activated protein kinase, PKR, by the Vaccinia virus K3 protein. Repressive binding of K3 to the kinase prevents phosphorylation of its recognised substrate, EIF2α (the α subunit of the Eukaryotic Initiation Factor 2). The interaction of K3 is probed by saturation mutation within four regions of PKR chosen by modelling the molecules' interaction. They identify K3-resistant PKR variants that recognise that the K3/EIF2α-binding surface of the kinase is malleable. This is reasonably interpreted as indicating the potential adaptability of this antiviral protein to combat viral virulence factors.

      Strengths:

      This is a well-conducted study that probes the versatility of the antiviral response to escape a viral inhibitor. The experimentation is very diligent, generating and screening a large number of variants to recognise the malleability of residues at the interface between PKR and K3.

      Weaknesses:

      (1) These are minor. The protein interaction between PKR and K3 has been previously well-explored through phylogenetic and functional analyses and molecular dynamics studies, as well as with more limited site-directed mutational studies using the same experimental assays.

      Accordingly, these findings largely reinforce what had been established rather than making major discoveries.

      First, thank you for your thoughtful feedback. We agree that our results are concordant with previous findings and recognize the importance of emphasizing what we find novel in our results. We have revised the introduction (lines 65-74 of the revised_manuscript.pdf) to emphasize three findings of interest: (1) the PKR kinase domain is largely pliable across its substrate-binding interface, a remarkable quality that is most fully revealed through a comprehensive screen, (2) we were able to differentiate variants that render PKR nonfunctional from those that are susceptible to Vaccinia K3, and (3) we observe a strong correlation between PKR variants that are resistant to K3 WT and K3-H47R.

      There are some presumptions:

      (2) It isn't established that the different PKR constructs are expressed equivalently so there is the contingency that this could account for some of the functional differences.

      This is an excellent point. We have revised the manuscript to raise this caveat in the discussion (lines 247-251). One indirect reason to suppose that expression differences among our PKR variants are not a dominant source of variation is that we did not observe much variation in kinase activity in the absence of K3.

      (3) Details about the confirmation of PKR used to model the interaction aren't given so it isn't clear how accurately the model captures the active kinase state. This is important for the interaction with K3/EIF2α.

      We have expanded on Supplemental Figure 12 and our description of the AlphaFold2 models in the Materials and Methods section (lines 573-590). We clarify that these models may not accurately capture the phosphoacceptor loop of eIF2α (residues Glu49-Lys60) and the PKR β4-5 linker (Asp338-Asn350) as these are highly flexible regions that are absent in the existing crystal structure complex (PDB 2A1A) and have low AlphaFold2 confidence scores (pLDDT < 50). We also noted, in the Materials and Methods section and in the caption of Figure 1, that the modeled eIF2α closely resembles the crystal structure of standalone yeast eIF2α, which places the Ser51 phosphoacceptor site far from the PKR active site. Thus, we expect there are additional undetermined PKR residues that contact eIF2α.

      (4) Not all regions identified to form the interface between PKR and K3 were assessed in the experimentation. It isn't clear why residues between positions 332-358 weren't examined, particularly as this would have made this report more complete than preceding studies of this protein interaction.

      Great questions. We designed and generated the PKR variant library based on the vaccinia K3 crystal structure (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A), in which PKR residues 338-350 are absent. After the genesis of the project, we generated the AlphaFold2-predicted complex of PKR and vaccinia K3, and have become very interested in the β4-β5 linker, a highly diverse region across PKR homologs which includes residues 332-358. However, this region remains unexamined in this manuscript.

      Reviewer #2 (Public Review):

      Chambers et al. (2024) present a systematic and unbiased approach to explore the evolutionary potential of the human antiviral protein kinase R (PKR) to evade inhibition by a poxviral antagonist while maintaining one of its essential functions.

      The authors generated a library of 426 single-nucleotide polymorphism (SNP)-accessible non-synonymous variants of PKR kinase domain and used a yeast-based heterologous virus-host system to assess PKR variants' ability to escape antagonism by the vaccinia virus pseudo-substrate inhibitor K3. The study identified determinant sites in the PKR kinase domain that harbor K3-resistant variants, as well as sites where variation leads to PKR loss of function. The authors found that multiple K3-resistant variants are readily available throughout the domain interface and are enriched at sites under positive selection. They further found some evidence of PKR resilience to viral antagonist diversification. These findings highlight the remarkable adaptability of PKR in response to viral antagonism by mimicry.

      Significance of the findings:

      The findings are important with implications for various fields, including evolutionary biology, virus-host interfaces, genetic conflicts, and antiviral immunity.

      Strength of the evidence:

      Convincing methodology using state-of-the-art mutational scanning approach in an elegant and simple setup to address important challenges in virus-host molecular conflicts and protein adaptations.

      Strengths:

      Systematic and Unbiased Approach:

      The study's comprehensive approach to generating and characterizing a large library of PKR variants provides valuable insights into the evolutionary landscape of the PKR kinase domain. By focusing on SNP-accessible variants, the authors ensure the relevance of their findings to naturally occurring mutations.

      Identification of Key Sites:

      The identification of specific sites in the PKR kinase domain that confer resistance or susceptibility to a poxvirus pseudosubstrate inhibition is a significant contribution.

      Evolutionary Implications:

      The authors performed meticulous comparative analyses throughout the study between the functional variants from their mutagenesis screen ("prospective") and the evolutionarily-relevant past adaptations ("retrospective").

      Experimental Design:

      The use of a yeast-based assay to simultaneously assess PKR capacity to induce cell growth arrest and susceptibility/resistance to various VACV K3 alleles is an efficient approach. The combination of this assay with high-throughput sequencing allows for the rapid characterization of a large number of PKR variants.

      Areas for Improvement:

      (5) Validation of the screen: The results would be strengthened by validating results from the screen on a handful of candidate PKR variants, either using a similar yeast heterologous assay, or - even more powerfully - in another experimental system assaying for similar function (cell translation arrest) or protein-protein interaction.

      Thank you for your thoughtful feedback. We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and found that the results generally support our original findings. We have revised the manuscript to include these validation experiments (lines 117-119 of the revised_manuscript.pdf, Supplemental Figure 4).

      (6) Evolutionary Data: Beyond residues under positive selection, the screen would allow the authors to also perform a comparative analysis with PKR residues under purifying selection. Because they are assessing one of the most conserved ancestral functions of PKR (i.e. cell translation arrest), it may also be of interest to discuss these highly conserved sites.

      This is a great point. We do find that there are regions of the PKR kinase domain that are not amenable to genetic perturbation, namely in the glycine rich loop and active site. We contrast the PKR functional scores at conserved residues under purifying selection with those under positive selection in Figure 2E (lines 141-143).

      (7) Mechanistic Insights: While the study identifies key sites and residues involved in vaccinia K3 resistance, it could benefit from further investigation into the underlying molecular mechanisms. The study's reliance on a single experimental approach, deep mutational scanning, may introduce biases and limit the scope of the findings. The authors may acknowledge these limitations in the Discussion.

      We agree that further investigation into the underlying molecular mechanisms is warranted and we have revised the manuscript to acknowledge this point in the discussion (lines 284-288).

      (8) Viral Diversity: The study focuses on the viral inhibitor K3 from vaccinia. Expanding the analysis to include other viral inhibitors, or exploring the effects of PKR variants on a range of viruses would strengthen and expand the study's conclusions. Would the identified VACV K3-resistant variants also be effective against other viral inhibitors (from pox or other viruses)? or in the context of infection with different viruses? Without such evidence, the authors may check the manuscript is specific about the conclusions.

      This is a fantastic question that we are interested in exploring in our future studies. In the manuscript we note a strong correlation between PKR variants that evade vaccinia wild-type K3 and the K3-H47R enhanced allele, but we are curious to know if this holds when tested against other K3 orthologs such as variola virus C3. That said, we have revised the manuscript to clarify this limitation to our findings and specify vaccinia K3 where appropriate.

      Reviewer #3 (Public Review):

      Summary:

      -  This study investigated how genetic variation in the human protein PKR can enable sensitivity or resistance to a viral inhibitor from the vaccinia virus called K3.

      -  The authors generated a collection of PKR mutants and characterized their activity in a high-throughput yeast assay to identify 1) which mutations alter PKR's intrinsic biochemical activity, 2) which mutations allow for PKR to escape from viral K3, and 3) which mutations allow for escape from a mutant version of K3 that was previously known to inhibit PKR more efficiently.

      -  As a result of this work, the authors generated a detailed map of residues at the PKR-K3 binding surface and the functional impacts of single mutation changes at these sites.

      Strengths:

      -  Experiments assessed each PKR variant against three different alleles of the K3 antagonist, allowing for a combinatorial view of how each PKR mutant performs in different settings.

      -  Nice development of a useful, high-throughput yeast assay to assess PKR activity, with highly detailed methods to facilitate open science and reproducibility.

      -  The authors generated a very clean, high-quality, and well-replicated dataset.

      Weaknesses:

      (9) The authors chose to focus solely on testing residues in or near the PKR-K3 predicted binding interface. As a result, there was only a moderately complex library of PKR mutants tested. The residues selected for investigation were logical, but this limited the potential for observing allosteric interactions or other less-expected results.

      First, we greatly appreciate all your feedback on the manuscript, as well as raising this particular point. We agree that this is a moderately complex library of PKR variants, from which we begin to uncover a highly pliable domain with a few specific sites that cannot be altered. We have revised the manuscript to raise this limitation (lines 284-288 of the revised_manuscript.pdf) and encourage additional exploration of the PKR kinase domain.

      (10) For residues of interest, some kind of independent validation assay would have been useful to demonstrate that this yeast fitness-based assay is a reliable and quantitative readout of PKR activity.

      We agree that additional data to validate our findings would strengthen the manuscript. We have individually screened a handful of PKR variants in duplicate using serial dilution to measure yeast growth, and generally found that the results support our original findings. We have revised the manuscript to include this validation experiment (lines 117-119, Supplemental Figure 4).

      (11) As written, the current version of the manuscript could use more context to help a general reader understand 1) what was previously known about these PKR and K3 variants, 2) what was known about how other genes involved in arms races evolve, or 3) what predictions or goals the authors had at the beginning of their experiment. As a result, this paper mostly provides a detailed catalog of variants and their effects. This will be a useful reference for those carrying out detailed, biochemical studies of PKR or K3, but any broader lessons are limited.

      Thank you for bringing this to our attention. We have revised the introduction of the manuscript to provide more context regarding previous work demonstrating an evolutionary arms race between PKR and K3 and how single residue changes alter K3 resistance (lines 51-64).

      (12) I felt there was a missed opportunity to connect the study's findings to outside evolutionary genetic information, beyond asking if there was overlap with PKR sites that a single previous study had identified as positively selected. For example, are there any signals of balancing selection for PKR? How much allelic diversity is there within humans, and are people typically heterozygous for PKR variants? Relatedly, although PKR variants were tested in isolation here, would the authors expect their functional impacts to be recessive or dominant, and would this alter their interpretations? On the viral diversity side, how much variation is there among K3 sequences? Is there an elevated evolutionary rate, for example, in K3 at residues that contact PKR sites that can confer resistance? None of these additions are essential, but some kind of discussion or analysis like this would help to connect the yeast-based PKR phenotypic assay presented here back to the real-world context for these genes.

      We appreciate this suggestion to extend our findings to a broader evolutionary context. There is little allelic diversity of PKR in humans, with all nonsynonymous variation listed in gnomAD being rare. (PKR shows sequence diversity in comparisons across species, including across primates.) Thus, barring the possibility of variation being present in under-studied populations, there is unlikely to be balancing selection on PKR in humans. Our expectation is that beneficial mutations in PKR for evading a pseudosubstrate inhibitor would be dominant, as a small amount of eIF2α phosphorylation is capable of halting translation (Siekierka, PNAS, 1984). There is a recent report citing PKR missense variants associated with dystonia that can be dominantly or recessively inherited (Eemy et al. 2020 PMID 33236446). Elde et al. 2009 (PMID 19043403) notes that poxvirus K3 homologs are under positive selection but no specific residues have been cited to be under positive selection. The lack of allelic diversity in PKR in humans notwithstanding, PKR could experience future selection in the human population as evidenced by its rapid evolution in primates, so we fully agree that a connection to the real-world context is useful. We have noted these topics in the discussion section (lines 289-294).

      Reviewer #1 (Recommendations For The Authors):

      I have no major criticisms but ask for some clarifications and make some comments about the perceived weaknesses.

      (13)  If the authors disagree with my summation that the findings largely replicate what was known, could they detail how the findings differ from what was known about this protein interaction and the major new insights stemming from the study? Currently, the abstract is a little philosophical rather than listing the explicit discoveries of the study.

      Thank you again for raising the need for us to clearly convey the novelty of our findings. We have revised the final paragraph in our introduction as described in comment #1.

      (14) As the experimental approach is well reported it is unnecessary to confirm the proposed activity by, for instance, measures of Sui2 phosphorylation. However, previous reports have recognised that point mutants of PKR can be differentially expressed. The impact of this potential effect is unknown in the current experimentation as there are no measures of the expression of the different mutant PKR constructs. The large number of constructs used makes this verification onerous. The potential impact could be ameliorated by redundant replacing each residue (hoping different residues have different effects on expression). Still, this limitation of the study should be acknowledged in the text.

      We greatly appreciate this comment and agree that this should be made clear in the text, which we have added to the discussion of the manuscript (lines 247-251).

      (15) Preceding findings and the modeling in this report recognise an involvement in the kinase insert region (residues 332 to 358) in PKR's interaction with K3 but this region is excluded from the analysis. These residues have been largely disregarded in the preceding analysis (it is absent from the molecular structure of the kinase) so its inclusion here might have lent a more novel aspect or delivered a more complete investigation. Is there a justification for excluding this flexible loop?

      The PKR variant library was designed based on the crystal structure of K3 (PDB 1LUZ) aligned to eIF2α in complex with PKR (PDB 2A1A). After the library was designed and made we attained complete predicted structures of PKR in complex with eIF2α and K3, which largely agrees with the predicted crystal structures but contain the additional flexible loops that were not captured in the crystal structures. Though the library studied here does not explore variation in the kinase insert region, we are very interested in doing so in our future studies.

      (16)  Could the explanation of the 'PKR functional score' be clarified? The description given within the legend of SF1 was helpful, so could this be replicated earlier in the main body of the text when introducing these experiments? e.g. As PKR activity is toxic to yeast, the number of cells in the pool expressing the functional PKR will decrease over time. Thus the associated barcode read count will also decrease, while the read count for the nonfunctional PKR will increase. This is termed the PKR function score, which will be relatively lower for cells transformed with less active PKR than those with more active PKR.

      Thank you for suggesting this clarification, we have revised the manuscript to clarify our definition of the PKR functional score (lines 106-109).

      (17)  Another suggestion to clarify this term is to modify the figures. Currently, the intent of the first simulated graph in Fig 1E is clear but the inversion of the response (shown by the transposition of the colours) in the next graph (to the right) is less immediately obvious. Accordingly, the orientation of the 'PKR functional score' is uncertain. Could the authors add text to the rightmost graphic in Figure 1E by, for instance, indicating the PKR activity in the vertical column with text such as 'less active' (at the bottom), 'WT' (in the centre), and 'more activity' (at the top)? Also, the position of the inactive K296R mutant might be added to Figure 2A complementing the positioning of the active WT kinase in the first data graph of this kind.

      We appreciate your specific feedback to improve the figures of the manuscript, we have made adjustments to Figure 1E to clarify how we derive the PKR functional scores.

      (18) The authors don't use existing structures of PKR in their modelling. However, there is no information about the state of the PKR molecule used for modelling. Specific elements of the kinase domain affect its interaction with K3 so it would be informative to know the orientation of these elements in the model. Could the authors detail the state of pivotal kinase elements in their models? This could involve the alignment of the N- and C-lobes, the orientation of kinase spines (C- and R-spines), and the phosphorylation stasis of residues in the activation loop, or at least the position of this loop in relationship to that adopted in the active dimeric kinase (e.g. PDB-2A1A, 3UIU or 6D3L). Alternatively, crystallographic structures of active inactive PKR could be overlayed with the theoretical structure used for modelling (as supplementary information).

      We have revised the manuscript to describe the alignment of the predicted PKR-K3 complex with active and inactive PKR, and we have extended Supplemental Figure 12 with an overlay of the predicted structures with existing structures. We have also added a supplemental data file containing the RMSD values of PKR (from the predicted PKR-K3 complex) aligned to active (PDB 2A1A) and inactive (PDB 3UIU) or unphosphorylated (PDB 6D3L) PKR (5_Structure-Alignment-RMSD-Values.xlsx). We have also provided the AlphaFold2 best model predictions for the PKR-eIF2α complex (6_AF2_PKR-KD_eIF2a.pdb) and PKR-K3 complex (7_AF2_PKR-KD_VACV-K3.pdb). Looking across the RMSD values, the AlphaFold2 model of PKR most closely resembles unphosphorylated PKR (PDB 6D3L) though we note the activation loop is absent from PDB 6D3L and 3UIU. We also aligned the Ser51 phosphoacceptor loop of AlphaFold2 eIF2α model to PDB 1Q46 and we see that the model reflects the pre-phosphorylation state. This loop is expected to interact with the PKR active site, which is not captured in our model and we state this explicitly in the caption of Figure 1 (lines 665-668).

      (19) Could some specific residue in Figure 7 be labelled (numbered) to orient the findings? Also, the key in this figure doesn't title the residues coloured white (RE red/black/blue). The white also isn't distinguished from the green (outside the regions targeted for mutagenesis).

      Excellent suggestion, we have revised this figure to include labels for the sites to orient the reader and clarify our categorization of PKR residues in the kinase domain.

      (20)  Regarding the discussion, the authors adopt the convention of describing K3 as a pseudosubstrate. Although I realize it is common to refer to K3 as a pseudosubstrate, it isn't phosphorylated and binds slightly differently to PKR so alternative descriptors, such as 'a competitive binder', would more accurately present the protein's function. Possibly for this reason, the authors declared an expectation that evolution pressures should shift K3 to precisely mimic EIF2α. However, closer molecular mimicry shouldn't be expected for two reasons. The first is a risk of disrupting other interactions, such as the EIF2 complex. Secondly, equivalent binding to PKR would demote K3 to merely a stoichiometric competitor of EIF2α. In this instance, effective inhibition would require very high levels of K3 to compete with equivalent binding by EIF2α. This would be demanding particularly upon induction of PKR during the interferon response. To be an effective inhibitor K3 has to bind more avidly than EIF2α and merely requires a sufficient overlap with the EIF2α interface on PKR to disrupt this alternative association. This interpretation predicts that K3 is under pressure to bind PKR by a different mechanism than EIF2α.

      We appreciate your thoughtful point about the usage of the term pseudosubstrate. Ultimately, we’ve decided to continue using the term due to its historical usage in the field. The question of the optimal extent of mimicry in K3 is a fascinating one, and we greatly appreciate your thoughts. We wholly agree that the possibility of K3 having superior PKR binding relative to eIF2α would be preferable to perfect mimicry. In our Ideas and Speculation section, we propose that benefits towards increasing PKR affinity may need to be balanced against potential loss of host range resulting from overfitting to a given host’s PKR. However, the possibility that reduced mimicry could be selected to avoid disruption of eIF2 function had not occurred to us; thank you for pointing it out!

      (21) The discussion of the 'positive selection' of sites is also interesting in this context. To what extent has the proposed positive selection been quantified? My understanding is that all of the EIF2α kinases are conserved and so demonstrate lower levels of residue change that might be expected by random mutagenesis i.e. variance is under negative selection. The relatively higher rate of variance in PKR orthologs compared to other EIF2α kinases could reflect some relaxation of these constraints, rather than positive selection. Greater tolerance of change may stem from PKR 's more sporadic function in the immune response (infrequent and intermittent presence of its activating stimuli) rather than the ceaseless control of homeostasis by the other EIF2α kinases. Also, induction of PKR during the immune response might compensate for mutations that reduce its activity. I believe that the entire clade of extant poxviruses is young relative to the divergence between their hosts. Accordingly, genetic variance in PKR predates these viruses. Although a change in PKR may become fixed if it affords an advantage during infection, such an advantage to the host would be countered by the much higher mutation rates of the virus. This would appear to diminish the opportunity for a specific mutation to dominate a host population and, thereby, to differentiate host species. Rather, pressure to elude control by a rapidly evolving viral factor would favour variation at sites where K3 binds. This speculation offers an alternative perspective to the current discussion that the variance in PKR orthologs stems from positive selection driven by viral infection.

      We appreciate this stimulating feedback for discussion. Three of the four eIF2α kinases (HIR, PERK, and GCN2) appear to be under purifying selection (Elde et al. 2009, PMID 19043403), which stand in contrast to PKR. Residues under positive selection have been found throughout PKR, including the dsRNA binding domains, linker region, and the kinase domain. Importantly, the selection analysis from Elde et al. and Rothenburg et al. concluded that positive selection at these sites is more likely than relaxed selection. We agree that poxviruses are young, though we would guess that viral pseudosubstrate inhibition of PKR is ancient. Many viral proteins have been reported to directly interact with PKR, including herpes virus US11, influenza A virus NS1A, hepatitis C virus NS5A, and human immunodeficiency virus Tat. The PKR kinase domain does contain residues under purifying selection that are conserved among all four eIF2α kinases, but it also contains residues under positive selection that interface with the natural substrate eIF2α. Our work suggests that PKR is genetically pliable across several sites in the kinase domain, and we are curious to know if this pliability would hold at the same sites across the other three eIF2α kinases.

      (22) The manuscript is very well written but has a small number of typos; e.g. an aberrant 'e' ln 7 of the introduction, capitalise the R in ranavirus on the last line of the fourth paragraph of the discussion, and eIF2α (EIF2α?) is occasionally written as eIFα in the materials&methods.

      Thank you for bringing these typos to our attention! We’ve deleted the aberrant ‘e’ in the introduction, capitalized ‘Ranavirus’ in the discussion (line 265), and corrected ‘eIFα’ to ‘eIF2α’ throughout the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Additional minor edits or revisions:

      (23) Paragraph 3 of the Introduction gives the impression that most of the previous work on the PKR-virus arms race is speculative. However, it is one of the best-described and most convincing examples of virus-host arms races. Can the authors edit the paragraph accordingly?

      Thank you for bringing this to our attention. We have revised the third paragraph and strengthened the description of the evolutionary arms race between PKR and viral pseudosubstrate antagonists.

      (24) Introduction: PKR has "two" double-stranded RNA binding domains. Can the authors update the text accordingly?

      We have updated the manuscript to clarify PKR has two dsRNA binding domains (lines 44-45).

      (25) The authors test here for one of the key functions of PKR: cell growth/translation arrest. Because of PKR pleiotropy, the manuscript may be edited accordingly: For example, statements such as "We found few genetic variants render the PKR kinase domain nonfunctional" are too speculative as they may retain other (not tested here) functions.

      This is a great suggestion, we have revised the manuscript to specify our definition of nonfunction in the context of our experimental screen (lines 86-92 and 106-109) and acknowledge this limitation in our experimental screen (lines 304-307).

      (26) The authors should specify "vaccinia" K3 whenever appropriate.

      We appreciate this comment and have revised the manuscript to specify vaccinia K3 where appropriate (e.g. lines 62,66, 70, 80, 108, and 226).

      (27) Ref for ACE2 diversification may include Frank et al 2022 PMID: 35892217.

      Thank you for pointing us to this paper, we have included it as a reference in the manuscript (line 277).

      (28) Positive selection of PKR as referred to by the authors corresponds to analyses performed in primates. As shown by several studies, the sites under positive selection may vary according to host orders. Can the authors specify this ("primate") in their manuscript? And/or shortly discuss this aspect.

      Thank you for raising this point. In the manuscript we performed our analysis using vertebrate sites under positive selection as identified in Rothenburg et al. 2009 PMID 19043413 (lines 51 and figure legends). We performed the same analysis using sites under positive selection in primates (as identified by Elde et al. 2009 PMID 19043403) and again found a significant difference in PKR functional scores versus K3. We have revised the manuscript to clarify our use of vertebrate sites under positive selection (line 80-81).

      (29) We view deep mutational scanning experiments as a complementary approach to positive selection": The authors should edit this and acknowledge previous and similar work of other antiviral factors, in particular one of the first studies of this kind on MxA (Colon-Thillet et al 2019 PMID: 31574080), and TRIM5 (Tenthorey et al 2020 PMID: 32930662).

      Thank you for raising up these two papers, which we acknowledge in the revised manuscript (line 299).

      (30) We believe Figure S7 brings important results and should be placed in the Main.

      We appreciate this suggestion, and have moved the contents of the former supplementary Figure 7 to the main text, in Figure 6.

      (31) The title may specify "poxvirus".

      Thank you for the suggestion to specify the nature of our experiment, we have adjusted the title to: Systematic genetic characterization of the human PKR kinase domain highlights its functional malleability to escape a poxvirus substrate mimic (line 3).

      Reviewer #3 (Recommendations For The Authors):

      (32) No line numbers or page numbers are provided, which makes it difficult to comment.

      We sincerely apologize for this oversight and have included line numbers in our revised manuscript as well as the tracked changes document.

      (33) In the introduction, I recommend defining evolutionary arms races more clearly for a broad audience.

      Thank you for this suggestion. We have revised the manuscript in the first and third paragraphs to more clearly introduce readers to the concept of an evolutionary arms race.

      (34) The introduction could use a clearer statement of the question being considered and the gap in knowledge this paper is trying to address. Currently, the third paragraph includes many facts about PKR and the fourth paragraph jumps straight into the approach and results. Some elaboration here would convey the significance of the study more clearly. As is, the introduction reads a bit like "We wanted to do deep mutational scanning. PKR seemed like an ok protein to look at", rather than conveying a scientific question.

      This is a great suggestion to improve the introduction section. We have heavily revised the third and fourth paragraphs of the introduction to clarify the motivation, approach, and significance of our work.

      (35) Relatedly, did the authors have any hypotheses at the start of the experiment about what kinds of results they expected? e.g. What parts of PKR would be most likely to generate escape mutants? Would resistant mutants be rare or common? etc? This would help the reader to understand which results are expected vs. surprising.

      These are all great questions. We have revised the introduction of the manuscript to point out that previous studies have characterized a handful of PKR variants that evade vaccinia K3, and these variants were made at sites found to be under positive selection (lines 60-64).

      (36) A description of the different K3 variants and information about why they were chosen for study should also be added to the Introduction. It was not until Figure 5 that the reader was told that K3-H47R was the same as the 'enhanced' K3 allele you are testing.

      Thank you for bringing this to our attention, we have revised the introduction to clarify the experimental conditions (lines 65-67) and specify K3-H47R as the enhanced allele earlier in the manuscript (line 100).

      (37) Does every PKR include just a single point mutation? It would be nice to see data about the number and types of mutations in each PRK window added to Supplemental Figure 1.

      Thank you for the suggestion to improve this figure. Every PKR variant that we track has a single point mutation that generates a nonsynonymous mutation. In our PacBio sequencing of the PKR variant library we identified a few off-target variants or sequences with multiple variants, but we identified the barcodes linked to those constructs and discarded those variants in our analysis. We have revised Supplemental Figure 1 to include the number and types of mutations made at each PKR window.

      (38) In terms of the paper's logical flow, personally, I would expect to begin by testing which variants break PKR's function (Figure 3) and then proceeding to see which variants allow for K3 escape (Figure 2). Consider swapping the order of these sections.

      Thank you for this suggestion, and we can appreciate how the flow of the manuscript may be improved by swapping Figures 2 and 3. We have decided to maintain the current order of the figures because we use Figure 3 to emphasize the distinction of PKR sites that are nonfunctional versus susceptible to vaccinia K3.

      (39) Figure 3A seems like a less-informative version of Figure 4A, recommend combining these two. Same comment with Figure 5A and Figure 6A.

      We appreciate this specific feedback for the figures. Though there are similarities between figure panels (e.g. 3A and 4A) we use them to emphasize different points in each figure. For example, in Figure 3 we emphasize the general lack of variants that impair PKR kinase activity, and in Figure 4 we distinguish kinase-impaired variants from K3-susceptible variants. For this reason, and given space constraints, we have chosen to maintain the figures separately. We did decide to move the former Figure 6 to the supplement.

      (40) In general, it felt like there was a lot of repetition/re-graphing of the same data in Figures 3-6. I recommend condensing some of this, and/or moving some of the panels to supplemental figures.

      Thank you for your suggestion, we have revised the manuscript and have moved Figure 6 to Supplemental Figure 7.

      (41) In contrast, Supplemental Figure 7 is helpful for understanding the distribution of the data. Recommend moving to the main text.

      This is a great recommendation, and we have moved Supplemental Figure 7 into Figure 6.

      (42) How do the authors interpret an enrichment of positively selected sites in K3-resistant variants, but not K3-H74R-resistant variants? This seems important. Please explain.

      Thank you for this suggestion to improve the manuscript; we agree that this observation warranted further exploration. We found a strong correlation in PKR functional scores between K3 WT and K3-H47R, and with that we find sites under positive selection that are resistant to K3 WT are also resistant to K3-H47R. The lack of enrichment at positively selected sites appears to be caused by collapsed dynamic range between PKR wild-type-like and nonfunctional variants in the K3-H47R screen. We have revised the manuscript to clarify this point (line 202-204).

      (43) Discussion: The authors compare and contrast between PKR and ACE2, but it would be worth mentioning other examples of genes involved in antiviral arms races wherein flexible, unstructured loops are functionally important and are hotspots of positive selection (e.g. MxA, NLRP1, etc).

      We greatly appreciate this suggestion to improve the discussion. We note this contrast between the PKR kinase domain and the flexible linkers of MxA and NLRP1 in the revised manuscript (lines 273-274).

      (44) Speculation section: What is the host range of the vaccinia virus? Is it likely to be a generalist amongst many species' PKRs (and if so, how variable are those PKRs)? Would be worth mentioning for context if you want to discuss this topic.

      Thank you for raising this question. Vaccinia virus is the most well studied of the poxviruses, having been used as a vaccine to eradicate smallpox, and serves as a model poxvirus. Vaccinia virus has a broad host range, and though the name vaccinia derives from the Latin word “vacca” for cow the viruses origin remains uncertain (Smith 2007 https://doi.org/10.1007/978-3-7643-7557-7_1). has been used to eradicate smallpox as a vaccine and serves as a model poxvirus. Thought the natural host is unknown, it appears to be a general inhibitor of vertebrate PKRs The natural host of vaccinia virus is unknown, though there is some evidence to suggest it may be native to rabbits and does appear to be generalist.

      (45) Many papers in this field discuss interactions between PKR and K3L, rather than K3. I understand that this is a gene vs. protein nomenclature issue, but consider matching the K3L literature to make this paper easier to find.

      Thank you for bringing this to our attention. We have revised the manuscript to specify that vaccinia K3 is expressed from the K3L gene in both the abstract (line 26) and the introduction (line 56) to help make this paper easier to find when searching for “K3L” literature.

      (46) Which PKR sequence was used as the wild-type background?

      This is a great question. We used the predominant allele circulating in the human population represented by Genbank m85294.1:31-1686. We cite this sequence in the Methods (line 421) and have added it to the results section as well (lines 84).

      (47) Figure 1C: the black dashed line is difficult to see. Recommend changing the colors in 1A-1C.

      Thank you for this suggestion, we have changed the dashed lines from black to white to make them more distinguishable.

      (48) Figure 1D: Part of the point of this figure is to convey overlaps between sites under selection, K3 contact sites, and eIF2alpha contact sites, but at this scale, many of the triangles overlap. It is therefore impossible to tell if the same sites are contacted vs. nearby sites. Perhaps the zoomed-in panels showing each of the four windows in the subsequent figures are sufficient?

      Thank you for bringing this to our attention. We have scaled the triangles down to reduce their overlap in Figure 1D and list all sites of interest (predicted eIF2α and vaccinia contacts, conserved sites, and positive selection sites) in the Materials and Methods section “Predicted PKR complexes and substrate contacts”.

      (49) Figure 1E: under "1,293 Unique Combinations", there is a line between the PKR and K3 variants, which makes it look like they are expressed as a fusion protein. I believe these proteins were expressed from the same plasmid, but not as a fusion, so I recommend re-drawing. Then in the graph, the y-axis says "PKR abundance", but from the figure, it is not clear that this refers to relative abundance in a yeast pool. Perhaps "yeast growth" or similar would be clearer?

      Thank you for the specific feedback to improve Figure 1. We have made the suggested edits to clarify that PKR and vaccinia K3 are not fused but each is expressed from their own promoter. We have also changed the y-axis from “PKR Abundance” to “Yeast Growth”.

    2. Reviewer #1 (Public review):

      Summary:

      The report examines the control of the antiviral RNA-activated protein kinase, PKR, by the Vaccinia virus K3 protein. K3 binds to PKR, hindering its ability to control protein translation by blocking its phosphorylation of the eukaryotic initiation factor EIF2α. Kinase function is probed by saturation mutation of the K3/EIF2α-binding surface on PKR, guided by models of their interaction. The findings identify specific residues at the predicted interface that asymmetrically influence repression by K3 and the phosphorylation of EIF2α. This recognises the potential of PKR alleles to resist control by the viral virulence factor.

      Strengths:

      The experimentation is diligent, generating and screening many point mutants to identify residues at the interface between PKR and EIF2α or K3 that distinguishes PKR's phosphor control of its substrate from the antithetical interaction with the viral virulence factor.

      Weaknesses:

      The protein interaction between PKR and K3 has already been well-explored through phylogenetic and functional analyses and molecular dynamics studies, as well as with more limited site-directed mutational studies using the same experimental assays. Accordingly, the findings are not pioneering but reinforce and extend what had previously been established.

      The authors responded to this comment by pointing out that their more comprehensive screen better defined the extent of the plasticity of the K3/EIF2α-binding surface on PKR.

      Also in their response, the authors added the caveat that the equivalent expression of the different PKR mutants has not been verified, added information clarifying the states of the model proteins compared to their determined molecular structures, and provided clarifications or responses to all other questions.

      I question eLife's assessment that the development of the yeast-based assay is a key advancement of this report, as this assay has been used for over 30 years.

    3. eLife Assessment

      This important revised report describes the control of the activity of the RNA-activated protein kinase, PKR, by the Vaccinia virus K3 protein. A strength of the manuscript is the powerful combination of a classic yeast-based assay with high-throughput sequencing and its convincing experimental use to characterize large numbers of PKR variants, now with improved controls for potential biases. A minor current limitation that the authors may address in the future is the scope of the screen in terms of the segments of PKR included.

    4. Reviewer #2 (Public review):

      Chambers et al. (2024) present a systematic and unbiased approach to explore the evolutionary potential of the kinase domain of the human antiviral protein kinase R (PKR) to evade inhibition by a poxviral antagonist while maintaining one of its essential functions.

      The authors generated a library of 426 single-nucleotide polymorphism (SNP)-accessible non-synonymous variants of PKR kinase domain and used a yeast-based heterologous virus-host system to assess PKR variants' ability to escape antagonism by the vaccinia virus pseudo-substrate inhibitor K3. The study identified determinant sites in the PKR kinase domain that harbor K3-resistant variants, as well as sites where variation leads to PKR loss of function. The authors found that multiple K3-resistant variants are readily available throughout the domain interface and are enriched at sites under positive selection. They further found some evidence of PKR resilience to viral antagonist diversification. These findings highlight the remarkable adaptability of PKR in response to viral antagonism by mimicry.

      Significance of the findings: The findings are important with implications to various fields, including evolutionary biology, virus-host interfaces, genetic conflicts, antiviral immunity.

      Strength of the evidence: Convincing methodology using state-of-the-art mutational scanning approach in an elegant and simple setup to address important challenges in virus-host molecular conflicts and protein adaptations.

      Strengths

      Systematic and Unbiased Approach: The study's comprehensive approach to generating and characterizing a large library of PKR variants provides valuable insights into the evolutionary landscape of PKR kinase domain. By focusing on SNP-accessible variants, the authors ensure the relevance of their findings to naturally occurring mutations.<br /> Identification of Key Sites: The identification of specific sites in the PKR kinase domain that confer resistance or susceptibility to a poxvirus pseudosubstrate inhibition is a significant contribution.<br /> Evolutionary Implications: The authors performed meticulous comparative analyses throughout the study between the functional variants from their mutagenesis screen ("prospective") and the evolutionarily-relevant past adaptations ("retrospective").<br /> Experimental Design: The use of a yeast-based assay to simultaneously assess PKR capacity to induce cell growth arrest and susceptibility/resistance to various VACV K3 alleles is an efficient approach. The combination of this assay with high-throughput sequencing allows for the rapid characterization of a large number of PKR variants.

      Areas of improvement

      Validation of the screen: In the revised version, the authors now provide the results of two independent experiments in a complete yeast growth assay on a handful of candidates to control the screen's results. This strengthens the direct findings from the screen. It would strengthen the study to complement this validation by another method to assess PKR functions; for example, in human PKR-KO cells, because results between yeast and human cells can differ. These limitations are now acknowledged in the revised version.<br /> Evolutionary Data: Beyond residues under positive selection, the screen allows the authors to also perform a comparative analysis with PKR residues under purifying selection. Because they are assessing one of the most conserved ancestral functions of PKR (i.e. cell translation arrest), it may also be of interest to discuss these highly conserved sites. The authors now discuss the implications for the conserved residues.<br /> Mechanistic insights and viral diversity: While the study identifies key sites and residues involved in vaccinia K3 resistance, it could benefit from further investigation into the underlying molecular mechanisms and the diversity of viral antagonists. The authors have now acknowledged these limitations in the Discussion and updated the manuscript to be more specific. These exciting research avenues will be the objectives of a next study.

      Overall Assessment

      The systematic approach, identification of key sites, and evolutionary implications are all notable strengths. While there is room for a stronger validation of the functions and further investigation into the mechanistic details and broader viral diversity, the findings are robust and already provide important advancements. The manuscript is well-written and clear, and the revised figures are informative and improved.

    1. eLife Assessment

      This study retrospectively analyzed clinical data to develop a risk prediction model for pulmonary hypertension in high-altitude populations. The evidence is solid, and the findings are useful and hold clinical significance as the model can be used for intuitive and individualized prediction of pulmonary hypertension risk in these populations.

    2. Reviewer #1 (Public Review):

      Summary:

      This study retrospectively analyzed clinical data to develop a risk prediction model for pulmonary hypertension in high-altitude populations. This finding holds clinical significance as it can be used for intuitive and individualized prediction of pulmonary hypertension risk in these populations. The strength of evidence is high, utilizing a large cohort of 6,603 patients and employing statistical methods such as LASSO regression. The model demonstrates satisfactory performance metrics, including AUC values and calibration curves, enhancing its clinical applicability.

      Strengths:

      (1) Large Sample Size: The study utilizes a substantial cohort of 6,603 subjects, enhancing the reliability and generalizability of the findings.

      (2) Robust Methodology: The use of advanced statistical techniques, including least absolute shrinkage and selection operator (LASSO) regression and multivariate logistic regression, ensures the selection of optimal predictive features.

      (3) Clinical Utility: The developed nomograms are user-friendly and can be easily implemented in clinical settings, particularly in resource-limited high-altitude regions.

      (4) Performance Metrics: The models demonstrate satisfactory performance, with strong AUC values and well-calibrated curves, indicating accurate predictions.

      Weaknesses:

      (1) Lack of External Validation: The models were validated internally, but external validation with cohorts from other high-altitude regions is necessary to confirm their generalizability.

      (2) Simplistic Predictors: The reliance on ECG and basic demographic data may overlook other potential predictors that could improve the models' accuracy and predictive power.

      (3) Regional Specificity: The study's cohort is limited to Tibet, and the findings may not be directly applicable to other high-altitude populations without further validation.

      Comments on revised version:

      The authors have made revisions in response to the primary concerns raised in the initial review, leading to significant improvements in the manuscript's technical accuracy, formatting consistency, and overall clarity. They have provided a detailed explanation of the selection criteria for the final model variables, which has enhanced the transparency and robustness of the study's methodology. Additionally, the authors have acknowledged the limitation of lacking external validation in cohorts from other high-altitude regions and outlined their plans for future research to address this issue.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      (1) Correct capitalization errors, ensuring the first letter of each sentence is capitalized.

      Thank you for your comment. We have corrected capitalization errors.

      (2) Ensure that all technical terms and abbreviations are introduced in full when first mentioned and consistently used throughout the text.

      Thank you for your comment. we have checked and corrected the issue.

      (3) Review the manuscript for grammatical errors and improve sentence structures to enhance readability.

      Thank you for your comment. we have checked and corrected the issue.

      (4) Ensure all figures referenced in the text, such as Fig. 3G, are appropriately discussed and integrated into the narrative.

      Thank you for your comment. we have discussed and integrated Fig. 3G into the narrative (Page 12, Line 162-166).

      (5) Maintain consistent formatting, including first-line indentation and spacing before paragraphs, to improve the document's visual coherence.

      Thank you for your comment. we have checked and corrected the issue.

      (6) Provide additional explanations for the selection criteria of final model variables, particularly the rationale behind choosing the λ_1se criterion in the LASSO regression.

      Thank you for your comment. we have provided explanations for choosing the λ_1se criterion in the LASSO regression (Page 25, Line 315-316; Page 27, Line 363-364).

      (7) Conduct validation studies with cohorts from other high-altitude regions to assess the generalizability and robustness of the prediction models.

      Thank you for your comment. The lack of validation of cohorts from other high-altitude regions is a weakness in this study, and in our follow-up study, we will conduct external validation with cohorts from more other high-altitude regions to assess the generalizability and robustness of our prediction models.

    1. eLife Assessment

      The authors analyze a comprehensive cohort of human plasma samples to identify an extracellular vesicles protein signature for early diagnosis of pancreatic cancer. The application of liquid biopsies is valuable, and the work addresses a key clinical problem as pancreas cancer is often diagnosed in late stages. The strength of evidence is solid. Altogether, this work supports the potential use of extracellular vesicles in clinical settings, with promising value to scientists and clinicians.

    2. Reviewer #1 (Public review):

      This study presents a large cohort of plasma-derived extracellular vesicle samples from 124 individuals, including patients with PDAC, benign pancreatic diseases and controls. The authors identified a panel of protein markers for the early detection of pancreatic cancer and validated in an external cohort.

    3. Reviewer #2 (Public review):

      This work investigates the use of extracellular vesicles (EVs) in blood as a noninvasive 'liquid biopsy' to aid in differentiation of patients with pancreatic cancer (PDAC) from those with benign pancreatic disease and healthy controls, an important clinical question where biopsies are frequently non-diagnostic. The use of extracellular vesicles as biomarkers of disease has been gaining interest in recent history, with a variety of published methods and techniques, looking at a variety of different compositions ('the molecular cargo') of EVs particularly in cancer diagnosis (Shah R, et al, N Engl J Med 2018; 379:958-966).

      This study adds to the growing body of evidence in using EVs for earlier detection of pancreatic cancer, identifying both new and known proteins of interest. Limitations in studying EVs in general include dealing with low concentrations in circulation and identifying the most relevant molecular cargo. This study provides validation of assaying EVs using the novel EVtrap method (Extracellular Vesicles Total Recovery And Purification), which the authors show to be more efficient than current standard techniques and potentially more scalable for larger clinical studies.

      The strength of this study is in its numbers - the authors worked with a cohort of 124 cases, 93 of them which were PDAC samples, which considered large for an EV study (Jia, E et al. BMC Cancer 22, 573 (2022)). The benign disease group (n=20, between chronic pancreatitis and IPMNs) and healthy control groups (n=11) were relatively small, but the authors were not only able to identify candidate biomarkers for diagnosis that clearly stood out in the PDAC cohort, but also validate it in an independent cohort of 36 new subjects. Proteins they've identified as associated with pancreatic cancer over benign disease included PDCD6IP, SERPINA12 and RUVBL2. They were even able to identify a set of EV proteins associated with metastasis and poorer prognosis , which include the proteins PSMB4, RUVBL2 and ANKAR and CRP, RALB and CD55. Their 7-EV protein signature yielded an 89% prediction accuracy for the diagnosis of PDAC against a background of benign pancreatic diseases that is compelling and comparable to other studies in the literature (Jia, E. et al. BMC Cancer 22, 573 (2022)).

      The limitations of this study are its containment within a single institution - further studies are warranted to apply the authors' 7-EV protein PRAC panel to multiple other cases at other institutions in a larger cohort.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, Bockorny, Muthuswamy, and Huang et al. performed proteomics analysis of plasma extracellular vesicles (EVs) from pancreatic ductal adenocarcinoma (PDAC) patients and patients with benign pancreatic diseases (chronic pancreatitis and intraductal papillary mucinous neoplasm, IPMN) to develop a 7-EV protein signature that predicts PDAC. Moreover, the authors identified PSMB4, RUVBL2, and ANKAR as being associated with metastasis. These studies provide important insight into alterations of EVs during PDAC progression and the data supporting predict PDAC with EV protein signatures are solid. However, there are certain concerns regarding the rigor and novelty of the data analysis and interpretation, as well as the clinical implications, as detailed below.

      (1) Plasma EVs were characterized by transmission electron microscopy and nanoparticle tracking analysis to confirm their morphology and size. The authors should also include an analysis of putative EV markers (e.g., tetraspanins, syntenin, ALIX, etc.) to confirm that the analyzed particles are EVs.

      We thank the reviewer for this comment. In the previous study from our co-authors who developed EVtrap method (PMID:32396726), they used electron microscopy and NTA , as well as quantification of typical EV protein markers, such as CD9, to confirm that particles isolated using EVtrap had typical characteristics of the extracellular vesicles. As such, these experiments were not replicated here. We added the following statement to the manuscript:

      “Previous analyses using electron microscopy and nanoparticle tracking also confirmed that the vast majority of particles isolated by EVtrap had diameters between 100-200 nm, consistent with exosomes (PMID:32396726). In addition, EVtrap isolates demonstrates higher abundance of CD9, a common exosome marker, as compared to isolates from other traditional EV isolation methods such as size exclusion chromatography and ultracentrifugation (PMID:32396726)”

      (2) The authors identified multiple over-expressed proteins in PDAC based on their foldchange and p-value; however, due to the heterogeneity of PDAC, it is necessary to show a heatmap displaying their abundance in all samples. High fold change does not necessarily indicate consistently high abundance in all PDAC samples.

      We thank the reviewer for this suggestion. We have now included the heatmap in the new Supplementary Figure 3.

      (3) PSMB4, RUVBL2, and ANKAR were identified as being associated with metastasis. The authors state that they intended to distinguish early and late-stage cancer samples, but it is unclear why they chose to compare metastatic and non-metastatic samples, as the non-metastatic group also includes late-stage cancer samples. This sentence should be rephrased to more accurately reflect the sample types profiled.

      We thank the reviewer for pointing this out. We would like to clarify that this analyses shown in Figures 3B and 3C pertain to patients with Metastatic vs Non-Metastatic disease, not early versus late stage. We edited the text to ensure this information is clear.

      (4) Non-metastatic and metastatic patients were separated based on global protein abundance. The samples within each group display significant heterogeneity, with some samples displaying similar patterns although they were classified into different groups (Figure 3A), and the samples within the same group, particularly the metastasis group, did not consistently exhibit similar patterns of protein abundance. The authors should clarify this point.

      We thank the reviewer for this comment. The EV proteomic expression is anticipated not to show the exact pattern across of samples of each group. The purpose of this experiment depicted in Figure 3 heatmap is to show the enrichment for pattern of expressions, but we acknowledge that not all samples from the same group have the exact proteome pattern.

      We added this statement in the discussion section:

      “As expected, the EV proteomic profiles of PDAC patients exhibited significant heterogeneity. While the above mentioned markers exhibited strong association with disease states at population levels, their abundances in individual patients varied significantly. Those observations highlight the need to develop multi-protein panels for pancreatic cancer diagnosis and prognosis.”

      (5) The authors performed the survival analysis on a set of EV proteins but did not specify the origin of these markers or how many markers were examined. The authors should show their abundances across different groups, such as different stages and metastasis status.

      We thank the reviewer for the comments. The goal of this experiment was not to identify EV proteins that performed similarly well for diagnosing and prognostication. In Figure 3A, 3B and 3C, we identified EV proteins that had better performance for diagnosis of metastatic disease. In these experiments we made  comparative analysis between patients with metastasis versus non-metastasis. In the experiment depicted in Figure 3D, the goal was to identify EV markers that had better performance is prognosticating outcomes as measured by overall survival, out of the markers identified in the previous experiments from Figure 3A. We would like to further clarify that based on our observation and others, it has become clear that EV profiles from cancer patients are highly heterogenous and we do not anticipate that a single marker will have sufficient test performance for cancer diagnosis or prognosis assessment when measured isolated. Rather, we anticipate that a panel of markers may yield better performance for diagnosis while a different combination of EV markers may have better performance for prognosis assessment.

      (6) The classification model yielded a 100% accuracy, which may refer to AUC, in their discovery cohort, but it decreased to 89% in the independent cohort. This suggests that the authors have encountered overfitting issues with their model, where it performed well on the discovery cohort but did not generalize well to the independent cohort. The authors should clarify this point. The AUC score of the 7-EV signature is 0.89 and is not equivalent to prediction accuracy. In order to demonstrate prediction accuracy, the authors should show the confusion matrix of training and testing data as well as other evaluation metrics, such as accuracy, precision, and recall.

      We thank the reviewer for providing these insightful comments. As you noted, the 7-biomarker signature machine learning model attained an impressive 100% accuracy within the internal Discovery Cohort, raising concerns about potential overfitting in the external validation dataset. Acknowledging the noted difference in AUROC of 0.11 in the external validation cohort, which surpasses the typical reported range of ~0.06-0.09, the model demonstrated a commendable AUROC of 0.89 in an independent patient cohort. Moreover, the utilization of an alternate technology to measure protein abundance in the validation dataset, underscores the model’s reproducibility and validity. We have provided the model metrics for both internal- and external-validation cohort. For these, please see updated Supplementary Figure 7, as well as the new Supplementary Figure 6 and Supplementary Figure 8. We also amended the discussion section to acknowledge that the validation cohort had limited sample size and proteins were measured in using a different method. Those factors likely contributed to the lower accuracy of predictions in the validation cohort. We addressed these limitations in the discussion section of the manuscript.

      (7) The authors should include more details of their model and the process of selection of signatures to enhance the reproducibility and transparency of their methods.

      We thank the reviewer for their valuable comments. To enhance clarity, we have incorporated additional information regarding the method employed for biomarker signature identification into the ‘Methods Section’ in page 23.  We note that Supplementary Table 7a provides details on ‘Sensitivity, Specificity, Precision, and AUC’ for the 16 markers included in the external validation study. Additionally, Supplementary Table 7b presents the contingency table for 7-biomarker signature, offering insights into model accuracy for both the Internal-Discovery and External Validation cohorts.  

      Reviewer #2 (Public Review):

      The authors intended to identify a protein signature in extracellular vesicles of serum to distinguish pancreatic ductal adenocarcinoma from benign pancreatic diseases.

      A major strength of the work presented is the valuable profiling of a significant number of patient samples, with a rich cohort of patients with pancreatic cancer, benign pancreatic diseases, and healthy controls. However, despite the strong cohorts presented, the numbers of patient samples for benign pancreatic diseases as well as controls were very limited.

      Also, the method used to isolate vesicles, EVTrap, recognizes double bilayers, which means that it can detect cellular debris and apoptotic bodies, which are very common in the circulation of patients that are undergoing chemotherapy. It would be important to identify the patients that are therapy naïve and the ones that are not because of this possible bias.

      We thank the Reviewer for these comments. We want to point out that the experiments presented in Supplementary Figure 1 (Transmission electron microscopy images and Nanoparticle tracking analysis) confirm that the vesicles isolated with EVTrap are not cellular debris and apoptotic bodies. Rather, these structures are in the nano range expected for exosomes. This is further supported by the additional work from our co-author and collaborator describing the development of EVtrap and its performance in isolating exosomes when compared to other traditional methods such as ultracentrifugation and size exclusion chromatography (PMID:32396726).

      As per the Reviewer’s request, we have provided an additional heatmap figure depicting whose patients are treatment naïve to differentiate from those who have received treatment (revised Figure 2C).

      Additionally, the transmission electron microscopy data reflect this heterogeneity of the samples, also with little identification of double bilayered vesicles. It would be important to identify some extracellular vesicles markers in those preparations to strengthen the quality of the samples analyzed.

      We appreciate the comment from the Reviewer and acknowledge the importance of identifying exosome markers on the isolate from EVtrap. These experiments have already been done and are reported in the original paper describing the development of this method by our co-authors in a separate work. In the manuscript PMID: 30080416, our collaborators demonstrated the detection of CD9, a well-known exosome marker, using Western Blot from isolates using EVtrap or ultra-centrifugation, a traditional technique to isolate exosomes. This work showed that EVtrap yielded much higher recovery rate of exosomes with lower contamination from soluble proteins. We did not repeat these already published experiments, but we amended our manuscript to reference these results.

      What is more, previously published work with this same methodology identifies around 2000 proteins per sample. It would be important to explain why in this study there seems to be a reduction in more than 50% of the amount of proteins identified in the vesicles.

      We thank the Reviewer for pointing out this important detail. In the previous work in which EVtrap was developed by our co-authors, the blood samples were processed using a different protocol, with shorter centrifugation (2,500g for 10 min) (PMID: 32396726). In the current work, we employed three centrifugation steps. As detailed in the Methods section of the manuscript, blood samples were centrifuged at 1,300g for 15 min. Then  plasma was removed from the top carefully avoiding cell pellet;  Repeat centrifugation of plasma at 2,500g for 15 min;  Again, plasma was removed from the top carefully avoiding cell pellet;  Third centrifugation at 2,500g for 15 min. This more extensive centrifugation process was intended to further increase the removal of platelets, apoptotic bodies, and other large particles and aggregates. Accordingly, we anticipate that the additional centrifugation steps decreased the contamination of our isolates but may have also decreased the amount of exosome proteins, hence the lower amount of exosome proteins identified in our study as compared to the original study from our co-authors (PMID: 32396726).

      One of the proteins that constantly surges on the analysis is KRT20. It would be important to proceed with the analysis by first filtering out possible contaminants of the proteomics, of which keratins are the most common ones.

      We thank the Reviewer for this comment. We would like to point out that we do believe that KRT20 is, in fact, cancer related and a not a contaminant. This is supported by our results presented in this manuscript showing enrichment or KRT20 in PDAC cases, and lower expression in benign samples. If this protein was a contaminant, its expression would be found uniformly in all samples, there would be no apparent reason for different expression between malignant vs benign cases, as all samples were processed following the same procedures. In addition, increased expression of KRT20 in PDAC tissues has also been reported by others. For instance, in a study by Schmiz-Winnthal  (PMID: 16364723), the authors showed that Cytokeratin 20 (KRT20) were expressed in 76% of PDAC patients and expression of KRT20 was associated with poor survival after surgical resection. Based on these observations, we believe that the KRT20 identified in our study is indeed a tumor associated EV protein rather than contamination.

      Finally, none of the 7-extracellular vesicle protein signatures has been validated by other techniques, such as western blot, in extracellular vesicles isolated by other, standard, methods, such as size exclusion chromatography.

      A distinct technique for protein analysis was done but not a different method of isolation of these vesicles. This would strengthen the results and the origin of the proteins.

      We appreciate the Reviewer’s comment. We would like to again emphasize that the goal of this manuscript was not to compare the performance of EVtrap with other traditional EV isolation approaches such as ultracentrifugation and size exclusion chromatography.  The main goal of study is to determine proteomic profiles of EVs isolated from clinical samples and provide such information to research community for further studies. As the Reviewer points out, proteins in EVs are highly heterogeneous which highlight the complexity of EV biology and interpatient heterogeneity of pancreatic cancer.  We do not anticipate the development of EV-based markers for pancreatic diagnosis can be achieved by a single team, but by a community of researchers. We hope information presented in the current study will help other researchers identify additional candidates for validation in future work. Nonetheless, we edited the manuscript to discuss the limitation of not doing cross-validation of protein detection using a different method.

      The conclusions that are reached do not fully meet the proposed aims of the identification of a protein signature in circulating extracellular vesicles that could improve early detection of the disease. The authors did not demonstrate the superiority of detection of these proteins in extracellular vesicles versus simply performing an ELISA, nor their superiority with respect to the current standard procedure for diagnosis.

      We would like to clarify to the Reviewer that the goal of this manuscript was not to prove superiority of the EV signature biomarker in diagnosing pancreatic cancer as compared to current standard of care (SOC) practice, i.e., CT scans, endoscopic ultrasound and CA19-9. In order to prove such superiority, one would require a large, randomized phase III trial with several hundred patients. This was not the pursue of our discovery EV proteomics study and we double checked our manuscript to ensure no such claim was made. Rather, we aimed at developing a new pipeline for discovery of new EV biomarkers and we believe we were able to prove that this approach was successful in discovering a new class of biomarkers based on proteins expressed on extra-cellular vesicles that have predominant expression on patients with pancreatic cancer. Future studies should continue to advance this field with goals of improving on the current standard of care diagnostic methods.

      The authors also suggest that profiling of circulating extracellular vesicles provides unique insights into systemic immune changes during pancreatic cancer development. How is this better than a regular hemogram is not clear.

      We would like to clarify that the overall goal of this study is to provide patient-relevant information for the research community to further investigate biology of extracellular vesicles. For the state 'unique insights into systemic immune changes' we referred to the fact that we discovered EVs carrying proteins involved in immune responses. Previous studies have shown that EVs play important roles in cell-cell communication, discoveries from our study provide candidates for future studies on cellular mechanisms underlying immune regulation during pancreatic cancer development.

      Finally, it would be important to determine how this signature compares with many others described in the literature that have the exact same aim. Why and how would this one be better?

      We would like to again clarify that comparing the diagnostic performance of the EV biomarkers discovered in the study against standard of care methods (CA19-9, ctDNA, CT scan) was beyond the scope of this discovery EV proteomics work. We reviewed the manuscript to ensure that no claims were made as far as superiority against point-of-care tests available in clinic.

      Reviewer #3 (Public Review):

      This work investigates the use of extracellular vesicles (EVs) in blood as a noninvasive 'liquid biopsy' to aid in the differentiation of patients with pancreatic cancer (PDAC) from those with benign pancreatic disease and healthy controls, an important clinical question where biopsies are frequently non-diagnostic. The use of extracellular vesicles as biomarkers of disease has been gaining interest in recent history, with a variety of published methods and techniques, looking at a variety of different compositions ('the molecular cargo') of EVs particularly in cancer diagnosis (Shah R, et al, N Engl J Med 2018; 379:958-966).

      This study adds to the growing body of evidence in using EVs for earlier detection of pancreatic cancer, identifying both new and known proteins of interest. Limitations in studying EVs, in general, include dealing with low concentrations in circulation and identifying the most relevant molecular cargo. This study provides validation of assaying EVs using the novel EVtrap method (Extracellular Vesicles Total Recovery And Purification),which the authors show to be more efficient than current standard techniques and potentially more scalable for larger clinical studies.

      The strength of this study is in its numbers - the authors worked with a cohort of 124 cases,93 of them which were PDAC samples, which are considered large for an EV study (Jia, E etal. BMC Cancer 22, 573 (2022)). The benign disease group (n=20, between chronic pancreatitis and IPMNs) and healthy control groups (n=11) were relatively small, but the authors were not only able to identify candidate biomarkers for diagnosis that clearly stood out in the PDAC cohort, but also validate it in an independent cohort of 36 new subjects.

      Proteins they have identified as associated with pancreatic cancer over benign disease included PDCD6IP, SERPINA12, and RUVBL2. They were even able to identify a set of EV proteins associated with metastasis and poorer prognosis, which include the proteins PSMB4, RUVBL2 and ANKAR and CRP, RALB and CD55. Their 7-EV protein signature yielded an 89% prediction accuracy for the diagnosis of PDAC against a background of benign pancreatic diseases that is compelling and comparable to other studies in the literature (Jia,E. et al. BMC Cancer 22, 573 (2022)).

      The limitations of this study are its containment within a single institution - further studies are warranted to apply the authors' 7-EV protein PRAC panel to multiple other cases at other institutions in a larger cohort.

      We are very thankful to the Reviewer for the positive feedback. We are similarly optimistic that EV-based biomarkers will assist future researchers to develop better diagnostic assays for patients with pancreatic cancer, as well as other tumor types lacking accurate blood-based tests.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer #1 (Public review):

      Summary:

      In this manuscript, Herrmannova et al explore changes in translation upon individual depletion of three subunits of the eIF3 complex (d, e and h) in mammalian cells. The authors provide a detailed analysis of regulated transcripts, followed by validation by RT-qPCR and/or Western blot of targets of interest, as well as GO and KKEG pathway analysis. The authors confirm prior observations that eIF3, despite being a general translation initiation factor, functions in mRNA-specific regulation, and that eIF3 is important for translation re-initiation. They show that global effects of eIF3e and eIF3d depletion on translation and cell growth are concordant. Their results support and extend previous reports suggesting that both factors control translation of 5'TOP mRNAs. Interestingly, they identify MAPK pathway components as a group of targets coordinately regulated by eIF3 d/e. The authors also discuss discrepancies with other reports analyzing eIF3e function.

      Strengths:

      Altogether, a solid analysis of eIF3 d/e/h-mediated translation regulation of specific transcripts. The data will be useful for scientists working in the Translation field.

      Weaknesses:

      The authors could have explored in more detail some of their novel observations, as well as their impact on cell behavior.

      The manuscript has improved with the new corrections. I appreciate the authors' attention to the minor comments, which have been fully solved. The authors have not, however, provided additional experimental evidence that uORF-mediated translation of Raf-1 mRNA depends on an intact eIF3 complex, nor have they addressed the consequences of such regulation for cell physiology. While I understand that this is a subject of follow-up research, the authors could have at least included their explanations/ speculations regarding major comments 2-4, which in my opinion could have been useful for the reader.

      Our explanations/speculations regarding major comments 2 and 3 were included in the Discussion. We apologize for this misunderstanding as we thought that we were supposed to explain our ideas only in the responses. We did not discuss the comment 4, however, as we are really not sure what is the true effect and did not want to go into wild speculations in our manuscript. We thank this reviewer for his insightful comments and understanding.


      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Major comments:

      (1) The authors report the potential translational regulation of Raf kinase by re-initiation. It would be interesting to show that Raf is indeed regulated by uORF-mediated translation, and that this is dependent on an intact eIF3 complex. Analyzing the potential consequences of Raf1 regulation for cancer cell proliferation or apoptosis would be a plus.

      We agree that this is an interesting and likely possibility. In fact, another clue that translation of Raf1 is regulated by uORFs comes from Bohlen et al. 2023 (PMID: 36869665) where they showed that RAF1 translation is dependent on PRRC2 proteins (that promote leaky scanning through these uORFs). We noted in the discussion that our results from eIF3d/e/hKD and the PRRC2A/B/CKD partly overlap. It is a subject of our follow-up research to investigate whether eIF3 and PRRC2 co-operate together to regulate translation of this important mRNA. 

      (2) The authors show that eIF3 d/e -but not 3h- has an effect on cell proliferation. First, this indicates that proliferation does not fully correlate with eIF3 integrity. Depletion of eIF3d does not affect the integrity of eIF3, yet the effects on proliferation are similar to those of eIF3e. What is the possibility that changes in proliferation reflect functions of eIF3d outside the eIF3 complex? What could be the real consequences of disturbing eIF3 integrity for the mammalian cell? Please, discuss.

      Yes, proliferation does not fully correlate with eIF3 integrity. Downregulation of eIF3 subunits that lead to disintegration of eIF3 YLC core (a, b, c, g, i) have more detrimental effect on growth and translation than downregulation of the peripheral subunits (e, k, l, f, h, m). Our previous studies (Wagner et al. 2016, PMID: 27924037 and Herrmannová et al. 2020, PMID: 31863585) indicate that the YLC core of eIF3 can partially support translation even without its peripheral subunits. In this respect eIF3d (as a peripheral subunit) is an amazing exception, suggesting it may have some specialized function(s). Whether this function resides outside of the eIF3 complex or not we do not know, but do not think so. Mainly because in the absence of eIF3e – its interaction partner, eIF3d gets rapidly degraded. Therefore, it is not very likely that eIF3d exists alone outside of eIF3 complex with moonlighting functions elsewhere. We think that eIF3d, as a head-interacting subunit close to an important head ribosomal protein RACK1 (a landing pad for regulatory proteins), is a target of signaling pathways, which may make it important for translation of specific mRNAs. In support is these thoughts, eIF3d (in the context of entire eIF3) together with DAP5 were shown to promote translation by an alternate capdependent (eIF4F-independent) mechanism (Lee et al. 2016, PMID: 27462815; de la Parra et al. 2018, PMID:30076308). In addition, the eIF3d function (also in the context of entire eIF3) was proved to be regulated by stress-triggered phosphorylation (Lamper et al. 2020, PMID: 33184215). 

      (3) Figure 6D: Surprisingly, reduced levels of ERK1/2 upon eIF3d/e-KD are compensated by increased phosphorylation of ERK1/2 and net activation of c-Jun. Please comment on the functional consequences of buffering mechanisms that the cell deploys in order to counteract compromised eIF3 function. Why would the cell activate precisely the MAPK pathway to compensate for a compromised eIF3 function?

      This we do not know. We can only speculate that when translation is compromised, cells try to counteract it in two ways: 1) they produce more ribosomes to increase translational rates and 2) activate MAPK signaling to send pro-growth signals, which can in the end further boost ribosome biogenesis.

      (4) Regarding DAP-sensitive transcripts, can the authors discuss in more detail the role of eIF3d in alternative cap-dependent translation versus re-initiation? Are these transcripts being translated by a canonical cap- and uORF-dependent mechanism or by an alternative capdependent mechanism?

      This is indeed not an easy question. On one hand, it was shown that DAP5 facilitates translation re-initiation after uORF translation in a canonical cap-dependent manner. This mechanism is essential for translation of the main coding sequence (CDS) in mRNAs with structured 5' leaders and multiple uORFs. (Weber et al. 2022, PMID: 36473845; David et al., 2022, PMID: 35961752). On the other hand, DAP5 was proposed to promote alternative, eIF4F-independent but cap-dependent translation, as it can substitute the function of the eIF4F complex in cooperation with eIF3d (de la Parra et al., 2018, PMID: 30076308; Volta et al., 2021 34848685). Overall, these observations paint a very complex picture for us to propose a clear scenario of what is going on between these two proteins on individual mRNAs. We speculate that both mechanisms are taking place and that the specific mechanism of translation initiation differs for differently arranged mRNAs.

      Minor comments:

      (5) Figure S2C: why is there a strong reduction of the stop codon peak for 3d and 3h KDs?

      We have checked the Ribowaltz profiles of all replicates (in the Supplementary data we are showing only a representative replicate I) and the stop codon peak differs a lot among the replicates. We think that this way of plotting was optimized for calculation and visualization of P-sites and triplet periodicity and thus is not suitable for this type of comparison among samples. Therefore, we have performed our own analysis where the 5’ ends of reads are used instead of P-sites and triplicates are averaged and normalized to CDS (see below please), so that all samples can be compared directly in one plot (same as Fig. S13A but for stop codon). We can see that the stop codon peak really differs and is the smallest for eIF3hKD. However, these changes are in the range of 20% and we are not sure about their biological significance. We therefore refrain from drawing any conclusions. In general, reduced stop codon peak may signal faster termination or increased stop codon readthrough, but the latter should be accompanied by an increased ribosome density in the 3’UTR, which is not the case. A defect in termination efficiency would be manifested by an increased stop codon peak, instead.

      Author response image 1.

       

      (6) Figures 5 and S8: Adding a vertical line at 'zero' in all cumulative plots will help the reader understand the author's interpretation of the data. 

      We have added a dashed grey vertical line at zero as requested. However, for interpretation of these plots, the reader should focus on the colored curve and whether it is shifted in respect to the grey curve (background) or not. Shift to the right indicates increased expression, while shift to the left indicates decreased expression. The reported p-value then indicates the statistical significance of the shift.

      (7) The entire Figure 2 are controls that can go to Supplementary Material. The clustering of Figure S3B could be shown in the main Figure, as it is a very easy read-out of the consistent effects of the KDs of the different eIF3 subunits under analysis.

      We have moved the entire Figure 2 to Supplementary Material as suggested (the original panels can be found as Supplementary Figures 1B, 1C and 3A). Figure S3B is now the main Figure 2E. 

      (8) There are 3 replicates for Ribo-Seq and four for RNA-Seq. Were these not carried out in parallel, as it is usually done in Ribo-seq experiments? Why is there an extra replicate for RNASeq?

      Yes, the three replicates were carried out in parallel. We have decided to add the fourth replicate in RNA-Seq to increase the data robustness as the RNA-Seq is used for normalization of FP to calculate the TE, which was our main analyzed metrics in this article. We had the option to add the fourth replicate as we originally prepared five biological replicates for all samples, but after performing the control experiments, we selected only the 3 best replicates for the Ribo-Seq library preparation and sequencing.  

      (9) Please, add another sheet in Table S2 with the names of all genes that change only at the translation (RPF) levels.

      As requested, we have added three extra sheets (one for each downregulation) for differential FP with Padjusted <0.05 in the Spreadsheet S2. We also provide a complete unfiltered differential expression data (sheet named “all data”), so that readers can filter out any relevant data based on their interest.

      (10) Page 5, bottom: ' ...we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules...'. This is not true for eIF3d, as shown in Fig1B and mentioned in Results.

      This reviewer is correct. By this generalized statement, we were trying to summarize our previous results from Wagner et al., 2014, PMID: 24912683; Wagner et al.,2016, PMID: 27924037 and Herrmannova et al.,2020, PMID: 31863585. The eIF3d downregulation is the only exception that does not affect expression of any other eIF3 subunit. Therefore, we have rewritten this paragraph accordingly: “We recently reported a comprehensive in vivo analysis of the modular dynamics of the human eIF3 complex (Wagner et al, 2020; Wagner et al, 2014; Wagner et al., 2016). Using a systematic individual downregulation strategy, we showed that the expression of all 12 eIF3 subunits is interconnected such that perturbance of the expression of one subunit results in the down-regulation of entire modules leading to the formation of partial eIF3 subcomplexes with limited functionality (Herrmannova et al, 2020). eIF3d is the only exception in this respect, as its downregulation does not influence expression of any other eIF3 subunit.”

      (11) Page 10, bottom: ' The PCA plot and hierarchical clustering... These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d.' This is already obvious in the polysome profiles of Figure S2C.

      We agree that this result is surely not surprising given the polysome profile and growth phenotype analyses of eIF3hKD. But still, we think that the PCA plot and hierarchical clustering results represent valuable controls. Nonetheless, we rephrased this section to note that this result agrees with the polysome profiles analysis: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: Ribo-Seq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      (12) Page 12: ' As for the eIF3dKD "unique upregulated" DTEGs, we identified one interesting and unique KEGG pathway, the ABC transporters (Supplementary Figure 5A, in green).' This sentence is confusing, as there are more pathways that are significant in this group, so it is unclear why the authors consider it 'unique'.

      The eIF3dKD “unique upregulated” group comprises genes with increased TE only in eIF3dKD but not in eIF3eKD or eIF3hKD (500 genes, Fig 2G). All these 500 genes were examined for enrichment in the KEGG pathways, and the top 10 significant pathways were reported (Fig S6A). However, 8 out of these 10 pathways were also significantly enriched in other gene groups examined (e.g. eIF3d/eIF3e common). Therefore, the two remaining pathways (“ABC transporters” and “Other types of O-glycan biosynthesis”) are truly unique for eIF3dKD. We wanted to highlight the ABC transporters group in particular because we find it rather interesting (for the reasons mentioned in the article). We have corrected the sentence in question to avoid confusion: “Among the eIF3dKD “unique upregulated” DTEGs, we identified one interesting KEGG pathway, the ABC transporters, which did not show up in other gene groups (Supplementary Figure 6A, in green). A total of 12 different ABC transporters had elevated TE (9 of them are unique to eIF3dKD, while 3 were also found in eIF3eKD), 6 of which (ABCC1-5, ABCC10) belong to the C subfamily, known to confer multidrug resistance with alternative designation as multidrug resistance protein (MRP1-5, MRP7) (Sodani et al, 2012).

      Interestingly, all six of these ABCC transporters were upregulated solely at the translational level (Supplementary Spreadsheet S2).”    

      (13) Note typo ('Various') in Figure 4A.

      Corrected

      (14) The introduction could be shortened.

      This is a very subjective requirement. In fact, when this manuscript was reviewed in NAR, we were asked by two reviewers to expand it substantially. Because a number of various research topics come together in this work, e.g. translational regulation, the eIF3 structure and function, MAPK/ERK signaling, we are convinced that all of them demand a comprehensive introduction for non-experts in each of these topics. Therefore, with all due respect to this reviewer, we did not ultimately shorten it.

      Reviewer #2 (Recommendations For The Authors):

      - In Figure 2, it would be useful to know why eIF3d is destabilized by eIF3e knockdown - is it protein degradation and why do the eIF3d/e knockdowns not more completely phenocopy each other when there is the same reduction to eIF3d as in the eIF3d knockdown sample?

      Yes, we do think that protein degradation lies behind the eIF3d destabilization in the eIF3eKD, but we have not yet directly demonstrated this. However, we have shown that eIF3d mRNA levels are not altered in eIF3eKD and that Ribo-Seq data indicate no change in TE or FP for eIF3d-encoding mRNA in eIF3eKD. Nonetheless, it is important to note (and we discuss it in the article) that eIF3d levels in eIF3dKD are lower than eIF3d levels in eIF3eKD (please see Supplementary Figure 1C). In fact, we believe that this is one of the main reasons for the eIF3d/e knockdowns differences.

      - The western blots in Figures 4 and 6 show modest changes to target protein levels and would be strengthened by quantification.

      We have added the quantifications as requested by this reviewer and the reviewer 3.

      - For Figure 4, this figure would be strengthened by experiments showing if the increase in ribosomal protein levels is correlated with actual changes to ribosome biogenesis.

      As suggested, we performed polysome profiling in the presence of EDTA to monitor changes in the 60S/40S ratio, indicating a potential imbalance in the biogenesis of individual ribosome subunits. We found that it was not affected (Figure 3G). In addition, we performed the same experiment, normalizing all samples to the same number of cells (cells were carefully counted before lysis). In this way, we confirmed that eIF3dKD and eIF3eKD cells indeed contain a significantly increased number of ribosomes, in agreement with the western blot analysis (Figure 3H).

      - In Figure 6, there needs to be a nuclear loading control.

      This experiment was repeated with Lamin B1 used as a nuclear loading control – it is now shown as Fig. 5F.

      - For Figure 8, these findings would be strengthened using luciferase reporter assays where the various RNA determinants are experimentally tested. Similarly, 5′ TOP RNA reporters would have been appreciated in Figure 4.

      This is indeed a logical continuation of our work, which represents the current work in progress of one of the PhD students. We apologize, but we consider this time- and resource-demanding analysis out of scope of this article.

      Reviewer #3 (Recommendations For The Authors):

      (1) Within the many effects observed, it is mentioned that eIF3d is known to be overexpressed while eIF3e is underexpressed in many cancers, but knockdown of either subunit decreases MDM2 levels, which would be expected to increase P53 activity and decrease tumor cell transformation. In contrast, they also report that 3e/3d knockdown dramatically increases levels of cJUN, presumably due to increased MAPK activity, and is expected to increase protumor gene expression. Additional discussion is needed to clarify the significance of the findings, which are a bit confusing.

      This is indeed true. However, considering the complexity of eIF3, the largest initiation factor among all, as well as the broad portfolio of its functions, it is perhaps not so surprising that the observed effects are complex and may seem even contradictory in respect to cancer. To acknowledge that, we expanded the corresponding part of discussion as follows: “Here, we demonstrate that alterations in the eIF3 subunit stoichiometry and/or eIF3 subcomplexes have distinct effects on the translatome; for example, they affect factors that play a prominent (either positive or negative) role in cancer biology (e.g., MDM2 and cJUN), but the resulting impact is unclear so far. Considering the complex interactions between these factors as well as the complexity of the eIF3 complex per se, future studies are required to delineate the specific oncogenic and tumor suppressive pathways that play a predominant role in mediating the effects of perturbations in the eIF3 complex in the context of neoplasia.”

      (2) There are places in the text where the authors refer to changes in transcriptional control when RNA levels differ, but transcription versus RNA turnover wasn't tested, e.g. page 16 and Figure S10, qPCR does not confirm "transcriptional upregulation in all three knockdowns" and page 19 "despite apparent compensatory mechanisms that increase their transcription."

      This is indeed true, the sentences in question were corrected. The term “increased mRNA levels” was used instead of transcriptional upregulation (increased mRNA stabilization is also possible).

      (3) Similarly, the authors suggest that steady-state LARP1 protein levels are unaffected based on ribosome footprint counts (page 21). It is incorrect to assume this, because ribosome footprints can be elevated due to stalling on RNA that isn't being translated and doesn't yield more protein, and because levels of translated RNA/synthesized proteins do not always reflect steady-state protein levels, especially in mutants that could affect lysosome levels and protein turnover. Also page 12, 1st paragraph suggests protein production is down when ribosome footprints are changed.

      Yes, we are well-aware of this known limitation of Ribo-seq analysis. Therefore, the steadystate protein levels of our key hits were verified by western blotting. In addition, we have removed the sentence about LARP1 because it was based on Ribo-Seq data only without experimental evaluation of the steady-state LARP1 protein levels.

      (4) The translation buffering effect is not clear in some Figures, e.g. S6, S8, 8A, and B. The authors show a scheme for translationally buffered RNAs being clustered in the upper right and lower left quadrants in S4H (translation up with transcript level down and v.v.), but in the FP versus RNA plots, the non-TOP RNAs and 4E-P-regulated RNAs don't show this behavior, and appear to show a similar distribution to the global changes. Some of the right panels in these figures show modest shifts, but it's not clear how these were determined to be significant. More information is needed to clarify, or a different presentation, such as displaying the RNA subsets in the left panels with heat map coloring to reveal whether RNAs show the buffered translation pattern defined in purple in Figure S4H, or by reporting a statistical parameter or number of RNAs that show behavior out of total for significance. Currently the conclusion that these RNAs are translationally buffered seems subjective since there are clearly many RNAs that don't show changes, or show translation-only or RNA-only changes.

      We would like to clarify that S4H does not indicate a necessity for changes in FPs in the buffered subsets. Although opposing changes in total mRNA and FPs are classified as buffering, often we also consider the scenario where there are changes to the total mRNA levels not accompanied by changes in ribosome association.

      In figure S6, the scatterplots indicate a high density of genes shifted towards negative fold changes on the x-axis (total mRNA). This is also reflected in the empirical cumulative distribution functions (ecdfs) for the log2 fold changes in total mRNA in the far right panels of A and B, and the lack of changes in log2 fold change for FPs (middle panels). Similarly, in figure S8, the scatterplots indicate a density of genes shifted towards positive fold changes on the x-axis for total mRNA. The ecdfs also demonstrate that there is a significant directional shift in log2 fold changes in the total mRNA that is not present to a similar degree in the FPs, consistent with translational offsetting. It is rightly pointed out that not all genes in these sets follow the same pattern of regulation. We have revised the title of Supplementary Figure S6 (now S7) to reflect this. However, we would like to emphasize that these figures are not intended to communicate that all genes within these sets of interest are regulated in the same manner, but rather that when considered as a whole, the predominant effect seen is that of translational offsetting (directional shifts in the log2 fold change distribution of total mRNA that are not accompanied by similar shifts in FP mRNA log2 fold changes).

      The significance of these differences was determined by comparing the ecdfs of the log2 fold changes for the genes belonging to a particular set (e.g. non-TOP mTOR-sensitive, p-eIF4E-sensitive) against all other expressed genes (background) using a Wilcoxan rank sum test. This allows identification of significant shifts in the distributions that have a clear directionality (if there is an overall increase, or decrease in fold changes of FPs or total mRNA compared to background). If log2 fold changes are different from background, but without a clear directionality (equally likely to be increased or decreased), the test will not yield a significant result. This approach allows assessment of the overall behavior of gene signatures within a given dataset in a manner that is completely threshold-independent, such that it does not rely on classification of genes into different regulatory categories (translation only, buffering, etc.) based on significance or fold-change cut-offs (as in S4H). Therefore, we believe that this unbiased approach is well-suited for identifying cases when there are many genes that follow similar patterns of regulation within a given dataset.

      (5) Page 10-"These results suggest that eIF3h depletion impacts the translatome differentially than depletion of eIF3e or eIF3d" ...These results suggest that eIF3h has less impact on the translatome, not that it does so differently. If it were changing translation by a different mechanism, I would not expect it to cluster with control.

      This sentence was rewritten as follows: “The PCA plot and hierarchical clustering (Figure 2A and Supplementary Figure 4A) showed clustering of the samples into two main groups: RiboSeq and RNA-seq, and also into two subgroups; NT and eIF3hKD samples clustered on one side and eIF3eKD and eIF3dKD samples on the other. These results suggest that the eIF3h depletion has a much milder impact on the translatome than depletion of eIF3e or eIF3d, which agrees with the growth phenotype and polysome profile analyses (Supplementary Figure 1A and 1D).”

      Other minor issues:

      (1) There are some typos: Figure 2 leves, Figure 4 variou,

      Corrected.

      (2) Figure 3, font for genes on volcano plot too small

      Yes, maybe, however the resolution of this image is high enough to enlarge a certain part of it at will. In our opinion, a larger font would take up too much space, which would reduce the informativeness of this graph.

      (3) Figure S5, highlighting isn't defined.

      The figure legend for S5A (now S6A) states: “Less significant terms ranking 11 and below are in grey. Terms specifically discussed in the main text are highlighted in green.” Perhaps it was overlooked by this reviewer.

      (4) At several points the authors refer to "the MAPK signaling pathway", suggesting there is a single MAPK that is affected, e.g in the title, page 3, and other places when it seems they mean "MAPK signaling pathways" since several MAPK pathways appear to be affected.

      We apologize for any terminological inaccuracies. There are indeed several MAPK pathways operating in cells. In our study, we focused mainly on the MAPK/ERK pathway. The confusion probably stems from the fact that the corresponding term in the KEGG pathway database is labeled "MAPK signaling pathway" and this term, although singular, includes all MAPK pathways. We have carefully reviewed the entire article and have corrected the term used accordingly to either: 1) MAPK pathways in general, 2) the MAPK/ERK pathway for this particular pathway, or 3) "MAPK signaling pathway", where the KEGG term is meant.

      (5) Some eIF3 subunit RNAs have TOP motifs. One might expect 3e and 3h levels to change as a function of 3d knockdown due to TOP motifs but this is not observed. Can the authors speculate why the eIF3 subunit levels don't change but other TOP RNAs show TE changes? Is this true for other translation factors, or just for eIF3, or just for these subunits? Could the Western blot be out of linear range for the antibody or is there feedback affecting eIF3 levels differently than the other TOP RNAs, or a protein turnover mechanism to maintain eIF3 levels?

      This is indeed a very interesting question. In addition to the mRNAs encoding ribosomal proteins, we examined all TOP mRNAs and added an additional sheet to the S2 supplemental spreadsheet with all TOP RNAs listed in (Philippe et al., 2020, PMID: 32094190). According to our Ribo-Seq data, we could expect to see increased protein levels of eIF3a and eIF3f in eIF3dKD and eIF3eKD, but this is not the case, as judged from extensive western blot analysis performed in (Wagner et. al 2016, PMID: 27924037). Indeed, we cannot rule out the involvement of a compensatory mechanism monitoring and maintaining the levels of eIF3 subunits at steady-state – increasing or decreasing them if necessary, which could depend on the TOP motif-mediated regulation. However, we think that in our KDs, all non-targeted subunits that lose their direct binding partner in eIF3 due to siRNA treatment become rapidly degraded. For example, co-downregulation of subunits d, k and l in eIF3eKD is very likely caused by protein degradation as a result of a loss of their direct binding partner – eIF3e. Since we showed that the yeast eIF3 complex assembles co-translationally (Wagner et. al 2020, PMID: 32589964), and there is no reason to think that mammalian eIF3 differs in this regard, our working hypothesis is that free subunits that are not promptly incorporated into the eIF3 complex are rapidly degraded, and the presence or absence of the TOP motif in the 5’ UTR of their mRNAs has no effect. As for the other TOP mRNAs, translation factors eEF1B2, eEF1D, eEF1G, eEF2 have significantly increased FPs in both eIF3dKD and eIF3eKD, but we did not check their protein levels by western blotting to conclude anything specific.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      This study delineates an important set of uninjured and injured periosteal snRNAseq data that provides an overview of periosteal cell responses to fracture healing. The authors also took additional steps to validate some of the findings using immunohistochemistry and transplantation assays. This study will provide a valuable publicly accessible dataset to reexamine the expression of the reported periosteal stem and progenitor cell markers.

      Strengths: 

      (1) This is the first single-nuclei atlas of periosteal cells that are obtained without enzymatic cell dissociation or targeted cell purification by FACS. This integrated snRNAseq dataset will provide additional opportunities for the community to revisit the expression of many periosteal cell markers that have been reported to date.

      (2) The authors delved further into the dataset using cutting-edge algorithms, including CytoTrace, SCENIC, Monocle, STRING, and CellChat, to define the potential roles of identified cell populations in the context of fracture healing. These additional computation analyses generate many new hypotheses regarding periosteal cell reactions.

      (3) The authors also sought to validate some of the computational findings using immunohistochemistry and transplantation assays to support the conclusion.

      Weaknesses: 

      (1) The current snRNAseq datasets contain only a small number of nuclei (1,189 nuclei at day 0, 6,213 nuclei on day 0-7 combined). It is unclear if the number is sufficient to discern subtle biological processes such as stem cell differentiation. 

      We analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations, revealing the diversity of cell populations in uninjured periosteum and post-injury, including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more in-depth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cells that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform sc/snRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in (Debnath et al. 2018), 300 in (Ambrosi et al. 2021), around 175 in (Remark et al. 2023).)

      (2) The authors' designation of Sca1+CD34+ cells as SSPCs is not sufficiently supported by experimental evidence. It will be essential to demonstrate stem/progenitor properties of Sca1+CD34+ cells using independent biological approaches such as CFU-F assays. In addition, the putative lineage trajectory of SSPCs toward IIFCs, osteoblasts, and chondrocytes remains highly speculative without concrete supporting data. 

      We performed additional analyses to further support that Sca1+ SSPCs display stem/progenitor properties. We performed CFU assays with Prx1-GFP+ SCA1+ and Prx1-GFP+ SCA1- periosteal cells (Figure 2F-G). We showed that Prx1-GFP+ SCA1+ display significant increased CFU potential compared to Prx1-GFP+ SCA1- cells. In addition, we isolated and transplanted Prx1-GFP+ Sca1+ and Prx1-GFP+ Sca1- periosteal cells at the fracture site of wild-type mice (Figure 2H). Only Sca1+ cells contributed to the callus formation, reinforcing that Sca1+ cells are the SSPC population mediating bone repair. 

      The differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses all point to Sca1+ cells as the SSPC population (Fig 2EG).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ pSSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fibrogenic fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supplementary figure 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs isolated from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrated the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      (3) The designation of POSTN+ clusters as injury-induced fibrogenic cells (IIFCs) is not fully supported by the presented data. The authors' snRNAseq datasets (Figure 1d) demonstrate that there are many POSTN+ cells prior to injury, indicating that POSTN+ cells are not specifically induced in response to injury. It has been widely recognized that POSTN is expressed in the periosteum without fracture. This raises a possibility that the main responder of fracture healing is POSTN+ cells, not SSPCs as they postulate. The authors cannot exclude the possibility that Sca1+CD34+ cells are mere bystanders and do not participate in fracture healing. 

      IIFCs are a population of cells that express high levels of ECM related genes, including Postn, Aspn and collagens. We did not claim that Postn expression is specific to IIFCs. While Postn is detected in the uninjured periosteum, snRNAseq analyses and RNAscope experiments showed that the expression of Postn is limited to a small number of cells in the cambium layer of the periosteum (Fig 4B , Figure 4 – Supplementary figure 1B). These Postn-expressing cells in the uninjured periosteum are not SSPCs, as they do not co-express/co-localize with Pi16+ and Sca1+ cells detected in the fibrous layer (Fig4, Figure 4– Supplementary figure 1A, Figure 6-Supplementary figure 1). These Postn-expressing cells are undergoing osteogenic differentiation as shown by the correlation between Runx2 and Postn expression (Fig. 4 – Supplementary Figure 1C). After fracture, we observed a strong increase in ECM-related gene expression and specifically in the IIFC population. We now show the strong increase of Postn expression after injury (Fig. 4 – Supplementary Figure 1D-E, Figure 6-Supplementary figure 1E). 

      As mentioned in our response above, we now show that SCA1+ cells form cartilage and bone after fracture, while SCA1- cells (including the POSTN+ population) from the uninjured periosteum did not contribute. These data reveal that Sca1+ CD34+ cells are the main SSPC population mediating bone healing and that POSTN+ IIFCs are a transient stage of SSPC differentiation. We added the following text to the result section: “Pi16-expressing SSPCs are located within the fibrous layer, while we observed few POSTN+ cells in the cambium layer (Fig. 4 – Supplementary Fig. 1A). Postn expression is weak in uninjured periosteum and is limited to differentiating cells. Postn expression is strongly increased in response to fracture, specifically in IIFCs (Fig. 4 – Supplementary Fig. 1B-E). “

      (4) Detailed spatial organization of Sca1+CD34+ cells and POSTN+ cells in the uninjured periosteum with respect to the cambium layer and the fibrous layer is not demonstrated. 

      We performed RNAscope experiments to locate Pi16-expressing and Postn-expressing cells in the uninjured periosteum. We observed that Pi16-expressing cells are in the external fibrous layer of the periosteum while Postn-expressing cells are located along the cortex in the cambium layer. The data are added in Fig 4B and Fig. 4- Supplementary Figure 1 and mentioned in the result section “Pi16-expressing SSPCs were located within the fibrous layer, while Postn-expressing cells were found in the cambium layer and corresponded to Runx2-expressing osteogenic cells (Fig. 4 – Supplementary Fig. 1A-C).”.

      (5) Interpretation of transplantation experiments in Figure 5 is not straightforward, as the authors did not demonstrate the purity of Prx1Cre-GFP+SCA1+ cells and Prx1Cre-GFP+CD146- cells to pSSPCs and IIFCs, respectively. It is possible that these populations contain much broader cell types beyond SSPCs or IIFCs.  

      We agree with the reviewer that our methodology for cell transplantation required more justification and validation. We decided to use a transgenic mouse line to be able to trace the cells in vivo after grafting. Prx1 marks limb mesenchyme during development and the Prx1Cre mouse model allows to label all SSPCs contributing to callus formation. Therefore, we used Prx1Cre, R26mTmG mice as donors for SSPCs and IIFCs isolation (Duchamp de Lageneste et al. 2018; Logan et al. 2002). Prx1 does not mark immune and endothelial cells but can label pericytes and fibroblastic populations (Duchamp de Lageneste et al. 2018; Logan et al. 2002; Julien et al. 2021). In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells (Fig 3-Supplementary figure 2, Fig 6-Supplementary figure 1). We sorted GFP+ Sca1+ cells from uninjured periosteum of Prx1Cre, R26mTmG mice to isolate only SSPCs and excluding endothelial cells and pericytes. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detected IIFCs but no SSPCs, chondrocytes or osteoblasts at this stage of repair. To eliminate Prx1-derived pericytes, we sorted GFP+CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 post-fracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text: “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      Reviewer #2 (Public Review):

      Summary: 

      The authors described cell type mapping was conducted for both WT and fracture types. Through this, unique cell populations specific to fracture conditions were identified. To determine these, the most undifferentiated cells were initially targeted using stemness-related markers and CytoTrace scoring. This led to the identification of SSPC differentiating into fibroblasts. It was observed that the fibroblast cell type significantly increased under fracture conditions, followed by subsequent increases in chondrocytes and osteoblasts.

      Strengths: 

      This study presented the injury-induced fibrogenic cell (IIFC) as a characteristic cell type appearing in the bone regeneration process and proposed that the IIFC is a progenitor undergoing osteochondrogenic differentiation. 

      Weaknesses: 

      This study endeavored to elucidate the role of IIFC through snRNAseq analysis and in vivo observation. However, such validation alone is insufficient to confirm that IIFC is an osteochondrogenic progenitor, and additional data presentation is required.  

      As mentioned in the response to Reviewer 1, the differentiation trajectory of SSPCs presented in our study is supported by a combination of bioinformatic analyses and in vivo validation:

      - snRNAseq allowed us to identify the different populations in the uninjured periosteum. In silico, in vitro and in vivo analyses altogether showed that Sca1+ cells are the SSPC population (Fig 2E-G).

      - At day 3 post-fracture, we did not detect Sca1+ cells in the callus (Fig 4 – Supplementary figure 2). Instead, we observed the appearance of a new population, IIFCs. This population clustered along SSPCs and pseudotime analyses indicate that SSPCs can differentiate into IIFCs (Fig 5B). We confirmed the ability of Sca1+ SSPCs to form IIFCs, by grafting them in the fracture callus and assessing their fate at day 5 post-fracture (Fig 6B).

      - In silico, we observed that IIFCs clustered along osteogenic and chondrogenic cells. The pseudotime trajectory suggests that IIFCs can differentiate into both lineages (Fig 5B-C). This is coherent with the progressive expression of osteochondrogenic genes observed in IIFCs (Fig 5C, Fig 8A, C, E). In vivo, we observed the progressive expression of Runx2 and Sox9 by IIFCs undergoing differentiation (Fig 6A). We now show that IIFCs are not undergoing apoptosis, indicating that these cells further differentiate (Fig 7 – Supp 2). To functionally assess the osteochondrogenic potential of IIFCs, we used transplantation assay and showed that Prx1-GFP+ IIFCs from day 3 post-fracture form cartilage and bone when transplanted at the fracture site of wild-type mice (Fig 6C). 

      We would like to insist on the robustness of the bioinformatic analyses performed in our study. First, we used datasets from different time points post-fracture to capture the true temporal progression of cell populations in the fracture callus. We used a large combination of tools shown to be reliable in many studies (Julien et al. 2021; Matsushita et al. 2020; Debnath et al. 2018; Baccin et al. 2020; Junyue Cao et al. 2019; Zhong et al. 2020), and all tools converge in the same trajectory. To further show the relevance of pseudotime in our model, we illustrate the distribution of the cell populations by time point (Fig. 5D). We can observe a parallel between the time points and the pseudotime, reinforcing that the pseudotime trajectory reflects the timing of SSPC differentiation. Overall, the combined in silico, in vitro and in vivo analyses strongly support that Sca1+ Pi16+ cells are the periosteal SSPC population, specifically represented in the uninjured dataset. In response to bone fracture, these SSPCs give rise to IIFCs that are specifically represented in the intermediate stages (days 3 and 5) prior to osteochondrogenic differentiation.

      We made the following changes in the text:

      - Line 81-87: “We performed in vitro CFU assays with sorted GFP+SCA1+  and GFP+SCA1- cells isolated from the periosteum of Prx1Cre; R26mTmG mice, as Prx1 labels all SSPCs contributing to the callus formation1. Prx1-GFP+ SCA1+ showed increased CFU potential, confirming their stem/progenitor property (Fig 2F-G).  Then, we grafted Prx1GFP+ SCA1+ et Prx1-GFP+ SCA1- periosteal cells at the fracture site of wild-type mice. Only SCA1+ cells formed cartilage and bone after fracture indicating that SCA1+ cells correspond to periosteal SSPCs with osteochondrogenic potential (Fig 2H).”

      - Line 120-122: “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2).”

      - Line 170-172: “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).”

      - Line 277-278: “Following this unique fibrogenic step, IIFCs do not undergo cell death but undergo either osteogenesis or chondrogenesis”

      - Line 281-283: “During bone repair, this initial fibrogenic process is an integral part of the SSPC differentiation process, and a transitional step prior to osteogenesis and chondrogenesis.”

      Reviewer #3 (Public Review): 

      In this manuscript, the authors explored the transcriptional heterogeneity of the periosteum with single nuclei RNA sequencing. Without prior enrichment of specific populations, this dataset serves as an unbiased representation of the cellular components potentially relevant to bone regeneration. By describing single-cell cluster profiles, the authors characterized over 10 different populations in combined steady state and post-fracture periosteum, including stem cells (SSPC), fibroblast, osteoblast, chondrocyte, immune cells, and so on. Specifically, a developmental trajectory was computationally inferred using the continuum of gene expression to connect SSPC, injury-induced fibrogenic cells (IIFC), chondrocyte, and osteoblast, showcasing the bipotentials of periosteal SSPCs during injury repair. Additional computational pipelines were performed to describe the possible gene regulatory network and the expected pathways involved in bone regeneration. Overall, the authors provided valuable insights into the cell state transitions during bone repair and proposed sets of genes with possible involvements in injury response. 

      While the highlights of the manuscript are the unbiased characterization of periosteal composition, and the trajectory of SSPC response in bone fracture response, many of the conclusions can be more strongly supported with additional clarifications or extensions of the analysis.  

      (1) As described in the method section, both the steady-state data and full dataset underwent integration before dimensional reduction and clustering. It would be appreciated if the authors could compare the post-integration landscapes of uninjured cells between steady state and full dataset analysis. Specifically, fibroblasts were shown in Figure 1C and 1E, and such annotations did not exist in Figure 2B. Will it be possible that the original 'fibroblasts' were part of the IIFC population? 

      As suggested, we now identified the fibroblast population from the uninjured periosteum in the integration of datasets from all time points (Figure 5B and Fig. 5 – Supplementary Figure 2). We identified 4 fibroblast populations in the uninjured periosteum: Luzp2+, Cldn1+, Hsd11b1+ and Csmd1+ fibroblasts. Luzp2+ and Cldn1+ fibroblasts are clustering distinctly from the other populations in the integrated dataset. Hsd11b1+ fibroblasts blend with SSPCs and IIFCs in the integrated dataset probably due to the low cell number. Finally, Csmd1+ fibroblasts are clustering at the interface between SSPCs and IIFCs likely because they correspond to differentiating cells both in the uninjured periosteum and in response to fracture. We modified the resolution of clustering in our subset dataset, in order to represent Luzp2+ and Cldn1+ fibroblasts as an isolated cluster (Figure 5B, cluster 10). In addition, both pseudotime (Fig. 5B) and gene regulatory network analyses (Fig. 7D), show that the fibroblast populations are distinct from the activation trajectory of SSPCs. We added the following sentence to the text “Fibroblasts from uninjured periosteum (Hsd11b1+, Cldn1+ and Luzp2+ cells corresponding to cluster 10 of Fig. 5B) clustered separately from the other populations, suggesting the absence of their contribution to bone healing.”

      (2) According to Figure 2, immune cells were taking a significant abundance within the dataset, specifically during days 3 & 5 post-fracture. It will be interesting to see the potential roles that immune cells play during bone repair. For example, what are the biological annotations of the immune clusters (B, T, NK, myeloid cells)? Are there any inflammatory genes or related signals unregulated in these immune cells? Do they interact with SSPC or IIFC during the transition?   

      In this manuscript, we report the overall dataset and focused our analyses on the response of SSPCs to injury and their differentiation trajectories. We did not include detailed analyses of the immune cell populations, that are out of scope of this manuscript and are part of another study (Hachemi et al, biorxiv, 2024)

      (3) The conclusion of Notch and Wnt signaling in IIFC transition was not sufficiently supported by the analysis presented in the manuscript, which was based on computational inferences. It will be great to add in references supporting these claims or provide experimental validations examining selected members of these pathways.

      The role of Wnt and Notch in bone repair has been widely studied and both signaling pathways are known to be regulators of SSPCs differentiation (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017; Matsushita et al. 2020; Steven Minear et al. 2010; Steve Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010). It was previously shown that Notch inactivation at early stages of repair leads to bone non-union while Notch inactivation in chondrocytes and osteoblasts does not significantly affect healing, confirming its role in SSPC differentiation before osteochondral commitment (Wang et al. 2016). Wnt was shown to be a critical driver of osteogenesis (Matsushita et al. 2020; Steve Minear et al. 2010; Steven Minear et al. 2010; Kang et al. 2007; Komatsu et al. 2010), as Wnt inhibition alters bone formation and Wnt overactivation increases bone formation (Pinzone et al. 2009; Balemans et Van Hul 2007). The role of Wnt is specific to osteogenic engagement as Wnt inhibition promotes chondrogenesis (Hsieh et al. 2023; C.-L. Wu et al. 2021; Ruscitto et al. 2023). A study by Lee et al. recently confirmed the successive activation and crosstalk of Notch and Wnt pathways during osteogenic differentiation of SSPCs during bone healing (Lee et al. 2021). They showed a peak of Notch activation at day 3 post-injury followed by a progressive decrease that parallels an increase of Wnt signaling inducing osteogenic differentiation. These studies correlate with the sequential activation of Notch and Wnt observed in our snRNAseq analyses. Our analyses now reveal how this sequential activation of Notch and Wnt relates to the fibrogenic and osteogenic phase of SSPC differentiation respectively. We clarified this in the discussion and added the references above to support our claims. 

      Recommendations for the authors: 

      Reviewer #1 (Recommendations For The Authors): 

      (1) The manuscript is well-written overall. However, the authors often oversimplify outcomes and overstate the results. Some of the statements (delineated below) need to be recalibrated to be in line with the presented data. 

      In addition to the suggested conclusions, we also toned down the following ones to avoid overstating our results :

      Line 24: suggesting a crucial paracrine role of this transient IIFC population

      Line 227: suggesting their central role in mediating cell interactions after fracture

      line 243: IIFCs produce paracrine factors that can regulate SSPCs

      - Line 77 (86): The authors should add "might" before "correspond to". 

      We provided new sets of data including CFU experiments and transplantation assay to reinforce our conclusion. We replaced “correspond to” by “encompass”

      - Line 102: SSPCs are obviously not "absent" in day 3 snRNAseq (Figure 2d). The percentage dropped (only) 75%, according to Figure 2e, which is far from disappearance. Overall, immunohistochemical staining is often dichotomous with snRNAseq designations. The authors should more carefully describe the results. 

      We agree that this comment may not reflect the data shown as we observe a strong decrease in the percentage of cells in SSPC clusters, but still detect few cells in the SSPC clusters. However, when we looked at the presence of Sca1+ Pi16+ cells at different time points, we confirmed the absence of cells expressing SSPC signature genes (Sca1, Pi16, Cd34) at day 3 injury. Due to the clustering resolution of the combined integration, some cells in the SSPC clusters might not be Sca1+ Pi16+. We now show these results in Fig. 4 – Supplementary Figure 2. We changed the text accordingly (line 120): “We did not detect Pi16-expressing SPPCs, consistent with the absence of cells expressing SSPC markers in the day 3 snRNAseq dataset compared to uninjured periosteum (Fig. 4 – Supplementary Figure 2)”.

      - Line 134: The authors need to clearly state that GFP+IIFCs were isolated based on Prx1CreGFP+CD146-. The authors did not clearly demonstrate the relationship between POSTN+ cells and CD146- cells, which poses concerns about the interpretation of transplantation experiments. 

      As mentioned above in response to reviewer 1-public review, we have clarified and provided additional information on our strategy to isolate SSPCs and IIFCs. We used the Prx1Cre; R26mTmG mice to mark all SSPCs and their derivatives with the GFP reporter in order to trace these populations after cell grafting. In the uninjured periosteum, Sca1 (Ly6a) is only expressed by SSPCs and endothelial cells. We sorted GFP+Sca1+ cells to exclude endothelial cells. For IIFCs, we isolated cells at day 3 post-fracture, as in our snRNAseq data, we detect IIFCs but no SSPCs, chondrocytes or osteoblasts at this time point. However, we also detected pericytes that can be Prx1-derived. To eliminate potential pericyte contamination, we sorted GFP+ CD146- cells, as CD146 is specifically expressed by pericytes. We added Figure 6-supplementary Figure 1 to better illustrate the expression of Prx1, SCA1 (Ly6a) and CD146 (Mcam) in the uninjured and day 3 post-fracture datasets. We further demonstrate the purity of SSPCs and IIFCs isolation by qPCR on sorted GFP+ Sca1+ cells from uninjured periosteum and GFP+ CD146- cells from day 3 postfracture periosteum and hematoma and confirmed the absence of contamination by other cell populations (Figure 6-Supplementary figure 1E). We made the following changes in the text (line 153): “To functionally validate the steps of pSSPC activation, we isolated SCA1+ GFP+ pSSPCs from Prx1Cre; R26mTmG mice, excluding endothelial cells, and grafted them at the fracture site of wild-type hosts” and “we isolated GFP+ CD146- from the fracture callus of Prx1Cre; R26mTmG mice at day 3 post fracture, that correspond to IIFCs without contamination by pericytes (CD146+ cells) (Fig. 6C, Figure 6 – Supplementary Fig.1).

      - Line 211: It is obvious from Figure 8F that ligand expression was not "specific" to the IIFC phase.

      The data only shows a slight enrichment of ligand score. 

      We corrected the text by “ligand expression was increased during the IIFC phase”.

      (2) Some of the computational predictions are incongruent with the known lineage trajectory. For example, in vivo lineage tracing experiments, including but not limited to, PLoS Genet. 2014. 10:e1004820, demonstrate that some of the chondrocytes within fracture callus can differentiate into osteoblasts. This is incompatible with the authors' conclusion that osteoblasts and chondrocytes represent two different terminal stages of cell differentiation in fracture healing. How do the authors reconcile this apparent inconsistency? 

      In this manuscript, we generated datasets corresponding to the initial stages of bone repair until day 7 post-injury. Therefore, our analyses encompass SSPC activation stages and engagement into osteogenesis and chondrogenesis. The results show that a portion of osteoblasts in the fracture callus are differentiating directly from IIFC via intramembranous ossification. The reviewer is correct to mention that osteoblasts have also been shown to derive from transdifferentiation of chondrocytes, which occurs at later stages of repair during the active phase of endochondral ossification (Julien et al. 2020; Aghajanian et Mohan 2018; Zhou et al. 2014; Hu et al. 2017). This process of chondrocyte to osteoblast transdifferentiation is not represented in our integrated dataset and may require adding later time points. However, when we analyzed the days 5 and 7 datasets independent of days 0 and 3, we were able to identify a cluster of hypertrophic chondrocytes (expressing Col10a1) connecting the clusters of chondrocytes and osteoblasts. This suggests that in this cluster, hypertrophic chondrocytes are undergoing transdifferentiation into osteoblasts as shown in the Author response image 1. Additional time points are needed in a future study to perform in depth analyses of chondrocyte transdifferentiation. 

      Author response image 1.

      Periosteum-derived chondrocytes undergo cartilage to bone transformation. A. UMAP projection of the subset of SSPCs, IIFCs, osteoblasts and chondrocytes in the integration of days 5 and 7 post-fracture datasets. B. Feature plots of Acan, Col10a1 and Ibsp expression.  C. UMAP projection separated by time points. D. Percentage of cells in the hypertrophic/differentiating chondrocyte cluster.

      (3) The authors did not cite some of the studies that described the roles of Notch signaling in fracture healing, for example, J Bone Miner Res. 2014. 29:1283-94. The authors should test the specificity of Notch signaling activities to IIFCs (POSTN+ cells) in vivo. 

      The role of Notch in the activation of SSPCs during bone repair has been investigated in several studies (Lee et al. 2021; Matthews et al. 2014; Novak et al. 2020; Wang et al. 2016; Kraus et al. 2022; Dishowitz et al. 2012; Junjie Cao et al. 2017). Notch dynamic was previously described with a peak at day 3 post-injury before a reduction when cells engage in osteogenesis and chondrogenesis (Lee et al. 2021; Dishowitz et al. 2012; Matthews et al. 2014). Notch plays a role in the early steps of SSPC activation prior to osteochondral differentiation as Notch inactivation in chondrocytes and osteoblasts does not affect bone repair (Wang et al. 2016). We added the references listed above to emphasize the correlation between our results and previous reports on the role of Notch and made changes in the discussion.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions 

      (1) This research utilized snRNA seq for the basic hypothesis formation; however, the number of nuclei acquired was quite limited. Therefore, please explain the rationale for employing snRNA seq instead of scRNA seq, which includes cytoplasm, and additionally provide the markers used for cell type mapping in the scRNA analysis.  

      As mentioned in our response to reviewer #1 above, we analyzed a total of 6,213 nuclei from uninjured periosteum and fracture calluses at 3 stages of bone healing. We were able to describe 11 distinct cell populations including rare cell types in the fracture environment such Schwann cells, adipocytes and pericytes. The number of nuclei was sufficient to perform extensive analysis using a combination of cutting-edge algorithms. We agree that more nuclei would allow more indepth analyses of cell fate transitions and rare populations, such as pericytes and Schwann cells. However, we concentrated here on SSPC/fibrogenic cell that are well represented in our dataset. Our study robustness is also reinforced by the analysis of 4 successive time points to define the SSPC/fibrogenic cell trajectories. Our validations using immunohistochemistry and transplantation assays also confirmed that our dataset is sufficient to define cell trajectories. There is no clear consensus on the number of cells needed to perform scRNAseq analyses, as it depends on the cell types analyzed and the fold changes in gene expression. Previously reported single cell datasets containing a lower number of cells reached major conclusions including SSPC identification, cell differentiation trajectories and differential gene expression (658 cells in(Debnath et al. 2018), 300 in (Ambrosi et al. 2021) around 175 in(Remark et al. 2023))

      Several studies have shown that snRNAseq provide data quality equivalent to scRNAseq in terms of cell type identification, number of detected genes and downstream analyses (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021). While, snRNAseq do not allow the detection of cytoplasm RNA, there is several advantages in using this technique: 

      (1) better representation of the cell types. To perform scRNAseq, a step of enzymatic digestion is needed. This usually leads to an overrepresentation of some cell types loosely attached to the ECM (immune cells, endothelial cells) and a reduced representation of cell types strongly attached to the ECM, such as chondrocytes and osteoblasts. In addition, large or multinucleated cells like hypertrophic chondrocytes and osteoclasts are too big to be sorted and encapsidated using 10X technology. Here, we optimized a protocol to mechanically isolate nuclei from dissected tissues that allows us to capture the diversity of cell types in periosteum and fracture callus.

      (2) higher recovery of nuclei. We performed both isolation of cells and nuclei from periosteum in our study and observed that nuclei extraction is the most efficient way to isolate cells from the periosteum and the fracture callus.

      (3) reduction of isolation time and cell stress. Previous studies showed that enzymatic digestion causes cell stress and induces stem cell activation (Machado et al. 2021; van den Brink et al. 2017). Therefore, we decided to perform snRNAseq to analyze the transcriptome of the intact periosteum without digestion induced-biais.

      We added this sentence in the result section: “Single nuclei transcriptomics was shown to provide results equivalent to single cell transcriptomics, but with better cell type representation and reduced digestion-induced stress response (Selewa et al. 2020; Wen et al. 2022; Ding et al. 2020; H. Wu et al. 2019; Machado et al. 2021)”.

      The list of genes used for cell type mapping are presented in Figure 3 – Supplementary figure 1. We added a detailed dot plot as Figure 3 – Supplementary figure 2.

      (2) During the fracture healing process of long bones, the influx of fibroblasts is a relatively common occurrence, and the fibrous callus that forms during bone repair and regeneration is reported to disappear over time. Therefore, inferring that IIFC differentiates into osteo- and chondrogenic cells based solely on their simultaneous appearance in the same time and space is challenging. More detailed validation is necessary, beyond what is supported by bioinformatics analysis. 

      The first step of bone repair is the formation of a fibrous callus, before cartilage and bone formation. There are no data in the literature demonstrating that an influx of fibroblasts occurs at the fracture site. Several studies now show that cells involved in callus formation are recruited locally (i.e. from the bone marrow, the periosteum and the skeletal muscle surrounding the fracture site) (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Julien et al. 2022; Matthews et al. 2021). The contribution of locally activated SSPCs to the fibrous callus is less well understood. Lineage tracing shows that GFP+ cell populations traced in Prx1Cre-GFP mice include SSPCs, IIFCs, chondrocytes and osteoblasts.

      The timing of the cell trajectories observed in our dataset correlates with the timing of callus formation previously described in the literature as the day 3 post-fracture mostly contains IIFCs while chondrocytes and osteoblasts appear from day 5 post-fracture. We conclude that IIFCs differentiate into osteochondrogenic cells based on multiple evidence beside the simultaneous appearance in time and space:

      - In silico trajectory analyses identify a trajectory from SSPCs to osteochondrogenic cells via IIFCs. We added an analysis to show that our pseudotime trajectory parallels the timepoints of the dataset, confirming that the differentiation trajectory follows the timing of cell differentiation (Figure 5D).

      - We show that IIFCs start to express chondrogenic and osteogenic genes prior to engaging into chondrogenesis and osteogenesis. In addition, we detected activation of osteo- and chondrogenic specific transcription factors in IIFCs. This shows a differentiation continuum between SSPCs, IIFCS, and osteochondrogenic cells (Figures 6-8).

      - Using transplantation assay, we showed that IIFCs form cartilage and bone, therefore reinforcing the osteochondrogenic potential of this population (Figure 6B).

      - IIFCs do not undergo apoptosis. We assessed the expression of apoptosis-related genes by IIFCs and did not detect expression. This was confirmed by cleaved caspase 3 immunostaining showing that a very low percentage of cells in the early fibrotic tissue undergo apoptosis. 

      Therefore, the idea that the initial fibrous callus is replaced by a new influx of SSPCs or committed progenitors is not supported by recent literature and is not observed in our dataset containing all cell types from the periosteum and fracture site. Overall, our bioinformatic analyses combined with our in vivo validation strongly support that IIFCs are differentiating into chondrocytes and osteoblasts during bone repair. Additional in vivo functional studies will aim to further validate the trajectory and investigate the critical factors regulating this process.

      (3) The influx of most osteogenic progenitors to the bone fracture site typically appears after postfracture day 7. It's essential to ascertain whether the osteogenic cells observed at the time of this study differentiated from IIFC or migrated from surrounding mesenchymal stem cells. 

      As mentioned above, there is not clear evidence in the literature indicating an influx of osteoprogenitors. Cells involved in callus formation are recruited locally and predominantly from the periosteum (Duchamp de Lageneste et al. 2018; Julien et al. 2021; Colnot 2009; Jeffery et al. 2022; Debnath et al. 2018; Matsushita et al. 2020; Matthews et al. 2021; Julien et al. 2022). Our datasets therefore include all cell populations that form the callus. Other sources of SSPCs include the surrounding muscle that contributes mostly to cartilage, and bone marrow that contributes to a low percentage of the callus osteoblasts in the medullary cavity (Julien et al. 2021; Jeffery et al. 2022). We provide evidence that IIFCs give rise to osteogenic cells using our bioinformatic analyses and in vivo transplantation assay (listed in the response above). As indicated in our response to reviewer #1, the steps leading to osteogenic differentiation observed in our dataset reflect the first step of callus ossification and correspond to the process of intramembranous ossification (up to day 7 post-injury). Endochondral ossification also contributes to osteoblasts including the transdifferentiation of chondrocytes into osteoblasts (Julien et al. 2020; Zhou et al. 2014; Hu et al. 2017). While this process mostly occurs around day 14 postfracture, we begin to detect this transition in our integrated day 5-day 7 dataset as shown in Author response image 1. 

      (4) It's crucial to determine whether the IIFC appearing at the fracture site contributes to the formation of the callus matrix or undergoes apoptosis during the fracture healing process. In the early steps of bone repair, the callus is mostly composed of an extracellular matrix (ECM). IIFCs are expressing high levels of ECM genes, including Postn, Aspn and collagens (Col3a1, Col5a1, Col8a1, Col12a1) (Figure 3 – Supplementary Figures 1-2 and Fig. 7 – Supplementary Figure 1B). IIFCs are the cells expressing the highest levels of matrix-related genes compared to the other cell types in the fracture environment (i.e. immune cells, endothelial cells, Schwann cells, pericytes, …) as shown now in Fig. 7 – Supplementary Figure 1A. Therefore, IIFCs are the main contributors to the callus matrix.

      We investigated if IIFCs undergo apoptosis. We observed that only a low percentage of IIFCs express apoptosis-related genes and are positive for cleaved caspase 3 immunostaining at days 3, 5 and 7 of bone repair. This shows that IIFCs do not undergo apoptosis and reinforces our model in which IIFCs further differentiate into osteoblasts and chondrocytes. We added these data in Fig. 7 – Supplementary Figure 2 and added the sentence in the results section “Only a small subset of IIFCs undergo apoptosis, further supporting that IIFCs are maintained in the fracture environment giving rise to osteoblasts and chondrocytes (Fig. 7 – Supplementary Figure 2).” 

      (5) Results from the snRNA seq highlight the paracrine role of IIFC, and verification is needed to ensure that the effect this has on surrounding osteogenic lineages is not misinterpreted.  

      To assess cell-cell interactions, we used tools such as Connectome and CellChat to infer and quantify intercellular communication networks between cell types. Studies showed the robustness of these tools combined with in vivo validation (Sinha et al. 2022; Alečković et al. 2022; Li et al. 2023). Here we used these tools to illustrate the paracrine profile of IIFCs, but in vivo validation would be required using gene inactivation to assess the requirement of individual paracrine factors. We performed extensive analyses of the crosstalk between immune cells and SSPCs using our dataset in another study combined with in vivo validation, showing the robustness of the tool and the dataset (Hachemi et al. 2024). We adjusted our conclusions to reflect our analyses: “suggesting a crucial paracrine role of this transient IIFC population during fracture healing”, “suggesting their central role in mediating cell interactions after fracture”, “suggesting that SSPCs can receive signals from IIFC”. 

      References

      Aghajanian, Patrick, et Subburaman Mohan. 2018. “The Art of Building Bone: Emerging Role of Chondrocyte-to-Osteoblast Transdifferentiation in Endochondral Ossification“. Bone Research 6 (1): 19. https://doi.org/10.1038/s41413-018-0021-z.

      Alečković, Maša, Simona Cristea, Carlos R. Gil Del Alcazar, Pengze Yan, Lina Ding, Ethan D. Krop, Nicholas W. Harper, et al. 2022. “Breast Cancer Prevention by Short-Term Inhibition of TGFβ Signaling“. Nature Communications 13 (1): 7558. https://doi.org/10.1038/s41467-02235043-5.

      Ambrosi, Thomas H., Owen Marecic, Adrian McArdle, Rahul Sinha, Gunsagar S. Gulati, Xinming Tong, Yuting Wang, et al. 2021. “Aged Skeletal Stem Cells Generate an Inflammatory Degenerative Niche”. Nature 597 (7875): 256‑62. https://doi.org/10.1038/s41586-021-03795-7.

      Baccin, Chiara, Jude Al-Sabah, Lars Velten, Patrick M. Helbling, Florian Grünschläger, Pablo Hernández-Malmierca, César Nombela-Arrieta, Lars M. Steinmetz, Andreas Trumpp, et Simon Haas. 2020. “Combined Single-Cell and Spatial Transcriptomics Reveal the Molecular, Cellular and Spatial Bone Marrow Niche Organization”. Nature Cell Biology 22 (1): 38‑48. https://doi.org/10.1038/s41556-019-0439-6.

      Balemans, Wendy, et Wim Van Hul. 2007. “The Genetics of Low-Density Lipoprotein ReceptorRelated Protein 5 in Bone: A Story of Extremes”. Endocrinology 148 (6): 2622‑29. https://doi.org/10.1210/en.2006-1352.

      Brink, Susanne C van den, Fanny Sage, Ábel Vértesy, Bastiaan Spanjaard, Josi Peterson-Maduro, Chloé S Baron, Catherine Robin, et Alexander van Oudenaarden. 2017. “Single-Cell Sequencing Reveals Dissociation-Induced Gene Expression in Tissue Subpopulations”. Nature Methods 14 (10): 935‑36. https://doi.org/10.1038/nmeth.4437.

      Cao, Junjie, Yalin Wei, Jing Lian, Lunyun Yang, Xiaoyan Zhang, Jiaying Xie, Qiang Liu, Jinyong Luo, Baicheng He, et Min Tang. 2017. ”Notch Signaling Pathway Promotes Osteogenic Differentiation of Mesenchymal Stem Cells by Enhancing BMP9/Smad Signaling”. International Journal of Molecular Medicine 40 (2): 378‑88. https://doi.org/10.3892/ijmm.2017.3037.

      Cao, Junyue, Malte Spielmann, Xiaojie Qiu, Xingfan Huang, Daniel M. Ibrahim, Andrew J. Hill, Fan Zhang, et al. 2019. ”The Single-Cell Transcriptional Landscape of Mammalian Organogenesis”. Nature 566 (7745): 496‑502. https://doi.org/10.1038/s41586-019-0969-x.

      Colnot, Céline. 2009. “Skeletal Cell Fate Decisions Within Periosteum and Bone Marrow During Bone Regeneration”. Journal of Bone and Mineral Research 24 (2): 274‑82. https://doi.org/10.1359/jbmr.081003.

      Debnath, Shawon, Alisha R. Yallowitz, Jason McCormick, Sarfaraz Lalani, Tuo Zhang, Ren Xu, Na Li, et al. 2018. “Discovery of a Periosteal Stem Cell Mediating Intramembranous Bone Formation”. Nature 562 (7725): 133‑39. https://doi.org/10.1038/s41586-018-0554-8.

      Ding, Jiarui, Xian Adiconis, Sean K. Simmons, Monika S. Kowalczyk, Cynthia C. Hession, Nemanja D. Marjanovic, Travis K. Hughes, et al. 2020. “Systematic Comparison of Single-Cell and Single-Nucleus RNA-Sequencing Methods”. Nature Biotechnology 38 (6): 737‑46.

      https://doi.org/10.1038/s41587-020-0465-8.

      Dishowitz, Michael I., Shawn P. Terkhorn, Sandra A. Bostic, et Kurt D. Hankenson. 2012. “Notch Signaling Components Are Upregulated during Both Endochondral and Intramembranous Bone Regeneration”. Journal of Orthopaedic Research 30 (2): 296‑303. https://doi.org/10.1002/jor.21518.

      Duchamp de Lageneste, Oriane, Anaïs Julien, Rana Abou-Khalil, Giulia Frangi, Caroline Carvalho, Nicolas Cagnard, Corinne Cordier, Simon J. Conway, et Céline Colnot. 2018. “Periosteum Contains Skeletal Stem Cells with High Bone Regenerative Potential Controlled by Periostin”. Nature Communications 9 (1): 773. https://doi.org/10.1038/s41467-018-03124-z.

      Hsieh, Chen-Chan, B. Linju Yen, Chia-Chi Chang, Pei-Ju Hsu, Yu-Wei Lee, Men-Luh Yen, ShawFang Yet, et Linyi Chen. 2023. “Wnt Antagonism without TGFβ Induces Rapid MSC Chondrogenesis via Increasing AJ Interactions and Restricting Lineage Commitment”. iScience 26 (1): 105713. https://doi.org/10.1016/j.isci.2022.105713.

      Hu, Diane P., Federico Ferro, Frank Yang, Aaron J. Taylor, Wenhan Chang, Theodore Miclau, Ralph S. Marcucio, et Chelsea S. Bahney. 2017. “Cartilage to Bone Transformation during Fracture Healing Is Coordinated by the Invading Vasculature and Induction of the Core Pluripotency Genes”. Development 144 (2): 221‑34. https://doi.org/10.1242/dev.130807.

      Jeffery, Elise C., Terry L.A. Mann, Jade A. Pool, Zhiyu Zhao, et Sean J. Morrison. 2022. “Bone Marrow and Periosteal Skeletal Stem/Progenitor Cells Make Distinct Contributions to Bone Maintenance and Repair”. Cell Stem Cell 29 (11): 1547-1561.e6. https://doi.org/10.1016/j.stem.2022.10.002.

      Julien, Anais, Anuya Kanagalingam, Ester Martínez-Sarrà, Jérome Megret, Marine Luka, Mickaël Ménager, Frédéric Relaix, et Céline Colnot. 2021. “Direct contribution of skeletal muscle mesenchymal progenitors to bone repair”. Nature Communications 12 (1): 2860. https://doi.org/10.1038/s41467-021-22842-5.

      Julien, Anais, Simon Perrin, Oriane Duchamp de Lageneste, Caroline Carvalho, Morad Bensidhoum, Laurence Legeai-Mallet, et Céline Colnot. 2020. “FGFR3 in Periosteal Cells Drives Cartilage-to-Bone Transformation in Bone Repair”. Stem Cell Reports 15 (4): 955‑67. https://doi.org/10.1016/j.stemcr.2020.08.005.

      Julien, Anais, Simon Perrin, Ester Martínez-Sarrà, Anuya Kanagalingam, Caroline Carvalho, Marine Luka, Mickaël Ménager, et Céline Colnot. 2022. “Skeletal Stem/Progenitor Cells in Periosteum and Skeletal Muscle Share a Common Molecular Response to Bone Injury”. Journal of Bone and Mineral Research, juin, jbmr.4616. https://doi.org/10.1002/jbmr.4616.

      Kang, Sona, Christina N. Bennett, Isabelle Gerin, Lauren A. Rapp, Kurt D. Hankenson, et Ormond A. MacDougald. 2007. “Wnt Signaling Stimulates Osteoblastogenesis of Mesenchymal Precursors by Suppressing CCAAT/Enhancer-Binding Protein α and Peroxisome Proliferator Activated        Receptor γ”. Journal of Biological Chemistry 282 (19): 14515‑24. https://doi.org/10.1074/jbc.M700030200.

      Komatsu, David E., Michelle N. Mary, Robert Jason Schroeder, Alex G. Robling, Charles H. Turner, et Stuart J. Warden. 2010. “Modulation of Wnt Signaling Influences Fracture Repair”. Journal of Orthopaedic Research 28 (7): 928‑36. https://doi.org/10.1002/jor.21078.

      Hachemi, Yasmine, Simon Perrin, Maria Ethel, Anais Julien, Julia Vettese, Blandine Geisler, Christian Göritz, et Céline Colnot. 2024. “Multimodal Analyses of Immune Cells during Bone Repair Identify Macrophages as a Therapeutic Target in Musculoskeletal Trauma”. https://doi.org/10.1101/2024.04.29.591608.

      Kraus, Jessica M., Dion Giovannone, Renata Rydzik, Jeremy L. Balsbaugh, Isaac L. Moss, Jennifer L. Schwedler, Julien Y. Bertrand, et al. 2022. “Notch Signaling Enhances Bone Regeneration in the Zebrafish Mandible”. Development 149 (5): dev199995. https://doi.org/10.1242/dev.199995.

      Lee, S., L. H. Remark, A. M. Josephson, K. Leclerc, E. Muiños Lopez, D. J. Kirby, Devan Mehta, et al. 2021. “Notch-Wnt Signal Crosstalk Regulates Proliferation and Differentiation of Osteoprogenitor Cells during Intramembranous Bone Healing”. Npj Regenerative Medicine 6 (1): 29. https://doi.org/10.1038/s41536-021-00139-x.

      Li, Jiaoduan, Dongyan Cao, Lixin Jiang, Yiwen Zheng, Siyuan Shao, Ai Zhuang, et Dongxi Xiang. 2023. “ITGB2-ICAM1 Axis Promotes Liver Metastasis in BAP1-Mutated Uveal Melanoma with Retained Hypoxia and ECM Signatures”. Cellular Oncology (Dordrecht), décembre. https://doi.org/10.1007/s13402-023-00908-4.

      Logan, Malcolm, James F. Martin, Andras Nagy, Corrinne Lobe, Eric N. Olson, et Clifford J. Tabin. 2002. “Expression of Cre Recombinase in the Developing Mouse Limb Bud Driven by aPrxl Enhancer”. Genesis 33 (2): 77‑80. https://doi.org/10.1002/gene.10092.

      Machado, Léo, Perla Geara, Jordi Camps, Matthieu Dos Santos, Fatima Teixeira-Clerc, Jens Van Herck, Hugo Varet, et al. 2021.”Tissue Damage Induces a Conserved Stress Response That Initiates Quiescent Muscle Stem Cell Activation”. Cell Stem Cell 28 (6): 1125-1135.e7. https://doi.org/10.1016/j.stem.2021.01.017.

      Matsushita, Yuki, Mizuki Nagata, Kenneth M. Kozloff, Joshua D. Welch, Koji Mizuhashi, Nicha Tokavanich, Shawn A. Hallett, et al. 2020. “A Wnt-Mediated Transformation of the Bone Marrow Stromal Cell Identity Orchestrates Skeletal Regeneration”. Nature Communications 11 (1): 332. https://doi.org/10.1038/s41467-019-14029-w.

      Matthews, Brya G, Danka Grcevic, Liping Wang, Yusuke Hagiwara, Hrvoje Roguljic, Pujan Joshi, Dong-Guk Shin, Douglas J Adams, et Ivo Kalajzic. 2014. “Analysis of αSMA-Labeled Progenitor Cell Commitment Identifies Notch Signaling as an Important Pathway in Fracture Healing”. Journal of Bone and Mineral Research 29 (5): 1283‑94. https://doi.org/10.1002/jbmr.2140.

      Matthews, Brya G, Sanja Novak, Francesca V Sbrana, Jessica L Funnell, Ye Cao, Emma J Buckels, Danka Grcevic, et Ivo Kalajzic. 2021. “Heterogeneity of Murine Periosteum Progenitors Involved in Fracture Healing”. eLife 10 (février):e58534. https://doi.org/10.7554/eLife.58534.

      Minear, Steve, Philipp Leucht, Samara Miller, et Jill A Helms. 2010. “rBMP Represses Wnt Signaling and Influences Skeletal Progenitor Cell Fate Specification during Bone Repair”. Journal of Bone and Mineral Research 25 (6): 1196‑1207. https://doi.org/10.1002/jbmr.29.

      Minear, Steven, Philipp Leucht, Jie Jiang, Bo Liu, Arial Zeng, Christophe Fuerer, Roel Nusse, et Jill A. Helms. 2010. “Wnt Proteins Promote Bone Regeneration”. Science Translational Medicine 2 (29). https://doi.org/10.1126/scitranslmed.3000231.

      Novak, Sanja, Emilie Roeder, Benjamin P. Sinder, Douglas J. Adams, Chris W. Siebel, Danka Grcevic, Kurt D. Hankenson, Brya G. Matthews, et Ivo Kalajzic. 2020. “Modulation of Notch1 Signaling Regulates Bone Fracture Healing”. Journal of Orthopaedic Research 38 (11): 2350‑61. https://doi.org/10.1002/jor.24650.

      Pinzone, Joseph J., Brett M. Hall, Nanda K. Thudi, Martin Vonau, Ya-Wei Qiang, Thomas J. Rosol, et John D. Shaughnessy. 2009. “The Role of Dickkopf-1 in Bone Development, Homeostasis, and Disease”. Blood 113 (3): 517‑25. https://doi.org/10.1182/blood-2008-03-145169.

      Remark, Lindsey H., Kevin Leclerc, Malissa Ramsukh, Ziyan Lin, Sooyeon Lee, Backialakshmi Dharmalingam, Lauren Gillinov, et al. 2023. “Loss of Notch Signaling in Skeletal Stem Cells Enhances Bone Formation with Aging”. Bone Research 11 (1): 50. https://doi.org/10.1038/s41413-023-00283-8.

      Ruscitto, Angela, Peng Chen, Ikue Tosa, Ziyi Wang, Gan Zhou, Ingrid Safina, Ran Wei, et al. 2023. “Lgr5-Expressing Secretory Cells Form a Wnt Inhibitory Niche in Cartilage Critical for Chondrocyte Identity”. Cell Stem Cell 30 (9): 1179-1198.e7. https://doi.org/10.1016/j.stem.2023.08.004.

      Selewa, Alan, Ryan Dohn, Heather Eckart, Stephanie Lozano, Bingqing Xie, Eric Gauchat, Reem Elorbany, et al. 2020. “Systematic Comparison of High-Throughput Single-Cell and SingleNucleus Transcriptomes during Cardiomyocyte Differentiation”. Scientific Reports 10 (1): 1535. https://doi.org/10.1038/s41598-020-58327-6.

      Sinha, Sarthak, Holly D. Sparks, Elodie Labit, Hayley N. Robbins, Kevin Gowing, Arzina Jaffer, Eren Kutluberk, et al. 2022. “Fibroblast Inflammatory Priming Determines Regenerative versus Fibrotic Skin Repair in Reindeer”. Cell 185 (25): 4717-4736.e25. https://doi.org/10.1016/j.cell.2022.11.004.

      Wang, Cuicui, Jason A. Inzana, Anthony J. Mirando, Yinshi Ren, Zhaoyang Liu, Jie Shen, Regis J. O’Keefe, Hani A. Awad, et Matthew J. Hilton. 2016. “NOTCH Signaling in Skeletal Progenitors Is Critical for Fracture Repair”. The Journal of Clinical Investigation 126 (4): 1471‑81. https://doi.org/10.1172/JCI80672.

      Wen, Fei, Xiaojie Tang, Lin Xu, et Haixia Qu. 2022. “Comparison of Single‑nucleus and Single‑cell Transcriptomes in Hepatocellular Carcinoma Tissue”. Molecular Medicine Reports 26 (5): 339. https://doi.org/10.3892/mmr.2022.12855.

      Wu, Chia-Lung, Amanda Dicks, Nancy Steward, Ruhang Tang, Dakota B. Katz, Yun-Rak Choi, et Farshid Guilak. 2021. “Single Cell Transcriptomic Analysis of Human Pluripotent Stem Cell Chondrogenesis”. Nature Communications 12 (1): 362. https://doi.org/10.1038/s41467-02020598-y.

      Wu, Haojia, Yuhei Kirita, Erinn L. Donnelly, et Benjamin D. Humphreys. 2019. “Advantages of Single-Nucleus over Single-Cell RNA Sequencing of Adult Kidney: Rare Cell Types and Novel Cell States Revealed in Fibrosis”. Journal of the American Society of Nephrology 30 (1): 23‑32. https://doi.org/10.1681/ASN.2018090912.

      Zhong, Leilei, Lutian Yao, Robert J. Tower, Yulong Wei, Zhen Miao, Jihwan Park, Rojesh Shrestha, et al. 2020. “Single Cell Transcriptomics Identifies a Unique Adipose Lineage Cell Population That Regulates Bone Marrow Environment”. eLife 9 (avril):e54695. https://doi.org/10.7554/eLife.54695.

      Zhou, Xin, Klaus von der Mark, Stephen Henry, William Norton, Henry Adams, et Benoit de Crombrugghe. 2014. “Chondrocytes Transdifferentiate into Osteoblasts in Endochondral Bone during Development, Postnatal Growth and Fracture Healing in Mice”. Édité par Matthew L. Warman. PLoS Genetics 10 (12): e1004820. https://doi.org/10.1371/journal.pgen.1004820.

    2. eLife Assessment

      This fundamental study generated a single cell atlas of mouse periosteal cells under both steady-state and fracture healing conditions to address the knowledge gap regarding cellular composition of the periosteum and their responses to injury. Based on convincing transcriptome analyses and experimental validation, the authors identified the injury induced fibrogenic cell (IIFC) as a characteristic cell type appearing in the bone regeneration process and proposed that the IIFC is a progenitor undergoing osteochondrogenic differentiation. This study will provide a significant publicly accessible dataset to reexamine the expression of the reported periosteal stem and progenitor cell markers.

    3. Reviewer #1 (Public review):

      This study delineates an important set of uninjured and injured periosteal snRNAseq data that provides an overview of periosteal cell responses to fracture healing. The authors also took additional steps to validate some of the findings using immunohistochemistry and transplantation assays. This study will provide a valuable publicly accessible dataset to reexamine the expression of the reported periosteal stem and progenitor cell markers.

      Strengths:

      (1) This is the first single-nuclei atlas of periosteal cells that are obtained without enzymatic cell dissociation or targeted cell purification by FACS. This integrated snRNAseq dataset will provide additional opportunities for the community to revisit the expression of many periosteal cell markers that have been reported to date.<br /> (2) The authors delved further into the dataset using cutting-edge algorithms, including CytoTrace, SCENIC, Monocle, STRING and CellChat, to define potential roles of identified cell populations in the context of fracture healing. These additional computation analyses generate many new hypotheses regarding periosteal cell reactions.<br /> (3) The authors also sought to validate some of the computational findings using immunohistochemistry and transplantation assays to support the conclusion.

      Weaknesses:

      (1) The current snRNAseq datasets contain only a small number of nuclei (1,189 nuclei at day 0, 6,213 nuclei day 0-7 combined). It is possible that these datasets are underpowered to discern subtle biological changes in skeletal stem/progenitor cell populations during fracture healing.<br /> (2) POSTN is expressed in the cambium layer of the periosteum without fracture. The current data do not exclude the possibility that these pre-existing POSTN+ cells are the main responder of fracture healing.

    4. Reviewer #2 (Public review):

      Summary:

      The authors described cell type mapping was conducted for both WT and fracture types. Through this, unique cell populations specific to fracture conditions were identified. To determine these, the most undifferentiated cells were initially targeted using stemness-related markers and CytoTrace scoring. This led to the identification of SSPC differentiating into fibroblasts. It was observed that the fibroblast cell type significantly increased under fracture conditions, followed by subsequent increases in chondrocytes and osteoblasts.

      Strengths:

      This study presented the injury-induced fibrogenic cell (IIFC) as a characteristic cell type appearing in the bone regeneration process and proposed that the IIFC is a progenitor undergoing osteochondrogenic differentiation.

      Comments on revised version:

      The authors have thoroughly addressed the reviewer's comments and have conducted additional experiments.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      We thank Reviewer 1 for their time and expertise in reviewing our study. The suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      We thank Reviewer 1 for this assessment.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      We thank the reviewer for their expertise on this topic. We obtained these habitat classifications from field guides and trait databases, and we will review our primary sources to clarify the trait classifications. We will also reclassify the species according to the expertise of this reviewer and perform our analysis again. 

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide- ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      It is really helpful to have the opportunity to contextualize study design decisions like this one, and we thank the reviewer for the query. Sampling intensity is always a meaningful issue in research conducted at this scale, and we addressed it head-on in this work.

      Very small quadrats covering massive geographical areas will be critically and increasingly afflicted by sampling weaknesses, as well as creating a potentially large problem with pseudoreplication. There is no simple solution to this problem. It would be possible to create interpolated predictions of species’ distributions using Species Distribution Models, Joint Species Distribution Models, or various kinds of Occupancy Models. None of these approaches then leads to analyses that rely on directly observed patterns. Instead, they are extrapolations, and those extrapolations typically fail when tested, (for example, papers by Lee-Yaw demonstrate that it is rare for SDMs to predict things well; occupancy models often perform less well than SDMs and do not capture how things change over time - Briscoe et al. 2021, Global Change Biology). The result of employing such techniques would certainly be to make all conclusions speculative, rather than directly observable. 

      Rather than employing extrapolative models, we relied on transparent techniques that are used successfully in the core macroecology literature that address spatial variation in sampling explicitly and simply. Moreover, we constructed extensive null models that show that range and phenology changes, respectively, are contrary to expectations that arise from sampling difference. 100km quadrats make for a reasonable “middle-ground” in terms of the effects of sampling, and we will add a reference to the methods section to clarify this.

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      Thank you for pointing this out - it is an editing error that should have been resolved prior to submission. We will replace the terms specialist and generalist with specific predictions based on traits.

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

      We thank Reviewer 1 for making us aware of this excellent reference. We will review this text and include it as a reference, in addition to other references recommended by Reviewer 1 and other reviewers.

      Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

      We thank Reviewer 2 for their time and expertise in reviewing our study.

      It is possible that some species with broad niches may not need to migrate, although in general failing to move with climate change is considered an indicator of “climate debt”, signaling that a species may be of concern for conservation (ex. Duchenne et al. 2021, Ecology Letters). We will revise the discussion to acknowledge potential differences in outcomes.

      We used null models to test whether our results regarding range shifts were robust, and if they varied due to increased sampling over time. We found that observed northern range limit shifts are not consistent with expectations derived from changes in sampling intensity (Figure S1, S2). 

      We thank Reviewer 2 for pointing out this error in Figure 1. This conceptual figure was a challenge to construct, as it must illustrate how phenology and range shifts can occur simultaneously or uniquely to enable a hypothetic odonate to track its thermal niche over time. In a previous version of the figure, we had a second panel and we failed to remove the reference to that panel when we simplified the figure. 

      Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      We thank Reviewer 3 for their expertise and the time they spent reviewing our study. Their suggestions are very helpful and will improve the quality of our manuscript.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      We thank Reviewer 3 for this assessment.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

      We thank Reviewer 3 for these suggestions. We will revise the introduction and discussion to better contextualize species’ responses to climate change and the mechanisms behind them. We will carefully review all figures and captions, and we will make changes to improve the clarity of the text and the presentation of results.

    2. Reviewer #1 (Public review):

      Summary:

      This study evaluates whether species can shift geographically, temporally, or both ways in response to climate change. It also teases out the relative importance of geographic context, temperature variability, and functional traits in predicting the shifts. The study system is large occurrence datasets for dragonflies and damselflies split between two time periods and two continents. Results indicate that more species exhibited both shifts than one or the other or neither, and that geographic context and temp variability were more influential than traits. The results have implications for future analyses (e.g. incorporating habitat availability) and for choosing winner and loser species under climate change. The methodology would be useful for other taxa and study regions with strong community/citizen science and extensive occurrence data.

      Strengths:

      This is an organized and well-written paper that builds on a popular topic and moves it forward. It has the right idea and approach, and the results are useful answers to the predictions and for conservation planning (i.e. identifying climate winners and losers). There is technical proficiency and analytical rigor driven by an understanding of the data and its limitations.

      Weaknesses:

      (1) The habitat classifications (Table S3) are often wrong. "Both" is overused. In North America, for example, Anax junius, Cordulia shurtleffii, Epitheca cynosura, Erythemis simplicicollis, Libellula pulchella, Pachydiplax longipennis, Pantala flavescens, Perithemis tenera, Ischnura posita, the Lestes species, and several Enallagma species are not lotic breeding. These species rarely occur let alone successfully reproduce at lotic sites. Other species are arguably "both", like Rhionaeschna multicolor which is mostly lentic. Not saying this would have altered the conclusions, but it may have exacerbated the weak trait effects.

      (2) The conservative spatial resolution (100 x 100 km) limits the analysis to wide-ranging and generalist species. There's no rationale given, so not sure if this was by design or necessity, but it limits the number of analyzable species and potentially changes the inference.

      (3) The objective includes a prediction about generalists vs specialists (L99-103) yet there is no further mention of this dichotomy in the abstract, methods, results, or discussion.

      (4) Key references were overlooked or dismissed, like in the new edition of Dragonflies & Damselflies model organisms book, especially chapters 24 and 27.

    3. Reviewer #2 (Public review):

      Summary:

      This paper explores a highly interesting question regarding how species migration success relates to phenology shifts, and it finds a positive relationship. The findings are significant, and the strength of the evidence is solid. However, there are substantial issues with the writing, presentation, and analyses that need to be addressed. First, I disagree with the conclusion that species that don't migrate are "losers" - some species might not migrate simply because they have broad climatic niches and are less sensitive to climate change. Second, the results concerning species' southern range limits could provide valuable insights. These could be used to assess whether sampling bias has influenced the results. If species are truly migrating, we should observe northward shifts in their southern range limits. However, if this is an artifact of increased sampling over time, we would expect broader distributions both north and south. Finally, Figure 1 is missed panel B, which needs to be addressed.

    4. Reviewer #3 (Public review):

      Summary:

      In their article "Range geographies, not functional traits, explain convergent range and phenology shifts under climate change," the authors rigorously investigate the temporal shifts in odonate species and their potential predictors. Specifically, they examine whether species shift their geographic ranges poleward or alter their phenology to avoid extreme conditions. Leveraging opportunistic observations of European and North American odonates, they find that species showing significant range shifts also exhibited earlier phenological shifts. Considering a broad range of potential predictors, their results reveal that geographical factors, but not functional traits, are associated with these shifts.

      Strengths:

      The article addresses an important topic in ecology and conservation that is particularly timely in the face of reports of substantial insect declines in North America and Europe over the past decades. Through data integration the authors leverage the rich natural history record for odonates, broadening the taxonomic scope of analyses of temporal trends in phenology and distribution to this taxon. The combination of phenological and range shifts in one framework presents an elegant way to reconcile previous findings improving our understanding of the drivers of biodiversity loss.

      Weaknesses:

      The introduction and discussion of the article would benefit from a stronger contextualization of recent studies on biological responses to climate change and the underpinning mechanism.

      The presentation of the results (particularly in figures) should be improved to address the integrative character of the work and help readers extract the main results. While the writing of the article is generally good, particularly the captions and results contain many inconsistencies and lack important detail. With the multitude of the relationships that were tested (the influence of traits) the article needs more coherence.

    1. eLife Assessment

      This fundamental work by Mäkelä et al. presents compelling experimental evidence supported by a theoretical model that the amount of chromosomal DNA can become limiting for the total rate of mRNA transcription and consequently protein production in the model bacterium Escherichia coli. The work is based on a mutant that allows inhibition of DNA replication while following growth at the single-cell level due to cell filamentation. The work significantly advances our understanding of growth and of the central dogma, and will be of considerable interest within both systems biology and microbial physiology.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Mäkelä et al. presents compelling experimental evidence that the amount of chromosomal DNA can become limiting for the total rate of mRNA transcription and consequently protein production in the model bacterium Escherichia coli. Specifically, the authors demonstrate that upon inhibition of DNA replication the rate of RNA transcription and the single-cell growth rate continuously decrease, the latter in direct proportion to the concentration of active ribosomes, as measured indirectly by single-particle tracking. The decrease of ribosomal activity with filamentation is likely caused by a decrease of the concentration of mRNAs, as suggested by an observed plateau of the total number of active RNA polymerases. These observations are compatible with the hypothesis that DNA limits the total rate of transcription and thus, indirectly, translation.

      The authors also demonstrate that the decrease of RNAp activity is independent of two candidate stress response pathways, the SOS stress response and the stringent response, as well as an anti-sigma factor previously implicated in variations of RNAp activity upon variations of nutrient sources.

      Remarkably, the reduction of growth rate is observed soon after the inhibition of DNA replication, suggesting that the amount of DNA in wild-type cells is tuned to provide just as much substrate for RNA polymerase as needed to saturate most ribosomes with mRNAs. While previous studies of bacterial growth have most often focused on ribosomes and metabolic proteins, this study provides important evidence that chromosomal DNA has a previously underestimated important and potentially rate-limiting role for growth.

      Strengths:

      This article links the growth of single cells to the amount of DNA, the number of active ribosomes and to the number of RNA polymerases, combining quantitative experiments with theory. The correlations observed during depletion of DNA, notably in M9gluCAA medium, are compelling and point towards a limiting role of DNA for transcription and subsequently for protein production soon after reduction of the amount of DNA in the cell. The article also contains a theoretical model of transcription-translation that contains a Michaelis-Menten type dependency of transcription on DNA availability and is fit to the data.

      At a technical level, single-cell growth experiments and single-particle tracking experiments are well described, suggesting that different diffusive states of molecules represent different states of RNAp/ribosome activities, which reflect the reduction of growth.

      Apart from correlations in DNA-deplete cells, the article also investigates the role of candidate stress response pathways for reduced transcription, demonstrating that neither the SOS nor the stringent response are responsible for the reduced rate of growth. Equally, the anti-sigma factor Rsd recently described for its role in controlling RNA polymerase activity in nutrient-poor growth media, seems also not involved according to mass-spec data. While other (unknown) pathways might still be involved in reducing the number of active RNA polymerases, the proposed hypothesis of the DNA substrate itself being limiting for the total rate of transcription is appealing.

      Finally, the authors confirm the reduction of growth in the distant Caulobacter crescentus, which lacks overlapping rounds of replication and could thus have shown a different dependency on DNA concentration.

      Weaknesses:

      The study has no apparent weaknesses after review.

    3. Reviewer #2 (Public review):

      In this work, the authors uncovered the effects of DNA dilution on E. coli, including a decrease in growth rate and a significant change in proteome composition. The authors demonstrated that the decline in growth rate is due to the reduction of active ribosomes and active RNA polymerases because of the limited DNA copy numbers. They further showed that the change in the DNA-to-volume ratio leads to concentration changes in almost 60% of proteins, and these changes mainly stem from the change in the mRNA levels.

      Comments on revised version:

      The authors have satisfyingly answered all of our questions.

    4. Reviewer #3 (Public review):

      Mäkelä et al. here investigate genome concentration as a limiting factor on growth. Previous work has identified key roles for transcription (RNA polymerase) and translation (ribosomes) as limiting factors on growth, which enable an exponential increase in cell mass. While a potential limiting role of genome concentration under certain conditions has been explored theoretically, Mäkelä et al. here present direct evidence that when replication is inhibited, genome concentration emerges as a limiting factor.

      A major strength of this paper is the diligent and compelling combination of experiment and modeling used to address this core question. The use of origin- and ftsZ-targeted CRISPRi is a very nice approach that enables dissection of the specific effects of limiting genome dosage in the context of a growing cytoplasm. While it might be expected that genome concentration eventually becomes a limiting factor, what is surprising and novel here is that this happens very rapidly, with growth transitioning even for cells within the normal length distribution for E. coli. Fundamentally, it demonstrates the fine balance of bacterial physiology, where the concentration of the genome itself (at least under rapid growth conditions) is no higher than it needs to be. A further surprising finding of this study is that susceptibility to this genome-limiting effect is felt differently by different genes, with unstable transcripts more affected and rRNA and many essential genes being more robust to it.

      It should be noted that the authors do not identify a "smoking gun" - a gene or small number of genes that mediate the effects of genome concentration-dependent growth limitation. However, what they do achieve is to develop plausible criteria for identifying such a gene - through investigating essential genes that decrease in their abundance more rapidly than others.

      Overall, this study provides a fundamental contribution to bacterial physiology by illuminating the relationship between DNA, mRNA, and protein in determining growth rate. While coarse-grained, the work invites exciting questions about how the composition of major cellular components is fine-tuned to a cell's needs and which specific gene products mediate this connection. The work also suggests the presence of buffering mechanisms that allow essential proteins such as RNA polymerase to be robust to fluctuations in genome concentration, which is an exciting area for future exploration. This work has implications not only for biotechnology, as the authors discuss, but potentially also for our understanding of how DNA-targeted antibiotics limit bacterial growth.

      Comments on revised version:

      Nothing left to add - the authors did a fantastic job addressing my points. In some ways doing so opened up even more interesting questions, but I happily accept that those are best left to future investigations.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The manuscript by Mäkelä et al. presents compelling experimental evidence that the amount of chromosomal DNA can become limiting for the total rate of mRNA transcription and consequently protein production in the model bacterium Escherichia coli. Specifically, the authors demonstrate that upon inhibition of DNA replication the single-cell growth rate continuously decreases, in direct proportion to the concentration of active ribosomes, as measured indirectly by single-particle tracking. The decrease of ribosomal activity with filamentation, in turn, is likely caused by a decrease of the concentration of mRNAs, as suggested by an observed plateau of the total number of active RNA polymerases. These observations are compatible with the hypothesis that DNA limits the total rate of transcription and thus translation. The authors also demonstrate that the decrease of RNAp activity is independent of two candidate stress response pathways, the SOS stress response and the stringent response, as well as an anti-sigma factor previously implicated in variations of RNAp activity upon variations of nutrient sources.

      Remarkably, the reduction of growth rate is observed soon after the inhibition of DNA replication, suggesting that the amount of DNA in wild-type cells is tuned to provide just as much substrate for RNA polymerase as needed to saturate most ribosomes with mRNAs. While previous studies of bacterial growth have most often focused on ribosomes and metabolic proteins, this study provides important evidence that chromosomal DNA has a previously underestimated important and potentially rate-limiting role for growth. 

      Thank you for the excellent summary of our work.

      Strengths: 

      This article links the growth of single cells to the amount of DNA, the number of active ribosomes and to the number of RNA polymerases, combining quantitative experiments with theory. The correlations observed during depletion of DNA, notably in M9gluCAA medium, are compelling and point towards a limiting role of DNA for transcription and subsequently for protein production soon after reduction of the amount of DNA in the cell. The article also contains a theoretical model of transcription-translation that contains a Michaelis-Menten type dependency of transcription on DNA availability and is fit to the data. While the model fits well with the continuous reduction of relative growth rate in rich medium (M9gluCAA), the behavior in minimal media without casamino acids is a bit less clear (see comments below). 

      At a technical level, single-cell growth experiments and single-particle tracking experiments are well described, suggesting that different diffusive states of molecules represent different states of RNAp/ribosome activities, which reflect the reduction of growth. However, I still have a few points about the interpretation of the data and the measured fractions of active ribosomes (see below). 

      Apart from correlations in DNA-deplete cells, the article also investigates the role of candidate stress response pathways for reduced transcription, demonstrating that neither the SOS nor the stringent response are responsible for the reduced rate of growth. Equally, the anti-sigma factor Rsd recently described for its role in controlling RNA polymerase activity in nutrient-poor growth media, seems also not involved according to mass-spec data. While other (unknown) pathways might still be involved in reducing the number of active RNA polymerases, the proposed hypothesis of the DNA substrate itself being limiting for the total rate of transcription is appealing. 

      Finally, the authors confirm the reduction of growth in the distant Caulobacter crescentus, which lacks overlapping rounds of replication and could thus have shown a different dependency on DNA concentration. 

      Weaknesses: 

      There are a range of points that should be clarified or addressed, either by additional experiments/analyses or by explanations or clear disclaimers. 

      First, the continuous reduction of growth rate upon arrest of DNA replication initiation observed in rich growth medium (M9gluCAA) is not equally observed in poor media. Instead, the relative growth rate is immediately/quickly reduced by about 10-20% and then maintained for long times, as if the arrest of replication initiation had an immediate effect but would then not lead to saturation of the DNA substrate. In particular, the long plateau of a constant relative growth rate in M9ala is difficult to reconcile with the model fit in Fig 4S2. Is it possible that DNA is not limiting in poor media (at least not for the cell sizes studied here) while replication arrest still elicits a reduction of growth rate in a different way? Might this have something to do with the naturally much higher oscillations of DNA concentration in minimal medium?

      The reviewer is correct that there are interesting differences between nutrient-rich and -poor conditions. They were originally noted in the discussion, but we understand how our original presentation made it confusing. We reorganized the text and figures to better explain our results and interpretations. In the revised manuscript, the data related to the poor media are now presented separately (new Figure 6) from the data related to the rich medium (Figures 1-3).  The total RNAP activity (abundance x active fraction) is significantly reduced in poor media (Figure 6A-B) similarly to rich medium (Figure 3H). Thus, DNA is limiting for transcription across conditions. However, the total ribosome activity in poor media (Figure 6C-D) and thus the growth rate (Figure 6EF) was less affected in comparison to rich media (Figure 2H and 1C). Our interpretation of these results is that while DNA is limiting for transcription in all tested nutrient conditions (as shown by the total active RNAP data), post-transcriptional buffering activities compensate for the reduction in transcription in poor media, thereby maintaining a better scaling of growth rates under DNA limitation. 

      The authors argue that DNA becomes limiting in the range of physiological cell sizes, in particular for M9glCAA (Fig. 1BC). It would be helpful to know by how much (fold-change) the DNA concentration is reduced below wild-type (or multi-N) levels at t=0 in Fig 1B and how DNA concentration decays with time or cell area, to get a sense by how many-fold DNA is essentially 'overexpressed/overprovided' in wild-type cells. 

      We now provide crude estimates in the Discussion section. The revised text reads: “Crude estimations suggest that ≤ 40% DNA dilution is sufficient to negatively affect transcription (total RNAP activity) in M9glyCAAT, whereas the same effect was observed after less than 10% dilution in nutrient-poor media (M9gly or M9ala) (see Materials and Methods).” We obtained these numbers based on calculations and estimates described in the Materials and Methods section and Appendix 1 (Appendix 1 – Table 1).

      Fig. 2: The distribution of diffusion coefficients of RpsB is fit to Gaussians on the log scale. Is this based on a model or on previous work or simply an empirical fit to the data? An exact analytical model for the distribution of diffusion constants can be found in the tool anaDDA by Vink, ..., Hohlbein Biophys J 2020. Alternatively, distributions of displacements are expressed analytically in other tools (e.g., in SpotOn). 

      We use an empirical fit of Gaussian mixture model (GMM) of three states to the data and extract the fractions of molecules in each state. This avoids making too many assumptions on the underlying processes, e.g. a Markovian system with Brownian diffusion. The model in anaDDA (Vink et al.) is currently limited to two-transitioning states with a maximal step number of 8 steps per track for a computationally efficient solution (longer tracks are truncated). Using a short subset of the trajectories is less accurate than using the entire trajectory and because of this, we consider full tracks with at least 9 displacements. Meanwhile, Spot-On supports a three-state model but it is still based on a semi-analytical model with a pre-calculated library of parameters created by fitting of simulated data. Neither of these models considers the effect of cell confinement, which plays a major role in single-molecule diffusion in small-sized cells such as bacteria. For these reasons, we opted to use an empirical fit to the data. We note that the fractions of active ribosomes in WT cells, which we extracted from these diffusion measurements, are consistent with the range of estimates obtained by others using similar or different approaches (Forchhammer and Lindhal 1971; Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014). 

      The estimated fraction of active ribosomes in wild-type cells shows a very strong reduction with decreasing growth rate (down from 75% to 30%), twice as strong as measured in bulk experiments (Dai et al Nat Microbiology 2016; decrease from 90% to 60% for the same growth rate range) and probably incompatible with measurements of growth rate, ribosome concentrations, and almost constant translation elongation rate in this regime of growth rates. Might the different diffusive fractions of RpsB not represent active/inactive ribosomes? See also the problem of quantification above. The authors should explain and compare their results to previous work. 

      We agree that our measured range is somewhat larger than the estimated range from Dai et al, 2016. However, they use different media, strains, and growth conditions. We also note that Dai et al did not make actual measurements of the active ribosome fraction. Instead, they calculate the “active ribosome equivalent” based on a model that includes growth rate, protein synthesis rate, RNA/protein abundance, and the total number of amino acids in all proteins in the cell. Importantly, our measurements show the same overall trend (a ~30% decrease) as Dai et al, 2016. Furthermore, our results are within the range of previous experimental estimates from ribosome profiling (Forchhammer and Lindhal 1971) or single-ribosome tracking (Mohapatra and Weisshaar, 2018; Sanamrad et al., 2014). We clarified this point in the revised manuscript. 

      To measure the reduction of mRNA transcripts in the cell, the authors rely on the fluorescent dye SYTO RNAselect. They argue that 70% of the dye signal represents mRNA. The argument is based on the previously observed reduction of the total signal by 70% upon treatment with rifampicin, an RNA polymerase inhibitor (Bakshi et al 2014). The idea here is presumably that mRNA should undergo rapid degradation upon rif treatment while rRNA or tRNA are stable. However, work from Hamouche et al. RNA (2021) 27:946 demonstrates that rifampicin treatment also leads to a rapid degradation of rRNA. Furthermore, the timescale of fluorescent-signal decay in the paper by Bakshi et al. (half life about 10min) is not compatible with the previously reported rapid decay of mRNA (24min) but rather compatible with the slower, still somewhat rapid, decay of rRNA reported by Hamouche et al.. A bulk method to measure total mRNA as in the cited Balakrishnan et al. (Science 2022) would thus be a preferred method to quantify mRNA. Alternatively, the authors could also test whether the mass contribution of total RNA remains constant, which would suggest that rRNA decay does not contribute to signal loss. However, since rRNA dominates total RNA, this measurement requires high accuracy. The authors might thus tone down their conclusions on mRNA concentration changes while still highlighting the compelling data on RNAp diffusion. 

      Thank you for bringing the Hamouche et al 2021 paper to our attention. To address this potential issue, we have performed fluorescence in situ hybridization (FISH) microscopy using a 16S rRNA probe (EUB338) to quantify rRNA concentration in 1N cells. We found that the rRNA signal only slightly decreases with cell size (i.e., genome dilution) compared to the RNASelect signal (e.g., a ~5% decrease for rRNA signal vs. 50% for RNASelect for a cell size range of 4 to 10 µm2). We have revised the text and added a figure to include the new rRNA FISH data (Figure 4). In addition, as a control, we validated our rRNA FISH method by comparing the intracellular concentration of 16S rRNA in poor vs. rich media (new Figure 4 – Figure supplement 3).

      The proteomics experiments are a great addition to the single-cell studies, and the correlations between distance from ori and protein abundance is compelling. However, I was missing a different test, the authors might have already done but not put in the manuscript: If DNA is indeed limiting the initiation of transcription, genes that are already highly transcribed in non-perturbed conditions might saturate fastest upon replication inhibition, while genes rarely transcribed should have no problem to accommodate additional RNA polymerases. One might thus want to test, whether the (unperturbed) transcription initiation rate is a predictor of changes in protein composition. This is just a suggestion the authors may also ignore, but since it is an easy analysis, I chose to mention it here. 

      We did not find any correlation when we examined the potential relation between RNA slopes and mRNA abundance (from our first CRISPRi oriC time point) or the transcription initiation rate (from Balakrishnan et al., 2022, PMID: 36480614) across genes. These new plots are presented in Figure 7 – Figure supplement 2B. In contrast, we found a small but significant correlation between RNA slopes and mRNA decay rates (from Balakrishnan et al., 2022, PMID: 36480614), specifically for genes with short mRNA lifetimes (new Figure 7F). This effect is consistent with our model prediction (Figure 5 – Figure supplement 2). 

      Related to the proteomics, in l. 380 the authors write that the reduced expression close to the ori might reflect a gene-dosage compensatory mechanism. I don't understand this argument. Can the authors add a sentence to explain their hypothesis? 

      We apologize for the confusion. While performing additional analyses for the revisions, we realized that while the proteins encoded by genes close to oriC tend to display subscaling behavior, this is not true at the mRNA level (new Figure 7 – Figure supplement 3B). In light of this result, we no longer have a hypothesis for the observed negative correlation at the protein level (originally Figure 5D, now Figure 7 – Figure supplement 3A). The text was revised accordingly.  

      In Fig. 1E the authors show evidence that growth rate increases with cell length/area. While this is not a main point of the paper it might be cited by others in the future. There are two possible artifacts that could influence this experiment: a) segmentation: an overestimation of the physical length of the cell based on phase-contrast images (e.g., 200 nm would cause a 10% error in the relative rate of 2 um cells, but not of longer cells). b) timedependent changes of growth rate, e.g., due to change from liquid to solid or other perturbations. To test for the latter, one could measure growth rate as a function of time, restricting the analysis to short or long cells, or measuring growth rate for short/long cells at selected time points. For the former, I recommend comparison of phase-contrast segmentation with FM4-64-stained cell boundaries.

      As the reviewer notes, the small increase in relative growth was just a minor observation that does not affect our story whether it is biologically meaningful or the result of a technical artefact. But we agree with the reviewer that others might cite it in future works and thus should be interpreted with caution.

      An artefact associated with time-dependent changes (e.g. changing from liquid cultures to more solid agarose pads) is unlikely for two reasons. 1. We show that varying the time that cells spend on agarose pads relative to liquid cultures does not affect the cell size-dependent growth rate results (Figure 1 – supplement 5A). 2. We show that the growth rate is stable from the beginning of the time-lapse with no transient effects upon cell placement on agarose pads for imaging (Figure 1 – supplement 1). These results were described in the Methods section where they could easily be missed. We revised the text to discuss these controls more prominently in the Results section.

      As for cell segmentation, we have run simulations and agree with the reviewer that a small overestimation of cell area (which is possible with any cell segmentation methods including ours) could lead to a small increase in relative growth with increasing cell areas (new Figure 1 – Figure supplement 3). Since the finding is not important to our story, we simply revised the text and added the simulation results to alert the readers to the possibility that the observation may be due to a small cell segmentation bias.

      Reviewer #2 (Public Review): 

      In this work, the authors uncovered the effects of DNA dilution on E. coli, including a decrease in growth rate and a significant change in proteome composition. The authors demonstrated that the decline in growth rate is due to the reduction of active ribosomes and active RNA polymerases because of the limited DNA copy numbers. They further showed that the change in the DNA-to-volume ratio leads to concentration changes in almost 60% of proteins, and these changes mainly stem from the change in the mRNA levels. 

      Thank you for the support and accurate summary!

      Reviewer #3 (Public Review): 

      Summary: 

      Mäkelä et al. here investigate genome concentration as a limiting factor on growth.

      Previous work has identified key roles for transcription (RNA polymerase) and translation (ribosomes) as limiting factors on growth, which enable an exponential increase in cell mass. While a potential limiting role of genome concentration under certain conditions has been explored theoretically, Mäkelä et al. here present direct evidence that when replication is inhibited, genome concentration emerges as a limiting factor. 

      Strengths: 

      A major strength of this paper is the diligent and compelling combination of experiment and modeling used to address this core question. The use of origin- and ftsZ-targeted CRISPRi is a very nice approach that enables dissection of the specific effects of limiting genome dosage in the context of a growing cytoplasm. While it might be expected that genome concentration eventually becomes a limiting factor, what is surprising and novel here is that this happens very rapidly, with growth transitioning even for cells within the normal length distribution for E. coli. Fundamentally, it demonstrates the fine balance of bacterial physiology, where the concentration of the genome itself (at least under rapid growth conditions) is no higher than it needs to be. 

      Thank you!

      Weaknesses: 

      One limitation of the study is that genome concentration is largely treated as a single commodity. While this facilitates their modeling approach, one would expect that the growth phenotypes observed arise due to copy number limitation in a relatively small number of rate-limiting genes. The authors do report shifts in the composition of both the proteome and the transcriptome in response to replication inhibition, but while they report a positional effect of distance from the replication origin (reflecting loss of high-copy, origin-proximal genes), other factors shaping compositional shifts and their functional effects on growth are not extensively explored. This is particularly true for ribosomal RNA itself, which the authors assume to grow proportionately with protein. More generally, understanding which genes exert the greatest copy number-dependent influence on growth may aid both efforts to enhance (biotechnology) and inhibit (infection) bacterial growth. 

      We agree but feel that identifying the specific limiting genes is beyond the scope of the study. This said, we carried out additional experiments and analyses to address the reviewer’s comment and identify potential contributing factors and limiting gene candidates. First, we examined the intracellular concentration of 16S ribosomal RNA (rRNA) by rRNA FISH microscopy and found that it decays much slower than the bulk of mRNAs as measured using RNASelect staining (new Figure 4 and Figure 4 – Figure supplements 1 and 3). We found that the rRNA signal is far more stable in 1N cells than the RNASelect signal, the former decreasing by only ~5% versus ~50% for the later in response to the same range of genome dilution (Figure 4C).  Second,  we carried out new correlation analyses between our proteomic/transcriptomic datasets and published genome-wide datasets that report various variables under unperturbed conditions (e.g., mRNA abundance, mRNA degradation rates, fitness cost, transcription initiation rates, essentiality for viability); see new Figure 7E-G and Figure 7 – Figure supplement 2. In the process, we found that genes essential for viability tend, on average, to display superscaling behavior (Figure 7G). This suggests that cells have evolved mechanisms that prioritize expression of essential genes over nonessential ones during DNA-limited growth. Furthermore, this analysis identified a small number of essential genes that display strong negative RNA slopes (Figure 7C, Datasets 1 and 2), indicating that the concentration of their mRNA decreases rapidly relative to the rest of the transcriptome upon genome dilution. These essential genes with strong subscaling behavior are candidates for being growth-limiting. 

      The text and figures were revised to include these new results.

      Overall, this study provides a fundamental contribution to bacterial physiology by illuminating the relationship between DNA, mRNA, and protein in determining growth rate. While coarse-grained, the work invites exciting questions about how the composition of major cellular components is fine-tuned to a cell's needs and which specific gene products mediate this connection. This work has implications not only for biotechnology, as the authors discuss, but potentially also for our understanding of how DNA-targeted antibiotics limit bacterial growth. 

      Thank you!

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Below are my comments. 

      (1) I noticed that a paper by Li et al. on biorxiv has found similar results as this work ("Scaling between DNA and cell size governs bacterial growth homeostasis and resource allocation," https://doi.org/10.1101/2021.11.12.468234), including the linear growth of E. coli when the DNA concentration is low. This relevant reference was not cited or discussed in the current manuscript. 

      We agree that authors should cite and discuss relevant peer-reviewed literature. But broadly speaking, we feel that extending this responsibility to all preprints (and by extension any online material) that have not been reviewed is a bit dangerous. It would effectively legitimize unreviewed claims and risk their propagation in future publications. We think that while imperfect, the peer-reviewing process still plays an important role. 

      Regarding the specific 2021 preprint that the reviewer pointed out, we think that the presented growth rate data are quite noisy and that the experiments lack a critical control (multi-N cells), making interpretation difficult. Their report that plasmid-borne expression is enhanced when DNA is severely diluted is certainly interesting and makes sense in light of our measurements that the activities, but not the concentrations, of RNA polymerases and ribosomes are reduced in 1N cells. However, we do not know why this preprint has not yet been published since 2021. There could be many possible reasons for this. Therefore, we feel that it is safer to limit our discussion to peer-reviewed literature.

      (2) I think the kinetic Model B in the Appendix has been studied in previous works, such as Klump & Hwa, PNAS 2008, https://doi.org/10.1073/pnas.0804953105

      Indeed, Klumpp & Hwa 2008 modeled the kinetics of RNA polymerase and promoter association prior to our study. But there is a difference between their model and ours. Their model is based on Michaelis Menten-type (MM) functions in which the RNAP is analogous to the “substrate” and the promoter to the “enzyme” in the MM equation. In contrast, our model uses functions based on the law of mass action (instead of MMtype of function). We have revised the text, included the Klumpp & Hwa 2008 reference, and revised the Materials & Methods section to clarify these points. 

      (3) On lines 284-285, if I understand correctly, the fractions of active RNAPs and active ribosomes are relative to the total protein number. It would be helpful if the authors could mention this explicitly to avoid confusion. 

      The fractions of active RNAPs and active ribosomes are expressed as the percentage of the total RNAPs and ribosomes. We have revised the text to be more explicit. Thank you.

      (4) On line 835, I am not sure what the bulk transcription/translation rate means. I guess it is the maximum transcription/translation rate if all RNAPs/ribosomes are working according to Eq. (1,2). It would be helpful if the authors could explain the meaning of r_1 and r_2 more explicitly. 

      Our apology for the lack of clarity. We have added the following equations:

      (5) Regarding the changes in protein concentrations due to genome dilution, a recent theoretical paper showed that it may come from the heterogeneity in promoter strengths (Wang & Lin, Nature Communications 2021). 

      In the Wang and Lin model, the heterogeneity in promoter strength predicts that the “mRNA production rate equivalent”, which is the mRNA abundance multiplied by the mRNA decay rate, will correlate the RNA slopes. However, we found these two variables to be uncorrelated (see below, The Spearman correlation coefficient ρ was 0.02 with a p-value of 0.24, indicating non-significance (NS).

      Author response image 1.

      The mRNA production rate equivalent (mRNA abundance at the first time point after CRISPRi oriC induction multiplied by the mRNA degradation rate measured by Balakrishnan et al., 2022, PMID: 36480614, expressed in transcript counts per minute) does not correlate (Spearman correlation’s p-value = 0.24) with the RNA slope in 1N-rich cells.  Data from 2570 genes are shown (grey markers, Gaussian kernel density estimation - KDE), and their binned statistics (mean +/- SEM, ~280 genes per bin, orange markers). 

      In addition, we found no significant correlation between RNA slopes and mRNA abundance or transcription initiation rate. These plots are now included in Figure 7E and Figure 7 –Figure supplement 2B. Thus, the promoter strength does not appear to be a predictor of the RNA (and protein) scaling behavior under DNA limitation. 

      Reviewer #3 (Recommendations For The Authors): 

      One general area that could be developed further is analysis of changes in the proteome/transcriptome composition, given that there may be specific clues here as to the phenotypic effects of genome concentration limitation. Specifically: 

      • In Figure 5D, the authors demonstrate an effect of origin distance on sensitivity to replication inhibition, presumably as a copy number effect. However, the authors note that the effect was only slight and postulated a compensatory mechanism. Due to the stability of proteins, one should expect relatively small effects - even if synthesis of a protein stopped completely, its concentration would only decrease twofold with a doubling of cell area (slope = -1, if I'm interpreting things correctly). It would be helpful to display the same information shown in Figure 5D at the mRNA level, since I would anticipate that higher mRNA turnover rates mean that effects on transcription rate should be felt more rapidly. 

      We thank the reviewer for this suggestion. To our surprise, we found that there is no correlation between gene location relative to the origin and RNA slope across genes. This suggests that the observed correlation between gene location and protein slopes does not occur at the mRNA level. Given that we do not have an explanation for the underlying mechanism, we decided to present these data (the original data in Figure 5D and the new data for the RNA slope) in a supplementary figure (Figure 7 – Figure supplement 3).

      • Related to this, did the authors see any other general trends? For example, do highly expressed genes hit saturation faster, making them more sensitive to limited genome concentration? 

      We found that the RNA slopes do not correlate with mRNA abundance or transcription initiation rates. However, they do correlate with mRNA decay. That is, short-lived mRNAs tend to have negative RNA slopes. The new analyses have been added as Figure 7E-F and Figure 7 – Figure supplement 2B. The text has been revised to incorporate this information. 

      • Presumably loss of growth is primarily driven by a subset of genes whose copy number becomes limiting. Previously, it has been reported that there is a wide variety among "essential" genes in their expression-fitness relationship - i.e. how much of a reduction in expression you need before growth is reduced (e.g. PMID 33080209). It would be interesting to explore the shifts in proteome/transcriptome composition to see whether any genes particularly affected by restricted genome concentration are also especially sensitive to reduced expression - overlap in these datasets may reveal which genes drive the loss of growth. 

      This is a very interesting idea – thank you! We did not find a correlation between the protein/RNA slope and the relative gene fitness as previously calculated (PMID 33080209), as shown below.

      Author response image 2.

      The relative fitness of each gene (data by Hawkins et al., 2020, PMID: 33080209, median fitness from the highest sgRNA activity bin) plotted versus the gene-specific RNA and protein slopes that we measured in 1Nrich cells after CRISPRi oriC induction. More than 260 essential genes are shown (262 RNA slopes and 270 protein slopes, grey markers), and their binned statistics (mean +/- SEM, 43-45 essential genes per bin, orange markers). The spearman correlations (ρ) with p-values above 10-3 are considered not significant (NS). In our analyses, we only considered correlations significant if they have a Spearman correlation p-value below 10-10.

      However, while doing this suggested analysis, we noticed that the essential genes that were included in the forementioned study have RNA slopes above zero on average. This led us to compare the RNA slope distributions of essential genes relative to all genes (now included in Figure 7G). We found that they tend to display superscaling behavior (positive RNA slopes), suggesting the existence of regulatory mechanisms that prioritize the expression of essential genes over less important ones when genome concentration becomes limiting for growth.  The text has been revised to include this new information.

      Other suggestions: 

      • In Figure 3 the authors report that total RNAP concentration increases with increasing cytoplasmic volume. This is in itself an interesting finding as it may imply a compensatory mechanism - can the authors offer an explanation for this? 

      We do not have a straightforward explanation. But we agree that it is very interesting and should be investigated in future studies given that this superscaling behavior is common among essential genes. 

      • The explanation of the modeling within the main text could be improved. Specifically, equations 1 and 2, as well as a discussion of models A and B (lines 290-301), do not explicitly relate DNA concentration to downstream effects. The authors provide the key information in Appendix 1, but for a general reader, it would be helpful to provide some intuition within the main text about how genome concentration influences transcription rate (i.e. via 𝛼RNAP).  

      We apologize for the lack of clarity. We have added information that hopefully improves clarity.

    1. eLife Assessment

      This valuable study uses dynamic metabolic models to compare perturbation responses in a bacterial system, analyzing whether they return to their steady state or amplify beyond the initial perturbation. The evidence supporting the emergent properties of perturbed metabolic systems to network topology and sensitivity to specific metabolites is solid, although the authors do not explain the origin of some significant inconsistencies between models.

    2. Reviewer #1 (Public review):

      (1a) Summary:

      The author studied metabolic networks for central metabolism, focusing on how system trajectories returned to their steady state. To quantify the response, systematic perturbation was performed in simulation and the maximal destabilization away from steady state (compared with initial perturbation distance) was characterized. The author analyzed the perturbation response and found that sparse network and networks with more cofactors are more "stable", in the sense that the perturbed trajectories have smaller deviation along the path back to the steady state.

      (1b) Strengths and major contributions:

      The author compared three metabolic models and performed systematic perturbation analysis in simulation. This is the first work characterized how perturbed trajectories deviate from equilibrium in large biochemical systems and illustrated interesting findings about the difference between sparse biological systems and randomly simulated reaction networks.

      (1c) Weaknesses:

      There are two main weaknesses in this study:

      First, the metabolic network in this study is incomplete. For example, amino acid synthesis and lipid synthesis are important for biomass and growth, but they are not included in the three models used in this study. NADH and NADPH are as important as ATP/ADP/AMP, but they are not included in the models. In the future, a more comprehensive metabolic and biosynthesis model is required.

      Second, this work does not provide mathematics explanation on the perturbation response χ. Since the perturbation analysis are performed closed to steady state (or at least belongs to the attractor of single steady state), local linear analysis would provide useful information. By complement with other analysis in dynamical systems (described in below) we can gain more logical insights about perturbation response.

      (1d) Discussion and impact for the field:

      Metabolic perturbation is an important topic in cell biology and has important clinical implication in pharmacodynamics. The computational analysis in this study provides an initiative for future quantitative analysis on metabolism and homeostasis.

      Comments on revised version:

      The revised version of this manuscript made some clarifications, while I think the analysis of response coefficients is still numerical and model-specific, being unclear under dynamical systems of views.

    3. Reviewer #2 (Public review):

      The authors have conducted a valuable comparative analysis of perturbation responses in three nonlinear kinetic models of E. coli central carbon metabolism found in the literature. They aimed to uncover commonalities and emergent properties in the perturbation responses of bacterial metabolism. They discovered that perturbations in the initial concentrations of specific metabolites, such as adenylate cofactors and pyruvate, significantly affect the maximal deviation of the responses from steady-state values. Furthermore, they explored whether the network connectivity (sparse versus dense connections) influences these perturbation responses. The manuscript is reasonably well written.

      Comments on revised version:

      The authors have addressed my concerns to a large extent. However, a few minor issues remain, as listed below:

      (1) The authors identified key metabolites affecting responses to perturbations in two ways: (i) by fixing a metabolite's value and (ii) by performing a sensitivity analysis. It would be helpful for the modeling community to understand better the differences and similarities in the obtained results. Do both methods identify substrate-level regulators? Is freezing a metabolite's dynamics dramatically changing the metabolic response (and if yes, which ones are so different in the two cases)? Does the scope of the network affect these differences and similarities?

      (2) Regarding the issues the authors encountered when performing the sensitivity analysis, they can be approached in two ways. First, the authors can check the methods for computing conserved moieties nicely explained by Sauro's group (doi:10.1093/bioinformatics/bti800) and compute them for large-scale networks (but beware of metabolites that belong to several conserved pools). Otherwise, the conserved pools of metabolites can be considered as variables in the sensitivity analysis-grouping multiple parameters is a common approach in sensitivity analysis.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Reviews):  

      First, the metabolic network in this study is incomplete. For example, amino acid synthesis and lipid synthesis are important for biomass and growth, but they4 are not included in the three models used in this study. NADH and NADPH are as important as ATP/ADP/AMP, but they are not included in the models. In the future, a more comprehensive metabolic and biosynthesis model is required.  

      Thank you for the critical comment on the weakness of the present study. We actually tried to study a larger model like Turnborg et al (2021), which is a model of JCVI-syn3A, but we give up to include it in our model list to study in depth. This is because we noticed that the concentration of ATP in the model can be negative (we confirmed this with one of the authors of the paper). Another "big" kinetic model of metabolism that we could list would be Khodayari et al (2017). However, we could not find the models to compare the dynamics of this big model with. Therefore, we decided to use the model only for the central carbon metabolism for now. We would like to leave a more extended study for the near future.  

      We would like to mention that NADH and NADPH are included in Khodayari model and Boecker model, while NADH and NADPH are ramped up to NADH in the latter model.  

      Second, this work does not provide a mathematical explanation of the perturbation response χ. Since the perturbation analysis is performed close to the steady state (or at least belongs to the attractor of single-steady-state), local linear analysis would provide useful information. By complementing with other analysis in dynamical systems (described below) we can gain more logical insights about perturbation response.  

      We tried a linear stability analysis. However, with the perturbation strength we used here, the linearization of the model is no longer valid, in the sense that the linearized model

      leads to negative concentrations of the metabolites (xst+Δx < 0 for some metabolites). We have added a scatter plot of the response coefficient of trajectories sharing the initial condition, while the dynamics are computed by the original model and the linearized model, respectively. (Fig. S1). 

      Since the response coefficient is based on the logarithm of the concentrations, as the metabolite concentrations approach zero, the response coefficient becomes larger. The high response coefficient in the Boecker and Chassagnole model would be explained by this artifact.  The linearized Khodayari model shows either χ~1 or χ = 0 (one or more metabolite concentrations become negative). This could be due to the number of variables in the model. For the response coefficient to have a larger value, the perturbation should be along the eigenvector that leads to oscillatory dynamics with long relaxation time (i.e., the corresponding eigenvalue has a small real part in terms of absolute value and a non-zero imaginary part). However, since the Khodayari model has about 800 variables, if perturbations are along such directions, there is a high probability that one or more metabolite concentrations will become negative.

      We fully agree that if the perturbations on the metabolite concentrations are in the linear regime, the response to the perturbations can be estimated by checking the eigenvalues and eigenvectors. However, we would say that the relationship between the linearized model (and thus the spectrum of eigenvalues) and the original model is unclear in this regime.  We remarked this in Lines 158160.

      Recommendations for the authors:

      My major suggestion is about understanding the key quantity in this study: the response coefficient χ. When the perturbed state is close to the fixed point, one could adopt local stability analysis and consider the linearized system. For a linear system with one stable fixed point P, we consider the Jacobian matrix M on P. If all eigenvalues of M are real and negative, the perturbed trajectory will return to P with each component monotonically varies. If some eigenvalues have negative real part and nonzero imaginary part, then the perturbed trajectory will spiral inward to the fixed point. Depending on the spiral trajectory and the initially perturbed state, some components would deviate furthermore (transiently) from the fixed point on the spiral trajectory. This explains why the response coefficient χ can be greater than 1. 

      Mathematically, a locally linearized system has similar behavior to the linear system, and the examples in this study can be analyzed in the similar way. Specifically, if a system has many complex eigenvalues, then the perturbed trajectory is more likely to have further deviation. The metabolic network models investigated in this work are not extremely large, and hence the author could analyze its spectrum of the Jacobian matrix at the steady state. Since the steady state is stable, I expect the spectrum located in the left half of the complex plane. If the spectrum spread out away from the real axis, we expect to see more spiral trajectories under perturbation. I think the spectrum analysis will provide a complementary view with respect to analysis on χ.  The authors' major findings, about the network sparsity and cofactors, can also be investigated under the framework of the spectrum analysis.  

      Of course, when the nonlinear system is perturbed far away from the fixed point, there are other geometrical properties of the vector field that can cause the response coefficient χ to be greater than 1. This could also be investigated in the future by testing the behavior of small and large perturbations and observing if the systems have signatures of nonlinearity.  

      Since all perturbed states return to the steady state, the eigenvalues of the Jacobi matrix accompanying the linearized system around the steady state are in the left half complex plane (negative real value). Also, some eigenvalues have non-zero imaginary parts.    

      The reason we emphasize the "nonlinear regime" is that the linearization is no longer valid in this regime, i.e. the metabolite concentrations can be negative when we calculate the linearized system. Certainly, there are complex eigenvalues in the Jacobi matrix of any model. However, we would say that there is no clear relationship between the eigenvalues and the response coefficient.      

      Minor suggestions:  

      Line 127: Regarding the source of perturbation, cell division also generates unequal concentration of proteins and metabolites for two daughter cells, and it is an interesting mechanism to create metabolic perturbation. 

      Thank you for the insightful suggestion. We mentioned the cell division as another source of perturbation (Lines 130-131).

      Line 175: I do not quite understand the statement "fixing each metabolite concentration...", since the metabolite concentration in the ODE simulation would change immediately after this fixing.  

      We meant in the sentence that we fixed the concentration of the selected metabolite as the steady state concentration and set the dx/dt of that metabolite to zero. We have rewritten the sentences to avoid confusion (Lines 180-181).

      Figure 2: There are a lot of inconsistencies between the three models. Could we learn which model is more reasonable, or the conclusion here is that the cellular response under perturbation is model-specific? The latter explanation may not be quite satisfactory since we expect the overall cellular property should not be sensitive to the model details. 

      Ideally, the overall cellular property should be insensitive to model details. However, the reality is that the behavior of the models (e.g., steady-state properties, relaxation dynamics, etc.) depends on the specific parameter choices, including what regulation is implemented. I think this situation is part of the motivation for the ensemble modeling (by J. Liao and colleague) that has been developed.  

      Detailed responsiveness would be model specific. For example, FBP has a fairly strong effect in the Boecker model, but less so in the Khodayari model, and the opposite effect in the Chassagnole model (Fig. 2). Our question was whether there are common tendencies among kinetic models that tend to show model-specific behavior.  

      Reviewer 2 (Public Review):

      (1) In the study on determining key metabolites affecting responses to perturbations (starting from line 171), the authors fix the values of individual concentrations to their steady-state values and observe the responses. Such a procedure adds artificial constraints to the network because, in the natural responses of cells (and models) to perturbations, it is highly unlikely that metabolites will not evolve in time. By fixing the values of specific metabolites, the authors prohibit the metabolic network from evolving in the most optimal way to compensate for the perturbation. Instead of this procedure, have the authors considered for this task applying techniques from variance-based sensitivity analysis (Sobol, global sensitivity analysis), where they can calculate the first-order sensitivity index and total effect index? Using this technique, the authors would be able to determine the key metabolites while allowing for metabolic responses to perturbations without unnatural constraints. 

      Thank you for the useful suggestion for studying the roles of each metabolite for responsiveness. We have computed the total sensitivity index (Homma and Salteli, 1996) for each metabolite of each model (Fig.S5). The total sensitivity indices of ATP are high-ranked in Khodayari- and Chassagnole model, while it is middle-ranked in the Boecker model. We believe that the importance of the adenyl cofactors is highlighted also in terms of the Sobol’ sensitivity analysis (the figure is referred in Lines 193-195). 

      We have encountered a minor difficulty for computing the sensitivity index. For the computation of the sensitivity index, we need to carry out the following Monte Carlo integral, 

      where the superscript (m) is the sample number index. The subscript i represents the ith element of the vector x, and ~i represents the vector x except for the ith element. The tilde stands for resampling.  

      There are several conserved quantities in each model. For independent resampling, we need to deal with the conserved quantities. For the Boecker and Chassagnole models, we picked a single metabolite from each conservation law and solved its concentration algebraically to make the metabolite concentration the dependent variable. Then, we can resample the metabolite concentration of one metabolite without changing the concentrations of other metabolites, which are independent variables.  

      However, in the Khodayari model, it was difficult to solve the dependent variables because the model has about 800 variables. Therefore, we gave up the computations of the sensitivity indices of the metabolites whose concentration is part of any conserved quantities, namely NAD, NADH, NADP, NADPH, Q8, and Q8H2.

      (2) To follow up on the previous remark, the authors state that the metabolites that augment the response coefficient when their concentration is fixed tend to be allosteric regulators. The authors should report which allosteric regulations are implemented in each of the models so that one can compare against Figure 2. Again, the effect of allosteric regulation by a specific metabolite that is quantified the way the authors did is biased by fixing the concentration value - it is true that negative feedback is broken when the metabolite concentration is fixed, however, in the rate law, there is still the fixed inhibition term with its value corresponding to the inhibition at the steady state. To see the effect of allosteric regulation by a metabolite, one can change the inhibition constants instead of constraining the responses with fixed concentrations.  

      We have listed the substrate-level regulations (Table S1-3). Also, we re-ran the simulation with reduced the effect of the substrate-level regulations for the reactions that are suspected to influence the change of the response coefficient. Instead of fixing the concentrations (Fig. S6). 

      The impact of substrate-level regulations is discussed in Lines 203-212.   

      We replaced "allosteric regulation" with "substrate-level regulation" because we noticed that some regulations are not necessarily allosteric.

      (3) Given the role of ATP in metabolic processes, the authors' finding of the sensitivity of the three networks' responses to perturbations in the AXP concentrations seems reasonable. However, drawing such firm conclusions from only three models, with each of them built around one steady state and having one kinetic parameter set despite that they were built for different physiologies, raises some questions. It is well-known in studies related to basins of attraction of the steady states that the nonlinear responses also depend on the actual steady states, the values of kinetic parameters, and implemented kinetic rate law, i.e., not only on the topology of the underlying systems. In the population of only three models, we cannot exclude the possibility of overlaps and strong similarities in the values of kinetic parameters, steady states, and enzyme saturations that all affect and might bias the observed responses. Ideally, to eliminate the possibility of such biases, one should simulate responses of a large population of models for multiple physiologies (and the corresponding steady states) and multiple parameter sets per physiology. This can be a difficult task, but having more kinetic models in this work would go a long way toward more convincing results. Recently, E. coli nonlinear kinetic models from several groups appeared that might help in this task, e.g., Haiman et al., PLoS Comput Biol, 17(1): e1008208, (2021), Choudhury et al., Nat Mach Intell, 4, 710-719, (2022); Hu et al., Metab Eng, 82, 123-133 (2024), Narayanan et al., Nat Commun, 15:723, (2024). 

      We have computed the responsiveness of 215 models generated by the MASSpy package (Haiman et al, 2021). Several model realizations showed a strong responsiveness, i.e. a broader distribution of the response coefficient (Fig.S8), and mentioned in Lines 339-341.

      We would like to mention that the three models studied in the present manuscript have limited overlap in terms of kinetic rate law and, accordingly, parameter values. In the Khodayari model, all reactions are bi-uni or uni-uni reactions implemented by mass-action kinetics, while the Boecker and Chassagnole models use the generalized Michaelis-Menten type rate laws. Also, the relationship between the response coefficients of the original model and the linearized model highlights the differences between the models (Fig. S1). If the models were somewhat effectively similar, the scatter plots of the response coefficient of the original- and linearized model should look similar among the three models. However, the three panels show completely different trends. Thus, the three models have less similarity even when they are linearized around the steady states. 

      (4) Can the authors share their insights on what could be the underlying reasons for the bimodal distribution in Figure 1E? Even after adding random reactions, the distribution still has two modes - why is that?  

      We have not yet resolved why only the Khodayari model shows the bimodal distribution of the response coefficient. However, by examining the time courses, the dynamics of the Khodayari model look like those of the excitable systems. This feature may contribute to the bimodal distribution of the response coefficient. In the future, we would like to show whether the system is indeed the excitable system and whcih reactions contribute to such dynamics.

      (5) Considering the effects of the sparsity of the networks on the perturbation responses (from line 223 onwards), when we compare the three analyzed models, it is clear that the Khodayari et al. model is a superset of the other two models. Therefore, this model can be considered as, e.g., Chassagnole model with Nadd reactions (though not randomly added). Based on Figures 1b and S2, one can observe that the responses of the Khodayari models have stronger responses, which is exactly opposite to the authors' conclusion that adding the reactions weakens the responses.

      The authors should comment on this.  

      The sparsity of the network is defined by the ratio of the number of metabolites to the number of reactions. Note that the Khodayari model is a superset of the Boecker and Chassagnole models in terms of the number of reactions, but also in terms of the number of metabolites (Boecker does not have the pentose phosphate pathway, Chassagnole does not have the TCA cycle, and neither has oxyative phosphorylation). Thus, even if we manually add reactions to the Boecker model, for example, we cannot obtain a network that is equivalent to the Khodayari model.  We added one sentence to clarify the point (Lines 254-255).

      Recommendations for the authors: 

      (1) Some typos: Line 57, remove ?; Line 134, correct "relaxation". 

      Thank you for pointing out. We fixed the typos.

      (2) Lines 510-515, please rewrite/clarify, it is confusing what are you doing. 

      We rewrote the sentences (Lines 529-532). We are sorry for the confusion.

      (3) Line 522, where are the expressions above Leq and K*? 

      Leq appears in the original paper of the Boecker model, but we decided not to use Leq. We apologize for not removing Leq from the present manuscript. The * in K* is the wildcard for representing the subscripts. We added a description for the role of “*”. 

      (4) Lines 525-530, based on the wording, it seems like you test first for 128 initial concentrations if the models converge back to the steady state and then you generate another set of 128 initial concentrations - is this what you are doing, or you simply use the 128 initial concentrations that have passed the test? 

      We apologize for the confusion. We did the first thing. We have rewritten the sentence to make it clearer. 

      (5) Figure 3, caption, by "broken line," did the authors mean "dashed line"? 

      We meant dashed line. We changed “broken line” to “dashed line”.

    1. eLife Assessment

      This manuscript describes an important study of a new lipid-mediated regulation mechanism of adenylyl cyclases. The biochemical evidence provided is convincing and will trigger more research in this mechanism. This manuscript will be of interest to all scientists working on lipid regulation and adenylyl cyclases.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that the Gαs-stimulated activity of human membrane adenylyl cyclases (mAC) can be enhanced or inhibited by certain unsaturated fatty acids (FA) in an isoform-specific fashion. Thus, with IC50s in the 10-20 micromolar range, oleic acid affects 3-fold stimulation of membrane-preparations of mAC isoform 3 (mAC3) but it does not act on mAC5. Enhanced Gαs-stimulated activities of isoforms 2, 7, and 9, while mAC1 was slightly attenuated, but isoforms 4, 5, 6, and 8 were unaffected. Certain other unsaturated octadecanoic FAs act similarly. FA effects were not observed in AC catalytic domain constructs in which TM domains are not present. Oleic acid also enhances the AC activity of isoproterenol-stimulated HEK293 cells stably transfected with mAC3, although with lower efficacy but much higher potency. Gαs-stimulated mAC1 and 4 cyclase activity were significantly attenuated in the 20-40 micromolar by arachidonic acid, with similar effects in transfected HEK cells, again with higher potency but lower efficacy. While activity mAC5 was not affected by unsaturated FAs, neutral anandamide attenuated Gαs-stimulation of mAC5 and 6 by about 50%. In HEK cells, inhibition by anandamide is low in potency and efficacy. To demonstrate isoform specificity, the authors were able to show that membrane preparations of a domain-swapped AC bearing the catalytic domains of mAC3 and the TM regions of mAC5 are unaffected by oleic acid but inhibited by anandamide. To verify in vivo activity, in mouse brain cortical membranes 20 μM oleic acid enhanced Gαs-stimulated cAMP formation 1.5-fold with an EC50 in the low micromolar range.

      Strengths:

      (1) A convincing demonstration that certain unsaturated FAs are capable of regulating membrane adenylyl cyclases in an isoform-specific manner, and the demonstration that these act at the AC transmembrane domains.

      (2) Confirmation of activity in HEK293 cell models and towards endogenous AC activity in mouse cortical membranes.

      (3) Opens up a new direction of research to investigate the physiological significance of FA regulation of mACs and investigate their mechanisms as tonic or regulated enhancers or inhibitors of catalytic activity.

      (4) Suggests a novel scheme for the classification of mAC isoforms.

      Comments on revised version:

      The issues I raised have largely been addressed. A minor concern relates to the legend for Figure 2C, where, according to the author's rebuttal, the vertical axis is "The ratio would be (Gsα + oleic acid stimulation) / (Gsα stimulation)" Otherwise, my general evaluation of the importance of the manuscript stands as stated in my initial review, namely, that the manuscript presents data and results that add a new dimension to existing paradigms for AC regulation, and will prompt future research into the role of physiological lipids in isoform-specific activation or inhibition of AC in tissues.

    3. Reviewer #3 (Public review):

      Summary:

      Landau et al. have submitted a manuscript describing for the first time that mammalian adenylyl cyclases can serve as membrane receptors. They have also identified the respective endogenouse ligands which act via AC membrane linkers to modify and control Gs-stimulated AC activity either towards enhancement or inhibition of ACs which is family and ligand-specific. Overall, they have used classical assays such as adenylyl cyclase and cAMP accumulation assays combined with molecular cloning and mutagenesis to provide exceptionally strong biochemical evidence for the mechanism of the involved pathway regulation.

      Strengths:

      The authors have gone the whole long classical way from having a hypothesis that ACs could be receptors to a series of MS studies aimed at ligand indentification, to functional studies of how these candidate substances affect the activity of various AC families in intact cells. They have used a large array of techniques with a paper having clear conceptual story and several strong lines of evidence.

      Comments on revised version:

      In general, the authors have addressed my comments satisfactorily apart from the suggestion to use a lower ISO concentration in their assay or at least to discuss this issue, cite relevant literature etc. Pending this small amendment I would to fine to proceed.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors show that the Gαs-stimulated activity of human membrane adenylyl cyclases (mAC) can be enhanced or inhibited by certain unsaturated fatty acids (FA) in an isoform-specific fashion. Thus, with IC50s in the 10-20 micromolar range, oleic acid affects 3-fold stimulation of membrane-preparations of mAC isoform 3 (mAC3) but it does not act on mAC5. Enhanced Gαs-stimulated activities of isoforms 2, 7, and 9, while mAC1 was slightly attenuated, but isoforms 4, 5, 6, and 8 were unaffected. Certain other unsaturated octadecanoic FAs act similarly. FA effects were not observed in AC catalytic domain constructs in which TM domains are not present. Oleic acid also enhances the AC activity of isoproterenol-stimulated HEK293 cells stably transfected with mAC3, although with lower efficacy but much higher potency. Gαs-stimulated mAC1 and 4 cyclase activity were significantly attenuated in the 20-40 micromolar by arachidonic acid, with similar effects in transfected HEK cells, again with higher potency but lower efficacy. While activity mAC5 was not affected by unsaturated FAs, neutral anandamide attenuated Gαs-stimulation of mAC5 and 6 by about 50%. In HEK cells, inhibition by anandamide is low in potency and efficacy. To demonstrate isoform specificity, the authors were able to show that membrane preparations of a domain-swapped AC bearing the catalytic domains of mAC3 and the TM regions of mAC5 are unaffected by oleic acid but inhibited by anandamide. To verify in vivo activity, in mouse brain cortical membranes 20 μM oleic acid enhanced Gαs-stimulated cAMP formation 1.5-fold with an EC50 in the low micromolar range.

      Strengths:

      (1) A convincing demonstration that certain unsaturated FAs are capable of regulating membrane adenylyl cyclases in an isoform-specific manner, and the demonstration that these act at the AC transmembrane domains.

      (2) Confirmation of activity in HEK293 cell models and towards endogenous AC activity in mouse cortical membranes.

      (3) Opens up a new direction of research to investigate the physiological significance of FA regulation of mACs and investigate their mechanisms as tonic or regulated enhancers or inhibitors of catalytic activity.

      (4) Suggests a novel scheme for the classification of mAC isoforms.

      Weaknesses:

      (1) Important methodological details regarding the treatment of mAC membrane preps with fatty acids are missing.

      We will address this issue in more detail.

      (2) It is not evident that fatty acid regulators can be considered as "signaling molecules" since it is not clear (at least to this reviewer) how concentrations of free fatty acids in plasma or endocytic membranes are hormonally or otherwise regulated.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #2 (Public review):

      Summary:

      The authors extend their earlier findings with bacterial adenylyl cyclases to mammalian enzymes. They show that certain aliphatic lipids activate adenylyl cyclases in the absence of stimulatory G proteins and that lipids can modulate activation by G proteins. Adding lipids to cells expressing specific isoforms of adenylyl cyclases could regulate cAMP production, suggesting that adenylyl cyclases could serve as 'receptors'.

      Strengths:

      This is the first report of lipids regulating mammalian adenylyl cyclases directly. The evidence is based on biochemical assays with purified proteins, or in cells expressing specific isoforms of adenylyl cyclases.

      Weaknesses:

      It is not clear if the concentrations of lipids used in assays are physiologically relevant. Nor is there evidence to show that the specific lipids that activate or inhibit adenylyl cyclases are present at the concentrations required in cell membranes. Nor is there any evidence to indicate that this method of regulation is seen in cells under relevant stimuli.

      Although this question is not the subject of this ms., we will address this question in more detail in the discussion of the revision.

      Reviewer #3 (Public review):

      Summary:

      Landau et al. have submitted a manuscript describing for the first time that mammalian adenylyl cyclases can serve as membrane receptors. They have also identified the respective endogenouse ligands which act via AC membrane linkers to modify and control Gs-stimulated AC activity either towards enhancement or inhibition of ACs which is family and ligand-specific. Overall, they have used classical assays such as adenylyl cyclase and cAMP accumulation assays combined with molecular cloning and mutagenesis to provide exceptionally strong biochemical evidence for the mechanism of the involved pathway regulation.

      Strengths:

      The authors have gone the whole long classical way from having a hypothesis that ACs could be receptors to a series of MS studies aimed at ligand indentification, to functional studies of how these candidate substances affect the activity of various AC families in intact cells. They have used a large array of techniques with a paper having clear conceptual story and several strong lines of evidence.

      Weaknesses:

      (1) At the beginning of the results section, the authors say "We have expected lipids as ligands". It is not quite clear why these could not have been other substances. It is because they were expected to bind in the lipophilic membrane anchors? Various lipophilic and hydrophilic ligands are known for GPCR which also have transmembrane domains. Maybe 1-2 additional sentences could be helpful here.

      Will be done as suggested.

      (2) In stably transfected HEK cells expressing mAC3 or mAC5, they have used only one dose of isoproterenol (2.5 uM) for submaximal AC activation. The reference 28 provided here (PMID: 33208818) did not specifically look at Iso and endogenous beta2 adrenergic receptors expressed in HEK cells. As far as I remember from the old pharmacological literature, this concentration is indeed submaximal in receptor binding assays but regarding AC activity and cAMP generation (which happen after signal amplification with a so-called receptor reserve), lower Iso amounts would be submaximal. When we measure cAMP, these are rather 10 to 100 nM but no more than 1 uM at which concentration response dependencies usually saturate. Have the authors tried lower Iso concentrations to prestimulate intracellular cAMP formation? I am asking this because, with lower Iso prestimulation, the subsequent stimulatory effects of AC ligands could be even greater.

      The best way to address this issue is to establish a concentration-response curve for Iso-stimulated cAMP formation using the permanently transfected cells. We note that in the past isoproterenol concentrations used in biochemical or electrophysiological experiments differed substantially.

      (3) The authors refer to HEK cell models as "in vivo". I agree that these are intact cells and an important model to start with. It would be very nice to see the effects of the new ligands in other physiologically relevant types of cells, and how they modulate cAMP production under even more physiological conditions. Probably, this is a topic for follow-up studies.

      The last sentence is correct.

      Appraisal of whether the authors achieved their aims, and whether the results support their conclusions:

      The authors have achieved their aims to a very high degree, their results do nicely support their conclusions. There is only one point (various classical GPCR concentrations, please see above) that would be beneficial to address.

      Without any doubt, this is a groundbreaking study that will have profound implications in the field for the next years/decades. Since it is now clear that mammalian adenylyl cyclases are receptors for aliphatic fatty acids and anandamide, this will change our view on the whole signaling pathway and initiate many new studies looking at the biological function and pathophysiological implications of this mechanism. The manuscript is outstanding.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      It is not clear from the methods section how free FAs were applied to membrane preparations or HEK293 cells. Were FAs solubilized in organic solvents, or introduced as micelles?

      The requested info is inserted into the M&M section

      Could the authors comment on what is known about the concentration of oleic acid and other non-saturated fatty acids in plasma membranes relative to those required to produce allosteric effects on cyclase activity?

      This info is now included in the last paragraph of the discussion.

      It would be worthwhile to test the effect of FAs on basal (not Gαs-stimulated) activity of mACs.

      This has been carried with mAC isoforms 2, 3, 7, and 9 in which oleic acid enhances Gsα-stimulated activity. Due to the low levels of basal activities interpretable data were not obtained.

      Do triglycerides esterified with oleic acid stimulate mAC3 and other sensitive isoforms?

      Experiments were done with triolein and 2-oleoyl-glycerol (the answer is no). The data are presented in Fig. 3 and in the appendix Fig.’s 8, 9, 14; structural formulas in appendix 2 Fig. 4 were updated.

      Does the quantity plotted on the vertical axis of Figure 1, right panel represent "Fractional Stimulation by Oleic acid" rather than simply "Fold Stimulation"? Clearly, as shown in the two left-most panels, Gαs stimulates both mAC and mAC5. Rather it seems that the ratio (oleic acid stimulation) / (Gαs stimulation) remains constant. This observation supports the statement in the discussion that "We suppose that in mAC3 the equilibrium of two differing ground states favors a Gαs-unresponsive state and the effector oleic acid concentration-dependently shifts this equilibrium to a Gαs-responsive state". It could also be said that the effect of oleic acid is additive, and in constant proportion to that of Gαs.

      This comment certainly is related to Fig. 2:

      The ratio would be (Gsα + oleic acid stimulation) / (Gsα-stimulation), i.e., fractional stimulation by addition of oleic acid is identical to fold stimulation.

      We have amended the legend to fig. 2C for clarification.

      The last sentence is wrong because oleic acid alone does not stimulate.

      It is stated on page 3, 2nd to last line that "The action of oleic acid on mAC3 was instantaneous...". Since the earliest time point is taken at 5 minutes, the claim that the action of the lipid is instantaneous cannot be made. Information about kinetics would be useful to have, since it is possible that the lipid must be released from a micelle and be incorporated into the AC membrane fraction before it is active.

      The first point is 3 min.

      We deleted the word “instantaneous” and added the correlation coefficients for both conditions in the legend to appendix 2; fig. 1 for clarification.

      The data spread in Figure 4 and other figures showing similar data is significant, to the extent that the computed value for EC50 may not be of high precision. Authors should cite the correlation coefficient for the overall fit and uncertainty for the EC50 value (in addition to significances by t-test of individual data points).

      This will not add valuable information. Pearsons correlation coefficients are only for linear relationships.

      (cf. N.N. Kachouie, W. Deebani (2020) Association Factor for Identifying Linear and Nonlinear Correlations in Noisy Conditions. Entropy 22:440)

      The "switch" between relatively low potency and high efficacy in membrane preps to high potency and low efficacy in cells is remarkable. Could this have a methodological basis or is it reflective of the mechanism by which FAs access mACs in membrane preps vs. cell membranes, or perhaps some biochemical transformation of the lipid in cells?

      Honestly, we do not know.

      The authors should note that there is some precedence for this work:

      J Nakamura , N Okamura, S Usuki, S Bannai, Inhibition of adenylyl cyclase activity in brain membrane fractions by arachidonic acid and related unsaturated fatty acids. Arch Biochem Biophys. 2001 May 1;389(1):68-76. doi: 10.1006/abbi.2001.2315.

      The effects of FA deficiencies on AC and related activities have been noted:

      Alam SQ, Mannino SJ, Alam BS, McDonough K Effect of essential fatty acid deficiency on forskolin binding sites, adenylate cyclase, and cyclic AMP-dependent protein kinase activity, the levels of G proteins and ventricular function in rat heart. J Mol Cell Cardiol. 1995 Aug;27(8):1593-604. doi: 10.1016/s0022-2828(95)90491-3. PMID: 8523422

      The latter publications are supportive of, and provide context to, the author's findings.

      Both references are mentioned and cited.

      Minor points:

      The significance of the coloring scheme in Figure 5C bar graph should be stated in the legend.

      Done.

      In the introduction, it is stated that "The protein displayed two similar catalytic domains (C1 and C2) and two dissimilar hexahelical membrane anchors (TM1 and TM2)". In both cases, the respective domains can be said to be similar in overall fold, but - certainly in the case of the catalytic domains - different in amino acid sequence in functionally important regions of the domain.

      Done: Changed wording.

      The statement in the introduction that "The domain architecture, TM1-C1-TM2-C2, clearly indicated a pseudoheterodimeric protein composed of two concatenated bacterial precursor proteins" The authors refer to the fact that mammalian enzymes are pseudo heterodimers whereas bacterial type III cyclases are dimers of identical subunits.

      Done.

      Reviewer #2 (Recommendations for the authors):

      The title need not state that a 'new class of receptors' has been identified. There is no direct evidence that the lipids bind to the enzymes, and the affinities can only be surmised from the EC50 graphs. To call a protein a receptor requires evidence to show that the binding is specific by showing that binding can be inhibited by a large excess of 'unlabelled' ligand. This could have been done by procuring labelled lipids for experimental verification.

      As is well known, lipids easily bind to proteins. In this study no purified proteins were used. Therefore, binding assays most likely would result in unreliable data.

      The paper would have benefitted from showing sequence alignments in the TM domains of the ACs discussed in the paper. Further, a phylogenetic tree of mammalian ACs would also reveal which enzymes from other species may be regulated similarly to those described in the paper. This would be important for researchers who use other model organisms to study cAMP signalling.

      Such data are in multiple papers accessible in the literature. Where deemed appropriate we inserted references.

      Figures 1A and 1B show data from only two experiments. A third experiment would have been useful in order to show the statistical significance of the data.

      At this stage more experiments would not have affected further experimental plans.

      Statements made in the text (for example, the last paragraph on page 6) state only the mean value and not the SDs. This would have been important to include even if the data is shown in the appendix. The same is true in the Legend of Figure 2. Why have the authors decided to use SEM and not SDs?

      The reason is specified in M&M.

      Concentrations of lipids used in biochemical assays are in the micromolar range. This suggests that we have moderate affinity binding, more in the range of an enzyme for a substrate rather than a receptor-ligand interaction.

      We happen to disagree. Clearly, the differential activities, enhancing or attenuating Gsα-stimulated mAC activities is most plausibly explained by mAC receptor properties. mACs have enzyme activities using fatty acids as substrates.

      The authors add lipids to cells and show changes in cAMP levels in their presence and absence. They also discuss how these extracellular lipids could be produced. Do you think this is necessary in vivo, though? Could the lipids present in membranes naturally act as regulators? Do specific lipid concentrations differ in different cell types, suggesting tissue-specific regulation of these mammalian Acs?

      These are things that could be discussed in the manuscript.

      The last paragraph of the discussion deals with these questions.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank you for sending our manuscript for the second round of review.  We are encouraged by the comments from reviewer #2 that our supplementary work on naïve T cells and antibody blockade work satisfied their previous concerns and is important for our work.

      The Editors raised concerns that we have shared preliminary data on Nrn1 and AMPAR double knockout mice.  We apologize for our enthusiasm for these studies.  Because of the publication model by eLife, we shared that data not because we needed to persuade the reviewer for publication purposes but rather to agree with the reviewer that the molecular target of Nrn1 is important, and we are progressing in understanding this subject.


      The following is the authors’ response to the original reviews.

      To Reviewer #1:

      Thank you for your thorough review and comments on our work, which you described as “the role of neuritin in T cell biology studied here is new and interesting.”.  We have summarized your comments into two categories: biology and investigation approach, experimental rigor, and data presentation.

      Biology and Investigation approach comments:

      (1) Questions regarding the T cell anergy model:

      Major point “(4) Figure 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.”

      T cell anergy is a well-established concept first described by Schwartz’s group. It refers to the hyporesponsive T cell functional state in antigen-experienced CD4 T cells (Chappert and Schwartz, 2010; Fathman and Lineberry, 2007; Jenkins and Schwartz, 1987; Quill and Schwartz, 1987).  Anergic T cells are characterized by their inability to expand and to produce IL2 upon subsequent antigen re-challenge. In this paper, we have borrowed the existing in vivo T cell anergy induction model used by Mueller’s group for T cell anergy induction (Vanasek et al., 2006).  Specifically, Thy1.1+ Ctrl or Nrn1-/- TCR transgenic OTII cells were co-transferred with the congenically marked Thy1.2+ WT polyclonal Treg cells into TCR-/- mice.  After anergy induction, the congenically marked TCR transgenic T cells were recovered by sorting based on Thy1.1+ congenic marker, and subsequently re-stimulation ex vivo with OVA323-339 peptide. We evaluated the T cell anergic state based on OTII cell expansion in vivo and IL2 production upon OVA323-339 restimulation ex vivo.  

      “The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this.”

      Because the anergy model by Mueller's group is well established (Vanasek et al., 2006), we did not feel that additional effort was required to validate this model as the reviewer suggested. Moreover, the limited IL2 production among the control cells upon restimulation confirms the validity of this model.

      “The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVAspecific cells, rather than by an anergic status”.

      Cells from Ctrl and Nrn1-/- mice on a homogeneous TCR transgenic (OTII) background were used in these experiments. The possibility that substantial variability of TCR expression or different expression levels of the transgenic TCR could have impacted IL2 production rather than anergy induction is unlikely.

      Overall, we used this in vivo anergy model to evaluate the Nrn1-/- T cell functional state in comparison to Ctrl cells under the anergy induction condition following the evaluation of Nrn1 expression, particularly in anergic T cells.  Through studies using this anergy model, we observed a significant change in Treg induction among OTII cells. We decided to pursue the role of Nrn1 in Treg cell development and function rather than the biology of T cell anergy as evidenced by subsequent experiments.

      Minor points “(6) On which markers are anergic cells sorted for RNAseq analysis?”

      Cells were sorted out based on their congenic marker marking Ctrl or Nrn1-/- OTII cells transferred into the host mice.  We did not specifically isolate anergic cells for sequencing.

      (2) Question regarding the validity of iTreg differentiation model.

      Major point: “(5) Figure 2A-C and Figure 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest generating data with purified nTreg. Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript. Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”.

      We thank Reviewer #1 for their feedback. While it is true that iTregs made in vitro and in vivo generated pTregs display several distinctions (e. g., differences in Foxp3 expression stability, for example), we strongly disagree with this statement by Revieweer#1 “The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance.”  The induced Treg cell (iTreg) model was established over 20 years ago (Chen et al., 2003; Zheng et al., 2002), and the model is widely adopted with over 2000 citations. Further, it has been instrumental in understanding different aspects of regulatory T cell biology (Hurrell et al., 2022; John et al., 2022; Schmitt and Williams, 2013; Sugiura et al., 2022).   

      Because we have observed reduced pTreg generation in vivo, we choose to use the in vitro iTreg model system to understand the mechanistic changes involved in Treg cell differentiation and function, specifically, neuritin’s role in this process. We have made no claim that iTreg cell biology is identical to pTreg generated in vivo or nTreg cells. However, the iTreg culture system has proved to be a good in vitro system for deciphering molecular events involved in complex processes. As such, it remains a commonly used approach by many research groups in the Treg cell field (Hurrell et al., 2022; John et al., 2022; Sugiura et al., 2022). Moreover, applying the iTreg in vitro culture system has been instrumental in helping us identify the cell electrical state change in Nrn1-/- CD4 cells and revealed the biological link between Nrn1 and the ionotropic AMPA receptor (AMPAR), which we will discuss in the subsequent discussion. It is technically challenging to use nTreg cells for T cell electrical state studies due to their heterogeneous nature from development in an in vivo environment and the effect of manipulation during the nTreg cell isolation process, which can both affect the T cell electrical state.   

      “Moreover, it was shown in the article of Gonzalez-Figueroa 2021 that Nrn1-/- nTreg retained a normal suppressive function, which would not be what is concluded by the authors of this manuscript.” 

      We have also carried out nTreg studies in vitro in addition to iTreg cells. Similar to Gonzalez-Figueroa et al.'s findings, we did not observe differences in suppression function between Nrn1-/- and WT nTreg using the in vitro suppression assay. However, Nrn1-/- nTreg cells revealed reduced suppression function in vivo (Fig. 2D-L). In fact, Gonzalez-Figueroa et al. observed reduced plasma cell formation after OVA immunization in Treg-specific Nrn1-/- mice, implicating reduced suppression from Nrn1-/- follicular regulatory T (Tfr) cells. Thus, our observation of the reduced suppression function of Nrn1-/- nTreg toward effector T cell expansion, as presented in Fig. 2D-L, does not contradict the results from Gonzalez-Figueroa et al. Rather, the conclusions of these two studies agree that Nrn1 can play important roles in immune suppression observable in vivo that are not captured readily by the in vitro suppression assay.

      “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      We have stated in the manuscript on page 7 line 208 that “Similar proportions of Foxp3+ cells were observed in Nrn1-/- and Ctrl cells under the iTreg culture condition, suggesting that Nrn1 deficiency does not significantly impact Foxp3+ cell differentiation”. In the revised manuscript, we will include the data on the proportion of Foxp3+ cells before iTreg restimulation.

      (3) Confirmation of transcriptomic data regarding amino acids or electrolytes transport change

      Minor point“(3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have indeed already performed such experiments corroborating the transcriptomics data on differential amino acid and nutrient transporter expression. Specifically, we loaded either iTreg or Th0 cells with membrane potential (MP) dye and measured MP level change after adding the complete set of amino acids (complete AA).  Upon entry, the charge carried by AAs may transiently affect cell membrane potential. Different AA transporter expression patterns may show different MP change patterns upon AA entry, as we showed in Author response image 1. We observed reduced MP change in Nrn1-/- iTreg compared to the Ctrl, whereas in the context of Th0 cells, Nrn1-/- showed enhanced MP change than the Ctrl. We can certainly include these data in the revised manuscript.

      Author response image 1.

      Membrane potential change induced by amino acids entry. a. Nrn1-/- or WT iTreg cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs. b. Nrn1-/- or WT Th0 cells loaded with MP dye and MP change was measured upon the addition of a complete set of AAs.

      (4) EAE experiment data assessment

      Minor point ”(5) Figure 5F. How are cells re-stimulated? If polyclonal stimulation is used, the experiment is not interesting because the analysis is done with lymph node cells. This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”

      In the EAE study, the Nrn1-/- mice exhibit similar disease onset but a protracted non-resolving disease phenotype compared to the WT control mice.  Several reasons may contribute to this phenotype: 1. Enhanced T effector cell infiltration/persistence in the central nervous system (CNS); 2. Reduced Treg cell-mediated suppression to the T effector cells in the CNS; 3. Protracted non-resolving inflammation at the immunization site has the potential to continue sending T effector cells into CNS, contributing to persistent inflammation. Based on this reasoning, we examined the infiltrating T effector cell number and Treg cell proportion in the CNS.  We also restimulated cells from draining lymph nodes close to the inflammation site, looking for evidence of persistent inflammation.  When mice were harvested around day 16 after immunization, the inflammation at the local draining lymph node should be at the contraction stage.  We stimulated cells with PMA and ionomycin intended to observe all potential T effector cells involved in the draining lymph node rather than only MOG antigen-specific cells.  We disagree with Reviewer #1’s assumption that “This analysis should either be performed with cells from the CNS or with MOG restimulation with lymph node cells.”. We think the experimental approach we have taken has been appropriately tailored to the biological questions we intended to answer.

      Experimental rigor and data presentation.

      (1) data labeling and additional supporting data

      Major points

      (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.

      (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figures 1A-C to have single-cell and quantitative data as well.

      Minor points  

      (1) Line 119, 120 of the text. It is said that one of the most up-regulated genes in anergic cells is Nrn1 but the data is not shown.

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We can adapt the labeling and provide additional data, including Nrn1 staining on Treg cells and flow graphs for pmTOR and pS6 staining (Fig. 3H), as requested by Reviewer #1.

      (2) Experimental rigor:

      General comments:

      “However, it is disappointing that reading this manuscript leaves an impression of incomplete work done too quickly.”

      We were discouraged to receive the comment, “this manuscript leaves an impression of incomplete work done too quickly.” Our study of this novel molecule began without any existing biological tools such as antibodies, knockout mice, etc.  Over the past several years, we have established our own antibodies for Nrn1 detection, obtained and characterized Nrn1 knockout mice, and utilized multiple approaches to identify the molecular mechanism of Nrn1 function. Through the use of the in vitro iTreg system described in this manuscript, we identified the association of Nrn1 deficiency with cell electrical state change, potentially connected to AMPAR function. We have further corroborated our findings by generating Nrn1 and AMPAR T cell specific double knockout mice and confirmed that T cell specific AMPAR deletion could abrogate the phenotype caused by the Nrn1 deficiency (see Support Figure 2).  We did not include the double knockout data in the current manuscript because AMPAR function has not yet been studied thoroughly in T cell biology, and we feel this topic warrants examination in its own right.  However, the unpublished data support the finding that Nrn1 modulates the T cell electrical state and, consequently, metabolism, ultimately influencing tolerance and immunity.  In its current form, the manuscript represents the first characterization of the novel molecule Nrn1 in anergic cells, Tregs, and effector T cells. While this work has led to several exciting additional questions, we disagree that the novel characterization we have presented Is incomplete. We feel that our present data set, which squarely highlights Nrn1’s role as an important immune regulator while shedding unprecedented light on the molecular events involved, will be of considerable interest to a broad field of researchers.

      “Multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms.”

      We have indeed used multiple in vivo models to reveal Nrn1's function in Treg differentiation, Treg suppression function, T effector cell differentiation and function, and the overall impact on autoimmune disease. Because the impact of ion channel function is often context-dependent, we examined the biological outcome of Nrn1 deficiency in several in vivo contexts.  We would appreciate it if Reviewer#1 would provide a specific example, given the Nrn1 phenotype, of how to proceed deeper to investigate the electrical change in the in vivo models.

      “Major points

      (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t-test may lead to thinking that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.”

      We respectfully disagree with Reviewer #1 on the question of statistical power and significance to our work. We have used 5-8 mice/group for each in vivo model and 3-4 technical replicates for the in vitro studies, with a minimum of 2-3 replicate experiments. These group sizes and replication numbers are in line with those seen in high-impact publications. While some differences between Ctrl and Nrn1-/- appear small, they have significant biological consequences, as evidenced by the various Nrn1-/- in vivo phenotypes. Furthermore, we believe we have subjected our data to the appropriate statistical tests to ensure rigorous analysis and representation of our findings.

      To Reviewer #2.

      We thank Reviewer #2 for the careful review of the manuscript. We especially appreciate the comments that “The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well done to strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.”

      “The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. “  

      We fully understand this comment from Reviewer #2. The main mechanism we identified contributing to the functional defect of Nrn1-/- T cells involves novel effects on the electric and metabolic state of the cells. Although we referenced neuronal studies that indicate Nrn1 is the auxiliary protein for the ionotropic AMPA-type glutamate receptor (AMPAR) and may affect AMPAR function, we did not provide any evidence in this manuscript as the topic requires further in-depth study.   

      For the benefit of this discussion, we include our preliminary Nrn1 and AMPAR double knockout data (Author response image 2), which indicates that abrogating AMPAR expression can compensate for the defect caused by Nrn1 deficiency in vitro and in vivo. This preliminary data supports the notion that Nrn1 modulates AMPAR function, which causes changes in T cell electric and metabolic state, influencing T cell differentiation and function.  

      Author response image 2.

      Deletion of AMPAR expression in T cells compensates for the defect caused by

      Nrn1 deficiency. Nrn1-/- mice were crossed with T cell-specific AMPAR knockout mice (AMPARfl/flCD4Cre+) mice. The following mice were generated and used in the experiment: T cell specific AMPAR-knockout and Nrn1 knockout mice (AKONKO), Nrn1 knockout mice (AWTNKO), Ctrl mice (AWTNWT). a. Deletion of AMPAR compensates for the iTreg cell defect observed in Nrn1-/- CD4 cells. iTreg live cell proportion, cell number, and Ki67 expression among Foxp3+ cells 3 days after aCD3 restimulation. b. Deletion of AMPAR in T cells abrogates the enhanced autoimmune response in Nrn1-/- Mouse in the EAE disease model. Mouse relative weight change and disease score progression after EAE disease induction.  

      Ion channels can influence cell metabolism through multiple means (Vaeth and Feske, 2018; Wang et al., 2020). First, ion channels are involved in maintaining cell resting membrane potential. This electrical potential difference across the cell membrane is essential for various cellular processes, including metabolism (Abdul Kadir et al., 2018; Blackiston et al., 2009; Nagy et al., 2018; Yu et al., 2022). Second, ion channels facilitate the movement of ions across cell membranes. These ions are essential for various metabolic processes. For example, ions like calcium (Ca2+), potassium (K+), and sodium (Na+) play crucial roles in signaling pathways that regulate metabolism (Kahlfuss et al., 2020). Third, ion channel activity can influence cellular energy balance due to ATP consumption associated with ion transport to maintain ion balances (Erecińska and Dagani, 1990; Gerkau et al., 2019). This, in turn, can impact processes like ATP production, which is central to cellular metabolism. Thus, ion channel expression and function determine the cell’s bioelectric state and contribute to cell metabolism (Levin, 2021).

      Because the AMPAR function has not been thoroughly studied using a genetic approach in T cells, we do not intend to include the double knockout data in this manuscript before fully characterizing the T cell-specific AMPAR knockout mice.  

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We appreciate the reviewer’s comments. This comment reflects two concerns in data interpretation:

      (1) Are Nrn1-/- naïve T cells fundamentally different from WT cells? Does this fundamental difference contribute to the observed electrical and metabolic phenotype in iTreg or Th0 cells? This is a very good question we will perform the experiments as the reviewer suggested. While Nrn1 is expressed at a basal (low) level in naïve T cells, deletion of Nrn1 may cause changes in naïve T cell phenotype.   

      (2) Is the Nrn1-/- phenotype caused by Nrn1 functional deficiency or due to the secondary effect of Nrn1 deletion, such as non-physiological cell membrane structure changes?

      We have done the following experiment to address this concern.  We have cultured WT T cells in the presence of Nrn1 antibody and compared the outcome with Nrn1-/- iTreg cells (Figure 3-figure supplement 2D,E,F). WT iTreg cells under antibody blockade exhibited similar changes as Nrn1-/- iTreg cells, confirming the physiological relevance of the Nrn1-/- phenotype.

      Manuscript Revision based on the Reviewer’s suggestions:

      Reviewer #1:

      Major points (3) Figure 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. 

      Following the suggestion by Reviewer#1, We have included the Nrn1 Ab staining on activated Nrn1-/- CD4 cells in Figure 1D. We have also added the staining of cell surface Nrn1 on Treg cells in Figure 1-figure supplement 1D.

      Major point: (5) “Moreover, we do not even know what the % of Foxp3 cells is in the iTreg used (after differentiation and 20h of re-stimulation) and whether this % is the same between Ctlr and Nrn1 KO cells.”

      In the revised manuscript, we have included the proportion of Foxp3+ cells among Nrn1-/- and ctrl iTreg cells developed under the iTreg culture condition in Figure 2A.

      Minor points  

      (2) For all figures showing %, the titles of the Y axes are written in an odd way. For example, it is written "Foxp3% CD4". It would be more conventional and clearer to write "% Foxp3+ / CD4+" or "% Foxp3+ among CD4+".

      Following reviewer#1’s suggestion, we have changed the Y-axis label in all the relevant figures.

      (3) Would not it be possible to perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane? This would be a more interesting demonstration than transcriptomic data.”

      We appreciate Review# 1’s suggestion regarding “perform experiments showing the ability of cells to transport amino acids or electrolytes across the plasma membrane”.  We have used AAinduced cellular MP changes to confirm differential AA transporter expression patterns and their impact on cellular MP levels.  The data are included in the revised manuscript in Figure 3H and Figure 4K.

      (4) For certain staining (Figure 3E, H) it would be important to show the raw data, in addition to MFI or % values.

      We appreciated Reviewer #1’s suggestion and have included the histogram staining data for Figure 3E. We have moved the original Figure 3H to the supplemental figure and included the histogram staining data in Figure 3-figure supplement 1C.  Similarly, we have included the histogram staining data in Figure 4-figure supplement 1C.

      Reviewer#2:

      “Although the biochemical and informatics studies are well-performed, it is my opinion that these results are inconclusive in part due to the absence of key "naive" control groups. This limits my ability to understand the significance of these data.

      Specifically, studies of the electrical and metabolic state of Nrn1-/- inducible Treg cells (iTregs) would benefit from similar data collected from wild-type and Nrn1-/- naive CD4 T cells.”

      We greatly appreciate Reviewer#2’s suggestion and have carried out experiments on naïve CD4 cells derived from Nrn1-/- and WT mice. We have compared membrane potential, AA-induced MP change between Nrn1-/- and WT naïve T cells, and the metabolic state of Nrn1-/- and WT naïve T cells by carrying out glucose stress tests and mitochondria stress tests using a seahorse assay.  Moreover, to investigate whether the phenotype revealed in Nrn1-/- CD4 cells was caused by a secondary effect of cell membrane structure change due to Nrn1 deletion, we carried out Nrn1 antibody blockade in WT CD4 cells and investigated the phenotypic change. These new results are included in Figure 3-figure supplement 2.

      Reference:

      Abdul Kadir, L., M. Stacey, and R. Barrett-Jolley. 2018. Emerging Roles of the Membrane Potential: Action Beyond the Action Potential. Front Physiol 9:1661.

      Blackiston, D.J., K.A. McLaughlin, and M. Levin. 2009. Bioelectric controls of cell proliferation: ion channels, membrane voltage and the cell cycle. Cell Cycle 8:3527-3536.

      Chappert, P., and R.H. Schwartz. 2010. Induction of T cell anergy: integration of environmental cues and infectious tolerance. Current opinion in immunology 22:552-559.

      Chen, W., W. Jin, N. Hardegen, K.J. Lei, L. Li, N. Marinos, G. McGrady, and S.M. Wahl. 2003. Conversion of peripheral CD4+CD25- naive T cells to CD4+CD25+ regulatory T cells by TGF-beta induction of transcription factor Foxp3. The Journal of experimental medicine 198:1875-1886.

      Erecińska, M., and F. Dagani. 1990. Relationships between the neuronal sodium/potassium pump and energy metabolism. Effects of K+, Na+, and adenosine triphosphate in isolated brain synaptosomes. J Gen Physiol 95:591-616.

      Fathman, C.G., and N.B. Lineberry. 2007. Molecular mechanisms of CD4+ T-cell anergy. Nat Rev Immunol 7:599-609.

      Gerkau, N.J., R. Lerchundi, J.S.E. Nelson, M. Lantermann, J. Meyer, J. Hirrlinger, and C.R. Rose. 2019. Relation between activity-induced intracellular sodium transients and ATP dynamics in mouse hippocampal neurons. The Journal of physiology 597:5687-5705.

      Hurrell, B.P., D.G. Helou, E. Howard, J.D. Painter, P. Shafiei-Jahani, A.H. Sharpe, and O. Akbari. 2022. PD-L2 controls peripherally induced regulatory T cells by maintaining metabolic activity and Foxp3 stability. Nature communications 13:5118.

      Jenkins, M.K., and R.H. Schwartz. 1987. Antigen presentation by chemically modified splenocytes induces antigen-specific T cell unresponsiveness in vitro and in vivo. The Journal of experimental medicine 165:302-319.

      John, P., M.C. Pulanco, P.M. Galbo, Jr., Y. Wei, K.C. Ohaegbulam, D. Zheng, and X. Zang. 2022. The immune checkpoint B7x expands tumor-infiltrating Tregs and promotes resistance to anti-CTLA-4 therapy. Nature communications 13:2506.

      Kahlfuss, S., U. Kaufmann, A.R. Concepcion, L. Noyer, D. Raphael, M. Vaeth, J. Yang, P. Pancholi, M. Maus, J. Muller, L. Kozhaya, A. Khodadadi-Jamayran, Z. Sun, P. Shaw, D. Unutmaz, P.B. Stathopulos, C. Feist, S.B. Cameron, S.E. Turvey, and S. Feske. 2020. STIM1-mediated calcium influx controls antifungal immunity and the metabolic function of nonpathogenic Th17 cells. EMBO molecular medicine 12:e11592.

      Levin, M. 2021. Bioelectric signaling: Reprogrammable circuits underlying embryogenesis, regeneration, and cancer. Cell 184:1971-1989.

      Nagy, E., G. Mocsar, V. Sebestyen, J. Volko, F. Papp, K. Toth, S. Damjanovich, G. Panyi, T.A. Waldmann, A. Bodnar, and G. Vamosi. 2018. Membrane Potential Distinctly Modulates Mobility and Signaling of IL-2 and IL-15 Receptors in T Cells. Biophys J 114:2473-2482.

      Quill, H., and R.H. Schwartz. 1987. Stimulation of normal inducer T cell clones with antigen presented by purified Ia molecules in planar lipid membranes: specific induction of a long-lived state of proliferative nonresponsiveness. Journal of immunology (Baltimore, Md. : 1950) 138:3704-3712.

      Schmitt, E.G., and C.B. Williams. 2013. Generation and function of induced regulatory T cells. Frontiers in immunology 4:152.

      Sugiura, A., G. Andrejeva, K. Voss, D.R. Heintzman, X. Xu, M.Z. Madden, X. Ye, K.L. Beier, N.U. Chowdhury, M.M. Wolf, A.C. Young, D.L. Greenwood, A.E. Sewell, S.K. Shahi, S.N. Freedman, A.M. Cameron, P. Foerch, T. Bourne, J.C. Garcia-Canaveras, J. Karijolich, D.C. Newcomb, A.K. Mangalam, J.D. Rabinowitz, and J.C. Rathmell. 2022. MTHFD2 is a metabolic checkpoint controlling effector and regulatory T cell fate and function. Immunity 55:65-81.e69.

      Vaeth, M., and S. Feske. 2018. Ion channelopathies of the immune system. Current opinion in immunology 52:39-50.

      Vanasek, T.L., S.L. Nandiwada, M.K. Jenkins, and D.L. Mueller. 2006. CD25+Foxp3+ regulatory T cells facilitate CD4+ T cell clonal anergy induction during the recovery from lymphopenia. Journal of immunology (Baltimore, Md. : 1950) 176:5880-5889.

      Wang, Y., A. Tao, M. Vaeth, and S. Feske. 2020. Calcium regulation of T cell metabolism. Current opinion in physiology 17:207-223.

      Yu, W., Z. Wang, X. Yu, Y. Zhao, Z. Xie, K. Zhang, Z. Chi, S. Chen, T. Xu, D. Jiang, X. Guo, M. Li, J. Zhang, H. Fang, D. Yang, Y. Guo, X. Yang, X. Zhang, Y. Wu, W. Yang, and D. Wang. 2022. Kir2.1-mediated membrane potential promotes nutrient acquisition and inflammation through regulation of nutrient transporters. Nature communications 13:3544.

      Zheng, S.G., J.D. Gray, K. Ohtsuka, S. Yamagiwa, and D.A. Horwitz. 2002. Generation ex vivo of TGF-beta-producing regulatory T cells from CD4+CD25- precursors. Journal of immunology (Baltimore, Md. : 1950) 169:4183-4189.

    2. eLife Assessment

      The neurotrophic factor Neuritin can moderate T-cell tolerance and immunity through both regulatory T (Treg) and effector T cells, promoting Treg cell expansion and suppression while dampening effector T cells to mediate the inflammatory response. Neuritin expression influences the membrane potential, ion channels, and nutrient transporter expression patterns of CD4+ T cells, contributing to differential metabolic states in Treg and effector T cells. These findings are solid and important for understanding immune regulation involving Treg cells and effector T cells.

    3. Reviewer #1 (Public review):

      The manuscript by Yu et al seeks to investigate the role of neuritin (Nrn1), identified as a marker of anergic cells, in the biology of regulatory (Tregs) and conventional (Tconv) T cells. Although the role of Nrn1 expressed by Tregs has already been explored (Gonzalez-Figueroa 2021 cited in the manuscript), this manuscript shows original new data suggesting that this molecule would be important in promoting Treg function and inhibiting Tconv effector function by acting at the level of membrane potential and molecule transport across the plasma membrane. However, multiple models have been used, but none has been studied thoroughly enough to provide really conclusive and unambiguous data. For example, 5 different models were used to study T cells in vivo. It would have been preferable to use fewer, but to go further in the study of mechanisms. In the absence of more in-depth study, the conclusions drawn by the authors are often open to questions. Major points concern the fact that there are not enough biological replicates for most experiments and some critical controls and data are lacking. Also, the authors have used iTregs rather than nTregs for many experiments (see below). This is unfortunate because the role of neuritin in T cell biology studied here is new and interesting.

      Major points (in the order in which they appear in the text).

      (1) A real weakness of this work is the fact that in most of the results shown, there are few biological replicates with differences that are often small between Ctrl and Nrn1 -/-. The systematic use of student's t test may lead to think that the differences are significant, which is often misleading given the small number of samples, which makes it impossible to know whether the distributions are Gaussian and whether a parametric test can be used. RNAseq bulk data are based on biological duplicates, which is open to criticism.<br /> (2) The authors use Nrn1+/+ and Nrn1+/- cells indiscriminately as control cells on the basis of similar biology between Nrn1+/+ and Nrn1+/- cells at homeostasis. However, it is quite possible that the Nrn1+/- cells have a phenotype in situations of in vitro activation or in vivo inflammation (cancer, EAE). It would be important to discriminate Nrn1+/- and Nrn1+/+ cells in the data or to show that both cell types have the same phenotype in these conditions too.<br /> (3) Fig 1A-D. Since the authors are using the Nrp1 KO mice, it would be important to confirm the specificity of the anti-Nrn1 mAb by FACS. Once verified, it would be important to add FACS results with this mAb in Figs 1A-C to have single-cell and quantitative data as well.<br /> (4) Fig 1E-H. The authors assume that this immunization protocol induces anergic cells, but they provide no experimental evidence for this. It would be useful to show that T cells are indeed anergic in this model, especially those that are OVA-specific. The lack of IL-2 production by Cltr cells could be explained by the presence of fewer OVA-specific cells, rather than by an anergic status.<br /> (5) Fig 2A-C and Fig 3. The use of iTregs to try to understand what is happening in vivo is problematic. iTregs are cells that have probably no equivalent in vivo, and so may have no physiological relevance. In any case, they are different from pTreg cells generated in vivo. Working with pTreg may be challenging, that is why I would suggest to generate data with purified nTreg.<br /> (6) Fig 2D-L. The model is designed to study the role of Nrn1 in nTreg. However, the % of Foxp3+ among CD45.2 nTreg cells fell to 5-15% of CD4+ cells (Fig 2F). Since we do not know what is the % of Foxp3 among the injected cells, we do not know whether this very low % is due to very high Treg instability or to preferential expansion of contaminating Tconvs. It is possible that the % of Tconv contaminant is high since Treg were sorted using beads and not FACS on some experiments. As it is very likely that there are Tconv contaminants that would be Nrn1-/- in the group transferred with Nrn1-/- "nTreg", the higher tumor rejection could be due to an overactivation of Nrn1-/- Tconvs (rather than a defect in Nrn1-/- Treg function).

    4. Reviewer #2 (Public review):

      Summary:

      This manuscript explores the role of Nrn1 in T cell tolerance. A previous study has demonstrated that Nrn1 is up-regulated in the Tfr fraction of Foxp3+ T regulatory cells. These authors now confirm expression of Nrn1 in iTregs as well as report here that Nrn1 is also greatly over-expressed in anergic CD4 T cells, and this is the stepping off point for this investigation.

      Most remarkably, experiments show that anergy induction is defective when T cells cannot express Nrn1. Furthermore, differentiation to a Foxp3+ iTreg phenotype is inhibited in the absence of Nrn1, and the iTregs that do develop appear functionally defective. On the other hand, the differentiation and expansion of Teff cells appears to be enhanced following deletion of Nrn1. With such defects in anergy induction as well as dysregulated Treg and Teff cell survival and function, auto reactive effector T cell activation becomes unrestrained and Nrn1-/- mice are more susceptible to severe EAE development.

      Strengths:

      The characterizations of T cell Nrn1 expression both in vitro and in vivo are comprehensive and convincing. The author's use of both Nrn1-/- T cells as well as anti-Nrn1 neutralizing Ab to achieve similar results is a strength. The in vivo functional studies of anergy development, Treg suppression, and EAE development are also well performed and strengthen the notion that Nrn1 is an important regulator of CD4 responsiveness.

      Weaknesses:

      The major weakness of this study stems from a lack of a clear molecular mechanism involving Nrn1. Previous studies of Nrn1 have suggested its role as a soluble molecule involved in intracellular communication, perhaps influencing cellular ion channel function and/or triggering downstream NFAT and mTOR activation. However, a unique receptor for Nrn1 has not been discovered and it remains unclear whether it acts in a cell-intrinsic or cell-extrinsic fashion for any particular cell type.

      Data shown here provide evidence for alterations in the electrical and metabolic state of iTreg and Teff cells when the Nrn1 gene is deleted. Nrn1-/- Tregs and Teff cells each express a unique pattern of genes associated with Neurotransmitter receptor, Metal ion transmembrane transport, Amino acid transport, and mTORC1 signaling activities, different than that seen in wild-type mice. It remains unclear how Nrn1 reinforces the membrane potential and facilitates aerobic glycolysis during and after iTreg differentiation, and yet suppresses the membrane potential and restrains aerobic glycolysis during Teff cell differentiation. Importantly, naive cells lacking Nrn1 expression show normal electrical and metabolic behaviors.

    1. eLife Assessment

      In this valuable study, Li and others identified cell membrane receptors for juvenile hormone (JH), a terpenoid hormone in insects that regulates their development and reproduction. While intracellular receptors for JH have been well characterized, membrane receptors for JH remained elusive for many years. Although the authors provide solid evidence to indicate that the receptor tyrosine kinases they identified bind to JH in vitro and induce non-genomic responses in cultured cells, their loss-of-function phenotypes are not consistent with known JH functions, so additional work is required to define physiological roles of these receptors.

    2. Reviewer #1 (Public review):

      Summary:

      Juvenile Hormone (JH) plays a key role in insect development and physiology. Although the intracellular receptor for JH was identified long ago, a number of studies have shown that part of JH functions should be fulfilled through binding to an unknown membrane receptor, which was proposed to belong to the RTK family. In this study, the authors screened all RTKs from the H. armigera genome for their ability to mediate responses to JH III treatment both in cultured cells and in developping animals. They also present convincing evidence that CAD96CA and FGFR1 directly bind JH III, and that their role might be conserved in other insect species.

      Strengths:

      Altogether, the experimental approach is very complete and elegant, providing evidence for the role of CAD96CA and FGFR1 in JH signalling using different techniques and in different contexts. I believe that this work will open new perspectives to study the role of JH and better understand what is the contribution of signalling through membrane receptors for JH-dependent developmental processes.

      Weaknesses:

      Unfortunately, the revised manuscript does not show significant improvement. While the identification of the receptors is highly convincing, important issues about the biological relevance remain unaddressed.

      First, the main point I raised about the first version of this article is that the redundancy and/or specificity of the two receptors should be clarified, even though I understand that it cannot be deeply investigated here. I believe that this point, shared by all reviewers, is highly relevant for the scope of this work. In this revised version, it is still unclear how to reconcile gain and loss-of-function experiments and the different expression profiles of the receptors.

      Second, the newly added explanations and pieces of discussion provided about the mild in vivo phenotypes of early pupation upon Cad96ca or Fgfr1 knock-out do not clarify the issue but instead put emphasis on methodological issues. Indeed, it is not clear whether the mild phenotypes reflect the biological role of Cad96ca and Fgfr1, or the redundancy of these two RTKs (and/or others), or some issue with the knock-out strategy (partial efficiency, mosaicism...).

      Finally, parts of the updated discussion and the modifications to the figures are confusing.

    3. Reviewer #2 (Public review):

      Summary:

      Juvenile hormone (JH) is a pleiotropic terpenoid hormone in insects that mainly regulates their development and reproduction. In particular, its developmental functions are described as the "status quo" action, as its presence in the hemolymph (the insect blood) prevents metamorphosis-initiating effects of ecdysone, another important hormone in insect development, and maintains the juvenile status of insects.

      While such canonical functions of JH are known to be mediated by its intracellular receptor complex composed of Met and Tai, there have been multiple reports suggesting the presence of cell membrane receptor(s) for JH, which mediate non-genomic effects of this terpenoid hormone. In particular, the presence of receptor tyrosine kinase(s) that phosphorylate Met/Tai in response to JH and thus indirectly affect the canonical JH signaling pathway has been strongly suggested. Given the importance of JH in insect physiology and the fact that the JH signaling pathway is a major target of insect growth regulators, elucidating the identify and functions of putative JH membrane receptors is of great significance from both basic and applied perspectives.

      In the present study, the authors identified candidate receptors for such cell membrane JH receptors, CAD96CA and FGFR1, in the cotton bollworm Helicoverpa armigera.

      Strengths:

      Their in vitro analyses are conducted thoroughly using multiple methods, which overall supports their claim that these receptors can bind to JH and mediate their non-genomic effects.

      Weaknesses:

      Results of their in vivo experiments, particularly those of their loss-of-function analyses using CRISPR mutants are still preliminary, and the results rather indicate that these membrane receptors do not have any physiologically significant roles in vivo. More specifically, previous studies in lepidopteran species have clearly and repeatedly shown that precocious metamorphosis is the hallmark phenotype for all JH signaling-deficient larvae. In contrast, the present study showed that Cad96ca and Fgfr1 G0 mutants only showed slight acceleration in their pupation timing, which is not a typical phenotype one would expect from JH signaling deficiency. This is inconsistent with their working model provided in Figure 6, which indicates that these cell membrane JH receptors promote the canonical JH signaling by phosphorylating Met/Tai.

      If the authors argue that this slight acceleration of pupation is indeed a major JH signaling-deficient phenotype in Helicoverpa, they need to provide more data to support their claim by analyzing CRISPR mutants of other genes involved in JH signaling, such as Jhamt and Met. An alternative explanation is that there is functional redundancy between CAD96CA and FGFR1 in mediating phosphorylation of Met/Tai. This possibility can be tested by analyzing double knockouts of these two receptors.

      Currently, the validity of their calcium imaging analysis in Figure 5 is also questionable. When performing calcium imaging in cultured cells, it is critically important to treat all the cells at the end of each experiment with a hormone or other chemical reagents that universally induce calcium increase in each particular cell line. Without such positive control, the validity of calcium imaging data remains unknown, and readers cannot properly evaluate their results.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, Li et al. identified CAD96CA and FGF1 among 20 receptor tyrosine kinase receptors as mediators of JH signaling. By performing a screen in HaEpi cells with overactivated JH signaling, the authors pinpointed two main RTKs that contribute to the transduction of JH. Using the CRISPR/Cas9 system to generate mutants, the authors confirmed that these RTKs are required for normal JH activation, as precocious pupariation was observed in their absence. Additionally, the authors demonstrated that both CAD96CA and FGF1 exhibit a high affinity for JH, and their activation is necessary for the proper phosphorylation of Tai and Met, transcription factors that promote the transcriptional response. Finally, the authors provided evidence suggesting that the function of CAD96CA and FGF1 as JH receptors is conserved across insects.

      Strengths:

      The data provided by the authors are convincing and support the main conclusions of the study, providing ample evidence to demonstrate that phosphorylation of the transducers Met and Tai mainly depends on the activity of two RTKs. Additionally, the binding assays conducted by the authors support the function of CAD96CA and FGF1 as membrane receptors of JH. The study's results validate, at least in H. amigera, the predicted existence of membrane receptors for JH.

      Weaknesses:

      The authors have provided evidences that the Cad96Ca and FGF1 RTK receptors contribute to JH signaling through CRISPR/Cas9, inducing precocious metamorphosis, although not to the same extent as absence of JH. Therefore, it still remains unclear whether these RTKs are completely required for pathway activation or only necessary for high activation levels during the last larval stage.

      While the authors have included some additional data, the mechanism by which different RTKs function in transducing JH signaling in a tissue specific manner is still unclear. As the authors note in the discussion, it is possible that other RTKs may also play a role in facilitating the transduction of JH signaling.

      Lastly, the study does not yet explain how RTKs with known ligands could also bind JH and contribute to JH signaling activation. Although receptor promiscuity has been suggested as a possible mechanism, future studies could explore whether activation of RTK pathways by their known ligands induces certain levels of JH transducer phosphorylation, which, in the presence of JH, could contribute to full pathway activation without the need for direct JH-RTK binding.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Juvenile Hormone (JH) plays a key role in insect development and physiology. Although the intracellular receptor for JH was identified long ago, a number of studies have shown that part of JH functions should be fulfilled through binding to an unknown membrane receptor, which was proposed to belong to the RTK family. In this study, the authors screened all RTKs from the H. armigera genome for their ability to mediate responses to JH III treatment both in cultured cells and in developing animals. They also present convincing evidence that CAD96CA and FGFR1 directly bind JH III, and that their role might be conserved in other insect species.

      Strengths:

      Altogether, the experimental approach is very complete and elegant, providing evidence for the role of CAD96CA and FGFR1 in JH signalling using different techniques and in different contexts. I believe that this work will open new perspectives to study the role of JH and better understand what is the contribution of signalling through membrane receptors for JH-dependent developmental processes.

      Weaknesses:

      I don't see major weaknesses in this study. However, I think that the manuscript would benefit from further information or discussion regarding the relationship between the two newly identified receptors. Experiments (especially in HEK-293T cells) suggest that CAD96CA and FGFR1 are sufficient on their own to transduce JH signalling. However, they are also necessary since loss-of-function conditions for each of them are sufficient to trigger strong effects (while the other is supposed to be still present).

      Thank you for the suggestion. We have added the discussion in the text: "CAD96CA and FGFR1 have similar functions in JH signaling, including transmitting JH signal for Kr-h1 expression, larval status maintaining, rapid intracellular calcium increase, phosphorylation of transcription factors MET1 and TAI, and high affinity to JH III. CAD96CA and FGFR1 are essential in the JH signal pathway, and loss-of-function for each is sufficient to trigger strong effects on pupation. The difference is that CAD96CA expression has no tissue specificity, and the Fgfr1 gene is highly expressed in the midgut; possibly, it plays a significant role in the midgut. Other possibility is that they play roles by forming heterodimer with each other or other RTKs, which needs to be addressed in future study. CAD96CA and FGFR1 transmit JH III signals in three different insect cell lines, suggesting their conserved roles in other insects.".

      In addition, despite showing different expression patterns, the two receptors seem to display similar developmental functions according to loss-of-function phenotypes. It is therefore unclear how to draw a model for membrane receptor-mediated JH signalling that includes both CAD96CA and FGFR1.

      Thank you for your question. We have modified the figure and the legends to make the conception clear.

      Reviewer #2 (Public Review):

      Summary:

      Juvenile hormone (JH) is a pleiotropic terpenoid hormone in insects that mainly regulates their development and reproduction. In particular, its developmental functions are described as the "status quo" action, as its presence in the hemolymph (the insect blood) prevents metamorphosis-initiating effects of ecdysone, another important hormone in insect development, and maintains the juvenile status of insects. While such canonical functions of JH are known to be mediated by its intracellular receptor complex composed of Met and Tai, there have been multiple reports suggesting the presence of cell membrane receptor(s) for JH, which mediate non-genomic effects of this terpenoid hormone. In particular, the presence of receptor tyrosine kinase(s) that phosphorylate Met/Tai in response to JH and thus indirectly affect the canonical JH signaling pathway has been strongly suggested. Given the importance of JH in insect physiology and the fact that the JH signaling pathway is a major target of insect growth regulators, elucidating the identification and functions of putative JH membrane receptors is of great significance from both basic and applied perspectives. In the present study, the authors identified candidate receptors for such cell membrane JH receptors, CAD96CA and FGFR1, in the cotton bollworm Helicoverpa armigera.

      Strengths:

      Their in vitro analyses are conducted thoroughly using multiple methods, which overall supports their claim that these receptors can bind to JH and mediate their non-genomic effects.

      Weaknesses:

      Results of their in vivo experiments, particularly those of their loss-of-function analyses using CRISPR mutants are still preliminary, and the results rather indicate that these membrane receptors do not have any physiologically significant roles in vivo. More specifically, previous studies in lepidopteran species have clearly and repeatedly shown that precocious metamorphosis is the hallmark phenotype for all JH signaling-deficient larvae. In contrast, the present study showed that Cad96ca and Fgfr1 G0 mutants only showed a slight acceleration in their pupation timing, which is not a typical phenotype one would expect from JH signaling deficiency. This is inconsistent with their working model provided in Figure 6, which indicates that these cell membrane JH receptors promote the canonical JH signaling by phosphorylating Met/Tai.

      If the authors argue that this slight acceleration of pupation is indeed a major JH signaling-deficient phenotype in Helicoverpa, they need to provide more data to support their claim by analyzing CRISPR mutants of other genes involved in JH signaling, such as Jhamt and Met. An alternative explanation is that there is functional redundancy between CAD96CA and FGFR1 in mediating phosphorylation of Met/Tai. This possibility can be tested by analyzing double knockouts of these two receptors.

      Thank you for your question and suggestion. The cadherin 96ca (CAD96CA) and fibroblast growth factor receptor 1 (FGFR1) were finally determined as JH cell membrane receptors by their roles in JH regulated-gene expression, maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and their JH-binding affinity. Their roles as JH cell membrane receptors were further determined by knockdown and knockout of them in vivo and in cell lines, and overexpression of them in mammal HEK-293T heterogeneously. Figure 6 is drafted by these solidate evidences.

      Cad96ca and Fgfr1 G0 mutants caused slight acceleration of pupation is one of the types of evidence of JH signaling-deficient. Othe evidences include a set of gene expression and the block of JH induced-rapid intracellular calcium increase.

      Kr-h1 is a typical indicator gene at the downstream of Jhamt and in JH signaling, so we used it as an indicator to examine JH signaling. Jhamt and Met or other genes might be affected in Cad96ca and Fgfr1 G0 mutants, which can be examined in future study.

      We have discussed the question that Cad96ca and Fgfr1 G0 mutants only showed a slight acceleration in their pupation timing: "Homozygous Cad96ca null Drosophila die at late pupal stages (Wang et al., 2009). However, we found that 86% of the larvae of the Cad96ca mutant successfully pupated in G0 generation, although earlier than the control. Similarly, null mutation of Fgfr1 or Fgfr2 in mouse is embryonic lethal (Arman et al., 1998; Deng et al., 1994; Yamaguchi et al., 1994). In D. melanogaster, homozygous Htl (Fgfr) mutant embryos die during late embryogenesis, too (Beati et al., 2020; Beiman et al., 1996; Gisselbrecht et al., 1996). However, in H. armigera, 91% of larvae successfully pupated in G0 generation after Fgfr1 knockout. The low death rate after Cad96ca and Fgfr1 knockout might be because of following reasons, including the editing efficiency (67% and 61% for Cad96ca mutant and Fgfr1 mutant, respectively), the chimera of the gene knockout at the G0 generation, and the redundant RTKs that play similar roles in JH signaling, similar to the redundant roles of MET and Germ-cell expressed bHLH-PAS (GCE) in JH signaling (Liu et al., 2009), which needs to obtain alive G1 homozygote mutants and double knockout of these two receptors in future study. We indeed observed that the eggs did not hatch successfully after mixed-mating of G0 Cad96ca mutant or Fgfr1 mutant, respectively, but the reason was not addressed further due to the embryonic death. By the similar reasons, most of the Cad96ca and Fgfr1 mutants showed a slight acceleration of pupation (about one day) without the typical precocious metamorphosis (at least one instar earlier) phenotype caused by JH signaling defects (Daimon et al., 2012; Fukuda, 1944; Riddiford et al., 2010) and JH pathway gene deletions (Abdou et al., 2011; Liu et al., 2009). On other side, JH can regulate gene transcription by diffusing into cells and binding to the intracellular receptor MET to conduct JH signal, which might affect the results of gene knockdown and knockout.".

      Currently, the validity of their calcium imaging analysis in Figure 5 is also questionable. When performing calcium imaging in cultured cells, it is critically important to treat all the cells at the end of each experiment with a hormone or other chemical reagents that universally induce calcium increase in each particular cell line. Without such positive control, the validity of calcium imaging data remains unknown, and readers cannot properly evaluate their results.

      Thank you for your question. For Figure 5, our goal was to demonstrate that JH can induce calcium mobilization through CAD96CA and FGFR1. Controls have been established between different experimental groups within the same cell, as well as between different cells. Increasing the positive experimental group would make the results more complex.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Li et al. identified CAD96CA and FGF1 among 20 receptor tyrosine kinase receptors as mediators of JH signaling. By performing a screen in HaEpi cells with overactivated JH signaling, the authors pinpointed two main RTKs that contribute to the transduction of JH. Using the CRISPR/Cas9 system to generate mutants, the authors confirmed that these RTKs are required for normal JH activation, as precocious pupariation was observed in their absence. Additionally, the authors demonstrated that both CAD96CA and FGF1 exhibit a high affinity for JH, and their activation is necessary for the proper phosphorylation of Tai and Met, transcription factors that promote the transcriptional response. Finally, the authors provided evidence suggesting that the function of CAD96CA and FGF1 as JH receptors is conserved across insects.

      Strengths:

      The data provided by the authors are convincing and support the main conclusions of the study, providing ample evidence to demonstrate that phosphorylation of the transducers Met and Tai mainly depends on the activity of two RTKs. Additionally, the binding assays conducted by the authors support the function of CAD96CA and FGF1 as membrane receptors of JH. The study's results validate, at least in H. amigera, the predicted existence of membrane receptors for JH.

      Weaknesses:

      The study has several weaknesses that need to be addressed. Firstly, it is not clear what criteria were used by the authors to discard several other RTKs that were identified as repressors of JH signaling. For example, while NRK and Wsck may not fulfill all the requirements to become JH receptors, other evidence, such as depletion analysis and target gene expression, suggests they are involved in proper JH signaling activation.

      Thank you for your question. We screened the RTKs sequentially, including examining the roles of 20 RTKs identified in the H. armigera genome in JH regulated-gene expression to obtain primary candidates, followed by screening of the candidates by their roles in maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and affinity to JH. WSCK was not involved in the phosphorylation of MET and TAI and was discarded during subsequent screening. NRK did not bind to JH III, did not meet the screening strategy, and was discarded.

      We increased the information in the Introduction: "We screened the RTKs sequentially, including examining the roles of 20 RTKs identified in the H. armigera genome in JH regulated-gene expression to obtain primary candidates, followed by screening of the candidates by their roles in maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and affinity to JH. The cadherin 96ca (CAD96CA) and fibroblast growth factor receptor 1 (FGFR1) were finally determined as JH cell membrane receptors by their roles in JH regulated-gene expression, maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and their JH-binding affinity. Their roles as JH cell membrane receptors were further determined by knockdown and knockout of them in vivo and cell lines, and overexpression of them in mammal HEK-293T heterogeneously.".

      We increased discussion: "This study found six RTKs that respond to JH induction by participating in JH induced-gene expression and intracellular calcium increase, however; they exert different functions in JH signaling, and finally CAD96CA and FGFR1 are determined as JH cell membrane receptors by their roles in JH induced-phosphorylation of MET and TAI and binding to JH III. We screen the RTKs transmitting JH signal primarily by examining some of JH induced-gene expression. By examining other genes or by other strategies to screen the RTKs might find new RTKs functioning as JH cell membrane receptors; however, the key evaluation indicators, such as the binding affinity of the RTKs to JH and the function in transmitting JH signal to maintain larval status are essential.".

      Secondly, the expression of the six RTKs, which, when knocked down, were able to revert JH signaling activation, was mainly detected in the last larval stage of H. amigera. However, since JH signaling is active throughout larval development, it is unclear whether these RTKs are completely required for pathway activation or only needed for high activation levels at the last larval stage.

      Thank you for the question. We knocked down the genes at last larval stage to observe pupation, which is a relatively simple and easily to be observed target to examine the role of the gene in JH-maintained larval status. The results from CRISPR/Cas9 experiments showed: "Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.".

      Additionally, the mechanism by which different RTKs exert their functions in a specific manner is not clear. According to the expression profile of the different RTKs, one might expect some redundant role of those receptors. In fact the no reversion of phosphorilation of tai and met upon depletion of Wsck in cells with overactivated JH signalling seems to support this idea.

      Nevertheless, and despite the overlapping expression of the different receptors, all RTKs seem to be required for proper pathway activation, even in the case of FGF1 which seems to be only expressed in the midgut. This is an intriguing point unresolved in the study.

      Thank you for your comments. Yes, from our study, different RTKs exert their functions in a specific manner. We have increased discussion: "This study found six RTKs that respond to JH induction by participating in JH induced-gene expression and intracellular calcium increase, however; they exert different functions in JH signaling, and finally CAD96CA and FGFR1 are determined as JH cell membrane receptors by their roles in JH induced-phosphorylation of MET and TAI and binding to JH III. We screen the RTKs transmitting JH signal primarily by examining some of JH induced-gene expression. By examining other genes or by other strategies to screen the RTKs might find new RTKs functioning as JH cell membrane receptors; however, the key evaluation indicators, such as the binding affinity of the RTKs to JH and the function in transmitting JH signal to maintain larval status are essential.".

      Finally, the study does not explain how RTKs with known ligands could also bind JH and contribute to JH signaling activation. in Drosophila, FGF1 is activated by pyramus and thisbe for mesoderm development, while CAD96CA is activated by collagen during wound healing. Now the authors claim that in addition to these ligands, the receptors also bind to JH. However, it is unclear whether these RTKs are activated by JH independently of their known ligands, suggesting a specific binding site for JH, or if they are only induced by JH activation when those ligands are present in a synergistic manner. Alternatively, another explanation could be that the RTK pathways by their known ligands activation may induce certain levels of JH transducer phosphorylation, which, in the presence of JH, contributes to the full pathway activation without JH-RTK binding being necessary.

      Thank you for your professional questions. It is an exciting and challenging to explore the molecular mechanism by which multiple ligands transmit signals through the same receptor. It requires a long-term research plan and in-depth studies. We added discussion in the text: "CAD96CA (also known as Stitcher, Ret-like receptor tyrosine kinase) activates upon epidermal wounding in Drosophila embryos (Tsarouhas et al., 2014) and promotes growth and suppresses autophagy in the Drosophila epithelial imaginal wing discs (O'Farrell et al., 2013). There is a CAD96CA in the genome of the H. armigera, which is without function study. Here, we reported that CAD96CA prevents pupation by transmitting JH signal as a JH cell membrane receptor. We also showed that CAD96CA of other insects has a universal function of transmitting JH signal to trigger Ca2+ mobilization, as demonstrated by the study in Sf9 cell lines of S. frugiperda and S2 cell lines of D. melanogaster.

      FGFRs control cell migration and differentiation in the developing embryo of D. melanogaster (Muha and Muller, 2013). The ligand of FGFR is FGF in D. melanogaste_r (Du et al., 2018_). FGF binds FGFR and triggers cell proliferation, differentiation, migration, and survival (Beenken and Mohammadi, 2009; Lemmon and Schlessinger, 2010). Three FGF ligands and two FGF receptors (FGFRs) are identified in Drosophila (Huang and Stern, 2005). The Drosophila FGF-FGFR interaction is specific. Different ligands have different functions. The activation of FGFRs by specific ligands can affect specific biological processes (Kadam et al., 2009). The FGFR in the membrane of Sf9 cells can bind to Vip3Aa (Jiang et al., 2018). One FGF and one FGFR are in the H. armigera genome, which has yet to be studied functionally. The study found that FGFR prevents insect pupation by transmitting JH signal as a JH cell membrane receptor. Exploring the molecular mechanism and output by which multiple ligands transmit signals through the same receptor is exciting and challenging.".

      Reviewer #1 (Recommendations For The Authors):

      As an experimental suggestion, I will only propose that authors test the double knock-down/knock-out or overexpression of CAD96CA and FGFR1 to give some hints into how redundant/independent the two receptors are.

      Thank you very much for your professional advice. We agree with your point of view that double knockout of CAD96CA and FGFR1 is very important to resolve the redundant/independent of the two receptors, which can make our research more complete. Unfortunately, due to experimental difficulty and time constraints, we did not provide supplementary experiments. In this study, we aim to screen the cell membrane receptors of JH. Therefore, we focused on which RTKs can function as receptors. This article is a preliminary study to identify the cell membrane receptors of JH. To further understand the relationship between the two membrane receptors, we will conduct in-depth research in future work.

      Apart from that, here are some minor points about the manuscript:

      Figure 2A: changing the scale on the y-axis would help to better see the different genotypes (similar to the way it is presented in Figure 5).

      Thanks for your reminding, we have changed the scale in Figure 2A.

      Figure 4J: image settings could be improved to better highlight the green fluorescence.

      Thank you for your advice, we have improved the imaged in Figure 4J.

      In general, the manuscript would benefit from some proofreading since a number of sentences are incorrect.

      Thanks for your reminding, we have carefully revised the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      (1) Although the authors note that there are 21 RTK genes in Drosophila (line 55), I can only see 16 Drosophila RTKs in Figure 1 - Figure Supplement 1. Some important Drosophila RTKs such as breathless are missing. The authors need to redraw the phylogenetic tree.

      Thanks for your reminding, we have presented the new phylogenetic tree in Figure 1-figure supplement 1.

      (2) The accelerated pupation phenotype in Cad96ca and Fgfr1 G0 mutants needs to be better described. In particular, it is critical to examine which developmental stage(s) are shortened in these mutant larvae. Refer to a similar study on a JH biosynthetic enzyme in Bombyx (PMID: 22412378) regarding how to describe the developmental timing phenotype.

      Thank you for your advice. We have re-shown Figure 4E and added the explanation in the text: "In 61 survivors of Cas9 protein plus Cad96ca-gRNA injection, 30 mutants were sequenced, and a mutation efficiency was 49.2%. Similarly, in the 65 survivors of Cas9 protein plus Fgfr1-gRNA injection, 35 mutants were sequenced, and a mutation efficiency was 53.8% (Figure 4C). The DNA sequences, deduced amino acids and off–target were analyzed (Figure 4—figure supplement 1). Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.".

      (3) The editing efficiency described in lines 211-213 is obscure. Does this indicate the percentage of animals with noisy sequencing spectra or the percentage of mutation rates analyzed by TA cloning?

      Thanks for your reminder. We have revised the description in the text: "In 61 survivors of Cas9 protein plus Cad96ca-gRNA injection, 30 mutants were sequenced, and a mutation efficiency was 49.2%. Similarly, in the 65 survivors of Cas9 protein plus Fgfr1-gRNA injection, 35 mutants were sequenced, and a mutation efficiency was 53.8% (Figure 4C). The DNA sequences, deduced amino acids and off–target were analyzed (Figure 4—figure supplement 1). Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.".

      (4) In Figures 4F and G, the authors examined expression levels of some JH/ecdysone responsive genes only at 0 hr-old 6th instar larvae. This single developmental stage is not enough for this analysis. In particular, the expression level of Fgfr1 only goes up in the mid-6th instar according to their own data (Figure 1-Figure Supplement 4), so it is critical to examine expression levels of these genes at least throughout the 6th larval instar.

      Thank you for your advice. Indeed, it is essential to detect the expression levels of JH/ecdysone response genes in the whole sixth instar larvae. Because we observed that the mutation has a shorter feeding stage at the sixth instar, we examined the expression level of the JH/ecdysone response gene at the early sixth instar. Due to the number of mutants obtained in the experiment was small and non-destructive sampling could not be performed in sixth instar period, there were no enough samples to test. In the future, we will generate Cad96ca Fgfr1 double mutations to carry out studies and detect the expression level of JH/ecdysone response genes in the whole sixth instar.

      (5) As mentioned above, some important Drosophila RTKs such as breathless are missing in their analyses. As breathless is a close paralog of heartless (Htl), I am sure that Drosophila breathless is also orthologous to Helicoverpa FGFR1. The authors therefore need to analyze breathless in Figure 5B in addition to Htl.

      Thank you for your advice. We added experiments and the results are shown in Figure 5B and Figure 5—figure supplement 1.

      (6) More discussion about the reason why dsNrk and dsWsck can provide resistance to JHIII in Figure 1 is required.

      Thank you for your advice. We added explanation in the discussion: "It is generally believed that the primary role of JH is to antagonize 20E during larval molting (Riddiford, 2008). The knockdown of Cad96ca, Nrk, Fgfr1, and Wsck showed phenotypes resistant to JH III induction and the decrease of Kr-h1 and increase of Br-z7 expression, but knockdown of Vegfr and Drl only decrease Kr-h1, without increase of Br-z7. Br-z7 is involved in 20E-induced metamorphosis in H. armigera (Cai et al., 2014), whereas, Kr-h1 is a JH early response gene that mediates JH action (Minakuchi et al., 2009) and represses Br expression (Riddiford et al., 2010). The high expression of Br-z7 is possible due to the down-regulation of Kr-h1 in Cad96ca, Nrk, Fgfr1 and Wsck knockdown larvae. The different expression profiles of Br-z7 in Vegfr and Drl knockdown larvae suggest other roles of Vegfr and Drl in JH signaling, which need further study."

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors should consider optimizing their experimental approach by depleting the six candidate RTKs in an early larval stage rather than using a sensitized background with JH application in the last larval stage.

      Thank you for your precious suggestion. We knocked down the genes at last larval stage to observe pupation, which is a relatively simple and easily to be observed target to examine the role of the gene in JH-maintained larval status. The results from CRISPR/Cas9 experiments showed: "Most wild-type larvae showed a phenotype of pupation on time. However, in the Cad96ca mutant, 86% of the larvae (an editing efficiency of 67% by TA clone analysis) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 24 h earlier. In the Fgfr1 mutant, 91% of the larvae (an editing efficiency of 61%) had a shortened feeding stage in the sixth instar and entered the metamorphic molting stage earlier, showing early pupation, with the pupation time being 23 h earlier (Figure 4D and E). The data suggested that CAD96CA and FGFR1 support larval growth and prevent pupation in vivo.". To know the roles of other RTKs in the whole larval development needs future work since a lot of experiments are needed.

      (2) Including a positive control for JH signaling, such as met or tai, would strengthen the assays and provide a benchmark for evaluating the downregulation of target genes and phenotype reversion upon JH application. This addition, especially in Figure 1, would enhance the interpretability of the results.

      Thank you for your suggestion. We agree with your point of view that adding the detection of Met or Tai as a positive control. Our laboratory has reported in previous studies that knockdown of Met leads to decreased expression of genes in the JH signaling pathway and precocious pupation (PMID: 24872508), so we did not repeat this related experiment in this study. In the future, when performg Cad96ca and Fgfr1 double mutant experiments, Met mutant can be generated as a control to provide more references for the interpretation of the results.

      (3) I recommend revising the manuscript to improve readability, particularly in the Results section, where descriptions of the binding part are particularly dense.

      Thank you for your advice. We have carefully revised the manuscript.

      (4) In line 122, please add the reference Wang et al., 2016.

      Thank you for your reminding, we have added the reference in line 125 of the new manuscript.

      (5) The authors should clarify why they chose to test the possible binding to JH of only Cad96CA, FGFR1, and NRK after conducting various assays while including OTK in the study as a negative control. This explanation should be included in the text.

      Thank you for the suggestion. We added the explanation, as described in the text: "We screened the RTKs sequentially, including examining the roles of 20 RTKs identified in the H. armigera genome in JH regulated-gene expression to obtain primary candidates, followed by screening of the candidates by their roles in maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and affinity to JH. The cadherin 96ca (CAD96CA) and fibroblast growth factor receptor 1 (FGFR1) were finally determined as JH cell membrane receptors by their roles in JH regulated-gene expression, maintaining larval status, JH induced-rapid increase of intracellular calcium levels, JH induced-phosphorylation of MET and TAI, and their JH-binding affinity. Their roles as JH cell membrane receptors were further determined by knockdown and knockout of them in vivo and cell lines, and overexpression of them in mammal HEK-293T heterogeneously.".

      "Since Cad96CA, FGFR1, and NRK were not only involved in JH-regulated Kr-h1 expression, JH III-induced delayed pupation, and calcium levels increase, but also involved in MET and TAI phosphorylation, we further analyzed their binding affinity to JH III. OTK did not respond to JH III, so we used it as a control protein on the cell membrane to exclude the possibility of nonspecific binding.".

      (6) The observed embryonic lethality of cad96ca and FGF1 mutants in Drosophila contrasts with the ability of the respective mutants in H. armigera to reach the pupal stage. The authors should discuss this significant difference.

      Thank you for the suggestion. We added the explanation in the discussion, as described in the text: "Homozygous Cad96ca null Drosophila die at late pupal stages (Wang et al., 2009). However, we found that 86% of the larvae of the Cad96ca mutant successfully pupated in G0 generation, although earlier than the control. Similarly, null mutation of Fgfr1 or Fgfr2 in mouse is embryonic lethal (Arman et al., 1998; Deng et al., 1994; Yamaguchi et al., 1994). In D. melanogaster, homozygous Htl (Fgfr) mutant embryos die during late embryogenesis, too (Beati et al., 2020; Beiman et al., 1996; Gisselbrecht et al., 1996). However, in H. armigera, 91% of larvae successfully pupated in G0 generation after Fgfr1 knockout. The low death rate after Cad96ca and Fgfr1 knockout might be because of following reasons, including the editing efficiency (67% and 61% for Cad96ca mutant and Fgfr1 mutant, respectively), the chimera of the gene knockout at the G0 generation, and the redundant RTKs that play similar roles in JH signaling, similar to the redundant roles of MET and Germ-cell expressed bHLH-PAS (GCE) in JH signaling (Liu et al., 2009), which needs to obtain alive G1 homozygote mutants and double knockout of these two receptors in future study. We indeed observed that the eggs did not hatch successfully after mixed-mating of G0 Cad96ca mutant or Fgfr1 mutant, respectively, but the reason was not addressed further due to the embryonic death. By the similar reasons, most of the Cad96ca and Fgfr1 mutants showed a slight acceleration of pupation (about one day) without the typical precocious metamorphosis (at least one instar earlier) phenotype caused by JH signaling defects (Daimon et al., 2012; Fukuda, 1944; Riddiford et al., 2010) and JH pathway gene deletions (Abdou et al., 2011; Liu et al., 2009). On other side, JH can regulate gene transcription by diffusing into cells and binding to the intracellular receptor MET to conduct JH signal, which might affect the results of gene knockdown and knockout.".

      (7) Building upon the previous point, it is noteworthy that the cad96ca and FGF1 mutants exhibit only a 24-hour early pupation phenotype, contrasting with the 48-hour early pupation induced by Kr-h1 depletion. This discrepancy suggests that while the function of these RTKs is necessary, it may not be sufficient to fully activate JH signaling. The expression profile of these receptors, primarily observed in the last larval stage, supports this hypothesis.

      Thank you for your suggestion. We added the explanation in the discussion, as described in the text: "Homozygous Cad96ca null Drosophila die at late pupal stages (Wang et al., 2009). However, we found that 86% of the larvae of the Cad96ca mutant successfully pupated in G0 generation, although earlier than the control. Similarly, null mutation of Fgfr1 or Fgfr2 in mouse is embryonic lethal (Arman et al., 1998; Deng et al., 1994; Yamaguchi et al., 1994). In D. melanogaster, homozygous Htl (Fgfr) mutant embryos die during late embryogenesis, too (Beati et al., 2020; Beiman et al., 1996; Gisselbrecht et al., 1996). However, in H. armigera, 91% of larvae successfully pupated in G0 generation after Fgfr1 knockout. The low death rate after Cad96ca and Fgfr1 knockout might be because of following reasons, including the editing efficiency (67% and 61% for Cad96ca mutant and Fgfr1 mutant, respectively), the chimera of the gene knockout at the G0 generation, and the redundant RTKs that play similar roles in JH signaling, similar to the redundant roles of MET and Germ-cell expressed bHLH-PAS (GCE) in JH signaling (Liu et al., 2009), which needs to obtain alive G1 homozygote mutants and double knockout of these two receptors in future study. We indeed observed that the eggs did not hatch successfully after mixed-mating of G0 Cad96ca mutant or Fgfr1 mutant, respectively, but the reason was not addressed further due to the embryonic death. By the similar reasons, most of the Cad96ca and Fgfr1 mutants showed a slight acceleration of pupation (about one day) without the typical precocious metamorphosis (at least one instar earlier) phenotype caused by JH signaling defects (Daimon et al., 2012; Fukuda, 1944; Riddiford et al., 2010) and JH pathway gene deletions (Abdou et al., 2011; Liu et al., 2009). On other side, JH can regulate gene transcription by diffusing into cells and binding to the intracellular receptor MET to conduct JH signal, which might affect the results of gene knockdown and knockout.".

      (8) The expression profile of the RTK hits described in Supplementary Figure 4A appears to be limited to the last larval stage until pupation. The authors should clarify whether these receptors are expressed earlier, and the meaning of the letters in the plot should be described in the figure legend.

      Thank you for the suggestion. We added the explanation in the Figure 1—figure supplement 4 legend, as described in the text: "The expression profiles of Vegfr1, Drl, Cad96ca, Nrk, Fgfr1, and Wsck during development. 5F: fifth instar feeding larvae; 5M: fifth instar molting larvae; 6th-6 h to 6th-120 h: sixth instar at 6 h to sixth instar 120 h larvae; P0 d to P8 d: pupal stage at 0-day to pupal stage at 8-day F: feeding stage; M: molting stage; MM: metamorphic molting stage; P: pupae.".

      We are very sorry, but due to time limitations, we will investigate the expression profile of RTK throughout the larval stage in future work.

      (9) In Figure 4, panels F and G, the levels of Kr-h1 are shown in cad96ca and FGF1 mutants in the last larval stage. The authors should indicate whether Kr-h1 levels are also low in earlier larval stages or only detected in the last larval stage, as this would imply that these RTKs are only required at this stage.

      Thank you for your suggestion. In this study, the Cad96ca and Fgfr1 mutants' feeding stage was shortened in the sixth instar, and they entered the metamorphic molting stage earlier. So, we detected the expression of Kr-h1 in the sixth instar. It is an excellent idea to detect the expression of Kr-h1 at various larvae stages to analyze the stages in which CAD96CA and FGFR1 play a role and to study the relationship between CAD96CA and FGFR1 in future.

      (10) While Figure 5 demonstrates JH-triggered calcium ion mobilization in Sf9 cells and S2 cells, the authors should also include data on JH signaling target genes, such as Kr-h1, for a more comprehensive analysis.

      Thank you for your advice. We added experiments, as described in the text: "To demonstrate the universality of CAD96CA and FGFR1 in JH signaling in different insect cells, we investigated JH-triggered calcium ion mobilization and Kr-h1 expression in Sf9 cells developed from S. frugiperda and S2 cells developed from D. melanogaster. Knockdown of Cad96ca and Fgfr1 (named Htl or Btl in D. melanogaster), respectively, significantly decreased JH III-induced intracellular Ca2+ release and extracellular Ca2+ influx, and Kr-h1 expression (Figure 5A, B, Figure 5—figure supplement 1A and B). The efficacy of RNAi of Cad96ca and Fgfr1 was confirmed in the cells (Figure 5—figure supplement 1C and D), suggesting that CAD96CA and FGFR1 had a general function to transmit JH signal in S. frugiperda and D. melanogaster.".

      (11) The authors should consider improving the quality of images and some plots, particularly enlarging panels showing larval and pupal phenotypes, such as Figure 1B and Supplementary Figure C. Additionally, adding a plot showing the statistical analysis of the phenotype in Supplementary Figure C would enhance clarity. Some plots are overly busy and difficult to read due to small size, such as Figure 1C, Figure 2A, and all the plots in Figure 3. Figure 4E also requires improvement for better readability.

      Thank you for your suggestion. We have adjusted Figure 1B, Figure 1C, Figure 1—figure supplement 1C, Figure 2A and Figure 4E. However, for Figure 3, we have not found a better way to arrange and adapt them, considering the overall arrangement of the results and the page space, so we keep them in their original state.

    1. eLife Assessment

      This important work presents data showing that all non-proneural phenotypes of the Inhibitor of DNA binding (Id) protein Emc are mediated through inappropriate nonapoptotic caspase activity. Using the developing Drosophila retina as a model the authors show that Emc acts by transcriptionally regulating the Death-Associated Inhibitor of Apoptosis 1 (diap1) gene, which impacts on Notch signaling by caspase-dependent increase of Delta protein. These are compelling findings, interesting for the caspase/apoptosis field as they add more non-apoptotic functions of caspases to the list, as well as for the Id field, which examines how Id proteins inhibit cell differentiation.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The extra macrochaetae (emc) gene encodes the only Inhibitor of DNA binding protein (Id protein) in Drosophila. Its best-known function is to inhibit proneural genes during development. However, the emc mutants also display nonproneural phenotypes. In this manuscript, the authors examined four non-proneural phenotypes of the emc mutants and reported that they are all caused by inappropriate non-apoptotic caspase activity. These non-neuronal phenotypes are: reduced growth of imaginal discs, increased speed of the morphogenetic furrow, and failure to specify R7 photoreceptor neurons and cone cells during eye development. Double mutants between emc and either H99 (which deletes the three pro-apoptotic genes reaper, grim, and hid) or the initiator caspase dronc suppress these mutant phenotypes of emc suggesting that the cell death pathway and caspase activity are mediating these emc phenotypes. In previous work, the authors have shown that emc mutations elevate the expression of ex which activates the SHW pathway (aka the Hippo pathway). One known function of the SHW pathway is to inhibit Yorkie which controls the transcription of the inhibitor of apoptosis, Diap1. Consistently, in emc clones the levels of Diap1 protein are reduced which might explain why caspase activity is increased in emc clones giving rise to the four non-neural phenotypes of emc mutants.

      However, this increased caspase activity is not causing ectopic apoptosis, hence the authors propose that this is nonapoptotic caspase activity. In the last part of the manuscript, the authors ruled out that Wg, Dpp, and Hh signaling are the target of caspases, but instead identified Notch signaling as the target of caspases, specifically the Notch ligand Delta. Protein levels of Delta are increased in emc clones in an H99- and dronc-dependent manner. The authors conclude that caspase-dependent non-apoptotic signaling underlies multiple roles of emc that are independent of proneural bHLH proteins.

      Strengths:

      Overall, this is an interesting manuscript and the findings are intriguing. It adds to the growing number of non-apoptotic functions of apoptotic proteins and caspases in particular. The manuscript is well written and the data are usually convincingly presented.

      Weaknesses:

      (1)  One major concern I have is the observation by the authors in Figure 3C in which protein levels of Diap1 are still reduced in emc H99 double mutant clones. If Diap1 is still reduced in these clones, shouldn't caspases still be derepressed? Given that emc H99 double mutants rescue all emc phenotypes examined, the observation that Diap1 levels are still reduced in emc H99 clones is inconsistent with the authors' model. The authors need to address this inconsistency.

      The effect of H99 emc clones on Diap1 protein levels is consistent with our conclusions.  The reviewer’s concern probably relates to previous work that shows that RHG proteins act by antagonizing DIAP1, so that Diap1 is epistatic to RHG (PMID:10481910), and that RHG proteins affect DIAP1 protein levels, and in particular that HID promotes DIAP1 ubiquitylation leading to its destruction (PMID:12021767).  First, epistasis means that in the absence of DIAP1, RHG levels do not affect cell survival.  DIAP1 protein is not absent in emc/emc eye clones, however, it is reduced.  It is not only possible but expected that RHG levels would affect survival when DIAP1 levels are only reduced.  Secondly, we did not see a difference in DIAP1 levels between H99/H99 clones and H99/+ cells within the same specimen, suggesting that rpr, grim and hid might not affect DIAP1 levels. It is possible that Hid protein only affects DIAP1 levels when overexpressed, as in the aforementioned paper (PMID:12021767), and that physiological RHG levels affect DIAP1 activity.  The H99 deficiency also eliminates Rpr and Grim, which may affect DIAP1 without ubiquitylating it. In our experiments, however, there are no cells completely wild type for the H99 region for comparison in the same specimen, so our results do not rule out the H99 deletion having a dominant effect on DIAP1 levels both inside and outside the clones.  What our data clearly showed is that emc affected DIAP1 levels independently of any potential RHG effect, and we hypothesized this was through diap1 transcription, because we showed previously that emc affects yki, a transcriptional regulator of the diap1 gene, but we have not demonstrated transcriptional regulation of diap1 directly in emc clones.  We modified the manuscript to better delineate these issues (lines 275-284).    

      (2) Are Diap1 protein levels reduced in all emc clones, including clones anterior to the furrow? This is difficult to see in Figure 3B. it is also recommended to look in emc mosaic wing discs.

      We now mention that DIAP1 levels were only reduced in  emc clones posterior to the morphogenetic furrow, not anterior to the morphogenetic furrow or in emc clones in wing imaginal discs (lines 284-5) and Figure 3 supplement 1.  

      (3) The authors speculate that Delta may be a direct target of caspase cleavage (Figure 9B), but then rule it out for a good reason. However, I assume that the increased protein levels of Delta in emc clones (Figure 7) are the results of increased transcription. In that case, shouldn't caspases control the transcriptional machinery leading to Delta expression?

      Thank you for suggesting that caspases control the transcription of Dl.  We added this possibility to the manuscript (lines 499-500).  At one time there was a Dl-LacZ transcriptional reporter, which would have made it straightforward to assess Dl transcription in emc clones, but this strain does not seem to exist now.  We have not attempted in situ hybridization to Dl transcripts in mosaic discs.  

      (4) How does caspase activity in emc clones cause reduced growth? Is this also mediated through Delta signaling?

      We do not know what is the caspase target responsible for reduced growth in wing discs.

      (5) Figure 1M: Is there a similar result with emc dronc mosaics?

      The emc dronc clones do not show as dramatic a growth advantage in a Minute background.  This is consistent with the smaller effect of emc dronc in the non-Minute background also (Figure 1N).  We mention this in the revised paper (lines 232-3).     

      Reviewer #2 (Public Review):

      Id proteins are thought to function by binding and antagonizing basic helix-loop-helix (bHLH) transcription factors but new findings demonstrate roles for emc including in tissues where no proneural (Drosophila bHLH) genes are known to function. The authors propose a new mechanism for developmental regulation that entails restraining new/novel non-apoptotic functions of apoptotic caspases.

      Specifically, the data suggest that loss of emc leads to reduced expression of diap1 and increased apoptotic caspase activity, which does not induce apoptosis but elevates Delta expression to increase N activity and cause developmental defects. Indeed, many of the phenotypes of emc mutant clones can be rescued by a chromosomal deficiency that reduces caspase activation or by mutations in the initiator caspase Dronc. A related manuscript that shows that loss of emc results in increased da, linked previously to diap1 expression, provides supporting data. There is increasing appreciation that apoptotic caspases have non-apoptotic roles. This study adds to the emerging field and should be of interest to readers.

      The data, for the most part, support the conclusions but I do have concerns about some of the data and the interpretations that should be addressed.

      Reviewer #3 (Public Review):

      The work extends earlier studies on the Drosophila Id protein EMC to uncover a potential pathway that explains several tissue-scale developmental abnormalities in emc mutants. It also describes a non-apoptotic role for caspases in cell biology.

      Strengths:

      The work adds to an emerging new set of functions for caspases beyond their canonical roles as cell death mediators. This novelty is a major strength as well as its reliance on genetic-based in vivo study. The study will be of interest to those who are curious about caspases in general.

      Weaknesses:

      The manuscript relies on imaging experiments using genetic mosaic imaginal discs. It is for the most part a qualitative analysis, showing representative samples with a small number of mutant clones in each. Although the senior author has a long track record of using experiments like this to rigorously discover regulatory mechanisms in this system, it is straightforward in 2023 to use Fiji and other image analysis tools to measure fluorescence. Such measurements could be done for all replicate clones of a given genotype as well as genetic control sampling. These could be presented in plots that would not only provide quantitative and statistical measurements, but will be more reader- friendly to those who are not fly people.

      We added quantification of anti-Delta and anti-Diap1 levels to the manuscript (Figures 3E and 7E).  We agree that this facilitates statistical confirmation of the results and may be more accessible to non-experts.  We do have concerns that these quantifications might be given too much weight.  For example, we cannot measure the background level of anti-DIAP1 labeling by labeling diap1 null mutant cells, because such cells do not survive.  Although we measure ~20% reduction in emc clones in the eye disc, and none in the wing disc, both measures could be underestimates if some of the labeling is non-specific, as is very possible.  We discuss this in the Methods (lines 166-9).

      Likewise, more details are needed to describe how clone areas were measured in Figure 1. Did they measure each clone and its twin spot, and then calculate the area ratio for each clone and its paired twin spot? This would be the correct way to analyze the data, yielding many independent measurements of the ratio. And doing so would obviate the need to log transform the data which is inexplicable unless they were averaging clones and twins within a disc and making replicates. More explanation is needed and if they indeed averaged, then they need to calculate the ratios pairwise for each clone and twin.

      We added details of clone size measurements and analysis to the methods (lines 141-6).  Although it might be useful to compare individual clones and corresponding twin spots, the only rigorous way to associate individual clones with individual twin spots, or even to determine what is one clone and what is one twin spot, is to use recombination rates low enough that significantly less than one recombination occurs per disc.  This would require many more dissections and we did not do this.  We now clarify in the manuscript that the analysis is indeed based on the ratio of total area of clones and twin spots with replicates, and that Log-transformation is to improve the normality of the ratio data suitable for parametric significance testing, not because clones and twin spots were summed from each sample.  We consulted with a statistician over this approach.  

      Reviewer #1 (Recommendations For The Authors):

      Lines 319/320: "Frizzled-3 RFP expression was not changed in in emc clones (Figure 4A)". This was actually not shown in Fig 4A (in fact this result was not shown at all). Fig 4A shows the result for emc nkd3 which the authors incorrectly assigned to Figure 4B (line 324).

      We apologize for labeling Figure 4A and 4B incorrectly.

      The title of Figure 6 is inaccurate. The title does not indicate what is shown in this figure. A more accurate title would be: Notch activity and function in emc mutant clones.

      We provided a new title for Figure 6. 

      Reviewer #2 (Recommendations For The Authors):

      There is no information on how reproducible the data is. How many discs were examined in each experiment and in how many technical or biological replicates? Can fluorescence signals be quantified within and outside the clones and presented to illustrate reproducibility and significance? This is especially needed for Fig 7, which shows key data that N ligand Delta is elevated in emc clones but dronc and H99 mutations rescue this phenotype. I can see that the Dl signal is brighter in the GFP- emc clone in Fig 7B but I can also see a brighter Dl signal in the small clone and perhaps also in the large clone in C. The difference between B and C could be simply disc-to-disc variation, which should be addressed with quantification and presentation of all data points.

      We added the number of samples to each figure legend.  We quantified the fluorescence signals for Figures 3 and 7.  Quantification shows that the difference between 7B and 7C is highly significant, not disc to disc variation.

      Fig 2B does not support the conclusion. It is supposed to show premature Sens expression and therefore abnormal morphogenetic furrow progression in emc clones. But the yellow arrow is pointing to GFP+ (wild type) cells and it is within this GFP+ region that most premature Sens expression is seen.

      We relocated the arrows in Figure 2B to point precisely to the premature differentiation.  When the morphogenetic furrow is accelerated in emc mutant, GFP – tissue, it does not stop when wild type, GFP+ tissue is encountered again, it continues at a normal pace.  Accordingly, emc+ regions that are anterior to emc- regions can also experience accelerated differentiation (please see lines 594-8).

      Fig 1 shows that while H99 deficiency restores the growth of emc clones to wild type level (Fig 1N), placing these in the Minute background made emc clones grow better than emc wild type but Minute neighbors (Fig 1M). The latter cells were nearly absent, suggesting elimination through cell competition. For the rest of the figures, some experiments are done in the Minute background (e.g., emc H99 clones in Fig 2D) while others are not in the Minute background (e.g., emc H99 clones in Fig 7D). Why the switch between backgrounds from experiment to experiment?

      Figure 2D shows emc H99 clones in a Minute background so that it can be compared with panels 2A-C, which show clones of other genotypes in a Minute background.  These clones almost take over the eye disc.  In Figure 7D, it was important to show the Dl expression pattern in a substantial wild type region, which could only be shown using the non-Minute background.  We have no indication that a Minute background changes the properties of the nonMinute clone, other than allowing its greater growth.  

      The first 3 paragraphs of the Introduction are overly detailed and read more like a review article. These could be made more concise to focus on the founding data for this manuscript, which are the published findings that emc mutations elevate ex expression (line 129) and that ex mutants show elevated diap1 expression (line 125). These do not show up until the very end of the Introduction.

      We shortened the Introduction to focus more rapidly on the topics relevant to these experiments.

      In several places, the space between the end of the sentence and the citation is missing (e.g., lines 57, 68, and 75).

      The spacing of citations was fixed.

      Line 247. 'morphogenetic furrow that found each ommatidia...' should use a word besides 'found.'

      We corrected line 247.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors show that inhibiting caspases rescues the growth defect of emc clones. However, they did not find excessive TUNEL staining in emc clones that would explain why the clones would be so small - excessive cell death. How reliable was their tunel staining in being able to detect excessive apoptosis (only negative data was shown). Could they induce excessive cell death using radiation or some other means to ensure the assay is robust? If death is not occurring in emc clones, a deficiency worth addressing is that they do not discuss or explore how the caspases then inhibit clone growth. Is it expanded cell cycle times, or smaller cells?? And that phenotype does not fit with their end model of Delta being the only moderator of emc since it is not playing a significant role in tissue growth anterior to the furrow.One would assume using the commercial antibody against activated caspase would be another readout for emc clones and this would bolster their claim that excessive caspase activation occurs in the emc cells.

      We have added Dcp1 staining in Figure 2 supplement 3 to show that TUNEL staining is reliable.

      (2) Figure 3D has really large emc clones when GMR-Diap is present. But the large clones are anterior to the furrow where Diap would not be overexpressed. Is this just an unusual sample with a coincidentally big emc M+ clone? It speaks to my concerns about the qualitative nature of the data.

      We replaced Figure 3D with an example of smaller clones.  Nowhere have we suggested that  GMR-DIAP1 affects clone size.

      (3) Figure 9B is very speculative and not appropriate since the authors have zero data to support that cleavage mechanism. It is fit for the next paper if the idea is correct. The panel should be removed.

      We did not intend Figure 9B to imply that we think Dl itself is the relevant target of non-apoptotic caspases.  Since apparently we gave that impression, we removed this to a supplemental figure.  We still think it is worth showing that Dl does not contain predicted caspase sites expected to activate signaling. 

      (4) Figure 9A could be made more clear. Their pathway represents the mutant cells in the mosaic disc. Why not also outline what you think is happening in the emc+ cells as well?

      It is difficult to make a comparable diagram for normal cells, because none of this pathway happens in normal cells.  We modified the figure legend to indicate this (lines 677-8).

      (5) The one emc ci clone they show spanning the furrow has a very non-continuous furrow advance phenotype. This is unlike the emc clones where the furrow advance is graded about the clone. And it resembles the SuH clones they show. This result and the synergistic effect on clone sizes they mention need more discussion and thought put into it. It argues ci is doing something with respect to emc action. loss of ci might not rescue size and furrow advance but actually, it makes it worse! This is interesting and might suggest an inhibitory role for ci in emc or a parallel role for ci in mediating growth and progression that is redundant with emc.

      We agree that aspects of the emc ci phenotype are not clear.  We discuss this in the revised manuscript (lines 373-5).  

      (6) Related to point 7, it is a weak argument for non-autonomy that graded furrow advance in emc clones is evidence for emc acting nonautonomously through Delta. Its weakness is combined with its lack of significance relative to the other findings. It should be deleted as should the SuH data.

      We agree that the evidence that emc affects morphogenetic furrow progression non-autonomously is not compelling and have revised the manuscript to soften this conclusion (lines 426-7).  We do not want to remove this idea, because it does in fact have significance for other findings.  Specifically, it supports the idea that the emc effect in the morphogenetic furrow is due to trans-activation by Delta, whereas  the effect on R7 and cone cell differentiation is due to autonomous cis-inhibition.  We think this is important to keep in the paper.

    3. Reviewer #1 (Public review):

      Summary:

      The extra macrochaetae (emc) gene encodes the only Inhibitor of DNA binding protein (Id protein) in Drosophila. Its best-known function is to inhibit proneural genes during development. However, the emc mutants also display non-proneural phenotypes. In this manuscript, the authors examined four non-proneural phenotypes of the emc mutants and reported that they are all caused by inappropriate non-apoptotic caspase activity. These non-neuronal phenotypes are: reduced growth of imaginal discs, increased speed of the morphogenetic furrow, and failure to specify R7 photoreceptor neurons and cone cells during eye development. Double mutants between emc and either H99 (which deletes the three pro-apoptotic genes reaper, grim, and hid) or the initiator caspase dronc suppress these mutant phenotypes of emc suggesting that the cell death pathway and caspase activity are mediating these emc phenotypes. In previous work, the authors have shown that emc mutations elevate the expression of ex which activates the SHW pathway (aka the Hippo pathway). One known function of the SHW pathway is to inhibit Yorkie which controls the transcription of the inhibitor of apoptosis, Diap1. Consistently, in emc clones the levels of Diap1 protein are reduced which might explain why caspase activity is increased in emc clones giving rise to the four non-neural phenotypes of emc mutants. However, this increased caspase activity is not causing ectopic apoptosis, hence the authors propose that this is non-apoptotic caspase activity. In the last part of the manuscript, the authors ruled out that Wg, Dpp, and Hh signaling are the target of caspases, but instead identified Notch signaling as the target of caspases, specifically the Notch ligand Delta. Protein levels of Delta are increased in emc clones in an H99- and dronc-dependent manner. The authors conclude that caspase-dependent non-apoptotic signaling underlies multiple roles of emc that are independent of proneural bHLH proteins.

      Strengths:

      Overall, this is an interesting manuscript and the findings are intriguing. It adds to the growing number of non-apoptotic functions of apoptotic proteins and caspases in particular. The manuscript is well written and the data are usually convincingly presented.

      Weaknesses:

      The authors have addressed all my concerns and questions.

    4. Reviewer #2 (Public review):

      Id proteins are thought to function by binding and antagonizing basic helix-loop-helix (bHLH) transcription factors but new findings demonstrate roles for emc including in tissues where no proneural (Drosophila bHLH) genes are known to function. The authors propose a new mechanism for developmental regulation that entails restraining new/novel non-apoptotic functions of apoptotic caspases.

      Specifically, the data suggest that loss of emc leads to reduced expression of diap1 and increased apoptotic caspase activity, which does not induce apoptosis but elevates Delta expression to increase N activity and cause developmental defects. Indeed, many of the phenotypes of emc mutant clones can be rescued by a chromosomal deficiency that reduces caspase activation or by mutations in the initiator caspase Dronc. A related manuscript that shows that loss of emc results in increased da, linked previously to diap1 expression, provides supporting data. There is increasing appreciation that apoptotic caspases have non-apoptotic roles. This study adds to the emerging field and should be of interest to the readers.

      The revised manuscript addresses my concerns from the first round of review.

    5. Reviewer #3 (Public review):

      The work extends earlier studies on the Drosophila Id protein EMC to uncover a potential pathway that explains several tissue-scale developmental abnormalities in emc mutants. It also describes a non-apoptotic role for caspases in cell biology.

      Strengths:

      The work adds to an emerging new set of functions for caspases beyond their canonical roles as cell death mediators. This novelty is a major strength as well as its reliance on genetic-based in vivo study. The study will be of interest to those who are curious about caspases in general.

      Weaknesses:

      The authors did an adequate job in dealing with the limitations of the reviewed preprint. Although they could have done more, they chose not to for reasons they adequately defended.

    1. Reviewer #2 (Public review):

      Summary:

      In this study, the authors combine the study of clinical samples of antibiotic resistant bacteria with experimental evolution and evolutionary genomics to address important questions about the propensity for reversion in two different schema: de novo resistance arising within a patient, and transmitted resistance. The authors' use of a combination of methods help to answer the question outlined in their hypothesis, that de novo resistance mechanisms appear to revert to sensitive phenotypes more readily in a drug-free environment.

      Strengths:

      This study is exceptionally well-written and organized. The authors state their hypothesis clearly, and follow it up with an impressive effort that is truly translational-they make direct use of clinical samples of bacteria, and combine that with approaches in experimental evolution and evolutionary genomics. The conclusions follow naturally from the results, and there are no irresponsible leaps made.

      Weaknesses:

      I will divide my criticism into two areas, conceptual (most of my critique), with a very small methodological question.

      (1) In the end, the authors offer findings that appear to be correct, and (again) are reported very clearly. However, this study is very surface-level in its theoretical underpinnings and construction, which is puzzling, because the field of antibiotic resistance and adaptation more broadly, is full of relevant studies and explanatory tools. Below I'll identify several areas where this manifests.

      For one, the authors do not engage with a large recent literature on reversion, reversal, and compensation. It provides much more conceptual grounding for what the authors observe, much of it compatible with the findings from this study:

      To offer two quick examples:<br /> - Avrani S, Katz S, Hershberg R. Adaptations accumulated under prolonged resource exhaustion are highly transient. MSphere. 2020 Aug 26;5(4):10-128.<br /> - Pennings, P.S., Ogbunugafor, C.B. and Hershberg, R., 2022. Reversion is most likely under high mutation supply when compensatory mutations do not fully restore fitness costs. G3, 12(9), p.jkac190.

      Examinations of the studies on adaptation and reversion offer a richer mechanistic take on what was observed. But this literature alone is less of a problem than the general offering of different takes for the results. One can turn to a different literature - from ecology - to find a different explanation that is compatible with the findings.

      De novo evolution involves the strong selection and rapid fixation of populations that are evolving largely to a relatively simple ecological milieu: their only primary function is to promote replication and survival of populations experiencing the negative fitness effects of drug pressure. Alternatively, transmitted resistant populations must deal with a multitude of selective pressures, working dynamically across time and space. In such a scenario, one would expect populations to locate places on the fitness landscape that are commensurate with survival in both drug-poor and drug-rich environments, as this is the ecological reality of the transmitted resistant bacteria. I could envision selection for "generalism" in this setting, corresponding to populations that have fixed mutations that promote resistance, but also those that ensure replication in drug-free environments. This regime might even reflect selection for "generalism" or "increased niche breadth." That is, transmitted resistance may have adopted a "jack of all trades, master of none" phenotype. The de novo resistance strains, alternatively, are selected for "generalism."

      See the following for examples (there are many):

      - Kassen R. The experimental evolution of specialists, generalists, and the maintenance of diversity. Journal of evolutionary biology. 2002 Mar 1;15(2):173-90.<br /> - Bell TH, Bell T. Many roads to bacterial generalism. FEMS microbiology ecology. 2021 Jan;97(1):fiaa240.

      Note that this classically ecological explanation is only one of several other literatures that offer models for the findings in this study.

      To the authors' credit, their study was about the very real-world problem of antibiotic resistance, using a system that is far less tractable than the model systems research that has generated a lot of data and theory. And sure: the study is valuable because it communicates an interesting finding using a combination of methods (impressively). But in some ways, the study almost reads as a descriptive exercise: it offers a good question (does de novo or transmitted resistance revert more readily), and tells you what they found (de novo does). However the explanatory mechanisms do not advance our understanding much. Reporting the presence of unstable and disruptive mutations in the de novo populations is hardly an explanation. That is, alternatively, data in support of a proper explanation. There is nothing magical about de novo evolution that should be selected for disruptive mutations.

      The reasons for the different sorts of mutation could have to do with the population genetic particulars of the de novo regime: large populations, strong selective pressure, relatively static fitness landscape. In such a setting, selection marches a population greedily up a peak. Alternatively, a transmitted population arises from a lineage that has observed a multitude of ecologies, across different fitness landscapes and has fixed mutations that confer survival across all of them.

      There's a literature that speaks to this:<br /> - Miller CM, Draghi JA. Range expansion can promote the evolution of plastic generalism in coarse-grained landscapes. Evolution Letters. 2024 Apr 1;8(2):322-30.<br /> - Bono LM, Draghi JA, Turner PE. Evolvability costs of niche expansion. Trends in genetics. 2020 Jan 1;36(1):14-23.

      The findings are simple enough (a testament to the strong study design and execution) that supporting population genetic simulations, or analytical descriptions (maybe not relevant) could offer insight as to what really happened here.

      (2) I recognize the challenge of working with clinical samples. It is very difficult to understand everything about them. But even having considered that, I might be missing something.

      My main question here involves the origin of the putatively transmitted strains. The authors state that " Isolates were also obtained from six patients with a putatively transmitted resistant bacteria (hereafter PT), where a daptomycin-resistant, E. faecium bacteremia was identified on their first culture."

      This seems like an awfully dubious way to identify transmitted resistance. I suppose I understand the logic (de novo evolution requires the observer to have seen the evolution happen in real-time). But this definition leaves the study wide open for an "apples to oranges" comparison that might render the other aspects questionable.

      The de novo strains are being compared to transmitted strains that may have been part of lineages that had passed between many, many patients. If this were true, then we should expect the genomic architecture of the transmitted strains to be far different. The transmitted strains might have undergone more selection in different regimes and genetic drift. Drift might have fixed mutations in transmission bottlenecks, altering the genetic architecture. In such a scenario, one might expect these populations to have a more difficult time unwinding their resistance phenotype.

      In the end, I applaud the authors on a well-done and well-written study.

    2. eLife Assessment

      This important study, which will be of interest to those studying the evolution and maintenance of antibiotic resistance, addresses the hypothesis that antibiotic resistance arising de novo during treatment will carry a higher fitness cost and will revert to susceptibility more readily than resistance that has been transmitted between hosts. There are, however, concerns that the 'putatively transmitted isolates' in this study do not necessarily represent resistant isolates that have been transmitted between hosts. The support for the central claim of different patterns of reversion between isolates with de novo resistance and putatively transmitted resistant isolates is currently incomplete.

    3. Reviewer #1 (Public review):

      Summary:

      Tracy and colleagues study the loss of daptomycin resistance in Enterococcus faecium isolates from bloodstream infections using in vitro evolution experiments in the absence of antibiotics. They test the hypothesis that antibiotic resistance arising de novo during treatment will carry a higher fitness cost and will revert more readily than resistance isolates which have been transmitted and have therefore already survived in the absence of antibiotic selection pressure.

      Strengths:

      This is an important question as a fitness cost to resistance is typically found in lab evolution experiments and assumed in modelling studies, but often not identified in clinical isolates. Here the authors find examples of clinical isolates which do and don't revert to sensitivity in in vitro evolution in the absence of antibiotics. Sequencing of the lab evolved isolates revealed that reversal of resistance was often due to mutations in the same gene that evolved in vivo, which is nice evidence that these resistance mutations did confer a fitness cost.

      Weaknesses:

      Although this is an interesting study on an important topic, currently the results are overinterpreted do not justify the title of the paper 'Reversion to sensitivity explains limited transmission of resistance in a hospital pathogen' for several reasons. Firstly, the patient group, e.g. 'putatively transmitted' isolates vs 'de novo' isolates was not a significant predictor of change in MIC. Instead the change in MIC in the absence of antibiotics was significantly associated with the starting MIC of the isolate in the evolution experiments, but this would be expected since isolates with a higher MIC have more potential to decrease in MIC in the evolution experiments. The abstract and some conclusion do not match the results in some instances, for example the abstract states 'resistance that arose de novo within patients was higher level but exhibited greater declines in resistance in vitro'. In the discussion: they state "these findings support our hypothesis that transmitted resistance strains are less likely to revert". However, on page 14 the initial MICs between DNR and PTR were not significantly different and patient group was not a significant predictor of change in MIC. Sequencing of the lab evolved isolates revealed that reversal of resistance was often due to mutations in the same gene that evolved in vivo. However, there were also some example of mutations in the same genes within the PTR isolates, so it remains unclear if there is a significant difference in behaviour between the DNR and PTR isolates in terms of reversion mutations. Significance testing, controlling for the starting MIC, would help confirm this.

      Secondly, the 'putatively transmitted isolates', i.e. isolates that were resistant in the first positive blood culture, do not necessarily represent resistant isolates that have been transmitted between hosts. E. faecium is primarily a commensal of the intestinal tract, but which can cause opportunistic extra-intestinal infections. These bacteremia cases were most likely caused by within-host translocation of a strain already colonizing the intestine to the bloodstream - indeed, it has been shown that antibiotics can lead to Enterococcus overgrowth in the intestine and subsequent bloodstream invasion (DOI: 10.1172/JCI43918). The 'putatively transmitted isolates' may have initially colonised the intestine via between host transmission in an already resistant state, as assumed by the authors, but they may also have evolved resistance de novo within the host's intestine prior to causing bloodstream infections. Since they do not have data on past daptomycin exposure in these individuals it cannot be assumed that these isolates were transmitted with high resistance between hosts. An alternative explanation for any differences between the 'de novo' and 'putatively transmitted' could be the environment where resistance evolved, e.g. the intestine with strong competition from other strains and species, or within the otherwise sterile bloodstream environment. The authors hypothesise that "newly resistant population must continue to transmit between hosts in antibiotic free conditions to ensure its survival" and that "transmission acts as a filter to select for resistance with a lower cost or lower chance of reversion". Rather than transmission per se, it is equally plausible that survival of the newly resistant population within the primary niche, the intestinal microbiota, is the crucial to filter for resistance with a lower cost.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest. I do have some concerns with the way that the project has been conceptualized, which I share below.

      Thank you for acknowledging the strengths and novelty of our study. We have now addressed the conceptual issues raised; please see below in the specific comments.

      (2) The authors should provide careful working definitions of what exactly they think is occurring in the brain following sensory deprivation. Characterizing these changes as 'largescale neural reorganization' and 'compensatory adaptation' gives the impression that the authors believe that there is good evidence in support of significant structural changes in the pathways between brain areas - a viewpoint that is not broadly supported (see Makin and Krakauer, 2023). The authors report changes in connectivity that amount to differences in coordinated patterns of BOLD signal across voxels in the brain; accordingly, their data could just as easily (and more parsimoniously) be explained by the unmasking of connections to the auditory cortex that are present in typically hearing individuals, but which are more obvious via MR in the absence of auditory inputs.

      We thank the Reviewer for the suggestion to clarify and better support our stance regarding reorganization. We indeed believe that the adaptive changes in the auditory cortex in deafness represent real functional recruitment for non-auditory functions, even in the relatively limited large-scale anatomical connectivity changes. This is supported by animal works showing causal evidence for the involvement of deprived auditory cortices in non-auditory tasks, in a way that is not found in hearing controls (e.g., Lomber et al., 2010, Meredith et al., 2011, reviewed in Alencar et al., 2019; Lomber et al., 2020). Whether the word “reorganization” should be used is indeed debated recently (Makin and Krakauer, 2023). Beyond terminology, we do agree that the basis for the changes in recruitment seen in the brains of people with deafness or blindness is largely based on the typical anatomical connectivity at birth. We also agree that at the group level, there is poor evidence of large-scale anatomical connectivity differences in deprivation. However, we think there is more than ample evidence that the unmasking and more importantly re-weighting of non-dominant inputs gives rise to functional changes. This is supported by the relatively weaker reorganization found in late-onset deprivation as compared to early-onset deprivation. If unmasking of existing connectivity without any functional additional changes were sufficient to elicit the functional responses to atypical stimuli (e.g., non-visual in blindness and non-auditory in deafness), one would expect there to be no difference between early- and late-onset deprivation in response patterns. Therefore, we believe that the fact that these are based on functions with some innate pre-existing inputs and integration is the mechanism of reorganization, not a reason not to treat it as reorganization. Specifically, in the case of this manuscript, we report the change in variability of FC from the auditory cortex, which is greater in deafness than in typically hearing controls. This is not an increase in response per se, but rather more divergent values of FC from the auditory cortex, which are harder to explain in terms of ‘unmasking’ alone, unless one assumes unmasking is particularly variable. The mechanistic explanation for our findings is that in the absence of auditory input’s fine-tuning and pruning of the connectivity of the auditory cortex, more divergent connectivity strength remains among the deaf. Thus, auditory input not only masks non-dominant inputs but also prunes/deactivates exuberant connectivity, in a way that generates a more consistently connected auditory system. We have added a shortened version of these clarifications to the discussion (lines 351-372).

      (3) I found the argument that the deaf use a single modality to compensate for hearing loss, and that this might predict a more confined pattern of differential connectivity than had been previously observed in the blind to be poorly grounded. The authors themselves suggest throughout that hearing loss, per se, is likely to be driving the differences observed between deaf and typically-hearing individuals; accordingly, the suggestion that the modality in which intentional behavioral compensation takes place would have such a large-scale effect on observed patterns of connectivity seems out of line.

      Thank you for your critical insight regarding our rationale on modality use and its impact on connectivity patterns in the deaf compared to the blind. After some thought, we agree that the argument presented may not be sufficiently strong and could distract from the main findings of our study. Therefore, we have decided to remove this claim from our revised manuscript.

      (4) The analyses highlighting the areas observed to be differentially connected to the auditory cortex and areas observed to be more variable in their connectivity to the auditory cortex seem somewhat circular. If the authors propose hearing loss as a mechanism that drives this variability in connectivity, then it is reasonable to propose hypotheses about the directionality of these changes. One would anticipate this directionality to be common across participants and thus, these areas would emerge as the ones that are differently connected when compared to typically hearing folks.

      We are a little uncertain how to interpret this concern.  If the question was about the logic leading to our statement that variability is driven by hearing loss, then yes, we indeed were proposing hearing loss as a mechanism that drives this variability in connectivity to the auditory cortex; we regret this was unclear in the original manuscript. This logic parallels the proposal made with regard to the increased variability in FC in blindness; deprivation leads to more variable outcomes, due to the lack of developmental environmental constraints (Sen et al., 2022). Specifically, we first analyzed the differences in within-group variability between deaf and hearing individuals (Fig. 1A), followed by examining the variability ratio (Fig. 1B) in the same regions that demonstrated differences. The first analysis does not specify which group shows higher variability; therefore, the second analysis is essential to clarify the direction of the effect and identify which group, and in which regions, exhibits greater variability. We have clarified this in the revised manuscript (lines 125-127): “To determine which group has larger individual differences in these regions (Figure 1B), we computed the ratio of variability between the two groups (deaf/hearing) in the areas that showed a significant difference in variability (Figure 1A)”. Nevertheless, this comment can also be interpreted as predicting that any change in FC due to deafness would lead to greater variability. In this case, it is also important to mention that while we would expect regions with higher variability to also show group differences between the deaf and the hearing (Figure 2), our analysis demonstrates that variability is present even in regions without significant group mean differences. Similarly, many areas that show a difference between the groups in their FC do not show a change in variability (for example, the bilateral anterior insula and sensorimotor cortex). In fact, the correlation between the regions with higher FC variability (Figure 1A) and those showing FC group differences (Figure 2B) is significant but rather modest, as we now acknowledge in our revised manuscript (lines 324-328). Therefore, increased FC and increased variability of FC are not necessarily linked. 

      (5) While the authors describe collecting data on the etiology of hearing loss, hearing thresholds, device use, and rehabilitative strategies, these data do not appear in the manuscript, nor do they appear to have been included in models during data analysis. Since many of these factors might reasonably explain differences in connectivity to the auditory cortex, this seems like an omission.

      We thank the Reviewer for their comment regarding the inclusion of these variables in our manuscript. We have now included additional information in the main text and a supplementary table in the revised manuscript that elaborates further on the etiology of hearing loss and all individual information that characterizes our deaf sample. Although we initially intended to include individual factors (e.g., hearing threshold, duration of hearing aid use, and age of first use) in our models, this was not feasible for the following reasons: 1) for some subjects, we only have a level  of hearing loss rather than specific values, which we could not use quantitatively as a nuisance variable (it was typical in such testing to ascertain the threshold of loss as belonging to a deafness level, such as “profound” and not necessarily go into more elaborate testing to identify the specific threshold), and 2) this information was either not collected for the hearing participants (e.g., hearing threshold) or does not apply to them (e.g., age of hearing aid use), which made it impossible to use the complete model with all these variables. Modeling the groups separately with different variables would also be inappropriate. Last, the distribution of the values and the need for a large sample to rigorously assess a difference in variability also precluded sub-dividing the group to subgroup based on these values. 

      Therefore, we opted for a different way to control for the potential influence of these variables on FC variability in the deaf. We tested the correlation between the FC from the auditory cortex and each of these parameters in the areas that showed increased FC in deafness (Figures 1A, B), to see if it could account for the increased variability. This ROI analysis did not reveal any significant correlations (all p > .05, prior to correction for multiple comparisons; see Figures S4, S5, and S6 for scatter plots). The maximal variability explained in these ROIs by the hearing factors was r2\=0.096, whereas the FC variability (Figure 1B) was increased by at least 2 in the deaf. Therefore, it does not seem like these parameters underlie the increased variability in deafness. To test if these variables had a direct effect on FC variability in other areas in the brain, we also directly computed the correlation between FC and each factor individually. At the whole-brain level, the results indicate a significant correlation between AC-FC and hearing threshold, as well as a correlation between AC-FC and the age of hearing aid use onset, but not for the duration of hearing aid use (Figure S3). While these may be interesting on their own, and are added to the revised manuscript, the regions that show significant correlations with hearing threshold and age of hearing aid use are not the same regions that exhibit FC variability in the deaf (Figures 1A, B).

      Overall, these findings suggest that although some of these factors may influence FC, they do not appear to be the driving factors behind FC variability. Finally, in terms of rehabilitative strategies, only one deaf subject reported having received long-term oral training from teachers. This participant started this training at age 2, as now described in the participants’ section. We thank the reviewer for raising this concern and allowing us to show that our findings do not stem from simple differences ascribed to auditory experience in our participants. 

      Reviewer #2 (Public Review):

      (1) The paper has two main merits. Firstly, it documents a new and important characteristic of the re-organization of the brains of the deaf, namely its variability. The search for a welldefined set of functions for the deprived auditory cortex of the deaf has been largely unsuccessful, with several task-based approaches failing to deliver unanimous results. Now, one can understand why this was the case: most likely there isn't a fixed one well-defined set of functions supported by an identical set of areas in every subject, but rather a variety of functions supported by various regions. In addition, the paper extends the authors' previous findings from blind subjects to the deaf population. It demonstrates that the heightened variability of connectivity in the deprived brain is not exclusive to blindness, but rather a general principle that applies to other forms of deprivation. On a more general level, this paper shows how sensory input is a driver of the brain's reproducible organization.

      We thank the Reviewer for their observations regarding the merits of our study. We appreciate the recognition of the novelty in documenting the variability of brain reorganization in deaf individuals. 

      (2) The method and the statistics are sound, the figures are clear, and the paper is well-written. The sample size is impressively large for this kind of study.

      We thank the Reviewer for their positive feedback on the methodology, statistical analysis, clarity of figures, and the overall composition of our paper. We are also grateful for the acknowledgment of our large sample size, which we believe significantly strengthens the statistical power and the generalizability of our findings.

      (3) The main weakness of the paper is not a weakness, but rather a suggestion on how to provide a stronger basis for the authors' claims and conclusions. I believe this paper could be strengthened by including in the analysis at least one of the already published deaf/hearing resting-state fMRI datasets (e.g. Andin and Holmer, Bonna et al., Ding et al.) to see if the effects hold across different deaf populations. The addition of a second dataset could strengthen the evidence and convincingly resolve the issue of whether delayed sign language acquisition causes an increase in individual differences in functional connectivity to/from Broca's area. Currently, the authors may not have enough statistical power to support their findings.

      We thank the Reviewer for their constructive suggestion to reinforce the robustness of our findings. While we acknowledge the potential value of incorporating additional datasets to strengthen our conclusions, the datasets mentioned (Andin and Holmer, Bonna et al., Ding et al.) are not publicly available, which limits our ability to include them in our analysis. Additionally, datasets that contain comparable groups of delayed and native deaf signers are exceptionally rare, further complicating the possibility of their inclusion. Furthermore, to discern individual differences within these groups effectively, a substantially larger sample size is necessary. As such, we were unfortunately unable to perform this additional analysis. This is a challenge we acknowledge in the revised manuscript (lines 442-445), especially when the group is divided into subcategories based on the level of language acquisition, which indeed reduces our statistical power. We have however, now integrated the individual task accuracy and reaction time parameters as nuisance variables in calculating the variability analyses; all the results are fully replicated when accounting for task difficulty. We also report that there was no group difference in activation for this task between the groups which could affect our findings. 

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. That said, we are exploring collaborations and other avenues to access comparable datasets that might enable a more powerful analysis in future work. This feedback is very important for guiding our ongoing efforts to verify and extend our conclusions.

      (4) Secondly, the authors could more explicitly discuss the broad implications of what their results mean for our understanding of how the architecture of the brain is determined by the genetic blueprint vs. how it is determined by learning (page 9). There is currently a wave of strong evidence favoring a more "nativist" view of brain architecture, for example, face- and object-sensitive regions seem to be in place practically from birth (see e.g. Kosakowski et al., Current Biology, 2022). The current results show what is the role played by experience.

      We thank the Reviewer for highlighting the need to elaborate on the broader implications of our findings in relation to the ongoing debate of nature vs. nurture. We agree that this discussion is crucial and have expanded our manuscript to address this point more explicitly. We now incorporate a more detailed discussion of how our results contribute to understanding the significant role of experience in shaping individual neural connectivity patterns, particularly in sensory-deprived populations (lines 360-372).

      Reviewer #3 (Public Review):

      Summary:

      (1) This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      -  The manuscript is well written.

      -  The methods are clearly described and appropriate.

      -  Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes.

      -  The results are interesting and novel.

      We thank the Reviewer for their positive and detailed feedback. Their acknowledgment of the clarity of our methods and the novelty of our results is greatly appreciated.

      Weaknesses:

      (2) Analyses were conducted for task-based data rather than resting-state data. It was unclear whether groups differed in task performance. If congenitally deaf individuals found the task more difficult this could lead to changes in FC.

      We thank the Reviewer for their observation regarding possible task performance differences between deaf and hearing participants and their potential effect on the results. Indeed, there was a difference in task accuracy between these groups. To account for this variation and ensure that our findings on functional connectivity were not confounded by task performance, we now included individual task accuracy and reaction time as nuisance variables in our analyses. This approach allowed us to control for any performance differences. The results now presented in the revised manuscript account for the inclusion of these two nuisance variables (accuracy and reaction time) and completely align with our original conclusions, highlighting increased variability in deafness, which is found in both the entire deaf group at large, as well as when equating language experience and comparing the hearing and native signers. The correlation between variability and group differences also remains significant, but its significance is slightly decreased, a moderate effect we acknowledge in the revised manuscript (see comment #4). The differences between the delayed signers and native signers are also retained (Figure 3), now aligning better with language-sensitive regions, as previously predicted. The inclusion of the task difficulty predictors also introduced an additional finding in this analysis, a significant cluster in the right aIFG. Therefore, the inclusion of these predictors reaffirms the robustness of the conclusions drawn about FC variability in the deaf population.

      We would like to note that while we would like to replicate these findings in an additional cohort using resting-state if we had access to such data, we do not anticipate the state in which the participants are scanned to greatly affect the findings. FC patterns of hearing individuals have been shown to be primarily shaped by common system and stable individual features, and not by time, state, or task (Finn et al., 2015; Gratton et al., 2018; Tavor et al., 2016). While the task may impact FC variability, we have recently shown that individual FC patterns are stable across time and state even in the context of plasticity due to visual deprivation (Amaral et al., 2024). Therefore, we expect that in deafness as well there should not be meaningful differences between resting-state and task FC networks, in terms of FC individual differences. We have also addressed this point in our manuscript (lines 442-451).

      (3) No differences in overall activation between groups were reported. Activation differences between groups could lead to differences in FC. For example, lower activation may be associated with more noise in the data, which could translate to reduced FC.

      We thank the reviewer for noting the potential implications of overall activation differences on FC. In our analysis of the activation for words, we found no significant clusters showing a group difference between the deaf and hearing participants (p < .05, cluster-corrected for multiple comparisons) - we also added this information to the revised manuscript (lines 542-544). This suggests that the differences in FC observed are not confounded by variations in overall brain activation between the groups under these conditions.

      (4) Figure 2B shows higher FC for congenitally deaf individuals than normal-hearing individuals in the insula, supplementary motor area, and cingulate. These regions are all associated with task effort. If congenitally deaf individuals found the task harder (lower performance), then activation in these regions could be higher, in turn, leading to FC. A study using resting-state data could possibly have provided a clearer picture.

      We thank the Reviewer for pointing out the potential impact of task difficulty on FC differences observed in our study. As addressed in our response to comment #2, task accuracy and reaction times were incorporated as nuisance variables in our analysis. Further, these areas showed no difference in activation between the groups (see response to comment #3 above). Notably, the referred regions still showed higher FC in congenitally deaf individuals even when controlling for these performance differences. Additionally, these findings are consistent with results from studies using resting-state data in deaf populations, further validating our observations. Specifically, using resting-state data, Andin & Holmer (2022), have shown higher FC for deaf (compared to hearing individuals) from auditory regions to the cingulate cortex, insular cortex, cuneus and precuneus, supramarginal gyrus, supplementary motor area, and cerebellum. Moreover, Ding et al. (2016) have shown higher FC for the deaf between the STG and anterior insula and dorsal anterior cingulated cortex. This suggests that the observed FC differences are likely reflective of genuine neuroplastic adaptations rather than mere artifacts of task difficulty. Although we wish we could augment our study with resting-state data analyzed similarly, we could not at present acquire or access such a dataset. We acknowledge this limitation of our study (lines 442-451) in the revised manuscript and intend to confirm that similar results will be found with resting state data in the future.

      (5) The correlation between the FC map and the FC variability map is 0.3. While significant using permutation testing, the correlation is low, and it is not clear how great the overlap is.

      We acknowledge that the correlation coefficient of 0.3, while statistically significant, indicates a moderate overlap. It's also worth noting that, using our new models that include task performance as a nuisance variable, this value has decreased somewhat, to 0.24 (which is still highly significant). It is important to note that the visual overlap between the maps is not a good estimate of the correlation, which was performed on the unthresholded maps, to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This correlation is meant to suggest a trend rather than a strong link, but especially due to its consistency with the findings in blindness, we believe this observation merits further investigation and discussion. As such, we kept it in the revised manuscript while moderating our claims about its strength.

      Reviewer #1 (Recommendations For The Authors):

      (1) Page 4: Does auditory cortex FC variability..." FC is not yet defined.

      Corrected, thanks.

      (2) Page 4: "It showed lower variability..." What showed this?

      Clarified, thanks.

      (3) Page 11: "highlining the importance" should read "highlighting the importance".

      Corrected, thanks.

      (4) Page 11: Do you really mean to suggest functional connectivity does not vary as a function of task? This would not seem well supported.

      We do not suggest that FC doesn’t vary as a function of task, and have revised this section (lines 447-451). 

      (5) Page 12: "there should not to be" should read "there should not be".

      Corrected, thanks.

      (6) Page 12: "and their majority" should read "and the majority".

      Corrected, thanks.

      Reviewer #2 (Recommendations For The Authors):

      Major

      (1) Although this is a lot of work, I nonetheless have another suggestion on how to test if your results are strong and robust. Perhaps you could analyze your data using an ROI/graph-theory approach. I am not an expert in graph theory analysis, but for sure there is a simple and elegant statistic that captures the variability of edge strength variability within a population. This approach could not only validate your results with an independent analysis and give the audience more confidence in their robustness, but it could also provide an estimate of the size of the effect size you found. That is, it could express in hard numbers how much more variable the connections from auditory cortex ROI's are, in comparison to the rest of the brain in the deaf population, relative to the hearing population.

      We thank the Reviewer for suggesting the use of graph theory as a method to further validate our findings. While we see the potential value in this approach, we believe it may be beyond the scope of the current paper, and merits a full exploration of its own, which we hope to do in the future.  However, we understand the importance of showing the uniqueness of the connectivity of the auditory cortex ROI as compared to the rest of the brain. So, in order to bolster our results, we conducted an additional analysis using control regions of interest (ROIs). Specifically, we calculated the inter-individual variability using all ROIs from the CONN Atlas (except auditory and language regions) as the control seed regions for the FC. We showed that the variability of connectivity from the auditory cortex is uniquely more increased on deafness, as compared to these control ROIs (Figure S1). This additional analysis supports the specificity of our findings to the auditory cortex in the deaf population. We aim to integrate more analytic approaches, including graph theory methods, in our future work.

      Minor

      (1) Some citations display the initial of the author in addition to the last name, unless there is something I don't know about the citation system, the initial shouldn't be there.

      This is due to the citation style we're using (APA 7th edition, as suggested by eLife), which requires including the first author's initials in all in-text citations when citing multiple authors with the same last name.  

      Reviewer #3 (Recommendations For The Authors):

      (1) I recommend that the authors provide behavioral data and results for overall neural activation.

      Thanks. We have added these to the revised manuscript. Specifically, we report that there was no difference in the activation for words (p < .05, cluster-corrected for multiple comparisons) between the deaf and hearing participants. Further, we report the behavioral averages for accuracy and reaction time for each group, and have now used these individual values explicitly as nuisance variables in the revised analyses.

      (2) For the correlation between FC and FC variability, it seemed a bit odd that the permuted data were treated additionally (through Gaussian smoothing). I understand the general logic (i.e., to reintroduce smoothness), but this approach provides more smoothing to the permutation than the original data. It is hard to know what this does to the statistical distribution. I recommend using a different approach or at least also reporting the p-value for non-smoothed permutation data.

      In response to this suggestion and to ensure transparency in our results, we have now included also the p-value for the non-smoothed permutation data in our revised manuscript (still highly significant; p < .0001). Thanks for this proposal.

      (3) For the map comparison, a plot with different colors, showing the FC map, the FC variability map, and one map for the overlap on the same brain may be helpful.

      We thank the Reviewer for their suggestion to visualize the overlap between the maps. However, we performed the correlation analysis using the unthresholded maps, as mentioned in the methods section of our manuscript, specifically to estimate the link not only between the most significant peaks of the effects, but across the whole brain patterns. This is why the maps displayed in the figures, which are thresholded for significance, may not appear to match perfectly, and may actually obscure the correlation across the brain. This methodological detail is crucial for interpreting the relationship and overlap between these maps accurately but also explains why the visualization of the overlap is, unfortunately, not very informative.

    2. eLife Assessment

      This study presents valuable data on the increase in individual differences in functional connectivity with the auditory cortex in individuals with congenital/early-onset hearing loss compared to individuals with normal hearing. The evidence supporting the study's claims is convincing, although additional work using resting-state functional connectivity and further links to how the results align with the underlying biology could have further strengthened the study. The work will be of interest to neuroscientists working on brain plasticity and may have implications for the design of interventions and compensatory strategies.

    3. Reviewer #1 (Public review):

      This experiment sought to determine what effect congenital/early-onset hearing loss (and associated delay in language onset) has on the degree of inter-individual variability in functional connectivity to the auditory cortex. Looking at differences in variability rather than group differences in mean connectivity itself represents an interesting addition to the existing literature. The sample of deaf individuals was large, and quite homogeneous in terms of age of hearing loss onset, which are considerable strengths of the work. The experiment appears well conducted and the results are certainly of interest.

    4. Reviewer #2 (Public review):

      Summary:

      This study focuses on changes in brain organization associated with congenital deafness. The authors investigate differences in functional connectivity (FC) and differences in the variability of FC. By comparing congenitally deaf individuals to individuals with normal hearing, and by further separating congenitally deaf individuals into groups of early and late signers, the authors can distinguish between changes in FC due to auditory deprivation and changes in FC due to late language acquisition. They find larger FC variability in deaf than normal-hearing individuals in temporal, frontal, parietal, and midline brain structures, and that FC variability is largely driven by auditory deprivation. They suggest that the regions that show a greater FC difference between groups also show greater FC variability.

      Strengths:

      The manuscript is well-written, and the methods are clearly described and appropriate. Including the three different groups enables the critical contrasts distinguishing between different causes of FC variability changes. The results are interesting and novel.

      Weaknesses:

      Analyses were conducted for task-based data rather than resting-state data. The authors report behavioral differences between groups and include behavioral performance as a nuisance regressor in their analysis. This is a good approach to account for behavioral task differences, given the data. Nevertheless, additional work using resting-state functional connectivity could remove the potential confound fully.

      The authors have addressed my concerns well.

    1. Author response:

      Reviewer #1 (Public Review):

      Summary

      The authors asked if parabrachial CGRP neurons were only necessary for a threat alarm to promote freezing or were necessary for a threat alarm to promote a wider range of defensive behaviors, most prominently flight.

      Major Strengths of Methods and Results

      The authors performed careful single-unit recording and applied rigorous methodologies to optogenetically tag CGRP neurons within the PBN. Careful analyses show that single-units and the wider CGRP neuron population increases firing to a range of unconditioned stimuli. The optogenetic stimulation of experiment 2 was comparatively simpler but achieved its aim of determining the consequence of activating CGRP neurons in the absence of other stimuli. Experiment 3 used a very clever behavioral approach to reveal a setting in which both cue-evoked freezing and flight could be observed. This was done by having the unconditioned stimulus be a "robot" traveling along a circular path at a given speed. Subsequent cue presentation elicited mild flight in controls and optogenetic activation of CGRP neurons significantly boosted this flight response. This demonstrated for the first time that CGRP neuron activation does more than promote freezing. The authors conclude by demonstrating that bidirectional modulation of CGRP neuron activity bidirectionally aTects freezing in a traditional fear conditioning setting and aTects both freezing and flight in a setting in which the robot served as the unconditioned stimulus. Altogether, this is a very strong set of experiments that greatly expand the role of parabrachial CGRP neurons in threat alarm.

      We would like to sincerely thank the reviewer for the positive and insightful comments on our work. We greatly appreciate the acknowledgment of our new behavioral approach, which allowed us to observe a dynamic spectrum of defensive behaviors in animals. Our use of the robot-based paradigm, which enables the observation of both freezing and flight, has been instrumental in expanding our understanding of how parabrachial CGRP neurons modulate diverse threat responses. We are pleased that the reviewer found this methodological innovation to be a valuable contribution to the field.

      Weaknesses

      In all of their conditioning studies the authors did not include a control cue. For example, a sound presented the same number of times but unrelated to US (shock or robot) presentation. This does not detract from their behavioral findings. However, it means the authors do not know if the observed behavior is a consequence of pairing. Or is a behavior that would be observed to any cue played in the setting? This is particularly important for the experiments using the robot US.

      We appreciate the reviewer’s insightful comment regarding the absence of a control cue in our conditioning studies. First, we would like to mention that, in response to the Reviewer 3, we have updated how we present our flight data by following methods from previously published papers (Fadok et al., 2017; Borkar et al., 2024). Instead of counting flight responses, we calculated flight scores as the ratio of the velocity during the CS to the average velocity in the 7 s before the CS on the conditioning day (or 10 s for the retention test). This method better captures both the speed and duration of fleeing during CS. With this updated approach, we observed a significant difference in flight scores between the ChR2 and control groups, even during conditioning, which may partly address the reviewer’s concern about whether the observed behavior is a consequence of CS-US pairing.

      However, we agree with the reviewer that including an unpaired group would provide stronger evidence, and in response, we conducted an additional experiment with an unpaired group. In this unpaired group, the CS was presented the same number of times, but the robot US was delivered randomly within the inter-trial interval. The unpaired group did not exhibit any notable conditioned freezing or flight responses. We believe that this additional experiment, now reflected in Figure 3, further strengthens our conclusion that the fleeing behavior is driven by associative learning between the CS and US, rather than a reaction to the cue itself.

      The authors make claims about the contribution of CGRP neurons to freezing and fleeing behavior, however, all of the optogenetic manipulations are centered on the US presentation period. Presently, the experiments show a role for these neurons in processing aversive outcomes but show little role for these neurons in cue responding or behavior organizing. Claims of contributions to behavior should be substantiated by manipulations targeting the cue period.

      We appreciate the reviewer’s constructive comments. We would like to emphasize that our primary objective in this study was to investigate whether activating parabrachial CGRP neurons—thereby increasing the general alarm signal—would elicit different defensive behaviors beyond passive freezing. To this end, we focused on manipulating CGRP neurons during the US period rather than the cue period.

      Previous studies have shown that CGRP neurons relay US signals, and direct activation of CGRP neurons has been used as the US to successfully induce conditioned freezing responses to the CS during retention tests (Han et al., 2015; Bowen et al., 2020). In our experiments, we also observed that CGRP neurons responded exclusively to the US during conditioning with the robot (Figure 1F), and stimulating these neurons in the absence of any external stimuli elicited strong freezing responses (Figure 2B). These findings, collectively, suggest that activation of CGRP neurons during the CS period would predominantly result in freezing behavior.

      Therefore, we manipulated the activity of CGRP neurons during the US period to examine whether adjusting the perceived threat level through these neurons would result in diverse dfensive behaivors when paired with chasing robot. We observed that enhancing CGRP neuron activity while animals were chased by the robot at 70 cm/s made them react as if chased at a higher speed (90 cm/s), leading to increased fleeing behaviors. While this may not fully address the role of these neurons in cue responding or behavior organizing, we found that silencing CGRP neurons with tetanus toxin (TetTox) abolished fleeing behavior even when animals were chased at high speeds (90 cm/s), which usually elicits fleeing without CGRP manipulation (Figure 5). This supports the conclusion that CGRP neurons are necessary for processing fleeing responses.

      In summary, manipulating CGRP neurons during the US period was essential for effectively investigating their role in adjusting defensive responses, thereby expanding our understanding of their function within the general alarm system. We hope this clarifies our experimental design and addresses the concern the reviewer has raised.

      Appraisal

      The authors achieved their aims and have revealed a much greater role for parabrachial CGRP neurons in threat alarm.

      Discussion

      Understanding neural circuits for threat requires us (as a field) to examine diverse threat settings and behavioral outcomes. A commendable and rigorous aspect of this manuscript was the authors decision to use a new behavioral paradigm and measure multiple behavioral outcomes. Indeed, this manuscript would not have been nearly as impactful had they not done that. This novel behavior was combined with excellent recording and optogenetic manipulations - a standard the field should aspire to. Studies like this are the only way that we as a field will map complete neural circuits for threat.

      We sincerely thank the reviewer for their positive and encouraging comments. We are grateful for the acknowledgment of our efforts in employing a novel behavioral paradigm to study diverse defensive behaviors. We are pleased that our work contributes to advancing the understanding of neural circuits involved in threat responses.

      Reviewer #3 (Public Review):

      Strengths:

      The study used optogenetics together with in vivo electrophysiology to monitor CGRP neuron activity in response to various aversive stimuli including robot chasing to determine whether they encode noxious stimuli diTerentially. The study used an interesting conditioning paradigm to investigate the role of CGRP neurons in the PBN in both freezing and flight behaviors.

      Weakness:

      The major weakness of this study is that the chasing robot threat conditioning model elicits weak unconditioned and conditioned flight responses, making it diTicult to interpret the robustness of the findings. Furthermore, the conclusion that the CGRP neurons are capable of inducing flight is not substantiated by the data. No manipulations are made to influence the flight behavior of the mouse. Instead, the manipulations are designed to alter the intensity of the unconditioned stimulus.

      We sincerely thank the reviewer for the thoughtful and constructive comments on our manuscript. In response to this feedback, we revisited our analysis of the flight responses and compared our methods with those used in previous literatures examining similar behaviors.

      We reviewed a study investigating sex differences in defensive behavior using rats (Gruene et al., 2015). In that study, the CS was presented for 30 s, and active defensive behvaior – referred to as ‘darting’ – was quantified as ‘Dart rate (dart/min)’. This was calculated by doubling the number of darts counted during the 30-s CS presentation to extrapolate to a per-min rate. The highest average dart rate observed was approximatley 1.5. Another relevant studies using mice quantified active defensive behavior by calculating a flight score—the ratio of the average speed during each CS to the average speed during the 10 s pre-CS period (Fadok et al., 2017; Borkar et al., 2024). This method captures multiple aspects of flight behavior during CS presentation, including overall velocity, number of bouts, and duration of fleeing. Moreover, it accounts for each animal’s individual velocity prior to the CS, reflecting how fast the animals were fleeing relative to their baseline activity.

      In our original analysis, we quantified flight responses by counting rapid fleeing movements, defined as movements exceeding 8 cm/s. This approach was consistent with our previous study using the same robot paradigm to observe unique patterns of defensive behavior related to sex differences (Pyeon et al., 2023). Based on our earlier findings, where this approach effectively identified significant differences in defensive behaviors, we believed that this method was appropriate for capturing conditioned flight behavior within our specific experimental context. However, prompted by the reviewer's insightful comments, we recognized that our initial method might not fully capture the robustness of the flight responses. Therefore, we re-analyzed our data using the flight score method described by Fadok and colleagues, which provides a more sensitive measure of fleeing during the CS.

      Re-analyzing our data revealed a more robust flight response than previously reported, demonstrating that additional CGRP neuron stimulation promoted flight behavior in animals during conditioning, addressing the concern that the data did not substantiate the role of CGRP neurons in inducing flight. In addition, we would like to emphasize the findings from our final experiment, where silencing CGRP neurons, even under high-threat conditions (90 cm/s), prevented animals from exhibiting flight responses. This demonstrates that CGRP neurons are necessary in influencing flight responses.

      We have updated all flight data in the manuscript and revised the relevant figures and text accordingly. We appreciate the opportunity to enhance our analysis. The reviewer's insightful observation led us to adopt a better method for quantifying flight behavior, which substantiates our conclusion about the role of CGRP neurons in modulating defensive responses.

      Borkar, C.D., Stelly, C.E., Fu, X., Dorofeikova, M., Le, Q.-S.E., Vutukuri, R., et al. (2024). Top- down control of flight by a non-canonical cortico-amygdala pathway. Nature 625(7996), 743-749.

      Bowen, A.J., Chen, J.Y., Huang, Y.W., Baertsch, N.A., Park, S., and Palmiter, R.D. (2020). Dissociable control of unconditioned responses and associative fear learning by parabrachial CGRP neurons. Elife 9, e59799.

      Fadok, J.P., Krabbe, S., Markovic, M., Courtin, J., Xu, C., Massi, L., et al. (2017). A competitive inhibitory circuit for selection of active and passive fear responses. Nature 542(7639), 96-100.

      Gruene, T.M., Flick, K., Stefano, A., Shea, S.D., and Shansky, R.M. (2015). Sexually divergent expression of active and passive conditioned fear responses in rats. Elife 4, e11352.

      Han, S., Soleiman, M.T., Soden, M.E., Zweifel, L.S., and Palmiter, R.D. (2015). Elucidating an a_ective pain circuit that creates a threat memory. Cell 162(2), 363-374.

      Pyeon, G.H., Lee, J., Jo, Y.S., and Choi, J.-S. (2023). Conditioned flight response in female rats to naturalistic threat is estrous-cycle dependent. Scientific Reports 13(1), 20988.

    1. eLife Assessment

      So et al. present an optimized protocol for single-nuclei RNA sequencing of adipose tissue in mice, ensuring better RNA quality and nuclei integrity. The authors use this protocol to explore the cellular landscape in both lean and diet-induced obese mice, identifying a dysfunctional hypertrophic adipocyte subpopulation linked to obesity. The data analyses are solid, and the findings are supported by the evidence presented. This study provides valuable information for the field of adipose tissue biology and will be particularly helpful for researchers using single-nuclei transcriptomics in various tissues.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript from So et al. describes what is suggested to be an improved protocol for single-nuclei RNA sequencing (snRNA-seq) of adipose tissue. The authors provide evidence that modifications to the existing protocols result in better RNA quality and nuclei integrity than previously observed, with ultimately greater coverage of the transcriptome upon sequencing. Using the modified protocol, the authors compare the cellular landscape of murine inguinal and perigonadal white adipose tissue (WAT) depots harvested from animals fed a standard chow diet (lean mice) or those fed a high-fat diet (mice with obesity).

      Strengths:

      Overall, the manuscript is well written, and the data are clearly presented. The strengths of the manuscript rest in the description of an improved protocol for snRNA-seq analysis. This should be valuable for the growing number of investigators in the field of adipose tissue biology that are utilizing snRNA-seq technology, as well as those other fields attempting similar experiments with tissues possessing high levels of RNAse activity.

      Moreover, the study makes some notable observations that provide the foundation for future investigation. One observation is the correlation between nuclei size and cell size, allowing for the transcriptomes of relatively hypertrophic adipocytes in perigonadal WAT to be examined. Another notable observation is the identification of an adipocyte subcluster (Ad6) that appears "stressed" or dysfunctional and likely localizes to crown-like inflammatory structures where pro-inflammatory immune cells reside.

      Weaknesses:

      Analogous studies have been reported in the literature, including a notable study from Savari et al. (Cell Metabolism). This somewhat diminishes the novelty of some of the biological findings presented here. This is deemed a minor criticism as the primary goal is to provide a resource for the field.

    3. Reviewer #2 (Public review):

      Summary:

      In the present manuscript So et al describe an optimized method for nuclei isolation and single nucleus RNA sequencing (snRNA-Seq), which they use to characterize cell populations in lean and obese murine adipose tissues.

      Strengths:

      The detailed description of the protocol for single-nuclei isolation incorporating VRC may be useful to researchers studying adipose tissues, which contain high levels of RNAses.

      While the majority of the findings largely confirm previous published adipose data sets, the authors present a detailed description of a mature adipocyte sub-cluster that appears to represent stressed or dying adipocytes present in obesity, and which is better characterized using the described protocol.

      Weaknesses:

      The use of VRC to enhance snRNA-seq has been previously published in other tissues, somewhat diminishing the novelty of this protocol.

      The snRNA-seq data sets presented in this manuscript, when compared with numerous previously published single-cell analysis of adipose tissue, represent an incremental contribution. The nuclei-isolation protocol may represent an improvement in transcriptional analysis for mature adipocytes, however other stromal populations may be better sequenced using single intact-cell cytoplasmic RNA-Seq.

    4. Reviewer #3 (Public review):

      The authors aimed to improve single-nucleus RNA sequencing (snRNA-seq) to address current limitations and challenges with nuclei and RNA isolation quality. They successfully developed a protocol that enhances RNA preservation and yields high-quality snRNA-seq data from multiple tissues, including a challenging model of adipose tissue. They then applied this method to eWAT and iWAT from mice fed either a normal or high-fat diet, exploring depot-specific cellular dynamics and gene expression changes during obesity. Their analysis included subclustering of SVF cells and revealed that obesity promotes a transition in APCs from an early to a committed state and induces a pro-inflammatory phenotype in immune cells, particularly in eWAT. In addition to SVF cells, they discovered six adipocyte subpopulations characterized by a gradient of unique gene expression signatures. Interestingly, a novel subpopulation, termed Ad6, comprised stressed and dying adipocytes with reduced transcriptional activity, primarily found in eWAT of mice on a high-fat diet. Overall, the methodology is sound, and the data presented supports the conclusions drawn. Further research based on these findings could pave the way for potential novel interventions in obesity and metabolic disorders, or for similar studies in other tissues or conditions.

      Strengths:

      The authors have presented a compelling set of results. They have compared their data with two previously published datasets and provide novel insight into the biological processes underlying mouse adipose tissue remodeling during obesity. The results are generally consistent and robust. The revised Discussion is comprehensive and puts the work in the context of the field.

      Weaknesses:

      • The adipose tissues were collected after 10 weeks of high-fat diet treatment, lacking the intermediate time points for identifying early markers or cell populations during the transition from healthy to pathological adipose tissue.<br /> • The expansion of the Ad6 subpopulation in obese iWAT and gWAT is interesting. The author claims that Ad6 exhibited a substantial increase in eWAT and a moderate rise in iWAT (Figure 4C). However, this adipocyte subpopulation remains the most altered in iWAT upon obesity. Could the authors elaborate on why there is a scarcity of adipocytes with ROS reporter and B2M in obese iWAT?<br /> • While the study provides extensive data on mouse models, the potential translation of these findings to human obesity remains uncertain.

      Revised version: The authors have properly revised the paper in response to the above questions, and I have no other concerns.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript from So et al. describes what is suggested to be an improved protocol for single-nuclei RNA sequencing (snRNA-seq) of adipose tissue. The authors provide evidence that modifications to the existing protocols result in better RNA quality and nuclei integrity than previously observed, with ultimately greater coverage of the transcriptome upon sequencing. Using the modified protocol, the authors compare the cellular landscape of murine inguinal and perigonadal white adipose tissue (WAT) depots harvested from animals fed a standard chow diet (lean mice) or those fed a high-fat diet (mice with obesity). 

      Strengths: 

      Overall, the manuscript is well-written, and the data are clearly presented. The strengths of the manuscript rest in the description of an improved protocol for snRNA-seq analysis. This should be valuable for the growing number of investigators in the field of adipose tissue biology that are utilizing snRNA-seq technology, as well as those other fields attempting similar experiments with tissues possessing high levels of RNAse activity. 

      Moreover, the study makes some notable observations that provide the foundation for future investigation. One observation is the correlation between nuclei size and cell size, allowing for the transcriptomes of relatively hypertrophic adipocytes in perigonadal WAT to be examined. Another notable observation is the identification of an adipocyte subcluster (Ad6) that appears "stressed" or dysfunctional and likely localizes to crown-like inflammatory structures where proinflammatory immune cells reside. 

      Weaknesses:  

      Analogous studies have been reported in the literature, including a notable study from Savari et al. (Cell Metabolism). This somewhat diminishes the novelty of some of the biological findings presented here. Moreover, a direct comparison of the transcriptomic data derived from the new vs. existing protocols (i.e. fully executed side by side) was not presented. As such, the true benefit of the protocol modifications cannot be fully understood. 

      We agree with the reviewer’s comment on the limitations of our study. Following the reviewer's suggestion, we performed a new analysis by integrating our data with those from the study by Emont et al. Please refer to the Recommendation for authors section below for further details.

      Reviewer #2 (Public Review):

      Summary: 

      In the present manuscript So et al utilize single-nucleus RNA sequencing to characterize cell populations in lean and obese adipose tissues. 

      Strengths: 

      The authors utilize a modified nuclear isolation protocol incorporating VRC that results in higherquality sequencing reads compared with previous studies. 

      Weaknesses:  

      The use of VRC to enhance snRNA-seq has been previously published in other tissues. The snRNA-seq snRNA-seq data sets presented in this manuscript, when compared with numerous previously published single-cell analyses of adipose tissue, do not represent a significant scientific advance. 

      Figure 1-3: The snRNA-seq data obtained by the authors using their enhanced protocol does not represent a significant improvement in cell profiling for the majority of the highlighted cell types including APCs, macrophages, and lymphocytes. These cell populations have been extensively characterized by cytoplasmic scRNA-seq which can achieve sufficient sequencing depth, and thus this study does not contribute meaningful additional insight into these cell types. The authors note an increase in the number of rare endothelial cell types recovered, however this is not translated into any kind of functional analysis of these populations. 

      We acknowledge the reviewer's comments on the limitations of our study, particularly the lack of extension of our snRNA-seq data into functional studies of new biological processes. However, this manuscript has been submitted as a Tools and Resources article. As an article of this type, we provide detailed information on our snRNA-seq methods and present a valuable resource of high-quality mouse adipose tissue snRNA-seq data. In addition, we demonstrate that our improved method offers novel biological insights, including the identification of subpopulations of adipocytes categorized by size and functionality. We believe this study offers powerful tools and significant value to the research community.

      Figure 4: The authors did not provide any evidence that the relative fluorescent brightness of GFP and mCherry is a direct measure of the nuclear size, and the nuclear size is only a moderate correlation with the cell size. Thus sorting the nuclei based on GFP/mCherry brightness is not a great proxy for adipocyte diameter. Furthermore, no meaningful insights are provided about the functional significance of the reported transcriptional differences between small and large adipocyte nuclei. 

      To address the reviewer's point, we analyzed the Pearson correlation coefficient for nucleus size vs. adipocyte size and found R = 0.85, indicating a strong positive correlation. In addition, we performed a new experiment to determine the correlation between nuclear GFP intensity and adipocyte nucleus size, finding a strong correlation with R = 0.91. These results suggest that nuclear GFP intensity can be a strong proxy for adipocyte size. Furthermore, we performed gene ontology analysis on genes differentially regulated between large and small adipocyte nuclei. We found that large adipocytes promote processes involved in insulin response, vascularization and DNA repair, while inhibiting processes related to cell migration, metabolism and the cytoskeleton. We have added these new data as Figure 4E, S6E, S6G, and S6H (page 11)

      Figure 5-6: The Ad6 population is highly transcriptionally analogous to the mAd3 population from Emont et al, and is thus not a novel finding. Furthermore, in the present data set, the authors conclude that Ad6 are likely stressed/dying hypertrophic adipocytes with a global loss of gene expression, which is a well-documented finding in eWAT > iWAT, for which the snRNA-seq reported in the present manuscript does not provide any novel scientific insight. 

      As the reviewer pointed out, a new analysis integrating our data with the previous study found that Ad3 from our study is comparable to mAd3 from Emont et al. in gene expression profiles. However, significant discrepancies in population size and changes in response to obesity were observed, likely due to differences in technical robustness. The dysfunctional cellular state of this population, with compromised RNA content, may have hindered accurate capture in the previous study, while our protocol enabled precise detection. This underscores the importance of our improved snRNA-seq protocol for accurately understanding adipocyte population dynamics. We have revised the manuscript to include new data in Figure S7 (page 14).

      Reviewer #3 (Public Review): 

      Summary:  

      The authors aimed to improve single-nucleus RNA sequencing (snRNA-seq) to address current limitations and challenges with nuclei and RNA isolation quality. They successfully developed a protocol that enhances RNA preservation and yields high-quality snRNA-seq data from multiple tissues, including a challenging model of adipose tissue. They then applied this method to eWAT and iWAT from mice fed either a normal or high-fat diet, exploring depot-specific cellular dynamics and gene expression changes during obesity. Their analysis included subclustering of SVF cells and revealed that obesity promotes a transition in APCs from an early to a committed state and induces a pro-inflammatory phenotype in immune cells, particularly in eWAT. In addition to SVF cells, they discovered six adipocyte subpopulations characterized by a gradient of unique gene expression signatures. Interestingly, a novel subpopulation, termed Ad6, comprised stressed and dying adipocytes with reduced transcriptional activity, primarily found in eWAT of mice on a high-fat diet. Overall, the methodology is sound, the writing is clear, and the conclusions drawn are supported by the data presented. Further research based on these findings could pave the way for potential novel interventions in obesity and metabolic disorders, or for similar studies in other tissues or conditions. 

      Strengths:  

      • The authors developed a robust snRNA-seq technique that preserves the integrity of the nucleus and RNA across various tissue types, overcoming the challenges of existing methods. 

      • They identified adipocyte subpopulations that follow adaptive or pathological trajectories during obesity. 

      • The study reveals depot-specific differences in adipose tissues, which could have implications for targeted therapies. 

      Weaknesses: 

      • The adipose tissues were collected after 10 weeks of high-fat diet treatment, lacking the intermediate time points for identifying early markers or cell populations during the transition from healthy to pathological adipose tissue. 

      We agree with the reviewers regarding the limitations of our study. To address the reviewer’s comment, we revised the manuscript to include this in the Discussion section (page 17).  

      • The expansion of the Ad6 subpopulation in obese iWAT and gWAT is interesting. The author claims that Ad6 exhibited a substantial increase in eWAT and a moderate rise in iWAT (Figure 4C). However, this adipocyte subpopulation remains the most altered in iWAT upon obesity. Could the authors elaborate on why there is a scarcity of adipocytes with ROS reporter and B2M in obese iWAT?

      We observed an increase in the levels of H2DCFA reporter and B2M protein fluorescence in adipocytes from iWAT of HFD-fed mice, although this increase was much less compared to eWAT, as shown in Figure 6B (left panel). These increases in iWAT were not sufficient for most cells to exceed the cutoff values used to determine H2DCFA and B2M positivity in adipocytes during quantitative analysis. We have revised the manuscript to clarify these results (page 13).

      • While the study provides extensive data on mouse models, the potential translation of these findings to human obesity remains uncertain. 

      To address the reviewer’s point, we expanded our discussion on the differences in adipocyte heterogeneity between mice and humans. We attempted to identify human adipocyte subclusters that resemble the metabolically unhealthy Ad6 adipocytes found in mice in our study; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Suggested points to address: 

      (1) The authors suggest that their improved protocol for maintaining RNA/nucleus integrity results in a more comprehensive analysis of adipose tissue heterogeneity. The authors compare the quality of their snRNA-seq data to those generated in prior studies (e.g., Savari et al.). What is not clear is whether additional heterogeneity/clusters can be observed due directly to the protocol modifications. A direct head-to-head comparison of the protocols executed in parallel would of course be ideal; however, integrating their new dataset with the corresponding data from Savari et al. could help address this question and help readers understand the benefits of this new protocol vs. existing protocols. 

      The data from Savari et al. are of significantly lower quality, likely because they were generated using earlier versions of the 10X Genomics system, and this study lacks iWAT data. To address the reviewer’s point, we instead integrated our data with those from the other study by Emont et al. (2022), which used comparable tissue types and experimental systems. The integrated analysis confirmed the improved representation of all cell types present in adipose tissues in our study, with higher quality metrics such as increased Unique Molecular Identifiers (UMIs) and the number of genes per nucleus. These results indicate that our protocol offers significant advantages in generating a more accurate representation of each cell type and their gene expression profiles. New data are included in Figure S2 (page 7).

      (2) The exact frequency of the Ad6 population in eWAT of mice maintained on HFD is a little unclear. From the snRNA-seq data, it appears that roughly 47% of the adipocytes are in this "stressed state." In Figure 6, it appears that greater than 75% of the adipocytes express B2M (Ad6 marker) and greater than 75% of adipocytes are suggested to be devoid of measurable PPARg expression. The latter seems quite high as PPARg expression is essential to maintain the adipocyte phenotype. Is there evidence of de-differentiation amongst them (i.e. acquisition of progenitor cell markers)? Presenting separate UMAPs for the chow vs. HFD state may help visualize the frequency of each adipocyte population in the two states. Inclusion of the stromal/progenitor cells in the visualization may help understand if cells are de-differentiating in obesity as previously postulated by the authors. Related to Point # 1 above, is this population observed in prior studies and at a similar frequency?

      To address the reviewer’s point, we analyzed the expression of adipocyte progenitor cell (APC) markers, such as Pdgfra, in the Ad6 population. We did not detect significant expression of APC markers, suggesting that Ad6 does not represent dedifferentiating adipocytes. Instead, they are likely stressed and dying cells characterized by an aberrant state of transcription with a global decline.

      When integrating our data with the datasets by Emont et al., we observed an adipocyte population in the previous study, mAd3, comparable to Ad6 in our study, with similar marker gene expression and lower transcript abundance. However, the population size of mAd3 was much smaller than that of Ad6 in our data and did not show consistent population changes during obesity. This discrepancy may be due to different technical robustness; the dysfunctional cellular state of this population, with its severely compromised RNA contents, may have made it difficult to accurately capture using standard protocols in the previous study, while our protocol enabled robust and precise detection. We added new data in Figure S6I and S7 (page 14) and revised the Discussion (page 17).

      Additional points  

      (1) The authors should be cautious in describing subpopulations as "increasing" or "decreasing" in obesity as the data are presented as proportions of a parent population. A given cell population may be "relatively increased." 

      To address the reviewer's point, we revised the manuscript to clarify the "relative" changes in cell populations during obesity in the relevant sections (pages 8, 9, 10, 11, and 15).

      (2) The authors should also be cautious in ascribing "function" to adipocyte populations based solely on their expression signatures. Statements such as those in the abstract, "...providing novel insights into the mechanisms orchestrating adipose tissue remodeling during obesity..." should probably be toned down as no such mechanism is truly demonstrated. 

      To address the reviewer's point, we revised the manuscript by removing or replacing the indicated terms or phrases with more suitable wording in the appropriate sections (page 2, 10, 12, 14)

      Reviewer #3 (Recommendations For The Authors): 

      (1) The authors might consider expanding a discussion on the potential implications of their findings, especially the newly identified adipocyte subpopulations and depot-specific differences for human studies. 

      To address the reviewer’s point, we attempted to identify human adipocyte subclusters that resembled our dysfunctional Ad6 adipocytes in mice; however, we did not find any similar adipocyte types. It has been reported that human adipocyte heterogeneity does not correspond well to that of mouse adipocytes (Emont et al. 2022). In addition, the heterogeneity of human adipocyte populations is not reproducible between different studies (Massier et al. 2023). Interestingly, this inconsistency is unique to adipocytes, as other cell types in adipose tissues display reproducible sub cell types across species and studies (Massier et al. 2023). Our findings indicate that adipocytes may exhibit a unique pathological cellular state with significantly reduced RNA content, which may contribute to the poor consistency in adipocyte heterogeneity in prior studies with suboptimal RNA quality. Therefore, using a robust method to effectively preserve RNA quality may be critical for accurately characterizing adipocyte populations, especially in disease states. It may be important to test in future studies whether our snRNA-seq protocol can identify consistent heterogeneity in adipocyte populations across different species, studies, and individual human subjects. We have revised the manuscript to include this new discussion (page 17)

      (2) typo: "To generate diet-induced obesity models". 

      We revised the manuscript to correct it.

    1. Author response:

      Reviewer #1 (Public Review):

      The authors examined the hypothesis that plasma ApoM, which carries sphingosine-1-phosphate (S1P) and activates vascular S1P receptors to inhibit vascular leakage, is modulated by SGLT2 inhibitors (SGLTi) during endotoxemia. They also propose that this mechanism is mediated by SGLTi regulation of LRP2/ megalin in the kidney and that this mechanism is critical for endotoxin-induced vascular leak and myocardial dysfunction. The hypothesis is novel and potentially exciting. However, the author's experiments lack critical controls, lack rigor in multiple aspects, and overall does not support the conclusions.

      Thank you for these comments. We have now directly addressed this hypothesis by using proximal tubule-specific inducible megalin/Lrp2 knockout mice, which remains an innovative hypothesis about how SGLT2i can reduce vascular leak.

      Reviewer #2 (Public Review):

      Apolipoprotein M (ApoM) is a plasma carrier for the vascular protective lipid mediator sphingosine 1-phospate (S1P). The plasma levels of S1P and its chaperones ApoM and albumin rapidly decline in patients with severe sepsis, but the mechanisms for such reductions and their consequences for cardiovascular health remain elusive. In this study, Ripoll and colleagues demonstrate that the sodium-glucose co-transporter inhibitor dapagliflozin (Dapa) can preserve serum ApoM levels as well as cardiac function after LPS treatment of mice with diet-induced obesity. They further provide data to suggest that Dapa preserves serum ApoM by increasing megalin-mediated reabsorption of ApoM in renal proximal tubules and that ApoM improves vascular integrity in LPS treated mice. These observations put forward a potential therapeutic approach to sustain vascular protective S1P signaling that could be relevant to other conditions of systemic inflammation where plasma levels of S1P decrease. However, although the authors are careful with their statements, the study falls short of directly implicating megalin in ApoM reabsorption and of ApoM/S1P depletion in LPS-induced cardiac dysfunction and the protective effects of Dapa.

      The observations reported in this study are exciting and potentially of broad interest. The paper is well written and concise, and the statements made are mostly supported by the data presented. However, the mechanism proposed and implied is mostly based on circumstantial evidence, and the paper could be substantially improved by directly addressing the role of megalin in ApoM reabsorption and serum ApoM and S1P levels and the importance of ApoM for the preservation for cardiac function during endotoxemia. Some observations that are not necessarily in line with the model proposed should also be discussed.

      The authors show that Dapa preserves serum ApoM and cardiac function in LPS-treated obese mice. However, the evidence they provide to suggest that ApoM may be implicated in the protective effect of Dapa on cardiac function is indirect. Direct evidence could be sought by addressing the effect of Dapa on cardiac function in LPS treated ApoM deficient and littermate control mice (with DIO if necessary).

      The authors also suggest that higher ApoM levels in mice treated with Dapa and LPS reflect increased megalin-mediated ApoM reabsorption and that this preserves S1PR signaling. This could be addressed more directly by assessing the clearance of labelled ApoM, by addressing the impact of megalin inhibition or deficiency on ApoM clearance in this context, and by measuring S1P as well as ApoM in serum samples.

      Methods: More details should be provided in the manuscript for how ApoM deficient and transgenic mice were generated, on sex and strain background, and on whether or not littermate controls were used. For intravital microscopy, more precision is needed on how vessel borders were outland and if this was done with or without regard for FITC-dextran. Please also specify the type of vessel chosen and considerations made with regard to blood flow and patency of the vessels analyzed. For statistical analyses, data from each mouse should be pooled before performing statistical comparisons. The criteria used for choice of test should be outlined as different statistical tests are used for similar datasets. For all data, please be consistent in the use of post-tests and in the presentation of comparisons. In other words, if the authors choose to only display test results for groups that are significantly different, this should be done in all cases. And if comparisons are made between all groups, this should be done in all cases for similar sets of data.

      Thank you for these comments. We have now tested the direct role of Lrp2 with respect to SGLT2i in vivo and in vitro, and our study now shows that Lrp2 is required for the effect of dapagliflozin on ApoM. ApoM deficient and transgenic mice were previously described and published by our group (PMID: 37034289) and others (PMID: 24318881), and littermate controls were used throughout our manuscript. We agree that the effect on cardiac function is likely indirect in these models, and as yet we do not have the tools in the LPS model to separate potential endothelial protective vs cardiac effects. In addition, since the ApoM knockout has multiple abnormalities that include hypertension, secondary cardiac hypertrophy, and an adipose/browning phenotype, all of which may influence its response to Dapa in terms of cardiac function, these studies will be challenging to perform and will require additional models that are beyond the scope of this manuscript.

      For intravital microscopy, vessel borders were outlined blindly without regard for FITC-dextran. We believe it is important to show multiple blood vessels per mouse since, as the reviewer points out, there is quite a bit of vessel heterogeneity. These tests were performed in the collaborator’s laboratory, and data analysis was blinded, and the collaborator was unaware of the study hypothesis at the time the measurements were performed and analyzed. They have previously reported this is a valid method to show cremaster vessel permeability (PMID: 26839042).

      We have updated our methods section and updated the figure legends to clearly indicate the statistical analyses we used. For 2 group comparison we used student’s t-test, and for multiple groups one-way ANOVA with Sidak's correction for multiple comparisons was used throughout the paper when the data are normally distributed, and Kruskal-Wallis was used when the data are not normally distributed.

      Reviewer #3 (Public Review):

      The authors have performed well designed experiments that elucidate the protective role of Dapa in sepsis model of LPS. This model shows that Dapa works, in part, by increasing expression of the receptor LRP2 in the kidney, that maintains circulating ApoM levels. ApoM binds to S1P which then interacts with the S1P receptor stimulating cardiac function, epithelial and endothelial barrier function, thereby maintaining intravascular volume and cardiac output in the setting of severe inflammation. The authors used many experimental models, including transgenic mice, as well as several rigorous and reproducible techniques to measure the relevant parameters of cardiac, renal, vascular, and immune function. Furthermore, they employ a useful inhibitor of S1P function to show pharmacologically the essential role for this agonist in most but not all the benefits of Dapa. A strength of the paper is the identification of the pathway responsible for the cardioprotective effects of SGLT2is that may yield additional therapeutic targets. There are some weaknesses in the paper, such as, studying only male mice, as well as providing a power analysis to justify the number of animals used throughout their experimentation. Overall, the paper should have a significant impact on the scientific community because the SGLT2i drugs are likely to find many uses in inflammatory diseases and metabolic diseases. This paper provides support for an important mechanism by which they work in conditions of severe sepsis and hemodynamic compromise.

      Thank you for these comments.

    1. Author response:

      Reviewer #1 (Public Review):

      This paper proposes a novel framework for explaining patterns of generalization of force field learning to novel limb configurations. The paper considers three potential coordinate systems: cartesian, joint-based, and object-based. The authors propose a model in which the forces predicted under these different coordinate frames are combined according to the expected variability of produced forces. The authors show, across a range of changes in arm configurations, that the generalization of a specific force field is quite well accounted for by the model.

      The paper is well-written and the experimental data are very clear. The patterns of generalization exhibited by participants - the key aspect of the behavior that the model seeks to explain - are clear and consistent across participants. The paper clearly illustrates the importance of considering multiple coordinate frames for generalization, building on previous work by Berniker and colleagues (JNeurophys, 2014). The specific model proposed in this paper is parsimonious, but there remain a number of questions about its conceptual premises and the extent to which its predictions improve upon alternative models.

      A major concern is with the model's premise. It is loosely inspired by cue integration theory but is really proposed in a fairly ad hoc manner, and not really concretely founded on firm underlying principles. It's by no means clear that the logic from cue integration can be extrapolated to the case of combining different possible patterns of generalization. I think there may in fact be a fundamental problem in treating this control problem as a cue-integration problem. In classic cue integration theory, the various cues are assumed to be independent observations of a single underlying variable. In this generalization setting, however, the different generalization patterns are NOT independent; if one is true, then the others must inevitably not be. For this reason, I don't believe that the proposed model can really be thought of as a normative or rational model (hence why I describe it as 'ad hoc'). That's not to say it may not ultimately be correct, but I think the conceptual justification for the model needs to be laid out much more clearly, rather than simply by alluding to cue-integration theory and using terms like 'reliability' throughout.

      We thank the reviewer for bringing up this point. We see and treat this problem of finding the combination weights not as a cue integration problem but as an inverse optimal control problem. In this case, there can be several solutions to the same problem, i.e., what forces are expected in untrained areas, which can co-exist and give the motor system the option to switch or combine them. This is similar to other inverse optimal control problems, e.g. combining feedforward optimal control models to explain simple reaching. However, compared to these problems, which fit the weights between different models, we proposed an explanation for the underlying principle that sets these weights for the dynamics representation problem. We found that basing the combination on each motor plan's reliability can best explain the results. In this case, we refer to ‘reliability’ as execution reliability and not sensory reliability, which is common in cue integration theory. We have added further details explaining this in the manuscript.

      “We hypothesize that this inconsistency in results can be explained using a framework inspired by an inverse optimal control framework. In this framework the motor system can switch or combine between different solutions. That is, the motor system assigns different weights to each solution and calculates a weighted sum of these solutions. Usually, to support such a framework, previous studies found the weights by fitting the weighed sum solution to behavioral data (Berret, Chiovetto et al. 2011). While we treat the problem in the same manner, we propose the Reliable Dynamics Representation (Re-Dyn) mechanism that determines the weights instead of fitting them. According to our framework, the weights are calculated by considering the reliability of each representation during dynamic generalization. That is, the motor system prefers certain representations if the execution of forces based on this representation is more robust to distortion arising from neural noise. In this process, the motor system estimates the difference between the desired generalized forces and generated generalized forces while taking into consideration noise added to the state variables that equivalently define the forces.”

      A more rational model might be based on Bayesian decision theory. Under such a model, the motor system would select motor commands that minimize some expected loss, averaging over the various possible underlying 'true' coordinate systems in which to generalize. It's not entirely clear without developing the theory a bit exactly how the proposed noise-based theory might deviate from such a Bayesian model. But the paper should more clearly explain the principles/assumptions of the proposed noise-based model and should emphasize how the model parallels (or deviates from) Bayesian-decision-theory-type models.

      As we understand the reviewer's suggestion, the idea is to estimate the weight of each coordinate system based on minimizing a loss function that considers the cost of each weight multiplied by a posterior probability that represents the uncertainty in this weight value. While this is an interesting idea, we believe that in the current problem, there are no ‘true’ weight values. That is, the motor system can use any combination of weights which will be true due to the ambiguous nature of the environment. Since the force field was presented in one area of the entire workspace, there is no observation that will allow us to update prior beliefs regarding the force nature of the environment. In such a case, the prior beliefs might play a role in the loss function, but in our opinion, there is no clear rationale for choosing unequal priors except guessing or fitting prior probabilities, which will resemble any other previous models that used fitting rather than predictions.

      Another significant weakness is that it's not clear how closely the weighting of the different coordinate frames needs to match the model predictions in order to recover the observed generalization patterns. Given that the weighting for a given movement direction is over- parametrized (i.e. there are 3 variable weights (allowing for decay) predicting a single observed force level, it seems that a broad range of models could generate a reasonable prediction. It would be helpful to compare the predictions using the weighting suggested by the model with the predictions using alternative weightings, e.g. a uniform weighting, or the weighting for a different posture. In fact, Fig. 7 shows that uniform weighting accounts for the data just as well as the noise-based model in which the weighting varies substantially across directions. A more comprehensive analysis comparing the proposed noise-based weightings to alternative weightings would be helpful to more convincingly argue for the specificity of the noise-based predictions being necessary. The analysis in the appendix was not that clearly described, but seemed to compare various potential fitted mixtures of coordinate frames, but did not compare these to the noise-based model predictions.

      We agree with the reviewer that fitted global weights, that is, an optimal weighted average of the three coordinate systems should outperform most of the models that are based on prediction instead of fitting the data. As we showed in Figure 7 of the submitted version of the manuscript, we used the optimal fitted model to show that our noise-based model is indeed not optimal but can predict the behavioral results and not fall too short of a fitted model. When trying to fit a model across all the reported experiments, we indeed found a set of values that gives equal weights for the joints and object coordinate systems (0.27 for both), and a lower value for the Cartesian coordinate system (0.12). Considering these values, we indeed see how the reviewer can suggest a model that is based on equal weights across all coordinate systems. While this model will not perform as well as the fitted model, it can still generate satisfactory results.

      To better understand if a model based on global weights can explain the combination between coordinate systems, we perform an additional experiment. In this experiment, a model that is based on global fitted weights can only predict one out of two possible generalization patterns while models that are based on individual direction-predicted weights can predict a variety of generalization patterns. We show that global weights, although fitted to the data, cannot explain participants' behavior. We report these new results in Appendix 2.

      “To better understand if a model based on global weights can explain the combination between coordinate systems, we perform an additional experiment. We used the idea of experiment 3 in which participants generalize learned dynamics using a tool. That is, the arm posture does not change between the training and test areas. In such a case, the Cartesian and joint coordinate systems do not predict a shift in generalized force pattern while the object coordinate system predicts a shift that depends on the orientation of the tool. In this additional experiment, we set a test workspace in which the orientation of the tool is 90° (Appendix 2- figure 1A). In this case, for the test workspace, the force compensation pattern of the object based coordinate system is in anti-phase with the Cartesian/joint generalization pattern. Any globally fitted weights (including equal weights) can produce either a non-shifted or 90° shifted force compensation pattern (Appendix 2- figure 1B). Participants in this experiment (n=7) showed similar MPE reduction as in all previous experiments when adapting to the trigonometric scaled force field (Appendix 2- figure 1C). When examining the generalized force compensation patterns, we observed a shift of the pattern in the test workspace of 14.6° (Appendix 2- figure 1D). This cannot be explained by the individual coordinate system force compensation patterns or any combination of them (which will always predict either a 0° or 90° shift, Appendix 2- figure 1E). However, calculating the prediction of the Re-Dyn model we found a predicted force compensation pattern with a shift of 6.4° (Appendix 2- figure 1F). The intermediate shift in the force compensation pattern suggests that any global based weights cannot explain the results.”

      With regard to the suggestion that weighting is changed according to arm posture, two of our results lower the possibility that posture governs the weights:

      (1) In experiment 3, we tested generalization while keeping the same arm posture between the training and test workspaces, and we observed different force compensation profiles across the movement directions. If arm posture in the test workspaces affected the weights, we would expect identical weights for both test workspaces. However, any set of weights that can explain the results observed for workspace 1 will fail to explain the results observed in workspace 2. To better understand this point we calculated the global weights for each test workspace for this experiment and we observed an increase in the weight for the object coordinates system (0.41 vs. 0.5) and a reduction in the weights for the Cartesian and joint coordinates systems (0.29 vs. 0.24). This suggests that the arm posture cannot explain the generalization pattern in this case.

      (2) In experiments 2 and 3, we used the same arm posture in the training workspace and either changed the arm posture (experiment 2) or did not change the arm posture (experiment 3) in the test workspaces. While the arm posture for the training workspace was the same, the force generalization patterns were different between the two experiments, suggesting that the arm posture during the training phase (adaptation) does not set the generalization weights.

      Overall, this shows that it is not specifically the arm posture in either the test or the training workspaces that set the weights. Of course, all coordinate models, including our noise model, will consider posture in the determination of the weights.

      Reviewer #2 (Public Review):

      Leib & Franklin assessed how the adaptation of intersegmental dynamics of the arm generalizes to changes in different factors: areas of extrinsic space, limb configurations, and 'object-based' coordinates. Participants reached in many different directions around 360{degree sign}, adapting to velocity-dependent curl fields that varied depending on the reach angle. This learning was measured via the pattern of forces expressed in upon the channel wall of "error clamps" that were randomly sampled from each of these different directions. The authors employed a clever method to predict how this pattern of forces should change if the set of targets was moved around the workspace. Some sets of locations resulted in a large change in joint angles or object-based coordinates, but Cartesian coordinates were always the same. Across three separate experiments, the observed shifts in the generalized force pattern never corresponded to a change that was made relative to any one reference frame. Instead, the authors found that the observed pattern of forces could be explained by a weighted combination of the change in Cartesian, joint, and object-based coordinates across test and training contexts.

      In general, I believe the authors make a good argument for this specific mixed weighting of different contexts. I have a few questions that I hope are easily addressed.

      Movements show different biases relative to the reach direction. Although very similar across people, this function of biases shifts when the arm is moved around the workspace (Ghilardi, Gordon, and Ghez, 1995). The origin of these biases is thought to arise from several factors that would change across the different test and training workspaces employed here (Vindras & Viviani, 2005). My concern is that the baseline biases in these different contexts are different and that rather the observed change in the force pattern across contexts isn't a function of generalization, but a change in underlying biases. Baseline force channel measurements were taken in the different workspace locations and conditions, so these could be used to show whether such biases are meaningfully affecting the results.

      We agree with the reviewer and we followed their suggested analysis. In the following figure (Author response image 1) we plotted the baseline force compensation profiles in each workspace for each of the four experiments. As can be seen in this figure, the baseline force compensation is very close to zero and differs significantly from the force compensation profiles after adaptation to the scaled force field.

      Author response image 1.

      Baseline force compensation levels for experiments 1-4. For each experiment, we plotted the force compensation for the training, test 1, and test 2 workspaces.

      Experiment 3, Test 1 has data that seems the worst fit with the overall story. I thought this might be an issue, but this is also the test set for a potentially awkwardly long arm. My understanding of the object-based coordinate system is that it's primarily a function of the wrist angle, or perceived angle, so I am a little confused why the length of this stick is also different across the conditions instead of just a different angle. Could the length be why this data looks a little odd?

      Usually, force generalization is tested by physically moving the hand in unexplored areas. In experiment 3 we tested generalization using a tool which, as far as we know, was not tested in the past in a similar way to the present experiment. Indeed, the results look odd compared to the results of the other experiments, which were based on the ‘classic’ generalization idea. While we have some ideas regarding possible reasons for the observed behavior, it is out of the scope of the current work and still needs further examination.

      Based on the reviewer’s comment, we improved the explanation in the introduction regarding the idea behind the object based coordinate system

      “we could represent the forces as belonging to the hand or a hand-held object using the orientation vector connecting the shoulder and the object or hand in space (Berniker, Franklin et al. 2014).” The reviewer is right in their observation that the predictions of the object-based reference frame will look the same if we change the length of the tool. The object-based generalized forces, specifically the shift in the force pattern, depend only on the object's orientation but not its length (equation 4).

      The manuscript is written and organized in a way that focuses heavily on the noise element of the model. Other than it being reasonable to add noise to a model, it's not clear to me that the noise is adding anything specific. It seems like the model makes predictions based on how many specific components have been rotated in the different test conditions. I fear I'm just being dense, but it would be helpful to clarify whether the noise itself (and inverse variance estimation) are critical to why the model weights each reference frame how it does or whether this is just a method for scaling the weight by how much the joints or whatever have changed. It seems clear that this noise model is better than weighting by energy and smoothness.

      We have now included further details of the noise model and added to Figure 1 to highlight how noise can affect the predicted weights. In short, we agree with the reviewer there are multiple ways to add noise to the generalized force patterns. We choose a simple option in which we simulate possible distortions to the state variables that set the direction of movement. Once we calculated the variance of the force profile due to this distortion, one possible way is to combine them using an inverse variance estimator. Note that it has been shown that an inverse variance estimator is an ideal way to combine signals (e.g., Shahar, D.J. (2017) https://doi.org/10.4236/ojs.2017.72017). However, as we suggest, we do not claim or try to provide evidence for this specific way of calculating the weights. Instead, we suggest that giving greater weight to the less variable force representation can predict both the current experimental results as well as past results.

      Are there any force profiles for individual directions that are predicted to change shape substantially across some of these assorted changes in training and test locations (rather than merely being scaled)? If so, this might provide another test of the hypotheses.

      In experiments 1-3, in which there is a large shift of the force compensation curve, we found directions in which the generalized force was flipped in direction. That is, clockwise force profiles in the training workspace could change into counter-clockwise profiles in the test workspace. For example, in experiment 2, for movement at 157.5° we can see that the force profile was clockwise for the training workspace (with a force compensation value of 0.43) and movement at the same direction was counterclockwise for test workspace 1 (force compensation equal to -0.48). Importantly, we found that the noise based model could predict this change.

      Author response image 2.

      Results of experiment 2. Force compensation profiles for the training workspace (grey solid line) and test workspace 1 (dark blue solid line). Examining the force nature for the 157.5° direction, we found a change in the applied force by the participants (change from clockwise to counterclockwise forces). This was supported by a change in force compensation value (0.43 vs. -0.48). The noise based model can predict this change as shown by the predicted force compensation profile (green dashed line).

      I don't believe the decay factor that was used to scale the test functions was specified in the text, although I may have just missed this. It would be a good idea to state what this factor is where relevant in the text.

      We added an equation describing the decay factor (new equation 7 in the Methods section) according to this suggestion and Reviewer 1 comment on the same issue.

      Reviewer #3 (Public Review):

      The author proposed the minimum variance principle in the memory representation in addition to two alternative theories of the minimum energy and the maximum smoothness. The strength of this paper is the matching between the prediction data computed from the explicit equation and the behavioral data taken in different conditions. The idea of the weighting of multiple coordinate systems is novel and is also able to reconcile a debate in previous literature.

      The weakness is that although each model is based on an optimization principle, but the derivation process is not written in the method section. The authors did not write about how they can derive these weighting factors from these computational principles. Thus, it is not clear whether these weighting factors are relevant to these theories or just hacking methods. Suppose the author argues that this is the result of the minimum variance principle. In that case, the authors should show a process of how to derive these weighting factors as a result of the optimization process to minimize these cost functions.

      The reviewer brings up a very important point regarding the model. As shown below, it is not trivial to derive these weights using an analytical optimization process. We demonstrate one issue with this optimization process.

      The force representation can be written as (similar to equation 6):

      We formulated the problem as minimizing the variance of the force according to the weights w:

      In this case, the variance of the force is the variance-covariance matrix which can be minimized by minimizing the matrix trace:

      We will start by calculating the variance of the force representation in joints coordinate system:

      Here, the force variance is a result of a complex function which include the joints angle as a random variable. Expending the last expression, although very complex, is still possible. In the resulted expression, some of the resulted terms include calculating the variance of nested trigonometric functions of the random joint angle variance, for example:

      In the vast majority of these cases, analytical solutions do not exist. Similar issues can also raise for calculating the variance of complex multiplication of trigonometric functions such as in the case of multiplication of Jacobians (and inverse Jacobians)

      To overcome this problem, we turned to numerical solutions which simulate the variance due to the different state variables.

      In addition, I am concerned that the proposed model can cancel the property of the coordinate system by the predicted variance, and it can work for any coordinate system, even one that is not used in the human brain. When the applied force is given in Cartesian coordinates, the directionality in the generalization ability of the memory of the force field is characterized by the kinematic relationship (Jacobian) between the Cartesian coordinate and the coordinate of interest (Cartesian, joint, and object) as shown in Equation 3. At the same time, when a displacement (epsilon) is considered in a space and a corresponding displacement is linked with kinematic equations (e.g., joint displacement and hand displacement in 2 joint arms in this paper), the generated variances in different coordinate systems are linked with the kinematic equation each other (Jacobian). Thus, how a small noise in a certain coordinate system generates the hand force noise (sigma_x, sigma_j, sigma_o) is also characterized by the kinematics (Jacobian). Thus, when the predicted forcefield (F_c, F_j, F_o) was divided by the variance (F_c/sigma_c^2, F_j/sigma_j^2, F_o/sigma_o^2, ), the directionality of the generalization force which is characterized by the Jacobian is canceled by the directionality of the sigmas which is characterized by the Jacobian. Thus, as it has been read out from Fig*D and E top, the weight in E-top of each coordinate system is always the inverse of the shift of force from the test force by which the directionality of the generalization is always canceled.

      Once this directionality is canceled, no matter how to compute the weighted sum, it can replicate the memorized force. Thus, this model always works to replicate the test force no matter which coordinate system is assumed. Thus, I am suspicious of the falsifiability of this computational model. This model is always true no matter which coordinate system is assumed. Even though they use, for instance, the robot coordinate system, which is directly linked to the participant's hand with the kinematic equation (Jacobian), they can replicate this result. But in this case, the model would be nonsense. The falsifiability of this model was not explicitly written.

      As explained above, calculating the variability of the generalized forces given the random nature of the state variable is a complex function that is not summarized using a Jacobian. Importantly the model is unable to reproduce or replicate the test force arbitrarily. In fact, we have already shown this (see Appendix 1- figure 1), where when we only attempt to explain the data with either a single coordinate system (or a combination of two coordinate systems) we are completely unable to replicate the test data despite using this model. For example, in experiment 4, when we don’t use the joint based coordinate system, the model predicts zero shift of the force compensation pattern while the behavioral data show a shift due to the contribution of the joint coordinate system. Any arbitrary model (similar to the random model we tested, please see the response to Reviewer 1) would be completely unable to recreate the test data. Our model instead makes very specific predictions about the weighting between the three coordinate systems and therefore completely specified force predictions for every possible test posture. We added this point to the Discussion

      “The results we present here support the idea that the motor system can use multiple representations during adaptation to novel dynamics. Specifically, we suggested that we combine three types of coordinate systems, where each is independent of the other (see Appendix 1- figure 1 for comparison with other combinations). Other combinations that include a single or two coordinate system can explain some of the results but not all of them, suggesting that force representation relies on all three with specific weights that change between generalization scenarios.”

    1. eLife Assessment

      The specific questions taken up for study by the authors-in mice of HDAC and Polycomb function in the context of vascular endothelial cell (EC) gene expression relevant to the blood-brain barrier, (BBB)-are potentially useful in the context of vascular diversification in understanding and remedying situations where BBB function is compromised. The strength of the evidence presented is incomplete, and to elaborate, it is known that the culturing of endothelial cells can have a strong effect on gene expression.

    2. Reviewer #1 (Public review):

      The blood-brain barrier separates neural tissue from blood-borne factors and is important for maintaining central nervous system health and function. Endothelial cells are the site of the barrier. These cells exhibit unique features relative to peripheral endothelium and a unique pattern of gene expression. There remains much to be learned about how the transcriptome of brain endothelial cells is established in development and maintained throughout life.

      The manuscript by Sadanandan, Thomas et al. investigates this question by examining transcriptional and epigenetic changes in brain endothelial cells in embryonic and adult mice. Changes in transcript levels and histone marks for various BBB-relevant transcripts, including Cldn5, Mfsd2a and Zic3 were observed between E13.5 and adult mice. To perform these experiments, endothelial cells were isolated from E13.5 and adult mice, then cultured in vitro, then sequenced. This approach is problematic. It is well-established that brain endothelial cells rapidly lose their organotypic features in culture (https://elifesciences.org/articles/51276). Indeed, one of the primary genes investigated in this study, Cldn1, exhibits very low expression at the transcript level in vivo, but is strongly upregulated in cultured ECs.

      (https://elifesciences.org/articles/36187 ; https://markfsabbagh.shinyapps.io/vectrdb/)

      This undermines the conclusions of the study. While this manuscript is framed as investigating how epigenetic processes shape BBB formation and maintenance, they may be looking at how brain endothelial cells lose their identity in culture.

      An additional concern is that for many experiments, siRNA knockdowns are performed without validation of the efficacy of knockdown.

      Some experiments in the paper are promising, however. For example, the knockout of HDAC2 in endothelial cells resulting in BBB leakage was striking. Investigating the mechanisms underlying this phenotype in vivo could yield important insights.

    3. Reviewer #2 (Public review):

      Sadanandan et al describe their studies in mice of HDAC and Polycomb function in the context of vascular endothelial cell (EC) gene expression relevant to the blood-brain barrier, (BBB). This topic is of interest because the BBB gene expression program represents an interesting and important vascular diversification mechanism. From an applied point of view, modifying this program could have therapeutic benefits in situations where BBB function is compromised.

      The study involves comparing the transcriptomes of cultured CNS ECs at E13 and adult stages and then perturbing EC gene expression pharmacologically in cell culture (with HDAC and Polycomb inhibitors) and genetically in vivo by EC-specific conditional KO of HDAC2 and Polycomb component EZH2.

      This reviewer has several critiques of the study.

      First, based on published data, the effect of culturing CNS ECs is likely to have profound effects on their differentiation, especially as related to their CNS-specific phenotypes. Related to this, the authors do not state how long the cells were cultured.

      Second, the use of qPCR assays for quantifying ChIP and transcript levels is inferior to ChIPseq and RNAseq. Whole genome methods, such as ChIPseq, permit a level of quality assessment that is not possible with qPCR methods. The authors should use whole genome NextGen sequencing approaches, show the alignment of reads to the genome from replicate experiments, and quantitatively analyze the technical quality of the data.

      Third, the observation that pharmacologic inhibitor experiments and conditional KO experiments targeting HDAC2 and the Polycomb complex perturb EC gene expression or BBB integrity, respectively, is not particularly surprising as these proteins have broad roles in epigenetic regulation is a wide variety of cell types.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewers' 1 and 2 concern on endothelial cells (ECs) transcription changes on culture.

      We have now addressed this concern by FACS-sorting ECs (Fig. 7A revised) and comparing our data with previous studies (S. Fig. 1C). Our major claim was the epigenetic repression of EC genes, including those involved in BBB formation and angiogenesis, during later development. To further strengthen our claim, we knocked out HDAC2 during the later stages of development to prevent this epigenetic repression. As shown in the first version of the manuscript, this knockout results in enhanced angiogenesis and a leaky BBB.

      In the revised version, we have FACS-sorted CD31+ ECs from E-17.5 WT and HDAC2 ECKO mice, followed by ultra-low mRNA sequencing. Confirming the epigenetic repression via HDAC2, the HDAC2-deleted ECs showed high expression of BBB genes such as ZO-1, OCLN, MFSD2A, and GLUT1, and activation of the Wnt signaling pathway as indicated by the upregulation of Wnt target genes such as Axin2 and APCDD1. Additionally, to validate the increased angiogenesis phenotype observed, angiogenesis-related genes such as VEGFA, FLT1, and ENG were upregulated.

      Since the transcriptomics of brain ECs during developmental stages has already been published in Hupe et al., 2017, we did not attempt to replicate this. However, we compared our differentially regulated genes from E-13.5 versus adult stages with the transcriptome changes during development reported by Hupe et al., 2017. We found a significant overlap in important genes such as CLDN5, LEF1, ZIC3, and MFSD2A (S. Fig. 1C).

      As pointed out by the reviewer, culture-induced changes cannot be ruled out from our data. We have included a statement in the manuscript: "Even though we used similar culture conditions for both embryonic and adult cortical ECs, culture-induced changes have been reported previously and should be considered as a varying factor when interpreting our results."

      Reviewer-1 Comment 2- An additional concern is that for many experiments, siRNA knockdowns are performed without validation of the efficacy of the knockdown.

      We have now provided the protein expression data for HDAC2 and EZH2 in the revised manuscript Supplementary Figure- 2A.

      Reviewer-1 Comment 3- Some experiments in the paper are promising, however. For example, the knockout of HDAC2 in endothelial cells resulting in BBB leakage was striking. Investigating the mechanisms underlying this phenotype in vivo could yield important insights.

      We appreciate your positive comment. The in vivo HDAC2 knockout experiment serves as a validation of our in vitro findings, demonstrating that the epigenetic regulator HDAC2 can control the expression of endothelial cell (EC) genes involved in angiogenesis, blood-brain barrier (BBB) formation, and maturation. To investigate the mechanism behind the underlying phenotype of HDAC2 ECKO, we performed mRNA sequencing on HDAC2 ECKO E-17.5 ECs and discovered that vascular and BBB maturation is hindered by preventing the epigenetic repression of BBB, angiogenesis, and Wnt target genes (Fig. 7A). As a result, the HDAC2 ECKO phenotype showed increased angiogenesis and BBB leakage. This strengthens our hypothesis that HDAC2-mediated epigenetic repression is critical for BBB and vascular maturation.

      Reviewer 2 Comment-2 The use of qPCR assays for quantifying ChIP and transcript levels is inferior to ChIPseq and RNAseq. Whole genome methods, such as ChIPseq, permit a level of quality assessment that is not possible with qPCR methods. The authors should use whole genome NextGen sequencing approaches, show the alignment of reads to the genome from replicate experiments, and quantitatively analyze the technical quality of the data.

      We appreciate the reviewer's comment. While whole-genome methods like ChIP-seq offer comprehensive and high-throughput data, ChIP-qPCR assays remain valuable tools due to their sensitivity, specificity, and suitability for validation and targeted analysis. Our ChIP analysis identify the crucial roles of HDAC2 and PRC2, two epigenetic enzymes, in CNS endothelial cells (ECs). In vivo data presented in Figure 4 further support this finding through observed phenotypic differences. We concur that a comprehensive analysis of HDAC2 and PRC2 target genes in ECs is essential. A comprehensive analysis of HDAC2 and PRC2 target genes in ECs is currently underway and will be the subject of a separate publication due to the extensive nature of the data.

      Reviewer 2 Comment-3 Third, the observation that pharmacologic inhibitor experiments and conditional KO experiments targeting HDAC2 and the Polycomb complex perturb EC gene expression or BBB integrity, respectively, is not particularly surprising as these proteins have broad roles in epigenetic regulation in a wide variety of cell types.

      We appreciate the comments from the reviewers. Our results provide valuable insights into the specific epigenetic mechanisms that regulate BBB genes It is important to recognize that different cell types possess stage-specific distinct epigenetic landscapes and regulatory mechanisms. Rather than having broad roles across diverse cell types, it is more likely that HDAC2 (eventhough there are several other class and subtypes of HDACs) and the Polycomb complex exhibit specific functions within the context of EC gene expression or BBB integrity.

      Moreover, the significance of our findings is enhanced by the fact that epigenetic modifications are often reversible with the assistance of epigenetic regulators. This makes them promising targets for BBB modulation. Targeting epigenetic regulators can have a widespread impact, as these mechanisms regulate numerous genes that collectively have the potential to promote the vascular repair.

      A practical advantage is that FDA-approved HDAC2 inhibitors, as well as PRC2 inhibitors (such as those mentioned in clinical trials NCT03211988 and NCT02601950, are already available. This facilitates the repurposing of drugs and expedites their potential for clinical translation.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:

      (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);

      (2) approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth;

      (3) single-trial population responses (i.e., the joint response across all sampled single unitsin an animal) encode sound source azimuth "effectively" (as stated in title) in that localization decoding error matches average mouse discrimination thresholds;

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus (as stated in Abstract);

      (5) evidence of noise correlation between pairs of neurons exists;

      and 6) noise correlations between responses of neurons help reduce population decoding error.

      While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:

      - Important research question to all researchers interested in sensory coding in the nervous system.

      - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellularrecording using high-density probes. Large neuronal data sets.

      - Confirmation of imaging results (lower temporal resolution) with more traditionalmicroelectrode results (higher temporal resolution).

      - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability - The authors' data clearlyshows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive tosound source azimuth - The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not performed on electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This may be true, but if single-unit responses were moderately or strongly sensitive to azimuth, one would expect them to pass the test even with relatively low statistical power. At best, if their statistical test missed some azimuthsensitive units, they were likely only weakly sensitive to azimuth. The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. This feels a bit like fishing for a lower p-value. The Kruskal-Wallis test should have been left as the only analysis. Moreover, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      The determination of what is a physiologically relevant “moderate or strong azimuth sensitivity” is not trivial, particularly when comparing tuning across different relays of the auditory pathway like the CNIC, auditory cortex, or in our case DCIC, where physiologically relevant azimuth sensitivities might be different. This is likely the reason why azimuth sensitivity has been defined in diverse ways across the bibliography (see Groh, Kelly & Underhill, 2003 for an early discussion of this issue). These diverse approaches include reaching a certain percentage of maximal response modulation, like used by Day et al. (2012, 2015, 2016) in CNIC, and ANOVA tests, like used by Panniello et al. (2018) and Groh, Kelly & Underhill (2003) in auditory cortex and IC respectively. Moreover, the influence of response variability and biases in response distribution estimation due to limited sampling has not been usually accounted for in the determination of azimuth sensitivity.

      As Reviewer #1 points out, in our study we used an appropriate ANOVA test (KruskalWallis) as a starting point to study response sensitivity to stimulus azimuth at DCIC. Please note that the alpha = 0.05 used for this test is not based on experimental evidence about physiologically relevant azimuth sensitivity but instead is an arbitrary p-value threshold. Using this test on the electrophysiological data, we found that ~ 21% of the simultaneously recorded single units reached significance (n = 4 mice). Nevertheless these percentages, in our small sample size (n = 4) were not significantly different from our false positive detection rate (p = 0.0625, Mann-Whitney, See Author response image 1 below).  In consequence, for both our imaging (Fig. 3C) and electrophysiological data, we could not ascertain if the percentage of neurons reaching significance in these ANOVA tests were indeed meaningfully sensitive to azimuth or this was due to chance. 

      Author response image 1.

      Percentage of the neuropixels recorded DCIC single units across mice that showed significant median response tuning, compared to false positive detection rate (α = 0.05, chance level).

      We reasoned that the observed markedly variable responses from DCIC units, which frequently failed to respond in many trials (Fig. 3D, 4A), in combination with the limited number of trial repetitions we could collect, results in under-sampled response distribution estimations. This under-sampling can bias the determination of stochastic dominance across azimuth response samples in Kruskal-Wallis tests. We would like to highlight that we decided not to implement resampling strategies to artificially increase the azimuth response sample sizes with “virtual trials”, in order to avoid “fishing for a smaller p-value”, when our collected samples might not accurately reflect the actual response population variability.

      As an alternative to hypothesis testing based on ranking and determining stochastic dominance of one or more azimuth response samples (Kruskal-Wallis test), we evaluated the overall statistical dependency to stimulus azimuth of the collected responses.  To do this we implement the Chi-square test by binning neuronal responses into categories. Binning responses into categories can reduce the influence of response variability to some extent, which constitutes an advantage of the Chi-square approach, but we note the important consideration that these response categories are arbitrary.

      Altogether, we acknowledge that our Chi-square approach to define azimuth sensitivity is not free of limitations and despite enabling the interrogation of azimuth sensitivity at DCIC, its interpretability might not extend to other brain regions like CNIC or auditory cortex. Nevertheless we hope the aforementioned arguments justify why the Kruskal-Wallis test simply could not “have been left as the only analysis”.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds - If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder modeltoo few trials used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "topranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Therefore, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      The azimuth discrimination error reported by Lauer et al. (2011) comes from engaged and highly trained mice, which is a very different context to our experimental setting with untrained mice passively listening to stimuli from 13 random azimuths. Therefore we did not perform analyses or interpretations of our results based on the behavioral task from Lauer et al. (2011) and only made the qualitative observation that the errors match for discussion.

      We believe it is further important to clarify that Lauer et al. (2011) tested the ability of mice to discriminate between a positively conditioned stimulus (reference speaker at 0º center azimuth associated to a liquid reward) and a negatively conditioned stimulus (coming from one of five comparison speakers positioned at 20º, 30º, 50º, 70 and 90º azimuth, associated to an electrified lickport) in a conditioned avoidance task. In this task, mice are not precisely “discriminating between left or right sources centered around zero degrees”, making further analyses to compare the experimental design of Lauer et al (2011) and ours even more challenging for valid interpretation.

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleusof the inferior colliculus - It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      Since sound source azimuth is reported to be encoded by population activity patterns at CNIC (Day and Delgutte, 2013), we refer to a population activity pattern code as the “similar format” in which this information is encoded at DCIC. Please note that this is a qualitative comparison and we do not claim this is the “same format”, due to the differences the reviewer precisely describes in the encoding of azimuth at CNIC where a much larger majority of neurons show stronger azimuth sensitivity and response reliability with respect to our observations at DCIC. By this qualitative similarity of encoding format we specifically mean the similar occurrence of activity patterns from azimuth sensitive subpopulations of neurons in both CNIC and DCIC, which carry sufficient information about the stimulus azimuth for a sufficiently accurate prediction with regard to the behavioral discrimination ability.

      (5) Evidence of noise correlation between pairs of neurons exists - The authors' data andanalyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decodingerror - The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      We sincerely thank the reviewer for this careful and detailed consideration of our analysis approach. Following the reviewer’s constructive suggestion, we justified the decoder choice in the results section at the last paragraph of page 18:

      “To characterize how the observed positive noise correlations could affect the representation of stimulus azimuth by DCIC top ranked unit population responses, we compared the decoding performance obtained by classifying the single-trial response patterns from top ranked units in the modeled decorrelated datasets versus the acquired data (with noise correlations). With the intention to characterize this with a conservative approach that would be less likely to find a contribution of noise correlations as it assumes response independence, we relied on the naive Bayes classifier for decoding throughout the study. Using this classifier, we observed that the modeled decorrelated datasets produced stimulus azimuth prediction error distributions that were significantly shifted towards higher decoding errors (Fig. 5B, C) and, in our imaging datasets, were not significantly different from chance level (Fig. 5B). Altogether, these results suggest that the detected noise correlations in our simultaneously acquired datasets can help reduce the error of the IC population code for sound azimuth.”

      Minor weakness:

      - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise. The description of the noise level and signal + noise level in the Methods should be made clearer. Mice hear from about 2.5 - 80 kHz, so it is important to know the noise level within this band as well as specifically within the band overlapping with the signal.

      We agree with the reviewer that this information is useful. In our study, the background R.M.S. SPL during imaging across the mouse hearing range (2.5-80kHz) was 44.53 dB and for neuropixels recordings 34.68 dB. We have added this information to the methods section of the revised manuscript.

      Reviewer #2 (Public Review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting, because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will be undoubtedly useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low the number of repetitions, but I yield to them to explain.

      We assumed that the levels of heating by excitation light measured at the neocortex in Prevedel et al. (2016), were representative for DCIC also. Nevertheless, we recognize this approximation might not be very accurate, due to the differences in tissue architecture and vascularization from these two brain areas, just to name a few factors. The limiting factor preventing us from collecting more trials in our imaging sessions was that we observed signs of discomfort or slight distress in some mice after ~30 min of imaging in our custom setup, which we established as a humane end point to prevent distress. In consequence imaging sessions were kept to 25 min in duration, limiting the number of trials collected. However we cannot rule out that with more extensive habituation prior to experiments the imaging sessions could be prolonged without these signs of discomfort or if indeed influence from our custom setup like potential heating of the brain by illumination light might be the causing factor of the observed distress. Nevertheless, we note that previous work has shown that ~200mW average power is a safe regime for imaging in the cortex by keeping brain heating minimal (Prevedel et al., 2016), without producing the lasting damages observed by immunohistochemisty against apoptosis markers above 250mW (Podgorski and Ranganathan 2016, https://doi.org/10.1152/jn.00275.2016).

      Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      One of the main reasons we chose the naïve Bayesian classifier is indeed because it assumes that the responses of the simultaneously recorded neurons are independent and therefore it does not assume a contribution of noise correlations to the estimation of the posterior probability of each azimuth. This model would represent the null hypothesis that noise correlations do not contribute to the encoding of stimulus azimuth, which would be verified by an equal decoding outcome from correlated or decorrelated datasets. Since we observed that this is not the case, the model supports the alternative hypothesis that noise correlations do indeed influence stimulus azimuth encoding. We wanted to test these hypotheses with the most conservative approach possible that would be least likely to find a contribution of noise correlations. Other relevant reasons that justify our choice of the naive Bayesian classifier are its robustness against the limited numbers of trials we could collect in comparison to other more “data hungry” classifiers like SVM, KNN, or artificial neuronal nets. We did perform preliminary tests with alternative classifiers but the obtained decoding errors were similar when decoding the whole population activity (Author response image 2A). Dimensionality reduction following the approach described in the manuscript showed a tendency towards smaller decoding errors observed with an alternative classifier like KNN, but these errors were still larger than the ones observed with the naive Bayesian classifier (median error 45º). Nevertheless, we also observe a similar tendency for slightly larger decoding errors in the absence of noise correlations (decorrelated, Author response image 2B). Sentences detailing the logic of classifier choice are now included in the results section at page 10 and at the last paragraph of page 18 (see responses to Reviewer 1).

      Author response image 2.

      A) Cumulative distribution plots of the absolute cross-validated single-trial prediction errors obtained using different classifiers (blue; KNN: K-nearest neighbors; SVM: support vector machine ensemble) and chance level distribution (gray) on the complete populations of imaged units. Cumulative distribution plots of the absolute cross-validated singletrial prediction errors obtained using a Bayes classifier (naive approximation for computation efficiency) to decode the single-trial response patterns from the 31 top ranked units in the simultaneously imaged datasets across mice (cyan), modeled decorrelated datasets (orange) and the chance level distribution associated with our stimulation paradigm (gray). Vertical dashed lines show the medians of cumulative distributions. K.S. w/Sidak: Kolmogorov-Smirnov with Sidak.

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging?

      Considering the amount of variability observed throughout the neuronal responses both in imaging and neuropixels datasets, it is easy to suspect that the information about stimulus azimuth carried in different amounts by individual DCIC neurons can be mixed up with information about other factors (Stringer et al., 2019). In an attempt to study the origin of these features that could confound stimulus azimuth decoding we explored their relation to face movement (Supplemental Figure 2), finding a correlation to snout movements, in line with previous work by Stringer et al. (2019).

      In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data?

      Our electrophysiology datasets were much smaller in size (number of simultaneously recorded neurons) compared to our volumetric calcium imaging datasets, resulting in a much smaller total number of top ranked units detected per dataset. This precluded the determination of a collapse of decoder performance due to overfitting beyond the range plotted in Fig 4G.

      How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      The diversity of the azimuth tuning curves recorded with calcium imaging (Fig. 3B) was qualitatively larger than the ones recorded with electrophysiology (Fig. 4B), potentially due to the larger sampling obtained with volumetric imaging. We did not perform a detailed comparison of the form and a more quantitative comparison of the diversity of these functions because the signals compared are quite different, as calcium indicator signal is subject to non linearities due to Ca2+ binding cooperativity and low pass filtering due to binding kinetics. We feared this could lead to misleading interpretations about the similarities or differences between the azimuth tuning functions in imaged and electrophysiology datasets. Our model uses statistical response dependency to stimulus azimuth, which does not rely on features from a descriptive statistic like mean response tuning. In this context, visualizing the trial-to-trial responses as a function of azimuth shows “features that are not obvious to the human eye in traditional visualizations” (Fig. 3D, left inset).

      Finally, the readership is encouraged to interpret certain statements by the authors in the current version conservatively. How the brain ultimately extracts spatial neuronal data for perception is anyone's guess, but it is important to remember that this study only shows that a naïve Bayesian classifier could decode this information, and it remains entirely unclear whether the brain does this as well. For example, the model is able to achieve a prediction error that corresponds to the psychophysical threshold in mice performing a discrimination task (~30 {degree sign}). Although this is an interesting coincidental observation, it does not mean that the two metrics are necessarily related. The authors correctly do not explicitly claim this, but the manner in which the prose flows may lead a non-expert into drawing that conclusion.

      To avoid misleading the non-expert readers, we have clarified in the manuscript that the observed correspondence between decoding error and psychophysical threshold is explicitly coincidental.

      Page 13, end of middle paragraph:

      “If we consider the median of the prediction error distribution as an overall measure of decoding performance, the single-trial response patterns from subsamples of at least the 7 top ranked units produced median decoding errors that coincidentally matched the reported azimuth discrimination ability of mice (Fig 4G, minimum audible angle = 31º) (Lauer et al., 2011).”

      Page 14, bottom paragraph:

      “Decoding analysis (Fig. 4F) of the population response patterns from azimuth dependent top ranked units simultaneously recorded with neuropixels probes showed that the 4 top ranked units are the smallest subsample necessary to produce a significant decoding performance that coincidentally matches the discrimination ability of mice (31° (Lauer et al., 2011)) (Fig. 5F, G).”

      We also added to the Discussion sentences clarifying that a relationship between these two variables remains to be determined and it also remains to be determined if the DCIC indeed performs a bayesian decoding computation for sound localization.

      Page 20, bottom:

      “… Concretely, we show that sound location coding does indeed occur at DCIC on the single trial basis, and that this follows a comparable mechanism to the characterized population code at CNIC (Day and Delgutte, 2013). However, it remains to be determined if indeed the DCIC network is physiologically capable of Bayesian decoding computations. Interestingly, the small number of DCIC top ranked units necessary to effectively decode stimulus azimuth suggests that sound azimuth information is redundantly distributed across DCIC top ranked units, which points out that mechanisms beyond coding efficiency could be relevant for this population code.

      While the decoding error observed from our DCIC datasets obtained in passively listening, untrained mice coincidentally matches the discrimination ability of highly trained, motivated mice (Lauer et al., 2011), a relationship between decoding error and psychophysical performance remains to be determined. Interestingly, a primary sensory representations should theoretically be even more precise than the behavioral performance as reported in the visual system (Stringer et al., 2021).”

      Moreover, the concept of redundancy (of spatial information carried by units throughout the DCIC) is difficult for me to disentangle. One interpretation of this formulation could be that there are non-overlapping populations of neurons distributed across the DCIC that each could predict azimuth independently of each other, which is unlikely what the authors meant. If the authors meant generally that multiple neurons in the DCIC carry sufficient spatial information, then a single neuron would have been able to predict sound source azimuth, which was not the case. I have the feeling that they actually mean "complimentary", but I leave it to the authors to clarify my confusion, should they wish.

      We observed that the response patterns from relatively small fractions of the azimuth sensitive DCIC units (4-7 top ranked units) are sufficient to generate an effective code for sound azimuth, while 32-40% of all simultaneously recorded DCIC units are azimuth sensitive. In light of this observation, we interpreted that the azimuth information carried by the population should be redundantly distributed across the complete subpopulation of azimuth sensitive DCIC units.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

      Reviewer #3 (Public Review):

      Summary:

      Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They used two complementary recording methods while mice passively listened to sounds at different locations: a large volume but slow sampling calcium-imaging method, and a smaller volume but temporally precise electrophysiology method. They found that neurons in the DCIC were variable in their activity, unreliably responding to sound presentation and responding during inter-sound intervals. Boffi and colleagues used a naïve Bayesian decoder to determine if the DCIC population encoded sound location on a single trial. The decoder failed to classify sound location better than chance when using the raw single-trial population response but performed significantly better than chance when using intermediate principal components of the population response. In line with this, when the most azimuth dependent neurons were used to decode azimuthal position, the decoder performed equivalently to the azimuthal localization abilities of mice. The top azimuthal units were not clustered in the DCIC, possessed a contralateral bias in response, and were correlated in their variability (e.g., positive noise correlations). Interestingly, when these noise correlations were perturbed by inter-trial shuffling decoding performance decreased. Although Boffi and colleagues display that azimuthal information can be extracted from DCIC responses, it remains unclear to what degree this information is used and what role noise correlations play in azimuthal encoding.

      Strengths:

      The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction with one another - especially when the data largely recapitulates the findings of the other recording methodology. In addition to the video of the head during the calcium imaging, this data set is extremely rich and will be of use to those interested in the information available in the DCIC, an understudied but likely important subnucleus in the auditory midbrain.

      The DCIC neural responses are complex; the units unreliably respond to sound onset, and at the very least respond to some unknown input or internal state (e.g., large inter-sound interval responses). The authors do a decent job in wrangling these complex responses: using interpretable decoders to extract information available from population responses.

      Weaknesses:

      The authors observe that neurons with the most azimuthal sensitivity within the DCIC are positively correlated, but they use a Naïve Bayesian decoder which assume independence between units. Although this is a bit strange given their observation that some of the recorded units are correlated, it is unlikely to be a critical flaw. At one point the authors reduce the dimensionality of their data through PCA and use the loadings onto these components in their decoder. PCA incorporates the correlational structure when finding the principal components and constrains these components to be orthogonal and uncorrelated. This should alleviate some of the concern regarding the use of the naïve Bayesian decoder because the projections onto the different components are independent. Nevertheless, the decoding results are a bit strange, likely because there is not much linearly decodable azimuth information in the DCIC responses. Raw population responses failed to provide sufficient information concerning azimuth for the decoder to perform better than chance. Additionally, it only performed better than chance when certain principal components or top ranked units contributed to the decoder but not as more components or units were added. So, although there does appear to be some azimuthal information in the recoded DCIC populations - it is somewhat difficult to extract and likely not an 'effective' encoding of sound localization as their title suggests.

      As described in the responses to reviewers 1 and 2, we chose the naïve Bayes classifier as a decoder to determine the influence of noise correlations through the most conservative approach possible, as this classifier would be least likely to find a contribution of correlated noise. Also, we chose this decoder due to its robustness against limited numbers of trials collected, in comparison to “data hungry” non linear classifiers like KNN or artificial neuronal nets. Lastly, we observed that small populations of noisy, unreliable (do not respond in every trial) DCIC neurons can encode stimulus azimuth in passively listening mice matching the discrimination error of trained mice. Therefore, while this encoding is definitely not efficient, it can still be considered effective.

      Although this is quite a worthwhile dataset, the authors present relatively little about the characteristics of the units they've recorded. This may be due to the high variance in responses seen in their population. Nevertheless, the authors note that units do not respond on every trial but do not report what percent of trials that fail to evoke a response. Is it that neurons are noisy because they do not respond on every trial or is it also that when they do respond they have variable response distributions? It would be nice to gain some insight into the heterogeneity of the responses.

      The limited number of azimuth trial repetitions that we could collect precluded us from making any quantification of the unreliability (failures to respond) and variability in the response distributions from the units we recorded, as we feared they could be misleading. In qualitative terms, “due to the high variance in responses seen” in the recordings and the limited trial sampling, it is hard to make any generalization. In consequence we referred to the observed response variance altogether as neuronal noise. Considering these points, our datasets are publicly available for exploration of the response characteristics.

      Additionally, is there any clustering at all in response profiles or is each neuron they recorded in the DCIC unique?

      We attempted to qualitatively visualize response clustering using dimensionality reduction, observing different degrees of clustering or lack thereof across the azimuth classes in the datasets collected from different mice. It is likely that the limited number of azimuth trials we could collect and the high response variance contribute to an inconsistent response clustering across datasets.

      They also only report the noise correlations for their top ranked units, but it is possible that the noise correlations in the rest of the population are different.

      For this study, since our aim was to interrogate the influence of noise correlations on stimulus azimuth encoding by DCIC populations, we focused on the noise correlations from the top ranked unit subpopulation, which likely carry the bulk of the sound location information.  Noise correlations can be defined as correlation in the trial to trial response variation of neurons. In this respect, it is hard to ascertain if the rest of the population, that is not in the top rank unit percentage, are really responding and showing response variation to evaluate this correlation, or are simply not responding at all and show unrelated activity altogether. This makes observations about noise correlations from “the rest of the population” potentially hard to interpret.

      It would also be worth digging into the noise correlations more - are units positively correlated because they respond together (e.g., if unit x responds on trial 1 so does unit y) or are they also modulated around their mean rates on similar trials (e.g., unit x and y respond and both are responding more than their mean response rate). A large portion of trial with no response can occlude noise correlations. More transparency around the response properties of these populations would be welcome.

      Due to the limited number of azimuth trial repetitions collected, to evaluate noise correlations we used the non parametric Kendall tau correlation coefficient which is a measure of pairwise rank correlation or ordinal association in the responses to each azimuth. Positive rank correlation would represent neurons more likely responding together. Evaluating response modulation “around their mean rates on similar trials” would require assumptions about the response distributions, which we avoided due to the potential biases associated with limited sample sizes.

      It is largely unclear what the DCIC is encoding. Although the authors are interested in azimuth, sound location seems to be only a small part of DCIC responses. The authors report responses during inter-sound interval and unreliable sound-evoked responses. Although they have video of the head during recording, we only see a correlation to snout and ear movements (which are peculiar since in the example shown it seems the head movements predict the sound presentation). Additional correlates could be eye movements or pupil size. Eye movement are of particular interest due to their known interaction with IC responses - especially if the DCIC encodes sound location in relation to eye position instead of head position (though much of eye-position-IC work was done in primates and not rodent). Alternatively, much of the population may only encode sound location if an animal is engaged in a localization task. Ideally, the authors could perform more substantive analyses to determine if this population is truly noisy or if the DCIC is integrating un-analyzed signals.

      We unsuccessfully attempted eye tracking and pupillometry in our videos. We suspect that the reason behind this is a generally overly dilated pupil due to the low visible light illumination conditions we used which were necessary to protect the PMT of our custom scope.

      It is likely that DCIC population activity is integrating un-analyzed signals, like the signal associated with spontaneous behaviors including face movements (Stringer et al., 2019), which we observed at the level of spontaneous snout movements. However investigating if and how these signals are integrated to stimulus azimuth coding requires extensive behavioral testing and experimentation which is out of the scope of this study. For the purpose of our study, we referred to trial-to-trial response variation as neuronal noise. We note that this definition of neuronal noise can, and likely does, include an influence from un-analyzed signals like the ones from spontaneous behaviors.

      Although this critique is ubiquitous among decoding papers in the absence of behavioral or causal perturbations, it is unclear what - if any - role the decoded information may play in neuronal computations. The interpretation of the decoder means that there is some extractable information concerning sound azimuth - but not if it is functional. This information may just be epiphenomenal, leaking in from inputs, and not used in computation or relayed to downstream structures. This should be kept in mind when the authors suggest their findings implicate the DCIC functionally in sound localization.

      Our study builds upon previous reports by other independent groups relying on “causal and behavioral perturbations” and implicating DCIC in sound location learning induced experience dependent plasticity (Bajo et al., 2019, 2010; Bajo and King, 2012), which altogether argues in favor of DCIC functionality in sound localization.

      Nevertheless, we clarified in the discussion of the revised manuscript that a relationship between the observed decoding error and the psychophysical performance, or the ability of the DCIC network to perform Bayesian decoding computations, both remain to be determined (please see responses to Reviewer #2).

      It is unclear why positive noise correlations amongst similarly tuned neurons would improve decoding. A toy model exploring how positive noise correlations in conjunction with unreliable units that inconsistently respond may anchor these findings in an interpretable way. It seems plausible that inconsistent responses would benefit from strong noise correlations, simply by units responding together. This would predict that shuffling would impair performance because you would then be sampling from trials in which some units respond, and trials in which some units do not respond - and may predict a bimodal performance distribution in which some trials decode well (when the units respond) and poor performance (when the units do not respond).

      In samples with more that 2 dimensions, the relationship between signal and noise correlations is more complex than in two dimensional samples (Montijn et al., 2016) which makes constructing interpretable and simple toy models of this challenging. Montijn et al. (2016) provide a detailed characterization and model describing how the accuracy of a multidimensional population code can improve when including “positive noise correlations amongst similarly tuned neurons”. Unfortunately we could not successfully test their model based on Mahalanobis distances as we could not verify that the recorded DCIC population responses followed a multivariate gaussian distribution, due to the limited azimuth trial repetitions we could sample.

      Significance:

      Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on top unit activity specifically). The dataset will be of value to those interested in the DCIC and also to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      General:

      The manuscript is generally well written, but could benefit from a quick proof by a native English speaker (e.g., "the" inferior colliculus is conventionally used with its article). The flow of arguments is also generally easy to follow, but I would kindly ask the authors to consider elaborating or clarifying the following points (including those already mentioned in my public review).

      (1) Choice of model:

      There are countless ways one can construct a decoder or classifier that can predict a presented sensory stimulus based on a population neuronal response. Given the assumptions of independence as mentioned in my public review, I would ask the authors to explicitly justify their choice of a naïve Bayesian classifier.

      A section detailing the logic of classifier choice is now included in the results section at page 10 and the last paragraph of page 18 from the revised version of the manuscript.

      (2) Number of imaging repetitions:

      For particularly noisy datasets, 14 repetitions is indeed quite few. I reckon this was not the choice of the authors, but rather limited by the inherent experimental conditions. Despite minimisation of required average laser power during the development of s-TeFo imaging, the authors still required almost 200 mW (which is still quite a lot of exposure). Although 14 repetitions for 13 azimuthal locations every 5 s is at face value a relatively short imaging session (~15 min.), at 191 mW, with the desire to image mice multiple times, I could imagine that this is a practical limitation the authors faced (to avoid excessive tissue heating or photodamage, which was assessed in the original Nature Methods article, but not here). Nevertheless, this logic (or whatever logic they had) should be explained for non-imaging experts in the readership.

      This is now addressed in the answers to the public reviews.

      (3) Redundancy:

      It is honestly unclear to me what the authors mean by this. I don't speculate that they mean there are "redundant" (small) populations of neurons that sufficiently encode azimuth, but I'm actually not certain. If that were the case, I believe this would need further clarification, since redundant representations would be both inconsistent with the general (perhaps surprising) finding that large populations are not required in the DCIC, which is thought to be the case at earlier processing stages.

      In the text we are referring to the azimuth information being redundantly distributed across DCIC top ranked units. We do not mention redundant “populations of neurons”.

      (4) Correspondence of decoding accuracy with psychometric functions in mice: While this is an interesting coincidental observation, it should not be interpreted that the neuronal detection threshold in the DCIC somehow is somehow responsible its psychometric counterpart (which is an interesting yet exceedingly complex question). Although I do not believe the authors intended to suggest this, I would personally be cautious in the way I describe this correspondence. I mention this because the authors point it out multiple times in the manuscript (whereas I would have just mentioned it once in passing).

      This is now clarified in the revised manuscript.

      (5) Noisy vs. sparse:

      I'm confident that the authors understand the differences between these terms, both in concept (stochastic vs. scattered) and in context (neuronal vs. experimental), but I personally would be cautious in the way I use them in the description of the study. Indeed, auditory neuronal signals are to my knowledge generally thought to be both sparse and noisy, which is in itself interesting, but the study also deals with substantial experimental (recording) noise, and I think it's important for the readership to understand when "noise" refers to the recordings (in particular the imaging data) and to neuronal activity. I mention this specifically because "noisy" appears in the title.

      We have clarified this issue at the bottom of page 5 by adding the following sentences to the revised manuscript:

      “In this section we used the word “noise” to refer to the sound stimuli used and recording setup background sound levels or recording noise in the acquired signals. To avoid confusion, from now on in the manuscript the word “noise” will be used in the context of neuronal noise, which is the trial-to-trial variation in neuronal responses unrelated to stimuli, unless otherwise noted.”

      (6)  More details in the Methods:

      The Methods section is perhaps the least-well structured part of the present manuscript in my view, and I encourage the authors to carefully go through it and add the following information (in case I somehow missed it).

      a. Please also indicate the number of animals used here.

      Added.

      b. How many sessions were performed on each mouse?

      This is already specified in the methods section in page 25:

      “mice were imaged a total of 2-11 times (sessions), one to three times a week.”

      We added for clarification:

      “Datasets here analyzed and reported come from the imaging session in which we observed maximal calcium sensor signal (peak AAV expression) and maximum number of detected units.”

      c. For the imaging experiments, was it possible to image the same units from session tosession?

      This is not possible for sTeFo 2P data due to low spatial resolution which makes precisely matching neuron ROIs across sessions challenging.

      d. Could the authors please add more detail to the analyses of the videos (to track facialmovements) or provide a reference?

      Added citation.

      e. The same goes for the selection of subcellular regions of interest that were used as"units."

      Added to page 25:

      “We used the CaImAn package (Giovannucci et al., 2019) for automatic ROI segmentation through constrained non negative matrix factorization and selected ROIs (Units) showing clear Ca transients consistent with neuronal activity, and IC neuron somatic shape and size (Schofield and Beebe, 2019).”

      Specific: In order to maximise the efficiency of my comments and suggestions (as there are no line numbers), my numerated points are organised in sequential order.

      (1) Abstract: I wouldn't personally motivate the study with the central nucleus of the IC (i.e. Idon't think this is necessary). I think the authors can motivate it simply with the knowledge gaps in spatial coding throughout the auditory system, in which such large data sets such as the ones presented here are of general value.

      (2) Page 4: 15-50 kHz "white" noise is incorrect. It should be "band-passed" noise.

      Changed.

      (3) Supplemental figure 1, panel A: Since the authors could not identify cell bodiesunequivocally from their averaged volume timeseries data, it would be clearer to the readership if larger images are shown, so that they can evaluate (speculate) for themselves what subcellular structures were identified as units. Even better would be to include a planar image through a cross-section. As mentioned above, not everything determined for the cortex or hippocampus can be assumed to be true for the DCIC.

      The raw images and segmentations are publicly available for detailed inspections.

      (4) Supplemental figure 2, panel A: This panel requires further explanation, in particular thepanel on the right. I assume that to be a simple subtraction of sequential frames, but I'm thrown off by the "d(Grey)" colour bar. Also, if "grey" refers to the neutral colour, it is conventionally spelled "gray" in US-American English.

      Changed.

      (5) Supplemental figure 2, panel B: I'm personally curious why the animals exhibitedmovement just prior to a stimulus. Did they learn to anticipate the presentation of a sound after some habituation? Is that somehow a pre-emptive startle response? We observe that in our own experiments (but as we stochastically vary the inter-trial-intervals, the movement typically occurs directly after the stimulus). I don't suggest the authors dwell on this, but I find it an interesting observation.

      It is indeed interesting, but we can’t conclude much about it without comparing it to random inter-trial-intervals.

      (6) Supplemental figure 3: I personally find these data (decoding of all electrophysiologicaldata) of central relevance to the study, since it mirrors the analyses presented for its imaging data counterpart and encourage the authors to move it to the main text.

      Changed.

      (7) Page 12: Do the authors have any further analyses of spatial tuning functions? We allknow they can parametrically obscure (i.e., bi-lobed, non-monotonic, etc.), but having these parameters (even if just in a supplemental figure) would be informative for the spatial auditory community.

      We dedicated significant effort to attempt to parametrize and classify the azimuth response dependency functions from the recorded DCIC cells in an unbiased way. Nevertheless, given the observed response noise and the “obscure” properties of spatial tuning functions mentioned by the reviewer, we could only reach the general qualitative observation of having a more frequent contralateral selectivity.

      (8) Page 14 (end): Here, psychometric correspondence is referenced. Please add theLauer et al., (2011) reference, or, as I would, remove the statement entirely and save it for the discussion (where it is also mentioned and referenced).

      Changed.

      (9) Figure 5, Panels B and C: Why don't the authors report the Kruskal-Wallis tests (forincreasing number of units training the model), akin to e.g., Panel G of Figure 4? I think that would be interesting to see (e.g., if the number of required units to achieve statistical significance is the same).

      Within class randomization produced a moderate effect on decoder performance, achieving statistical significance at similar numbers of units, as seen in figure 5 panels B and C. We did not include these plots for the sake of not cluttering the figure with dense distributions and fuzzing the visualization of the differences between the distributions shown.

      (10) Figure 5, Panels B and C (histograms): I see a bit of skewedness in the distributions(even after randomisation). Where does this come from? This is just a small talking point.

      We believe this is potentially due to more than one distribution of pairwise correlations combined into one histogram (like in a Gaussian mixture model).

      (11) Page 21: Could the authors please specify that the Day and Delgutte (2013) study wasperformed on rabbits? Since rabbits have an entirely different spectral hearing range compared to mice, spatial coding principles could very well be different in those animals (and I'm fairly certain such a study has not yet been published for mice).

      Specified.

      (12) Page 22: I'd encourage the authors to remove the reference to Rayleigh's duplextheory, since mice hardly (if at all) use interaural time differences for azimuthal sound localisation, given their generally high-frequency hearing range.

      That sentence is meant to discuss beyond the mouse model an exciting outlook of our findings in light of previous reports, which is a hypothetical functional relationship between the tonotopy in DCIC and the spatial distribution of azimuth sensitive DCIC neurons. We have clarified this now in the text.

      (13) Page 23: I believe the conventional verb for gene delivery with viruses is still"transduce" (or "infect", but not "induce"). What was the specific "syringe" used for stereotactic injections? Also, why were mice housed separately after surgery? This question pertains to animal welfare.

      Changed. The syringe was a 10ml syringe to generate positive or negative pressure, coupled to the glass needle through a silicon tubing via a luer 3-way T valve. Single housing was chosen to avoid mice compromising each other’s implantations. Therefore this can be seen as a refinement of our method to maximize the chances of successful imaging per implanted mouse.

      (14) Page 25: Could the authors please indicate the refractory period violation time windowhere? I had to find it buried in the figure caption of Supplementary figure 1.

      Added.

      (15) Page 27: What version of MATLAB was used? This could be important for reproductionof the analyses, since The Mathworks is infamously known to add (or even more deplorably, modify) functions in particular versions (and not update older ones accordingly).

      Added.

      Reviewer #3 (Recommendations For The Authors):

      Overall I thought this was a nice manuscript and a very interesting dataset. Here are some suggestions and minor corrections:

      You may find this work of interest - 'A monotonic code for sound azimuth in primate inferior colliculus' 2003, Groh, Kelly & Underhill.

      We thank the reviewer for pointing out this extremely relevant reference, which we regrettably failed to cite. It is now included in the revised version of the manuscript.

      In your introduction, you state "our findings point to a functional role of DCIC in sound location coding". Though your results show that there is azimuthal information contained in a subset of DCIC units there's no evidence in the manuscript that shows a functional link between this representation and sound localization.

      This is now addressed in the answers to the public reviews.

      I found the variability in your DCIC population quite striking - especially during the intersound intervals. The entrainment of the population in the imaging datatset suggests some type of input activating the populations - maybe these are avenues for further probing the variability here:

      (1) I'm curious if you can extract eye movements from your video. Work from Jennifer Grohshows that some cells in the primate inferior colliculus are sensitive to different eye positions (Groh et. al., 2001). With recent work showing eye movements in rodents, it may explain some of the variance in the DCIC responses.

      This is now addressed in the answers to the public reviews.

      (2) I was also curious if the motor that moves the speaker made noise It could be possiblesome of the 'on going' activity could be some sound-evoked response.

      We were careful to set the stepper motor speed so that it produced low frequency noise, within a band mostly outside of the hearing range of mice (<4kHz). Nevertheless, we cannot fully rule out that a very quiet but perhaps very salient component of the motor noise could influence the activity during the inter trial periods. The motor was stationary and quiet for a period of at least one stimulus duration before and during stimulus presentation.  

      (3) Was the sound you present frozen or randomly generated on each trial? Could therebe some type of structure in the noise you presented that sometimes led cells to respond to a particular azimuth location but not others?

      The sound presented was frozen noise. This is now clarified in the methods section.

      It may be useful to quantify the number of your units that had refractory period violations.

      Our manual curation of sorted units was very stringent to avoid mixing differently tuned neurons. The single units analyzed had very infrequent refractory period violations, in less than ~5% of the spikes, considering a 2 ms refractory period.

      Was the video recording contralateral or ipsilateral to the recording?

      The side of the face ipsilateral to the imaged IC was recorded. Added to methods.

      I was struck by the snout and ear movements - in the example shown in Supplementary Figure 2B it appears as they are almost predicting sound onset. Was there any difference in ear movements in the habituated and non-habituated animals? Also, does the placement of the cranial window disturb any of the muscles used in ear movement?

      Mouse snout movements appear to be quite active perhaps reflecting arousal (Stringer et al., 2019). We cannot rule out that the cranial window implantation disturbed ear movement but while moving the mouse headfixed we observed what could be considered normal ear movements.

      Did you correlate time-point by time-point in the average population activity and movement or did you try different temporal labs/leads in case the effect of the movements was delayed in some way?

      Point by point due to 250ms time resolution of imaging.

      Are the video recordings only available during the imaging? It would be nice to see the same type of correlations in the neuropixel-acquired data as well.

      Only imaging. For neuropixels recordings, we were skeptical about face videography as we suspected that face movements were likely influenced by the acute nature of the preparation procedure. Our cranial window preparation in the other hand involved a recovery period of at least 4 weeks. Therefore we were inclined to perform videographical interrogation of face movements on these mice instead.

      If you left out more than 1 trial do you think this would help your overfitting issue (e.g. leaving out 20% of the data).

      Due to the relatively small number of trial repetitions collected, fitting the model with an even smaller training dataset is unlikely to help overfitting and will likely decrease decoder performance.

      It would be nice to see a confusion matrix - even though azimuthal error and cumulative distribution of error are a fine way to present the data - a confusion matrix would tell us which actual sounds the decoder is confusing. Just looking at errors could result in some funky things where you reduce the error generally but never actually estimate the correct location.

      We considered confusion matrices early on in our study but they were not easily interpretable or insightful, likely due to the relatively low discrimination ability of the mouse model with +/- 30º error after extensive training. Therefore, we reasoned that in passively listening mice (and likely trained mice too) with limited trial repetitions, an undersampled and diffuse confusion matrix is expected which is not an ideal means of visualizing and comparing decoding errors. Hence we relied on cumulative error distributions.

      Do your top-ranked units have stronger projections onto your 10-40 principal components?

      It would be interesting to know if the components are mostly taking into account those 30ish percent of the population that is dependent upon azimuth.

      Inspection of PC loadings across units ranked based on response dependency to stimulus azimuth does not show a consistent stronger projection of top ranked units onto the first 10-40 principal components (Author response image 3).

      Author response image 3.

      PC loading matrices for each recorded mouse. The units recorded in each mouse are ranked in descending order of response dependency to stimulus azimuth based on  the p value of the chi square test. Units above the red dotted line display a chi square p value < 0.05, units below this line have p values >= 0.05.

      How much overlap is there in the tuning of the top-ranked units?

      This is quite varying from mouse to mouse and imaging vs electrophysiology, which makes it hard to make a generalization since this might depend on the unique DCIC population sampled in each mouse.

      I'm not really sure I follow what the nS/N adds - it doesn't really measure tuning but it seems to be introduced to discuss/extract some measure of tuning.

      nS/N is used to quantify how noisy neurons are, independent of how sensitive their responses are to the stimulus azimuth.

      Is the noise correlation - observed to become more positive - for more contralateral stimuli a product of higher firing rates due to a more preferred stimulus presentation or a real effect in the data? Was there any relationship between distance and strength of observed noise correlation in the DCIC?

      We observed a consistent and homogeneous trend of pairwise noise correlation distributions either shifted or tailed towards more positive values across stimulus azimuths, for imaging and electrophysiology datasets (Author response image 3). The lower firing frequency observed in neuropixels recordings in response to ipsilateral azimuths could have affected the statistical power of the comparison between the pairwise noise correlation coefficient distribution to its randomized chance level, but the overall histogram shapes qualitatively support this consistent trend across azimuths (Author response image 4).

      Author response image 4.

      Distribution histograms for the pairwise correlation coefficients (Kendall tau) from pairs of simultaneously recorded top ranked units across mice (blue) compared to the chance level distribution obtained through randomization of the temporal structure of each unit’s activity to break correlations (purple). Vertical lines show the medians of these distributions. Imaging data comes from n = 12 mice and neuropixels data comes from n = 4 mice.

      Typos:

      'a population code consisting on the simultaneous" > should on be of?

      'half of the trails' > trails should be trials?

      'referncing the demuxed channels' > should it be demixed?

      Corrected.

    2. eLife Assessment

      The paper reports the important discovery that the mouse dorsal inferior colliculus, an auditory midbrain area, encodes sound location. The evidence supporting the claims is solid, being supported by both optical and electrophysiological recordings. The observations described should be of interest to auditory researchers studying the neural mechanisms of sound localization and the role of noise correlations in population coding.

    3. Reviewer #1 (Public review):

      Summary:

      In this study, the authors address whether the dorsal nucleus of the inferior colliculus (DCIC) in mice encodes sound source location within the front horizontal plane (i.e., azimuth). They do this using volumetric two-photon Ca2+ imaging and high-density silicon probes (Neuropixels) to collect single-unit data. Such recordings are beneficial because they allow large populations of simultaneous neural data to be collected. Their main results and the claims about those results are the following:<br /> (1) DCIC single-unit responses have high trial-to-trial variability (i.e., neural noise);<br /> (2) approximately 32% to 40% of DCIC single units have responses that are sensitive to sound source azimuth;<br /> (3) single-trial population responses (i.e., the joint response across all sampled single units in an animal) encode sound source azimuth "effectively" (as stated in the title) in that localization decoding error matches average mouse discrimination thresholds;<br /> (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleus of the inferior colliculus (as stated in the Abstract);<br /> (5) evidence of noise correlation between pairs of neurons exists;<br /> and 6) noise correlations between responses of neurons help reduce population decoding error.<br /> While simultaneous recordings are not necessary to demonstrate results #1, #2, and #4, they are necessary to demonstrate results #3, #5, and #6.

      Strengths:<br /> - Important research question to all researchers interested in sensory coding in the nervous system.<br /> - State-of-the-art data collection: volumetric two-photon Ca2+ imaging and extracellular recording using high-density probes. Large neuronal data sets.<br /> - Confirmation of imaging results (lower temporal resolution) with more traditional microelectrode results (higher temporal resolution).<br /> - Clear and appropriate explanation of surgical and electrophysiological methods. I cannot comment on the appropriateness of the imaging methods.

      Strength of evidence for the claims of the study:

      (1) DCIC single-unit responses have high trial-to-trial variability -<br /> The authors' data clearly shows this.

      (2) Approximately 32% to 40% of DCIC single units have responses that are sensitive to sound source azimuth -<br /> The sensitivity of each neuron's response to sound source azimuth was tested with a Kruskal-Wallis test, which is appropriate since response distributions were not normal. Using this statistical test, only 8% of neurons (median for imaging data) were found to be sensitive to azimuth, and the authors noted this was not significantly different than the false positive rate. The Kruskal-Wallis test was not reported for electrophysiological data. The authors suggested that low numbers of azimuth-sensitive units resulting from the statistical analysis may be due to the combination of high neural noise and relatively low number of trials, which would reduce statistical power of the test. This is likely true, and highlights a weakness in the experimental design (i.e., relatively small number of trials). The authors went on to perform a second test of azimuth sensitivity-a chi-squared test-and found 32% (imaging) and 40% (e-phys) of single units to have statistically significant sensitivity. However, the use of a chi-squared test is questionable because it is meant to be used between two categorical variables, and neural response had to be binned before applying the test.

      (3) Single-trial population responses encode sound source azimuth "effectively" in that localization decoding error matches average mouse discrimination thresholds -<br /> If only one neuron in a population had responses that were sensitive to azimuth, we would expect that decoding azimuth from observation of that one neuron's response would perform better than chance. By observing the responses of more than one neuron (if more than one were sensitive to azimuth), we would expect performance to increase. The authors found that decoding from the whole population response was no better than chance. They argue (reasonably) that this is because of overfitting of the decoder model-too few trials were used to fit too many parameters-and provide evidence from decoding combined with principal components analysis which suggests that overfitting is occurring. What is troubling is the performance of the decoder when using only a handful of "top-ranked" neurons (in terms of azimuth sensitivity) (Fig. 4F and G). Decoder performance seems to increase when going from one to two neurons, then decreases when going from two to three neurons, and doesn't get much better for more neurons than for one neuron alone. It seems likely there is more information about azimuth in the population response, but decoder performance is not able to capture it because spike count distributions in the decoder model are not being accurately estimated due to too few stimulus trials (14, on average). In other words, it seems likely that decoder performance is underestimating the ability of the DCIC population to encode sound source azimuth.

      To get a sense of how effective a neural population is at coding a particular stimulus parameter, it is useful to compare population decoder performance to psychophysical performance. Unfortunately, mouse behavioral localization data do not exist. Instead, the authors compare decoder error to mouse left-right discrimination thresholds published previously by a different lab. However, this comparison is inappropriate because the decoder and the mice were performing different perceptual tasks. The decoder is classifying sound sources to 1 of 13 locations from left to right, whereas the mice were discriminating between left or right sources centered around zero degrees. The errors in these two tasks represent different things. The two data sets may potentially be more accurately compared by extracting information from the confusion matrices of population decoder performance. For example, when the stimulus was at -30 deg, how often did the decoder classify the stimulus to a lefthand azimuth? Likewise, when the stimulus was +30 deg, how often did the decoder classify the stimulus to a righthand azimuth?

      (4) DCIC can encode sound source azimuth in a similar format to that in the central nucleus of the inferior colliculus -<br /> It is unclear what exactly the authors mean by this statement in the Abstract. There are major differences in the encoding of azimuth between the two neighboring brain areas: a large majority of neurons in the CNIC are sensitive to azimuth (and strongly so), whereas the present study shows a minority of azimuth-sensitive neurons in the DCIC. Furthermore, CNIC neurons fire reliably to sound stimuli (low neural noise), whereas the present study shows that DCIC neurons fire more erratically (high neural noise).

      (5) Evidence of noise correlation between pairs of neurons exists -<br /> The authors' data and analyses seem appropriate and sufficient to justify this claim.

      (6) Noise correlations between responses of neurons help reduce population decoding error -<br /> The authors show convincing analysis that performance of their decoder increased when simultaneously measured responses were tested (which include noise correlation) than when scrambled-trial responses were tested (eliminating noise correlation). This makes it seem likely that noise correlation in the responses improved decoder performance. The authors mention that the naïve Bayesian classifier was used as their decoder for computational efficiency, presumably because it assumes no noise correlation and, therefore, assumes responses of individual neurons are independent of each other across trials to the same stimulus. The use of a decoder that assumes independence seems key here in testing the hypothesis that noise correlation contains information about sound source azimuth. The logic of using this decoder could be more clearly spelled out to the reader. For example, if the null hypothesis is that noise correlations do not carry azimuth information, then a decoder that assumes independence should perform the same whether population responses are simultaneous or scrambled. The authors' analysis showing a difference in performance between these two cases provides evidence against this null hypothesis.

      Minor weakness:<br /> - Most studies of neural encoding of sound source azimuth are done in a noise-free environment, but the experimental setup in the present study had substantial background noise. This complicates comparison of the azimuth tuning results in this study to those of other studies. One is left wondering if azimuth sensitivity would have been greater in the absence of background noise, particularly for the imaging data where the signal was only about 12 dB above the noise.

    4. Reviewer #2 (Public review):

      In the present study, Boffi et al. investigate the manner in which the dorsal cortex of the of the inferior colliculus (DCIC), an auditory midbrain area, encodes sound location azimuth in awake, passively listening mice. By employing volumetric calcium imaging (scanned temporal focusing or s-TeFo), complemented with high-density electrode electrophysiological recordings (neuropixels probes), they show that sound-evoked responses are exquisitely noisy, with only a small portion of neurons (units) exhibiting spatial sensitivity. Nevertheless, a naïve Bayesian classifier was able to predict the presented azimuth based on the responses from small populations of these spatially sensitive units. A portion of the spatial information was provided by correlated trial-to-trial response variability between individual units (noise correlations). The study presents a novel characterization of spatial auditory coding in a non-canonical structure, representing a noteworthy contribution specifically to the auditory field and generally to systems neuroscience, due to its implementation of state-of-the-art techniques in an experimentally challenging brain region. However, nuances in the calcium imaging dataset and the naïve Bayesian classifier warrant caution when interpreting some of the results.

      Strengths:

      The primary strength of the study lies in its methodological achievements, which allowed the authors to collect a comprehensive and novel dataset. While the DCIC is a dorsal structure, it extends up to a millimetre in depth, making it optically challenging to access in its entirety. It is also more highly myelinated and vascularised compared to e.g., the cerebral cortex, compounding the problem. The authors successfully overcame these challenges and present an impressive volumetric calcium imaging dataset. Furthermore, they corroborated this dataset with electrophysiological recordings, which produced overlapping results. This methodological combination ameliorates the natural concerns that arise from inferring neuronal activity from calcium signals alone, which are in essence an indirect measurement thereof.

      Another strength of the study is its interdisciplinary relevance. For the auditory field, it represents a significant contribution to the question of how auditory space is represented in the mammalian brain. "Space" per se is not mapped onto the basilar membrane of the cochlea and must be computed entirely within the brain. For azimuth, this requires the comparison between miniscule differences between the timing and intensity of sounds arriving at each ear. It is now generally thought that azimuth is initially encoded in two, opposing hemispheric channels, but the extent to which this initial arrangement is maintained throughout the auditory system remains an open question. The authors observe only a slight contralateral bias in their data, suggesting that sound source azimuth in the DCIC is encoded in a more nuanced manner compared to earlier processing stages of the auditory hindbrain. This is interesting because it is also known to be an auditory structure to receive more descending inputs from the cortex.

      Systems neuroscience continues to strive for the perfection of imaging novel, less accessible brain regions. Volumetric calcium imaging is a promising emerging technique, allowing the simultaneous measurement of large populations of neurons in three dimensions. But this necessitates corroboration with other methods, such as electrophysiological recordings, which the authors achieve. The dataset moreover highlights the distinctive characteristics of neuronal auditory representations in the brain. Its signals can be exceptionally sparse and noisy, which provide an additional layer of complexity in the processing and analysis of such datasets. This will undoubtedly be useful for future studies of other less accessible structures with sparse responsiveness.

      Weaknesses:

      Although the primary finding that small populations of neurons carry enough spatial information for a naïve Bayesian classifier to reasonably decode the presented stimulus is not called into question, certain idiosyncrasies, in particular the calcium imaging dataset and model, complicate specific interpretations of the model output, and the readership is urged to interpret these aspects of the study's conclusions with caution.

      I remain in favour of volumetric calcium imaging as a suitable technique for the study, but the presently constrained spatial resolution is insufficient to unequivocally identify regions of interest as cell bodies (and are instead referred to as "units" akin to those of electrophysiological recordings). It remains possible that the imaging set is inadvertently influenced by non-somatic structures (including neuropil), which could report neuronal activity differently than cell bodies. Due to the lack of a comprehensive ground-truth comparison in this regard (which to my knowledge is impossible to achieve with current technology), it is difficult to imagine how many informative such units might have been missed because their signals were influenced by spurious, non-somatic signals, which could have subsequently misled the models. The authors reference the original Nature Methods article (Prevedel et al., 2016) throughout the manuscript, presumably in order to avoid having to repeat previously published experimental metrics. But the DCIC is neither the cortex nor hippocampus (for which the method was originally developed) and may not have the same light scattering properties (not to mention neuronal noise levels). Although the corroborative electrophysiology data largely eleviates these concerns for this particular study, the readership should be cognisant of such caveats, in particular those who are interested in implementing the technique for their own research.

      A related technical limitation of the calcium imaging dataset is the relatively low number of trials (14) given the inherently high level of noise (both neuronal and imaging). Volumetric calcium imaging, while offering a uniquely expansive field of view, requires relatively high average excitation laser power (in this case nearly 200 mW), a level of exposure the authors may have wanted to minimise by maintaining a low number of repetitions, but I yield to them to explain. Calcium imaging is also inherently slow, requiring relatively long inter-stimulus intervals (in this case 5 s). This unfortunately renders any model designed to predict a stimulus (in this case sound azimuth) from particularly noisy population neuronal data like these as highly prone to overfitting, to which the authors correctly admit after a model trained on the entire raw dataset failed to perform significantly above chance level. This prompted them to feed the model only with data from neurons with the highest spatial sensitivity. This ultimately produced reasonable performance (and was implemented throughout the rest of the study), but it remains possible that if the model was fed with more repetitions of imaging data, its performance would have been more stable across the number of units used to train it. (All models trained with imaging data eventually failed to converge.) However, I also see these limitations as an opportunity to improve the technology further, which I reiterate will be generally important for volume imaging of other sparse or noisy calcium signals in the brain.

      Indeed, in separate comments to these remarks, the authors confirmed that the low number of trials was technically limited, to which I emphasise is to no fault of their own. However, they also do not report this as a typical imaging constraint, such as photobleaching, but rather because the animals exhibited signs of stress and discomfort at longer imaging periods. From an animal welfare perspective, I would encourage the authors to state this in the methods for transparency. It would demonstrate their adherence to animal welfare policies, which I find to be an incredibly strong argument for limiting the number of trials in their study.

      Transitioning to the naïve Bayesian classifier itself, I first openly ask the authors to justify their choice of this specific model. There are countless types of classifiers for these data, each with their own pros and cons. Did they actually try other models (such as support vector machines), which ultimately failed? If so, these negative results (even if mentioned en passant) would be extremely valuable to the community, in my view. I ask this specifically because different methods assume correspondingly different statistical properties of the input data, and to my knowledge naïve Bayesian classifiers assume that predictors (neuronal responses) are assumed to be independent within a class (azimuth). As the authors show that noise correlations are informative in predicting azimuth, I wonder why they chose a model that doesn't take advantage of these statistical regularities. It could be because of technical considerations (they mention computing efficiency), but I am left generally uncertain about the specific logic that was used to guide the authors through their analytical journey.

      In a revised version of the manuscript, the authors indeed justify their choice of the naïve Bayesian classifier as a conservative approach (not taking into account noise correlations), which could only improve with other models (that do). They even tested various other commonly used models, such as support vector machines and k-nearest neighbours, to name a few, but do not report these efforts in the main manuscript. Interestingly, these models, which I supposed would perform better in fact did not overall - a finding that I have no way of interpreting but nevertheless find interesting. I would thus encourage the authors to include these results in a figure supplement and mention it en passant while justifying their selection of model (but please include detailed model parameters in the methods section).

      That aside, there remain other peculiarities in model performance that warrant further investigation. For example, what spurious features (or lack of informative features) in these additional units prevented the models of imaging data from converging? In an orthogonal question, did the most spatially sensitive units share any detectable tuning features? A different model trained with electrophysiology data in contrast did not collapse in the range of top-ranked units plotted. Did this model collapse at some point after adding enough units, and how well did that correlate with the model for the imaging data? How well did the form (and diversity) of the spatial tuning functions as recorded with electrophysiology resemble their calcium imaging counterparts? These fundamental questions could be addressed with more basic, but transparent analyses of the data (e.g., the diversity of spatial tuning functions of their recorded units across the population). Even if the model extracts features that are not obvious to the human eye in traditional visualisations, I would still find this interesting.

      Although these questions were not specifically addressed in the revised version of the manuscript, I also admit that I did not indent do assert that these should necessarily fall within the scope of the present study. I rather posed them as hypothetical directions one could pursue in future studies. Finally, further concerns I had with statements regarding the physiological meaning of the findings have been ameliorated by nicely modified statements, thus bringing transparency to the readership, which I appreciate.

      In summary, the present study represents a significant body of work that contributes substantially to the field of spatial auditory coding and systems neuroscience. However, limitations of the imaging dataset and model as applied in the study muddles concrete conclusions about how the DCIC precisely encodes sound source azimuth and even more so to sound localisation in a behaving animal. Nevertheless, it presents a novel and unique dataset, which, regardless of secondary interpretation, corroborates the general notion that auditory space is encoded in an extraordinarily complex manner in the mammalian brain.

    5. Reviewer #3 (Public review):

      Summary:

      Boffi and colleagues sought to quantify the single-trial, azimuthal information in the dorsal cortex of the inferior colliculus (DCIC), a relatively understudied subnucleus of the auditory midbrain. They accomplished this by using two complementary recording methods while mice passively listened to sounds at different locations: calcium imaging that recorded large neuronal populations but with poor temporal precision and multi-contact electrode arrays that recorded smaller neuronal populations with exact temporal precision. DCIC neurons respond variably, with inconsistent activity to sound onset and complex azimuthal tuning. Some of this variably was explained by ongoing head movements. The authors used a naïve Bayes decoder to probe the azimuthal information contained in the response of DCIC neurons on single trials. The decoder failed to classify sound location better than chance when using the raw population responses but performed significantly better than chance when using the top principal components of the population. Units with the most azimuthal tuning were distributed throughout the DCIC, possessed contralateral bias, and positively correlated responses. Interestingly, inter-trial shuffling decreased decoding performance, indicating that noise correlations contributed to decoder performance. Overall, Boffi and colleagues, quantified the azimuthal information available in the DCIC while mice passively listened to sounds, a first step in evaluating if and how the DCIC could contribute to sound localization.

      Strengths:

      The authors should be commended for collection of this dataset. When done in isolation (which is typical), calcium imaging and linear array recordings have intrinsic weaknesses. However, those weaknesses are alleviated when done in conjunction - especially when the data is consistent. This data set is extremely rich and will be of use for those interested in auditory midbrain responses to variable sound locations, correlations with head movements, and neural coding.

      The DCIC neural responses are complex with variable responses to sound onset, complex azimuthal tuning and large inter-sound interval responses. Nonetheless, the authors do a decent job in wrangling these complex responses: finding non-canonical ways of determining dependence on azimuth and using interpretable decoders to extract information from the population.

      Weaknesses:

      The decoding results are a bit strange, likely because the population response is quite noisy on any given trial. Raw population responses failed to provide sufficient information concerning azimuth for significant decoding. Importantly, the decoder performed better than chance when certain principal components or top ranked units contributed but did not saturate with the addition of components or top ranked units. So, although there is azimuthal information in the recorded DCIC populations - azimuthal information appears somewhat difficult to extract.

      Although necessary given the challenges associated with sampling many conditions with technically difficult recording methods, the limited number of stimulus repeats precludes interpretable characterization of the heterogeneity across the population. Nevertheless, the dataset is public so those interested can explore the diversity of the responses.

      The observations from Boffi and colleagues raises the question: what drives neurons in the DCIC to respond? Sound azimuth appears to be a small aspect of the DCIC response. For example, the first 20 principal components which explain roughly 80% of the response variance are insufficient input for the decoder to predict sound azimuth above chance. Furthermore, snout and ear movements correlate with the population response in the DCIC (the ear movements are particularly peculiar given they seem to predict sound presentation). Other movements may be of particular interest to control for (e.g. eye movements are known to interact with IC responses in the primate). These observations, along with reported variance to sound onsets and inter-sound intervals, question the impact of azimuthal information emerging from DCIC responses. This is certainly out of scope for any one singular study to answer, but, hopefully, future work will elucidate the dominant signals in the DCIC population. It may be intuitive that engagement in a sound localization task may push azimuthal signals to the forefront of DCIC response, but azimuthal information could also easily be overtaken by other signals (e.g. movement, learning).

      Boffi and colleagues set out to parse the azimuthal information available in the DCIC on a single trial. They largely accomplish this goal and are able to extract this information when allowing the units that contain more information about sound location to contribute to their decoding (e.g., through PCA or decoding on their activity specifically). Interestingly, they also found that positive noise correlations between units with similar azimuthal preferences facilitate this decoding - which is unusual given that this is typically thought to limit information. The dataset will be of value to those interested in the DCIC and to anyone interested in the role of noise correlations in population coding. Although this work is first step into parsing the information available in the DCIC, it remains difficult to interpret if/how this azimuthal information is used in localization behaviors of engaged mice.

    1. Author response:

      Reviewer #1 (Public Review):

      Padilha et al. aimed to find prospective metabolite biomarkers in serum of children aged 6-59 months that were indicative of neurodevelopmental outcomes. The authors leveraged data and samples from the cross-sectional Brazilian National Survey on Child Nutrition (ENANI-2019), and an untargeted multisegment injection-capillary electrophoresis-mass spectrometry (MSI-CE-MS) approach was used to measure metabolites in serum samples (n=5004) which were identified via a large library of standards. After correlating the metabolite levels against the developmental quotient (DQ), or the degree of which age-appropriate developmental milestones were achieved as evaluated by the Survey of Well-being of Young Children, serum concentrations of phenylacetylglutamine (PAG), cresol sulfate (CS), hippuric acid (HA) and trimethylamine-N-oxide (TMAO) were significantly negatively associated with DQ. Examination of the covariates revealed that the negative associations of PAG, HA, TMAO and valine (Val) with DQ were specific to younger children (-1 SD or 19 months old), whereas creatinine (Crtn) and methylhistidine (MeHis) had significant associations with DQ that changed direction with age (negative at -1 SD or 19 months old, and positive at +1 SD or 49 months old). Further, mediation analysis demonstrated that PAG was a significant mediator for the relationship of delivery mode, child's diet quality and child fiber intake with DQ. HA and TMAO were additional significant mediators of the relationship of child fiber intake with DQ.

      Strengths of this study include the large cohort size and study design allowing for sampling at multiple time points along with neurodevelopmental assessment and a relatively detailed collection of potential confounding factors including diet. The untargeted metabolomics approach was also robust and comprehensive allowing for level 1 identification of a wide breadth of potential biomarkers. Given their methodology, the authors should be able to achieve their aim of identifying candidate serum biomarkers of neurodevelopment for early childhood. The results of this work would be of broad interest to researchers who are interested in understanding the biological underpinnings of development and also for tracking development in pediatric populations, as it provides insight for putative mechanisms and targets from a relevant human cohort that can be probed in future studies. Such putative mechanisms and targets are currently lacking in the field due to challenges in conducting these kind of studies, so this work is important.

      However, in the manuscript's current state, the presentation and analysis of data impede the reader from fully understanding and interpreting the study's findings.

      Particularly, the handling of confounding variables is incomplete. There is a different set of confounders listed in Table 1 versus Supplementary Table 1 versus Methods section Covariates versus Figure 4. For example, Region is listed in Supplementary Table 1 but not in Table 1, and Mode of Delivery is listed in Table 1 but not in Supplementary Table 1. Many factors are listed in Figure 4 that aren't mentioned anywhere else in the paper, such as gestational age at birth or maternal pre-pregnancy obesity.

      We thank the reviewer for their comment. We would like to clarify that initially, the tables had different variables because they have different purposes. Table 1 aims to characterize the sample on variables directly related to the children’s and mother’s features and their nutritional status. Supplementary File 1(previously named supplementary table 1) summarizes the sociodemographic distribution of the development quotient. Neither of the tables concerned the metabolite-DQ relationships and their potential covariates, they only provide context for subsequent analyses by characterizing the sample and the outcome. Instead, the covariates included in the regression models were selected using the Direct Acyclic Graph presented in Figure 1.

      To avoid this potential confusion however, we included the same variables in Table 1 and Supplementary File 1(page 38) and we discussed the selection of model covariates in Figure 4 in more detail here in the letter and in the manuscript.

      The authors utilize the directed acrylic graph (DAG) in Figure 4 to justify the further investigation of certain covariates over others. However, the lack of inclusion of the microbiome in the DAG, especially considering that most of the study findings were microbial-derived metabolite biomarkers, appears to be a fundamental flaw. Sanitation and micronutrients are proposed by the authors to have no effect on the host metabolome, yet sanitation and micronutrients have both been demonstrated in the literature to affect microbiome composition which can in turn affect the host metabolome.

      Thank you for your comment. We appreciate that the use of DAG and lack of the microbiome in the DAG are concerns. This has been already discussed in reply #1 to the editor that has been pasted below for convenience:

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Additionally, the authors emphasized as part of the study selection criteria the following, "Due to the costs involved in the metabolome analysis, it was necessary to further reduce the sample size. Then, samples were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions related to iron metabolism, such as anemia and nutrient deficiencies. The selection process aimed to represent diverse health statuses, including those with no conditions, with specific deficiencies, or with combinations of conditions. Ultimately, through a randomized process that ensured a balanced representation across these groups, a total of 5,004 children were selected for the final sample (Figure 1)."

      Therefore, anemia and nutrient deficiencies are assumed by the reader to be important covariates, yet, the data on the final distribution of these covariates in the study cohort is not presented, nor are these covariates examined further.

      Thank you for the comments. We apologize for the misunderstanding and will amend the text to make our rationale clearer in the revised version of the manuscript.

      We believed the original text was clear enough in stating that the sampling process was performed aiming to maintain the representativeness of the original sample. This sampling process considered anemia and nutritional deficiencies, among other variables. However, we did not aim to include all relevant covariates of the DQ-metabolome relationship; these were decided using the DAG, as described in the manuscript and other sessions of this letter. Therefore, we would like to emphasize that our description of the sampling process does not assumes anemia and nutritional deficiencies are important covariates for the DQ-metabolome relationship.

      We rewrote this text part, page 11, lines 279-285:

      “Due to the costs involved in the metabolome analysis, it was necessary to reduce the sample size that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. Therefore, the infants were stratified by age groups (6 to 11, 12 to 23, and 24 to 59 months) and health conditions such as anemia and micronutrient deficiencies. The selection process aimed to represent diverse health statuses to the original sample. Ultimately, 5,004 children were selected for the final sample through a random sampling process that ensured a balanced representation across these groups (Figure 2).”

      The inclusion of specific covariates in Table 1, Supplementary Table 1, the statistical models, and the mediation analysis is thus currently biased as it is not well justified.

      We appreciate the reviewer comment. However, it would have been ideal to receive a comment/critic with a clearer and more straightforward argumentation, so we could try to address it based on our interpretation.

      Please refer to our response to item #1 above regarding the variables in the tables and figures. The covariates in the statistical models were selected using the DAG, which is a cutting-edge procedure that aims to avoid bias and overfitting, a common situation when confounders are adjusted for without a clear rationale. We elaborate on the advantages of using the DAG in response to item #6 and in page 9 of the manuscript. The statistical models we use follow the best practices in the field when dealing with a large number of collinear predictors and a continuous outcome (see our response to the editor’s 4th comment). Finally, the mediation analyses were done to explore a few potential explanations for our results from the PLSR and multiple regression analyses. We only ran mediation analyses for plausible mechanisms for which the variables of interest were available in our data. Please see our response to reviewer 3’s item #1 for a more detailed explanation on the mediation analysis.

      Finally, it is unclear what the partial-least squares regression adds to the paper, other than to discard potentially interesting metabolites found by the initial correlation analysis.

      Thank you for the question. As explained in response to the editor’s item #4, PLS-based analyses are among the most commonly used analyses for parsing metabolomic data (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015). This procedure is especially appropriate for cases in which there are multiple collinear predictor variables as it allows us to compare the predictive value of all the variables without relying on corrections for multiple testing. Testing each metabolite in separate correlations corrected for multiple comparisons is less appropriate because the correlated nature of the metabolites means the comparisons are not truly independent and would cause the corrections (which usually assume independence) to be overly strict. As such, we only rely on the correlations as an initial, general assessment that gives context to subsequent, more specific analyses. Given that our goal is to select the most predictive metabolites, discarding the less predictive metabolites is precisely what we aim to achieve. As explained above and in response to the editor’s item #4, the PLSR allows us to reach that goal without introducing bias in our estimates or losing statistical power.  

      Reviewer #2 (Public Review):

      A strength of the work lies in the number of children Padilha et al. were able to assess (5,004 children aged 6-59 months) and in the extensive screening that the Authors performed for each participant. This type of large-scale study is uncommon in low-to-middle-income countries such as Brazil.

      The Authors employ several approaches to narrow down the number of potentially causally associated metabolites.

      Could the Authors justify on what basis the minimum dietary diversity score was dichotomized? Were sensitivity analyses undertaken to assess the effect of this dichotomization on associations reported by the article? Consumption of each food group may have a differential effect that is obscured by this dichotomization.

      Thank you for the observation. We would like to emphasize that the child's diet quality was assessed using the minimum dietary diversity (MDD) indicator proposed by the WHO (World Health Organization & United Nations Children’s Fund (UNICEF), 2021). This guideline proposes the cutoff used in the present study. We understand the reviewer’s suggestion to use the consumption of healthy food groups as an evaluation of diet quality, but we chose to follow the WHO proposal to assess dietary diversity. This indicator is widely accepted and used as a marker and provides comparability and consistency with other published studies.

      Could the Authors specify the statistical power associated with each analysis?

      To the best of our knowledge, we are not aware of power calculation procedures for PLS-based analyses. However, given our large sample size, we do not believe power was an issue with the analyses. For our regression analyses, which typically have 4 predictors, we had 95% power to detect an f-squared of 0.003 and an r of 0.05 in a two-sided correlation test considering an alpha level of 0.05.

      New text, page 11, lines 296-298:

      “Given the size of our sample, statistical power is not an issue in our analyses. Considering an alpha of 0.05 for a two-sided test, a sample size of 5000 has 95% power to detect a correlation of r = 0.05 and an effect of f2 = 0.003 in a multiple regression model with 4 predictors.”

      Could the Authors describe in detail which metric they used to measure how predictive PLSR models are, and how they determined what the "optimal" number of components were?

      We chose the model with the fewest number of components that maximized R2 and minimized root mean squared error of prediction (RMSEP). In the training data, the model with 4 components had a lower R2 but a lower RMSEP, therefore we chose the model with 3 components which had a higher R2 than the 4-component model and lower RMSEP than the model with 2 components. However, the number of components in the model did not meaningfully change the rank order of the metabolites on the VIP index.

      New text, page 8, lines 220-224:

      “To better assess the predictiveness of each metabolite in a single model, a PLSR was conducted. PLS-based analyses are the most commonly used analyses when determining the predictiveness of a large number of variables as they avoid issues with collinearity, sample size, and corrections for multiple-testing (Blekherman et al., 2011; Wold et al., 2001; Gromski et al. 2015).”

      New text, page 12, lines 312-314:

      “In PLSR analysis, the training data suggested that three components best predicted the data (the model with three components had the highest R2, and the root mean square error of prediction (RMSEP) was only slightly lower with four components). In comparison, the test data showed a slightly more predictive model with four components (Figure 3—figure supplement 2).”

      The Authors use directed acyclic graphs (DAG) to identify confounding variables of the association between metabolites and DQ. Could the dataset generated by the Authors have been used instead? Not all confounding variables identified in the literature may be relevant to the dataset generated by the Authors.

      Thank you for the question. The response is most likely no, the current dataset should not be used to define confounders as these must be identified based on the literature. The use of DAGs has been widely explored as a valid tool for justifying the choice of confounding factors in regression models in epidemiology. This is because DAGs allow for a clear visualization of causal relationships, clarify the complex relationships between exposure and outcome. Besides, DAGs demonstrate the authors' transparency by acknowledging factors reported as important but not included/collected in the study. This has been already discussed in reply #1 to the editor that has been pasted below for convenience.

      Thank you for the comment and suggestions. It is important to highlight that there is no data on microbiome composition. We apologize if there was an impression such data is available. The main goal of conducting this national survey was to provide qualified and updated evidence on child nutrition to revise and propose new policies and nutritional guidelines for this demographic. Therefore, collection of stool derived microbiome (metagenomic) data was not one of the objectives of ENANI-2019. This is more explicitly stated as a study limitation in the revised manuscript on page 17, lines 463-467:

      “Lastly, stool microbiome data was not collected from children in ENANI-2019 as it was not a study objective in this large population-based nutritional survey. However, the lack of microbiome data does not reduce the importance/relevance, since there is no evidence that microbiome and factors affecting microbiome composition are confounders in the association between serum metabolome and child development.”

      Besides, one must consider the difficulties and costs in collecting and analyzing microbiome composition in a large population-based survey. In contrast, the metabolome data has been considered a priority as there was already blood specimens collected to inform policy on micronutrient deficiencies in Brazil. However, due to funding limitations we had to perform the analysis in a subset of our sample, still representative and large enough to test our hypothesis with adequate study power (more details below).

      We would like to argue that there is no evidence that microbiome and factors affecting microbiome composition are confounders on the association between serum metabolome and child development. First, one should revisit the properties of a confounder according to the epidemiology literature that in short states that confounding refers to an alternative explanation for a given conclusion, thus constituting one of the main problems for causal inference (Kleinbaum, Kupper, and Morgenstern, 1991; Greenland & Robins, 1986; VanderWeele, 2019). In our study, we highlight that certain serum metabolites associated with the developmental quotient (DQ) in children were circulating metabolites (e.g., cresol sulfate, hippuric acid, phenylacetylglutamine, TMAO) previously reported to depend on dietary exposures, host metabolism and gut microbiota activity. Our discussion cites other published work, including animal models and observational studies, which have reported how these bioactive metabolites in circulation are co-metabolized by commensal gut microbiota, and may play a role in neurodevelopment and cognition as mediated by environmental exposures early in life.

      In fact, the literature on the association between microbiome and infant development is very limited. We performed a search using terms ‘microbiome’ OR ‘microbiota’ AND ‘child development’ AND ‘systematic’ OR ‘meta-analysis’ and found only one study: ‘Associations between the human immune system and gut microbiome with neurodevelopment in the first 5 years of life: A systematic scoping review’ (DOI 10.1002/dev.22360). The authors conclude: ‘while the immune system and gut microbiome are thought to have interactive impacts on the developing brain, there remains a paucity of published studies that report biomarkers from both systems and associations with child development outcomes.’ It is important to highlight that our criteria to include confounders on the directed acyclic graph (DAG) was based on the literature of systematic reviews or meta-analysis and not on single isolated studies.

      In summary, we would like to highlight that there is no microbiome data in ENANI-2019 and in the event such data was present, we are confident that based on the current stage of the literature, there is no evidence to consider such construct in the DAG, as this procedure recommends that only variables associated with the exposure and the outcome should be included. Please find more details on DAG below.

      Moreover, we would like to clarify that we have not stated that sanitation and micronutrients have no effect on the serum metabolome, instead, these constructs were not considered on the DAG.

      To make it clearer, we have modified the passage about DAG in the methods section. New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the serum metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Were the systematic reviews or meta-analyses used in the DAG performed by the Authors, or were they based on previous studies? If so, more information about the methodology employed and the studies included should be provided by the Authors.

      Thank you for the question. The reviews or meta-analyses used in the DAG have been conducted by other authors in the field. This has been laid out more clearly in our methods section.

      New text, page 9, lines 234-241:

      “The subsequent step was to disentangle the selected metabolites from confounding variables. A Directed Acyclic Graph (DAG; Breitling et al., 2021) was used to more objectively determine the minimally sufficient adjustments for the regression models to account for potentially confounding variables while avoiding collider variables and variables in the metabolite-DQ causal pathways, which if controlled for would unnecessarily remove explained variance from the metabolites and hamper our ability to detect biomarkers. To minimize bias from subjective judgments of which variables should and should not be included as covariates, the DAG only included variables for which there was evidence from systematic reviews or meta-analysis of relationships with both the metabolome and DQ (Figure 1). Birth weight, breastfeeding, child's diet quality, the child's nutritional status, and the child's age were the minimal adjustments suggested by the DAG. Birth weight was a variable with high missing data, and indicators of breastfeeding practice data (referring to exclusive breastfeeding until 6 months and/or complemented until 2 years) were collected only for children aged 0–23 months. Therefore, those confounders were not included as adjustments. Child's diet quality was evaluated as MDD, the child's nutritional status as w/h z-score, and the child's age in months.”

      Approximately 72% of children included in the analyses lived in households with a monthly income superior to the Brazilian minimum wage. The cohort is also biased towards households with a higher level of education. Both of these measures correlate with developmental quotient. Could the Authors discuss how this may have affected their results and how generalizable they are?

      Thank you for your comment. This has been already discussed in reply #6 to the editor and that has been pasted below for convenience.

      Thank you for highlighting this point. The ENANI-2019 is a population-based household survey with national coverage and representativeness for macroregions, sex, and one-year age groups (< 1; 1-1.99; 2-2.99; 3-3.99; 4-5). Furthermore, income quartiles of the census sector were used in the sampling. The study included 12,524 households 14,588 children, and 8,829 infants with blood drawn.

      Due to the costs involved in metabolome analysis, it was necessary to further reduce the sample size to around 5,000 children that is equivalent to 57% of total participants from ENANI-2019 with stored blood specimens. To avoid a biased sample and keep the representativeness and generability, the 5,004 selected children were drawn from the total samples of 8,829 to keep the original distribution according age groups (6 to 11 months, 12 to 23 months, and 24 to 59 months), and some health conditions related to iron metabolism, e.g., anemia and nutrient deficiencies. Then, they were randomly selected to constitute the final sample that aimed to represent the total number of children with blood drawn. Hence, our efforts were to preserve the original characteristics of the sample and the representativeness of the original sample.

      The ENANI-2019 study does not appear to present a bias towards higher socioeconomic status. Evidence from two major Brazilian population-based household surveys supports this claim. The 2017-18 Household Budget Survey (POF) reported an average monthly household income of 5,426.70 reais, while the Continuous National Household Sample Survey (PNAD) reported that in 2019, the nominal monthly per capita household income was 1,438.67 reais. In comparison, ENANI-2019 recorded a household income of 2,144.16 reais and a per capita income of 609.07 reais in infants with blood drawn, and 2,099.14 reais and 594.74 reais, respectively, in the serum metabolome analysis sample.

      In terms of maternal education, the 2019 PNAD-Education survey indicated that 48.8% of individuals aged 25 or older had at least 11 years of schooling. When analyzing ENANI-2019 under the same metric, we found that 56.26% of ≥25 years-old mothers of infants with blood drawn had 11 years of education or more, and 51.66% in the metabolome analysis sample. Although these figures are slightly higher, they remain within a reasonable range for population studies.

      It is well known that higher income and maternal education levels can influence child health outcomes, and acknowledging this, ENANI-2019 employed rigorous sampling methods to minimize selection biases. This included stratified and complex sampling designs to ensure that underrepresented groups were adequately included, reducing the risk of skewed conclusions. Therefore, the evidence strongly suggests that the ENANI-2019 sample is broadly representative of the Brazilian population in terms of both socioeconomic status and educational attainment.

      Further to this, could the Authors describe how inequalities in access to care in the Brazilian population may have affected their results? Could they have included a measure of this possible discrepancy in their analyses?

      Thank you for the concern.

      The truth is that we are not in a position to answer this question because our study focused on gathering data on infant nutritional status and there is very limited information on access to care to allow us to hypothesize. Another important piece of information is that this national survey used sampling procedures that aimed to make the sample representative of the 15 million Brazilian infants under 5 years. Therefore, the sample is balanced according to socio-economic strata, so there is no evidence to make us believe inequalities in access to health care would have played a role.

      The Authors state that the results of their study may be used to track children at risk for developmental delays. Could they discuss the potential for influencing policies and guidelines to address delayed development due to malnutrition and/or limited access to certain essential foods?

      The point raised by the reviewer is very relevant. Recognizing that dietary and microbial derived metabolites involved in the gut-brain axis could be related to children's risk of developmental delays is the first step to bringing this topic to the public policy agenda. We believe the results can contribute to the literature, which should be used to accumulate evidence to overcome knowledge gaps and support the formulation and redirection of public policies aimed at full child growth and development; the promotion of adequate and healthy nutrition and food security; the encouragement, support, and protection of breastfeeding; and the prevention and control of micronutrient deficiencies.  

      Reviewer #3 (Public Review):

      The ENANI-2019 study provides valuable insights into child nutrition, development, and metabolomics in Brazil, highlighting both challenges and opportunities for improving child health outcomes through targeted interventions and further research.

      Readers might consider the following questions:

      (1) Should investigators study the families through direct observation of diet and other factors to look for a connection between food taken in and gut microbiome and child development?

      As mentioned before, the ENANI-2019 did not collect data on stool derived microbiome. However, there is data on child dietary intake with 24-hour recall that can be further explored in other studies.

      (2) Can an examination of the mother's gut microbiome influence the child's microbiome? Can the mother or caregiver's microbiome influence early childhood development?

      The questions raised by the reviewer are interesting and has been explored by other authors. However, we do not have microbiota data from the child nor from the mother/caregiver.

      (3) Is developmental quotient enough to study early childhood development? Is it comprehensive enough?

      Yes, we are confident it is comprehensive enough.

      According to the World Health Organization, the term Early Childhood Development (ECD) refers to the cognitive, physical, language, motor, social and emotional development between 0 - 8 years of age. The SWCY milestones assess the domains of cognition, language/communication and motor. Therefore, it has enough content validity to represent ECD.

      The SWYC is recommended for screening ECD by the American Society of Pediatrics. Furthermore, we assessed the internal consistency of the SWYC milestones questionnaire using ENANI-2019 data and Cronbach's alpha. The findings indicated satisfactory reliability (0.965; 95% CI: 0.963–0.968).

      The SWCY is a screening instrument and indicates if the ECD is not within the expected range. If one of the above-mentioned domains are not achieved as expected the child may be at risk of ECD delay. Therefore, DQ<1 indicates that a child has not reached the expected ECD for the age group. We cannot say that children with DQ≥1 have full ECD, since we do not assess the socio-emotional domains. However, DQ can track the risk of ECD delay.

      References

      Blekherman, G., Laubenbacher, R., Cortes, D. F., Mendes, P., Torti, F. M., Akman, S., ... & Shulaev, V. (2011). Bioinformatics tools for cancer metabolomics. Metabolomics, 7, 329-343.

      Gromski, P. S., Muhamadali, H., Ellis, D. I., Xu, Y., Correa, E., Turner, M. L., & Goodacre, R. (2015). A tutorial review: Metabolomics and partial least squares-discriminant analysis–a marriage of convenience or a shotgun wedding. Analytica chimica acta, 879, 10-23.

      Wold, S., Sjöström, M., & Eriksson, L. (2001). PLS-regression: a basic tool of chemometrics. Chemometrics and intelligent laboratory systems, 58(2), 109-130.

      LUIZ, RR., and STRUCHINER, CJ. Inferência causal em epidemiologia: o modelo de respostas potenciais [online]. Rio de Janeiro: Editora FIOCRUZ, 2002. 112 p. ISBN 85-7541-010-5. Available from SciELO Books http://books.scielo.org.

      GREENLAND, S. & ROBINS, J. M. Identifiability, exchangeability, and epidemiological Confounding. International Journal of Epidemiolgy, 15(3):413-419, 1986.

      Freitas-Costa NC, Andrade PG, Normando P, et al. Association of development quotient with nutritional status of vitamins B6, B12, and folate in 6–59-month-old children: Results from the Brazilian National Survey on Child Nutrition (ENANI-2019). The American journal of clinical nutrition 2023;118(1):162-73. doi: https://doi.org/10.1016/j.ajcnut.2023.04.026

      Sheldrick RC, Schlichting LE, Berger B, et al. Establishing New Norms for Developmental Milestones. Pediatrics 2019;144(6) doi: 10.1542/peds.2019-0374 [published Online First: 2019/11/16]

      Drachler Mde L, Marshall T, de Carvalho Leite JC. A continuous-scale measure of child development for population-based epidemiological surveys: a preliminary study using Item Response Theory for the Denver Test. Paediatric and perinatal epidemiology 2007;21(2):138-53. doi: 10.1111/j.1365-3016.2007.00787.x [published Online First: 2007/02/17]

      VanderWeele, TJ Princípios de seleção de fatores de confusão. Eur J Epidemiol 34, 211–219 (2019). https://doi.org/10.1007/s10654-019-00494-6

      David G. Kleinbaum, Lawrence L. Kupper; Hal Morgenstern. Epidemiologic Research: Principles and Quantitative Methods. 1991

      Yan R, Liu X, Xue R, Duan X, Li L, He X, Cui F, Zhao J. Association between internet exclusion and depressive symptoms among older adults: panel data analysis of five longitudinal cohort studies. EClinicalMedicine 2024;75. doi: 10.1016/j.eclinm.2024.102767.

      Zhong Y, Lu H, Jiang Y, Rong M, Zhang X, Liabsuetrakul T. Effect of homemade peanut oil consumption during pregnancy on low birth weight and preterm birth outcomes: a cohort study in Southwestern China. Glob Health Action. 2024 Dec 31;17(1):2336312.

      Aristizábal LYG, Rocha PRH, Confortin SC, et al. Association between neonatal near miss and infant development: the Ribeirão Preto and São Luís birth cohorts (BRISA). BMC Pediatr. 2023;23(1):125. Published 2023 Mar 18. doi:10.1186/s12887-023-03897-3

      Al-Haddad BJS, Jacobsson B, Chabra S, et al. Long-term risk of neuropsychiatric disease after exposure to infection in utero. JAMA Psychiatry. 2019;76(6):594-602. doi:10.1001/jamapsychiatry.2019.0029

      Chan, A.Y.L., Gao, L., Hsieh, M.HC. et al. Maternal diabetes and risk of attention-deficit/hyperactivity disorder in offspring in a multinational cohort of 3.6 million mother–child pairs. Nat Med 30, 1416–1423 (2024).

      Hernan MA, Robins JM (2020). Causal Inference: What If. Boca Raton: Chapman & Hall/CRC.

      Greenland S; Pearl J; Robins JM. Confounding and collapsibility in causal inference. Statist Sci. 14 (1) 29 - 46 1999. https://doi.org/10.1214/ss/1009211805

    1. eLife Assessment

      This study investigates the role of the Cadherin Flamingo (Fmi) in cell competition in developing tissues in Drosophila melanogaster. The findings are valuable in that they show that Fmi is required in winning cells in several competitive contexts. The evidence supporting the conclusions is solid, as the authors identify Fmi as a potential new regulator of cell competition, however, they don't delve into a mechanistic understanding of how this occurs.

    2. Reviewer #1 (Public review):

      Summary:

      This paper is focused on the role of Cadherin Flamingo (Fmi) in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that expression activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which make continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact out-competed (PMID: 20679206), which is something to bear in mind. They assess the role of fmi in several kinds of winners, and their data support the conclusion that fmi is required for winner status. However, they make the claim that loss of fmi from Myc winners converts them to losers, and the data supporting this conclusion is not compelling.

      Strengths:

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:<br /> I have read the revised manuscript and have found issues that need to be resolved. The biggest concern is the overstatement of the results that loss of fmi from Myc-overexpressing clones turns them into losers. This is not shown in a compelling manner in the revised manuscript and the authors need to tone down their language or perform more experiments to support their claims. Additionally, the data about apoptosis is not sufficiently explained.

    3. Reviewer #2 (Public review):

      Summary:<br /> In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.<br /> (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a mechanistic understanding of how Fmi regulates cell competition. While induction of apoptosis and JNK activation are commonly observed outcomes in various cell competition conditions, it is crucial to determine the specific mechanisms through which they are induced in fmi-depleted clones. Furthermore, it is recommended that the authors utilize the power of fly genetics to conduct a series of genetic epistasis analyses.

    4. Reviewer #3 (Public review):

      Summary:

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific of Flamingo as it cannot be recapitulated with other components of the PCP pathway, does not rely on interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo do not just suppress the competitive advantage of winner clones, but even turn them in putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long term avenue for therapeutic purpose as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantifications and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provide some hints on a putative mechanism (specifically by comparing its localisation in winner and loser cells).

      Also, on a more interpretative note, the absence of impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      Strengths:

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition

      - One of the rare genetic conditions that affects very specifically winner cells without any impact in losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective on the long term)

      Weaknesses:

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

    5. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Summary: 

      This paper is focused on the role of Cadherin Flamingo (Fmi) - also called Starry night (stan) - in cell competition in developing Drosophila tissues. A primary genetic tool is monitoring tissue overgrowths caused by making clones in the eye disc that express activated Ras (RasV12) and that are depleted for the polarity gene scribble (scrib). The main system that they use is ey-flp, which makes continuous clones in the developing eye-antennal disc beginning at the earliest stages of disc development. It should be noted that RasV12, scrib-i (or lgl-i) clones only lead to tumors/overgrowths when generated by continuous clones, which presumably creates a privileged environment that insulates them from competition. Discrete (hs-flp) RasV12, lgl-i clones are in fact outcompeted (PMID: 20679206), which is something to bear in mind. 

      We think it is unlikely that the outcome of RasV12, scrib (or lgl) competition depends on discrete vs. continuous clones or on creation of a privileged environment. As shown in the same reference mentioned by the reviewer, the outcome of RasV12, scrib (or lgl) tumors greatly depends on the clone being able to grow to a certain size. The authors show instances of discrete clones where larger RasV12, lgl clones outcompete the surrounding tissue and eliminate WT cells by apoptosis, whereas smaller clones behave more like losers. It is not clear what aspect of the environment determines the ability of some clones to grow larger than others, but in neither case are the clones prevented from competition. Other studies show that in mammalian cells, RasV12, scrib clones are capable of outcompeting the surrounding tissue, such as in Kohashi et al (2021), where cells carrying both mutations actively eliminate their neighbors.

      The authors show that clonal loss of Fmi by an allele or by RNAi in the RasV12, scrib-i tumors suppresses their growth in both the eye disc (continuous clones) and wing disc (discrete clones). The authors attributed this result to less killing of WT neighbors when Myc over-expressing clones lacking Fmi, but another interpretation (that Fmi regulates clonal growth) is equally as plausible with the current results. 

      See point (1) for a discussion on this.

      Next, the authors show that scrib-RNAi clones that are normally out-competed by WT cells prior to adult stages are present in higher numbers when WT cells are depleted for Fmi. They then examine death in RasV12, scrib-i ey-FLP clones, or in discrete hsFLP UAS-Myc clones. They state that they see death in WT cells neighboring RasV12, scrib-i clones in the eye disc (Figures 4A-C). Next, they write that RasV12, scrib-I cells become losers (i.e., have apoptosis markers) when Fmi is removed. Neither of these results are quantified and thus are not compelling. They state that a similar result is observed for Myc over-expression clones that lack Fmi, but the image was not compelling, the results are not quantified and the controls are missing (Myc over-expressing clones alone and Fmi clones alone). 

      We assayed apoptosis in UAS-Myc clones in eye discs but neglected to include the results in Figure 4. We include them in the updated manuscript. Regarding Fmi clones alone, we direct the reviewer’s attention to Fig. 2 Supplement 1 where we showed that fminull clones cause no competition. Dcp-1 staining showed low levels of apoptosis unrelated to the fminull clones or twin-spots.

      Regarding the quantification of apoptosis, we did not provide a quantification, in part because we observe a very clear visual difference between groups (Fig. 4A-K), and in part because it is challenging to come up with a rigorous quantification method. For example, how far from a winner clone can an apoptotic cell be and still be considered responsive to the clone? For UASMyc winner clones, we observe a modest amount of cell death both inside and outside the clones, consistent with prior observations. For fminull UAS-Myc clones, we observe vastly more cell death within the fminull UAS-Myc clones and modest death in nearby wildtype cells, and consequently a much higher ratio of cell death inside vs outside the clone. Because of the somewhat arbitrary nature of quantification, and the dramatic difference, we initially chose not to provide a quantification. However, given the request, we chose an arbitrary distance from the clone boundary in which to consider dying cells and counted the numbers for each condition. We view this as a very soft quantification, but we nevertheless report it in a way that captures the phenomenon in the revised manuscript. 

      They then want to test whether Myc over-expressing clones have more proliferation. They show an image of a wing disc that has many small Myc overexpressing clones with and without Fmi. The pHH3 results support their conclusion that Myc overexpressing clones have more pHH3, but I have reservations about the many clones in these panels (Figures 5L-N). 

      As the reviewer’s reservations are not specified, we have no specific response.

      They show that the cell competition roles of Fmi are not shared by another PCP component and are not due to the Cadherin domain of Fmi. The authors appear to interpret their results as Fmi is required for winner status. Overall, some of these results are potentially interesting and at least partially supported by the data, but others are not supported by the data.

      Strengths: 

      Fmi has been studied for its role in planar cell polarity, and its potential role in competition is interesting.

      Weaknesses:

      (1) In the Myc over-expression experiments, the increased size of the Myc clones could be because they divide faster (but don't outcompete WT neighbors). If the authors want to conclude that the bigger size of the Myc clones is due to out-competition of WT neighbors, they should measure cell death across many discs of with these clones. They should also assess if reducing apoptosis (like using one copy of the H99 deficiency that removes hid, rpr, and grim) suppresses winner clone size. If cell death is not addressed experimentally and quantified rigorously, then their results could be explained by faster division of Myc over-expressing clones (and not death of neighbors). This could also apply to the RasV12, scrib-i results.

      Indeed, Myc clones have been shown to divide faster than WT neighbors, but that is not the only reason clones are bigger. As shown in (de la Cova et al, 2004), Myc-overexpressing cells induce apoptosis in WT neighbors, and blocking this apoptosis results in larger wings due to increased presence of WT cells. Also, (Moreno and Basler, 2004) showed that Myc-overexpressing clones cause a reduction in WT clone size, as WT twin spots adjacent to 4xMyc clones are significantly smaller than WT twin spots adjacent to WT clones. In the same work, they show complete elimination of WT clones generated in a tub-Myc background. Since then, multiple papers have shown these same results. It is well established then that increased cell proliferation transforms Myc clones into supercompetitors and that in the absence of cell competition, Myc-overexpressing discs produce instead wings larger than usual. 

      In (de la Cova et al, 2004) the authors already showed that blocking apoptosis with H99 hinders competition and causes wings with Myc clones to be larger than those where apoptosis wasn’t blocked. As these results are well established from prior literature, there is no need to repeat them here. 

      (2) This same comment about Fmi affecting clone growth should be considered in the scrib RNAi clones in Figure 3.

      In later stages, scrib RNAi clones in the eye are eliminated by WT cells. While scrib RNAi clones are not substantially smaller in third instar when competing against fmi cells (Fig 3M), by adulthood we see that WT clones lacking Fmi have failed to remove scrib clones, unlike WT clones that have completely eliminated the scrib RNAi clones by this time. We therefore disagree that the only effect of Fmi could be related to rate of cell division. 

      (3) I don't understand why the quantifications of clone areas in Figures 2D, 2H, 6D are log values. The simple ratio of GFP/RFP should be shown. Additionally, in some of the samples (e.g., fmiE59 >> Myc, only 5 discs and fmiE59 vs >Myc only 4 discs are quantified but other samples have more than 10 discs). I suggest that the authors increase the number of discs that they count in each genotype to at least 20 and then standardize this number.

      Log(ratio) values are easier to interpret than a linear scale. If represented linearly, 1 means equal ratios of A and B, while 2A/B is 2 and A/2B is 0.5. And the higher the ratio difference between A and B, the starker this effect becomes, making a linear scale deceiving to the eye, especially when decreased ratios are shown. Using log(ratios), a value of 0 means equal ratios, and increased and decreased ratios deviate equally from 0.

      Statistically, either analyzing a standardized number of discs for all conditions or a variable number not determined beforehand has no effect on the p-value, as long as the variable n number is not manipulated by p-hacking techniques, such as increasing the n of samples until a significant p-value has been obtained. While some of our groups have lower numbers, all statistical analyses were performed after all samples were collected. For all results obtained by cell counts, all samples had a minimum of 10 discs due to the inherent though modest variability of our automated cell counts, and we analyzed all the discs that we obtained from a given experiment, never “cherry-picking” examples. For the sake of transparency, all our graphs show individual values in addition to the distributions so that the reader knows the n values at a glance.

      (5) Figure 4 - shows examples of cell death. Cas3 is written on the figure but Dcp-1 is written in the results. Which antibody was used? The authors need to quantify these results. They also need to show that the death of cells is part of the phenotype, like an H99 deficiency, etc (see above).

      Thank you for flagging this error. We used cleaved Dcp-1 staining to detect cell death, not Cas3 (Drice in Drosophila). We updated all panels replacing Cas3 by Dcp-1. 

      As described above, cell death is a well established consequence of myc overexpression induced cell death and we feel there is no need to repeat that result. To what extent loss of Fmi induces excess cell death or reduces proliferation in “would-be” winners, and to what extent it reduces “would-be” winners’ ability to eliminate competitors are interesting mechanistic questions that are beyond the scope of the current manuscript.

      (6) It is well established that clones overexpressing Myc have increased cell death. The authors should consider this when interpreting their results.

      We are aware that Myc-overexpressing clones have increased cell death, but it has also been demonstrated that despite that fact, they behave as winners and eliminate WT neighboring cells. And as mentioned in comment (1), WT clones generated in a 3x and 4x Myc background are eliminated and removed from the tissue, and blocking cell death increases the size of WT “losers” clones adjacent to Myc overexpressing clones. 

      (7) A better characterization of discrete Fmi clones would also be helpful. I suggest inducing hs-flp clones in the eye or wing disc and then determining clone size vs twin spot size and also examining cell death etc. If such experiments have already been done and published, the authors should include a description of such work in the preprint.

      We have already analyzed the size of discrete Fmi clones and showed that they did not cause any competition, with fmi-null clones having the same size as WT clones in both eye and wing discs. We direct the reviewer’s attention to Figure 2 Supplement 1.

      (8) We need more information about the expression pattern of Fmi. Is it expressed in all cells in imaginal discs? Are there any patterns of expression during larval and pupal development? 

      Fmi is equally expressed by all cells in all imaginal discs in Drosophila larva and pupa. We include this information and the relevant reference (Brown et al, 2014) in the updated manuscript.

      (9) Overall, the paper is written for specialists who work in cell competition and is fairly difficult to follow, and I suggest re-writing the results to make it accessible to a broader audience.

      We have endeavored to both provide an accessible narrative and also describe in sufficient detail the data from multiple models of competition and complex genetic systems. We hope that most readers will be able, at a minimum, to follow our interpretations and the key takeaways, while those wishing to examine the nuts and bolts of the argument will find what they need presented as simply as possible.

      Reviewer 2:

      Summary: 

      In this manuscript, Bosch et al. reveal Flamingo (Fmi), a planar cell polarity (PCP) protein, is essential for maintaining 'winner' cells in cell competition, using Drosophila imaginal epithelia as a model. They argue that tumor growth induced by scrib-RNAi and RasV12 competition is slowed by Fmi depletion. This effect is unique to Fmi, not seen with other PCP proteins. Additional cell competition models are applied to further confirm Fmi's role in 'winner' cells. The authors also show that Fmi's role in cell competition is separate from its function in PCP formation.

      We would like to thank the reviewer for their thoughtful and positive review.

      Strengths:

      (1) The identification of Fmi as a potential regulator of cell competition under various conditions is interesting.

      (2) The authors demonstrate that the involvement of Fmi in cell competition is distinct from its role in planar cell polarity (PCP) development.

      Weaknesses:

      (1) The authors provide a superficial description of the related phenotypes, lacking a comprehensive mechanistic understanding. Induction of apoptosis and JNK activation are general outcomes, but it is important to determine how they are specifically induced in Fmi-depleted clones. The authors should take advantage of the power of fly genetics and conduct a series of genetic epistasis analyses.

      We appreciate that this manuscript does not address the mechanism by which Fmi participates in cell competition. Our intent here is to demonstrate that Fmi is a key contributor to competition. We indeed aim to delve into mechanism, are currently directing our efforts to exploring how Fmi regulates competition, but the size of the project and required experiments are outside of the scope of this manuscript. We feel that our current findings are sufficiently valuable to merit sharing while we continue to investigate the mechanism linking Fmi to competition. 

      (2) The depletion of Fmi may not have had a significant impact on cell competition; instead, it is more likely to have solely facilitated the induction of apoptosis.

      We respectfully disagree for several reasons. First, loss of Fmi is specific to winners; loss of Fmi has no effect on its own or in losers when confronting winners in competition. And in the Ras V12 tumor model, loss of Fmi did not perturb whole eye tumors – it only impaired tumor growth when tumors were confronted with competitors. We agree that induction of apoptosis is affected, but so too is proliferation, and only when in winners in competition.

      (3) To make a solid conclusion for Figure 1, the authors should investigate whether complete removal of Fmi by a mutant allele affects tumor growth induced by expressing RasV12 and scrib RNAi throughout the eye.

      We agree with the reviewer that this is a worthwhile experiment, given that RNAi has its limitations. However, as fmi is homozygous lethal at the embryo stage, one cannot create whole disc tumors mutant for fmi. As an approximation to this condition, we have introduced the GMR-Hid, cell-lethal combination to eliminate non-tumor tissue in the eye disc. Following elimination of non-tumor cells, there remains essentially a whole disc harboring fminull tumor. Indeed, this shows that whole fminull tumors overgrow similar to control tumors, confirming that the lack of Fmi only affects clonal tumors. We provide those results in the updated manuscript (Figure 1 Suppl 2 C-D).

      (4) The authors should test whether the expression level of Fmi (both mRNA and protein) changes during tumorigenesis and cell competition.

      This is an intriguing point that we considered worthwhile to examine. We performed immunostaining for Fmi in clones to determine whether its levels change during competition. Fmi is expressed ubiquitously at apical plasma membranes throughout the disc, and this was unchanged by competition, including inside >>Myc clones and at the clone boundary, where competition is actively happening. We provide these results as a new supplementary figure (Figure 5 Suppl 1) in the updated manuscript.

      Reviewer 3:

      Summary: 

      In this manuscript, Bosch and colleagues describe an unexpected function of Flamingo, a core component of the planar cell polarity pathway, in cell competition in the Drosophila wing and eye disc. While Flamingo depletion has no impact on tumour growth (upon induction of Ras and depletion of Scribble throughout the eye disc), and no impact when depleted in WT cells, it specifically tunes down winner clone expansion in various genetic contexts, including the overexpression of Myc, the combination of Scribble depletion with activation of Ras in clones or the early clonal depletion of Scribble in eye disc. Flamingo depletion reduces the proliferation rate and increases the rate of apoptosis in the winner clones, hence reducing their competitiveness up to forcing their full elimination (hence becoming now "loser"). This function of Flamingo in cell competition is specific to Flamingo as it cannot be recapitulated with other components of the PCP pathway, and does not rely on the interaction of Flamingo in trans, nor on the presence of its cadherin domain. Thus, this function is likely to rely on a non-canonical function of Flamingo which may rely on downstream GPCR signaling.

      This unexpected function of Flamingo is by itself very interesting. In the framework of cell competition, these results are also important as they describe, to my knowledge, one of the only genetic conditions that specifically affect the winner cells without any impact when depleted in the loser cells. Moreover, Flamingo does not just suppress the competitive advantage of winner clones, but even turns them into putative losers. This specificity, while not clearly understood at this stage, opens a lot of exciting mechanistic questions, but also a very interesting long-term avenue for therapeutic purposes as targeting Flamingo should then affect very specifically the putative winner/oncogenic clones without any impact in WT cells.

      The data and the demonstration are very clean and compelling, with all the appropriate controls, proper quantification, and backed-up by observations in various tissues and genetic backgrounds. I don't see any weakness in the demonstration and all the points raised and claimed by the authors are all very well substantiated by the data. As such, I don't have any suggestions to reinforce the demonstration.

      While not necessary for the demonstration, documenting the subcellular localisation and levels of Flamingo in these different competition scenarios may have been relevant and provided some hints on the putative mechanism (specifically by comparing its localisation in winner and loser cells). 

      Also, on a more interpretative note, the absence of the impact of Flamingo depletion on JNK activation does not exclude some interesting genetic interactions. JNK output can be very contextual (for instance depending on Hippo pathway status), and it would be interesting in the future to check if Flamingo depletion could somehow alter the effect of JNK in the winner cells and promote downstream activation of apoptosis (which might normally be suppressed). It would be interesting to check if Flamingo depletion could have an impact in other contexts involving JNK activation or upon mild activation of JNK in clones.

      We would like to thank the reviewer for their thorough and positive review.

      Strengths: 

      - A clean and compelling demonstration of the function of Flamingo in winner cells during cell competition.

      - One of the rare genetic conditions that affects very specifically winner cells without any impact on losers, and then can completely switch the outcome of competition (which opens an interesting therapeutic perspective in the long term)

      Weaknesses: 

      - The mechanistic understanding obviously remains quite limited at this stage especially since the signaling does not go through the PCP pathway.

      Reviewer 2 made the same comment in their weakness (1), and we refer to that response. In future work, we are excited to better understand the pathways linking Fmi and competition.

    1. Author response:

      Reviewer #2 (Public Review):

      M. El Amri et al., investigated the functions of Marcks and Marcks like 1 during spinal cord (SC) development and regeneration in Xenopus laevis. The authors rigorously performed loss of function with morpholino knock-down and CRISPR knock-out combining rescue experiments in developing spinal cord in embryo and regeneration in tadpole stage.

      For the assays in the developing spinal cord, a unilateral approach (knock-down/out only one side of the embryo) allowed the authors to assess the gene functions by direct comparing one-side (e.g. mutated SC) to the other (e.g. wild type SC on the other side). For the assays in regenerating SC, the authors microinject CRISPR reagents into 1-cell stage embryo. When the embryo (F0 crispants) grew up to tadpole (stage 50), the SC was transected. They then assessed neurite outgrowth and progenitor cell proliferation. The validation of the phenotypes was mostly based on the quantification of immunostaining images (neurite outgrowth: acetylated tubulin, neural progenitor: sox2, sox3, proliferation: EdU, PH3), that are simple but robust enough to support their conclusions. In both SC development and regeneration, the authors found that Marcks and Marcksl1 were necessary for neurite outgrowth and neural progenitor cell proliferation.

      The authors performed rescue experiments on morpholino knock-down and CRISPR knock-out conditions by Marcks and Marcksl1 mRNA injection for SC development and pharmacological treatments for SC development and regeneration. The unilateral mRNA injection rescued the loss-of-function phenotype in the developing SC. To explore the signalling role of these molecules, they rescued the loss-of-function animals by pharmacological reagents They used S1P: PLD activator, FIPI: PLD inhibitor, NMI: PIP2 synthesis activator and ISA-2011B: PIP2 synthesis inhibitor. The authors found the activator treatment rescued neurite outgrowth and progenitor cell proliferation in loss of function conditions. From these results, the authors proposed PIP2 and PLD are the mediators of Marcks and Marcksl1 for neurite outgrowth and progenitor cell proliferation during SC development and regeneration. The results of the rescue experiments are particularly important to assess gene functions in loss of function assays, therefore, the conclusions are solid. In addition, they performed gain-of-function assays by unilateral Marcks or Marcksl1 mRNA injection showing that the injected side of the SC had more neurite outgrowth and proliferative progenitors. The conclusions are consistent with the loss-of-function phenotypes and the rescue results. Importantly, the authors showed the linkage of the phenotype and functional recovery by behavioral testing, that clearly showed the crispants with SC injury swam less distance than wild types with SC injury at 10-day post surgery.

      Prior to the functional assays, the authors analyzed the expression pattern of the genes by in situ hybridization and immunostaining in developing embryo and regenerating SC. They confirmed that the amount of protein expression was significantly reduced in the loss of function samples by immunostaining with the specific antibodies that they made for Marcks and Marcksl1. Although the expression patterns are mostly known in previous works during embryo genesis, the data provided appropriate information to readers about the expression and showed efficiency of the knock-out as well.

      MARCKS family genes have been known to be expressed in the nervous system. However, few studies focus on the function in nerves. This research introduced these genes as new players during SC development and regeneration. These findings could attract broader interests from the people in nervous disease model and medical field. Although it is a typical requirement for loss of function assays in Xenopus laevis, I believe that the efficient knock-out for four genes by CRISPR/Cas9 was derived from their dedication of designing, testing and validation of the gRNAs and is exemplary.

      Weaknesses,

      (1) Why did the authors choose Marcks and Marcksl1? The authors mentioned that these genes were identified with a recent proteomic analysis of comparing SC regenerative tadpole and non-regenerative froglet (Line (L) 54-57). However, although it seems the proteomic analysis was their own dataset, the authors did not mention any details to select promising genes for the functional assays (this article). In the proteomic analysis, there must be other candidate genes that might be more likely factors related to SC development and regeneration based on previous studies, but it was unclear what the criteria to select Marcks and Marcksl1 was.

      To highlight the rationale for selecting these proteins, we reworded the sentence as follows: “A recent proteomic screen … after SCI identified a number of proteins that are highly upregulated at the tadpole stage but downregulated in froglets (Kshirsagar, 2020). These proteins included Marcks and Marcksl1, which had previously been implicated in the regeneration of other tissues (El Amri et al., 2018) suggesting a potential role for these proteins also in spinal cord regeneration.”

      (2) Gene knock-out experiments with F0 crispants,

      The authors described that they designed and tested 18 sgRNAs to find the most efficient and consistent gRNA (L191-195). However, it cannot guarantee the same phenotypes practically, due to, for example, different injection timing, different strains of Xenopus laevis, etc. Although the authors mentioned the concerns of mosaicism by themselves (L180-181, L289-292) and immunostaining results nicely showed uniformly reduced Marcks and Marcksl1 expression in the crispants, they did not refer to this issue explicitly.

      To address this issue, we state explicitly in line 208-212: “We also confirmed by immunohistochemistry that co-injection of marcks.L/S and marcksl1.L/S sgRNA, which is predicted to edit all four homeologs (henceforth denoted as 4M CRISPR) drastically reduced immunostaining for Marcks and Marcksl1 protein on the injected side (Fig. S6 B-G), indicating that protein levels are reduced in gene-edited embryos.”

      (3) Limitations of pharmacological compound rescue

      In the methods part, the authors describe that they performed titration experiments for the drugs (L702-704), that is a minimal requirement for this type of assay. However, it is known that a well characterized drug is applied, if it is used in different concentrations, the drug could target different molecules (Gujral TS et al., 2014 PNAS). Therefore, it is difficult to eliminate possibilities of side effects and off targets by testing only a few compounds.

      As explained in the responses to reviewer 1, we have completely rewritten and toned down our presentation of the pharmacological result and explicitly mention in our discussion now the possibility of side effects.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Public Reviews:

      Reviewer #2 (Public review):

      Summary:

      This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroencephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.

      Strengths:

      The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.

      Weaknesses:

      The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. Generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.

      We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.

      We agree that the sample size being smaller than planned due to the pandemic restrictions is a weakness for this study, and hope that future studies into cholinergic effects on motivation in humans will use larger sample sizes. They should also ensure women are not excluded from sample populations, which will become even more important if the research progresses to clinical populations.

      Reviewer #3 (Public review):

      Summary:

      Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.

      Strengths:

      This manuscript addresses an interesting and timely question and does so using an impressive within subject pharmacological design and a task well designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.

      Weaknesses:

      A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to covid). Nonetheless, it is worth stating explicitly that this sample size is relatively small for the effect sizes typically observed in such studies highlighting the need for future confirmatory studies.

      We thank the reviewer for their time and their assessment of this manuscript, and we appreciate their helpful comments on the previous version.

      We agree that the small sample size is a weakness of the study, and hope that future work into cholinergic modulation of motivation can involve larger samples to replicate and extend this work.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      Thank you for addressing my comments and clarifying the analysis sections. Women can be included in such studies by performing a pregnancy test before each test session, but I understand how this could have added to the pandemic limitations. Best of luck with your future work!

      Thank you for your time in reviewing this paper, and your helpful comments.

      Reviewer #3 (Recommendations for the authors):

      The authors have done a great job at addressing my concerns and I think that the manuscript is now very solid. That said, I have one minor concern.

      Thank you for your time in reviewing this paper, and your helpful comments.

      For descriptions of mass univariate analyses and cluster correction, I am still a bit confused on exactly what terms were in the regression. In one place, the authors state:

      On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model 'variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)'.

      I take this to mean that the regression model includes a voltage regressor and a three-way interaction term, along with participant level intercept terms.

      However, elsewhere, the authors state:

      "We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant."

      I take this to mean that the regression model included regressors for incentive, distractorPresent, THP, along with their 2 and 3 way interactions. I think that this seems like the more reasonable model - but I just want to 1) verify that this is what the authors did and 2) encourage them to articulate this more clearly and consistently throughout.

      We apologise for the lack of clarity about the whole-brain regression analyses.

      We used Wilkinson notation for this formula, where ‘A*B’ denotes ‘A + B + A:B’, so all main effects and lower-order interactions terms were included in the regression, as your second interpretation says. The model written out in full would be:

      'variable ~1 + voltage + incentive + distractorPresent + THP + incentive*distractorPresent + incentive*THP + distractorPresent*THP +  incentive*distractorPresent*THP + (1 | participant)'    

      We will clarify this in the Version of Record.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors used a motivated saccade task with distractors to measure response vigor and reaction time (RT) in healthy human males under placebo or muscarinic antagonism. They also simultaneously recorded neural activity using EEG with event-related potential (ERP) focused analyses. This study provides evidence that the muscarinic antagonist Trihexyphenidyl (THP) modulates the motivational effects of reward on both saccade velocity and RT, and also increases the distractibility of participants. The study also examined the correlational relationships between reaction time and vigor and manipulations (THP, incentives) with components of the EEG-derived ERPs. While an interesting correlation structure emerged from the analyses relating the ERP biomarkers to behavior, it is unclear how these potentially epiphenomenal biomarkers relate to relevant underlying neurophysiology.

      Strengths:

      This study is a logical translational extension from preclinical findings of cholinergic modulation of motivation and vigor and the CNV biomarker to a normative human population, utilizing a placebo-controlled, double-blind approach.

      While framed in the context of Parkinson's disease where cholinergic medications can be used, the authors do a good job in the discussion describing the limitations in generalizing their findings obtained in a normative and non-age-matched cohort to an aged PD patient population.

      The exploratory analyses suggest alternative brain targets and/or ERP components that relate to the behavior and manipulations tested. These will need to be further validated in an adequately powered study. Once validated, the most relevant biomarkers could be assessed in a more clinically relevant population.

      Weaknesses:

      The relatively weak correlations between the main experimental outcomes provide unclear insight into the neural mechanisms by which the manipulations lead to behavioral manifestations outside the context of the ERP. It would have been interesting to evaluate how other quantifications of the EEG signal through time-frequency analyses relate to the behavioral outcomes and manipulations.

      The ERP correlations to relevant behavioral outcomes were not consistent across manipulations demonstrating they are not reliable biomarkers to behavior but do suggest that multiple underlying mechanisms can give rise to the same changes in the ERP-based biomarkers and lead to different behavioral outcomes.

      We thank the reviewer for their review and their comments.

      We agree that these ERPs may not be reliable biomarkers yet, given the many-to-one mapping we observed where incentives and THP antagonism both affected the CNV in different ways, and hope that future studies will help clarify the use and limitations of the CNV as a potential biomarker of invigoration.

      Our original hypothesis was specifically about the CNV as an index of preparatory behaviour, but we plan to look at potential changes to frequency characteristics in future work. We have included this in the discussion of future investigations. (page 16, line 428):

      “Future investigations of other aspects of the EEG signals may illuminate us. Such studies could also investigate other potential signals that may be more sensitive to invigoration and/or muscarinic antagonism, including frequency-band power and phase-coherence, or measures of variability in brain signals such as entropy, which may give greater insight into processes affected by these factors.”

      Reviewer #2 (Public Review):

      Summary:

      This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroengephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.

      Strengths:

      The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.

      Weaknesses:

      The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. The generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.

      We thank the reviewer for their review, and their comments.

      We agree that our study was underpowered, not reaching our target of 27 participants due to pandemic restrictions halting our recruitment, and hope that future studies into muscarinic antagonism in motivation will have larger sample sizes, and include male and female participants across a range of ages, to assess generalisability.

      We only included men to prevent the chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”

      While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.

      We have updated the Methods/Drugs section to explain this (page 17, line 494):

      “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”

      And we reference to this in the Methods/Participants section (page 18, line 501):

      “We recruited 27 male participants (see Drugs section above),…”

      We agree that future work is needed to replicate this in different samples, and that this work cannot tell us the mechanism by which the drug is dampening invigoration, but we think that showing these effects do occur and can be linked to anticipatory/preparatory activity rather than overall reward sensitivity is a useful finding.

      Reviewer #3 (Public Review):

      Summary:

      Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.

      Strengths:

      This manuscript addresses an interesting and timely question and does so using an impressive within-subject pharmacological design and a task well-designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.

      Weaknesses:

      In full disclosure, I have previously reviewed this manuscript in another journal and the authors have done a considerable amount of work to address my previous concerns. However, I have a few remaining concerns that affect my interpretation of the current manuscript.

      Some of the EEG signals (figures 4A&C) have profiles that look like they could have ocular, rather than central nervous, origins. Given that this is an eye movement task, it would be useful if the authors could provide some evidence that these signals are truly related to brain activity and not driven by ocular muscles, either in response to explicit motor effects (ie. Blinks) or in preparation for an upcoming saccade.

      We thank the reviewer for re-reviewing the manuscript and for raising this issue.

      All the EEG analyses (both ERP and whole-brain) are analysing the preparation period between the ready-cue and target appearance when no eye-movements are required. We reject trials with blinks or saccades over 1 degree in size, as detected by the Eyelink software according the sensitive velocity and acceleration criteria specified in the manuscript (Methods/Eye-tracking, page 19, line 550). This means that there should be no overt eye movements in the data. However, microsaccades and ocular drift are still possible within this period, which indeed could drive some effects. To measure this, we counted the number of microsaccades (<1 degree in size) in the preparation period between incentive cue and the target onset, for each trial. Further, we measure the mean absolute speed of the eye during the preparation period (excluding the periods during microsaccades) for each trial.

      We have run a control analysis to check whether including ocular drift speed or number of microsaccades as a covariate in the whole-brain regression analysis changes the association between EEG and the behavioural metrics at frontal or other electrodes. Below we show these ‘variable ~ EEG’ beta-coefficients when controlling for each eye-movement covariate, in the same format as Figure 4. We did not run the permutation testing on this due to time/computational costs (it takes >1 week per variable), so p-values were not calculated, only the beta-coefficients. The beta-coefficients are almost unchanged, both in time-course and topography, when controlling for either covariate.  The frontal associations to velocity and distractor pull remain, suggesting they are not due to these eye movements.

      We have added this figure as a supplemental figure.

      For additional clarity in this response, we also plot the differences between these covariate-controlled beta-coefficients, and the true beta-coefficients from figure 4 (please note the y-axis scales are -0.02:0.02, not -0.15:0.15 as in Figure 4 and Figure 4-figure supplement 2). This shows that the changes to the associations between EEG and velocity/distractor-pull were not frontally-distributed, demonstrating eye-movements were not driving these effects. Relatedly, the RT effect’s change was frontally-distributed, despite Figure 4 showing the true relationship was central in focus, again indicating that effect was also not related to these eye movements.

      Author response image 1.

      Difference in beta-coefficients when eye-movement covariates are included. This is the difference from the beta-coefficients shown in Figure 4, please note the smaller y-axis limits.

      The same pattern was seen if we controlled for the change in eye-position from the baseline period (measured by the eye-tracker) at each specific time-point, i.e., controlling for the distance the eye had moved from baseline at the time the EEG voltage is measured. The topographies and time-course plots were almost identical to the above ones:

      Author response image 2.

      Controlling for change in eye-position at each time-point does not change the regression results. Left column shows the beta-coefficients between the variable and EEG voltage, and the right column shows the difference from the main results in Figure 4 (note the smaller y-axis limits for the right-hand column).

      Therefore, we believe the brain-behaviour regressions are independent of eye-movements. We have included the first figure presented here as an additional supplemental figure, and added the following to the text (page 10, line 265):

      “An additional control analysis found that these results were not driven by microsaccades or ocular drift during the preparation period, as including these as trial-wise covariates did not substantially change the beta-coefficients (Figure 4 – Figure Supplement 2).”

      For other EEG signals, in particular, the ones reported in Figure 3, it would be nice to see what the spatial profiles actually look like - does the scalp topography match that expected for the signal of interest?

      Yes, the CNV is a central negative potential peaking around Cz, while the P3a is slightly anterior to this (peaking between Cz and FCz). We have added the topographies to the main figure (see point below).

      This is the topography of the mean CNV (1200:1500ms from the preparation cue onset), which is maximal over Cz, as expected.

      The P3a’s topography (200:280ms after preparation cue) is maximal slightly anterior to Cz, between Cz and FCz.

      A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to COVID). That said, they only report the sample size in one place in the methods rather than through degrees of freedom in their statistical tests conducted throughout the results. In part because of this, I am not totally clear on whether the sample size for each analysis is the same - or whether participants were removed for specific analyses (ie. due to poor EEG recordings, for example).  

      We apologise for the lack of clarity here. All 20 participants were included in all analyses, although the number of trials included differed between behavioural and EEG analyses. We only excluded trials with EEG artefacts from the EEG analyses, not from the purely behavioural analyses such as Figures 1&2, although trials with blinks/saccades were removed from behavioural analyses too. Removing the EEG artefactual trials from the behavioural analyses did not change the findings, despite the lower power. The degrees of freedom in the figure supplement tables are the total number of trials (less 8 fixed-effect terms) included in the single-trial / trial-wise regression analyses we used.

      We have clarified this in the Methods/Analysis (page 20, line 602):

      “Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.”

      And we state the number of participants and trials in the start of the behavioural results (page 3, line 97):

      “We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT.”

      and EEG results section (page 7, line 193):

      “We used single-trial linear mixed-effects regression to see the effects of Incentive and THP on each ERP (20 participants, 16627 trials; Distractor was included too, along with all interactions, and a random intercept by participant).”

      Beyond this point, but still related to the sample size, in some cases I worry that results are driven by a single subject. In particular, the interaction effect observed in Figure 1e seems like it would be highly sensitive to the single subject who shows a reverse incentive effect in the drug condition.

      Repeating that analysis after removing the participant with the large increase in saccadic RT with incentives did not remove the incentive*THP interaction effect – although it did weaken slightly from (β = 0.0218, p = .0002) to  (β=0.0197, p=.0082). This is likely because that while that participant did have slower RTs for higher incentives on THP, they were also slower for higher incentives under placebo (and similarly for distractor present/absent), making them less of an outlier in terms of effects than in raw RT terms. Below is Author response image 3 the mean-figure without that participant, and Author response image 4 that participant shown separately.

      Author response image 3.

      Author response image 4.

      There are not sufficient details on the cluster-based permutation testing to understand what the authors did or whether it is reasonable. What channels were included? What metric was computed per cluster? How was null distribution generated?

      We apologise for not giving sufficient details of this, and have updated the Methods/Analysis section to include these details, along with a brief description in the Results section.

      To clarify here, we adapted the DMGroppe Mass Univariate Testing toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘variable ~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour – i.e. does adding the voltage at this time/channel explain additional variance in the variable not captured in our main behavioural analyses. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution of cluster mass (across times/channels per iteration), and calculated the p-value as the proportion of this distribution further from zero than the absolute true t-statistics (two-tailed test).

      We have given greater detail for this in the Methods/Analysis section (page 20, line 614):

      “We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.”

      And we have added a brief explanation to the Results section also (page 9, line 246):

      “We regressed each electrode and time-point against the three behavioural variables separately, while controlling for effects of incentive, distractor, THP, the interactions of those factors, and a random effect of participant. This analysis therefore asks whether trial-to-trial neural variability predicts behavioural variability. To assess significance, we used cluster-based permutation tests (DMGroppe Mass Univariate toolbox; Groppe, Urbach, & Kutas, 2011), shuffling the trials within each condition and person, and repeating it 2500 times, to build a null distribution of ‘cluster mass’ from the t-statistics (Bullmore et al., 1999; Maris & Oostenveld, 2007) which was used to calculate two-tailed p-values with a family-wise error rate (FWER) of .05 (see Methods/Analysis for details).”

      The authors report that "muscarinic antagonism strengthened the P3a" - but I was unable to see this in the data plots. Perhaps it is because the variability related to individual differences obscures the conditional differences in the plots. In this case, event-related difference signals could be helpful to clarify the results.

      We thank the reviewer for spotting this wording error, this should refer to the incentive effect weakening the P3a, as no other significant effects were found on the P3a, as stated correctly in the previous paragraph. We have corrected this in the manuscript (page 9, line 232):

      “This suggests that while incentives strengthened the incentive-cue response and the CNV and weakened the P3a, muscarinic antagonism strengthened the CNV,”

      The reviewer’s suggestion for difference plots is very valuable, and we have added these to Figure 3, as well as increasing the y-axis scale for figure 3c to show the incentives weakening the P3a more clearly, and adding the topographies suggested in an earlier comment. The difference waves for Incentive and THP effects show that both are decreasing voltage, albeit with slightly different onset times – Incentive starts earlier, thus weakening the positive P3a, while both strengthen the negative CNV. The Incentive effects within THP and Placebo separately illustrate the THP*Incentive interaction.

      We have amended the Results text and figure (page 7, line 200):

      “The subsequent CNV was strengthened (i.e. more negative; Figure 3d) by incentive (β = -.0928, p < .0001) and THP (β = -0.0502, p < .0001), with an interaction whereby THP decreased the incentive effect (β= 0.0172, p = .0213). Figure 3h shows the effects of Incentive and THP on the CNV separately, using difference waves, and Figure 3i shows the incentive effect grows more slowly in the THP condition than the Placebo condition.

      For mediation analyses, it would be useful in the results section to have a much more detailed description of the regression results, rather than just reporting things in a binary did/did not mediate sort of way. Furthermore, the methods should also describe how mediation was tested statistically (ie. What is the null distribution that the difference in coefficients with/without moderator is tested against?).

      We have added a more detailed explanation of how we investigated mediation and mediated moderation, and now report the mediation effects for all tests run and the permutation-test p-values.

      We had been using the Baron & Kenny (1986) method, based on 4 tests outlined in the updated text below, which gives a single measure of change in absolute beta-coefficients when all the tests have been met, but without any indication of significance; any reduction found after meeting the other 3 tests indicates a partial mediation under this method. We now use permutation testing to generate a p-value for the likelihood of finding an equal or larger reduction in the absolute beta-coefficients if the CNV were not truly related to RT. This found that the CNV’s mediation of the Incentive effect on RT was highly significant, while the Mediated Moderation of CNV on THP*Incentive was weakly significant.

      During this re-analysis, we noticed that we had different trial-numbers in the different regression models, as EEG-artefactual trials were not excluded from the behavioural-only model (‘RT ~ 1 + Incentive’). However, this causes issues with the permutation testing as we are shuffling the ERPs and need the same trials included in all the mixed-effects models. Therefore, we have redone these mediation analyses, including only the trials with valid ERP measures (i.e. no artefactual trials) in all models. This has changed the beta-coefficients we report, but not the findings or conclusions of the mediation analyses. We have updated the figure to have these new statistics.

      We have updated the text to explain the methodology in the Results section (page 12, line 284):

      “We have found that neural preparatory activity can predict residual velocity and RT, and is also affected by incentives and THP. Finally, we ask whether the neural activity can explain the effects of incentives and THP, through mediation analyses. We used the Baron & Kenny ( 1986) method to assess mediation (see Methods/Analysis for full details). This tests whether the significant Incentive effect on behaviour could be partially reduced (i.e., explained) by including the CNV as a mediator in a mixed-effects single-trial regression. We measured mediation as the reduction in (absolute) beta-coefficient for the incentive effect on behaviour when the CNV was included as a mediator (i.e., RT ~ 1 + Incentive + CNV + Incentive*CNV + (1 | participant)). This is a directional hypothesis of a reduced effect, and to assess significance we ran a permutation-test, shuffling the CNV within participants, and measuring the change in absolute beta-coefficient for the Incentive effect on behaviour. This generates a distribution of mediation effects where there is no relationship between CNV and RT on a trial (i.e., a null distribution). We ran 2500 permutations, and calculated the proportion with an equal or more negative change in absolute beta-coefficient, equivalent to a one-tailed test. We ran this mediation analysis separately for the two behavioural variables of RT and residual velocity, but not for distractor pull as it was not affected by incentive, so failed the assumptions of mediation analyses (Baron & Kenny, 1986; Muller et al., 2005). We took the mean CNV amplitude from 1200:1500ms as our Mediator.

      Residual velocity passed all the assumption tests for Mediation analysis, but no significant mediation was found. That is, Incentive predicted velocity (β=0.1304, t(1,16476)=17.3280, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted velocity when included alongside Incentive (β=0.0015, t(1,16475)=1.9753, p=.0483). However, including CNV did not reduce the Incentive effect on velocity, and in fact strengthened it (β=0.1318, t(1,16475)=17.4380, p<.0001; change in absolute coefficient: Δβ=+0.0014). Since there was no mediation (reduction), we did not run permutation tests on this.

      However, RT did show a significant mediation of the Incentive effect by CNV: Incentive predicted RT (β=-0.0868, t(1,16476)=-14.9330, p<.0001); Incentive predicted CNV (β=-0.9122, t(1,16476)=-12.1800, p<.0001); and CNV predicted RT when included alongside Incentive (β=0.0127, t(1,16475)=21.3160, p<.0001). The CNV mediated the effect of Incentive on RT, reducing the absolute beta-coefficient (β=-0.0752, t(1,16475)=-13.0570, p<.0001; change in absolute coefficient: Δβ= -0.0116). We assessed the significance of this change via permutation testing, shuffling the CNV across trials (within participants) and calculating the change in absolute beta-coefficient for the Incentive effect on RT when the permuted CNV was included as a mediator. We repeated this 2500 times to build a null distribution of Δβ, and calculated the proportion with equal or stronger reductions for a one-tailed p-value, which was highly significant (p<.0001). This suggests that the Incentive effect on RT is partially mediated by the CNV’s amplitude during the preparation period, and this is not the case for residual velocity.

      We also investigated whether the CNV could explain the cholinergic reduction in motivation (THP*Incentive interaction) on RT – i.e., whether CNV mediation the THP moderation. We measured Mediated Moderation as suggested by Muller et al. (2005; see Methods/Analysis for full explanation): Incentive*THP was associated with RT (β=0.0222, t(1,16474)=3.8272, p=.0001); and Incentive*THP was associated with CNV (β=0.1619, t(1,16474)=2.1671, p=.0302); and CNV*THP was associated with RT (β=0.0014, t(1,16472)=2.4061, p=.0161). Mediated Moderation was measured by the change in absolute Incentive*THP effect when THP*CNV was included in the mixed-effects model (β=0.0214, t(1,16472)=3.7298, p=.0002; change in beta-coefficient: Δβ= -0.0008), and permutation-testing (permuting the CNV as above) found a significant effect (p=.0132). This indicates cholinergic blockade changes how incentives affect preparatory negativity, and how this negativity reflects RT, which can explain some of the reduced invigoration of RT. However, this was not observed for saccade velocity.

      And we have updated the Methods/Analysis section with a more detailed explanation too (page 21, line 627):

      “For the mediation analysis, we followed the 4-step process  (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):

      (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))

      (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))

      (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))

      (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)

      The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or smaller than the true values (as Mediation is a one-tailed prediction).

      Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):

      (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)

      Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or smaller than the true change.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) The analysis section could benefit from greater detail. For example, how exactly did they assess that the effects of the drug on peak velocity and RT were driven by non-distracting trials? Ideally, for every outcome, the analysis approach used should be detailed and justified.

      We apologise for the confusion from this. To clarify, we found a 2-way regression (incentive*THP) on both residual velocity and saccadic RT and this pattern was stronger in distractor-absent trials for residual velocity, and stronger in distractor-present trials for saccadic RT, as can be seen in Figure 1d&e. However, as there was no significant 3-way interaction (incentive*THP*distractor) for either metric, and the 2-way interaction effects were in the same direction in distractor present/absent trials for both metrics, we think these effects were relatively unaffected by distractor presence.

      We have updated the Results section to make this clearer: (page 3, line 94):

      We measured vigour as the residual peak velocity of saccades within each drug session (see Figure 1c & Methods/Eye-tracking), which is each trial’s deviation of velocity from the main sequence. This removes any overall effects of the drug on saccade velocity, while still allowing incentives and distractors to have different effects within each drug condition. We used single-trial mixed-effects linear regression (20 participants, 18585 trials in total) to assess the effects of Incentive, Distractors, and THP, along with all the interactions of these (and a random-intercept per participant), on residual velocity and saccadic RT. As predicted, residual peak velocity was increased by incentives (Figure 1d; β = 0.1266, p < .0001), while distractors slightly slowed residual velocity (β = -0.0158, p = .0294; see Figure 1 – Figure supplement 1 for full behavioural statistics). THP decreased the effect of incentives on velocity (incentive * THP: β = -0.0216, p = .0030), indicating that muscarinic blockade diminished motivation by incentives. Figure 1d shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was absent; the 3-way (distractor*incentive*THP) interaction was not significant (p > .05), suggesting that the distractor-present trials had the same effect but weaker (Figure 1d).

      Saccadic RT (time to initiation of saccade) was slower when participants were given THP (β = 0.0244, p = < .0001), faster with incentives (Figure 1e; β = -0.0767, p < .0001), and slowed by distractors (β = 0.0358, p < .0001). Again, THP reduced the effects of incentives (incentive*THP: β = 0.0218, p = .0002). Figure 1e shows that this effect was similar in distractor absent/present trials, although slightly stronger when the distractor was present; as the 3-way (distractor*incentive*THP) interaction was not significant and the direction of effects was the same in the two, it suggests the effect was similar in both conditions. Additionally, the THP*Incentive interactions were correlated between saccadic RT and residual velocity at the participant level (Figure 1 – Figure supplement 2).

      We have given more details of the analyses performed in the Methods section and the results, as requested by you and the other reviewers (page 20, line 602):

      Behavioural and EEG analysis included all 20 participants, although trials with EEG artefacts were included in the behavioural analyses (18585 trials in total) and not the EEG analyses (16627 trials in total), to increase power in the former. Removing these trials did not change the findings of the behavioural analyses.

      We used single-trial linear-mixed effects models to analyse our data, including participant as a random effect of intercept, with the formula ‘~1 + incentive*distractor*THP + (1 | participant)’. We z-scored all factors to give standardised beta coefficients.

      For the difference-wave cluster-based permutation tests (Figure 3 – Figure supplement 4), we used the DMGroppe Mass Univariate toolbox (Groppe et al., 2011), with 2500 permutations, to control the family-wise error rate at 0.05. This was used for looking at difference waves to test the effects of incentive, THP, and the incentive*THP interaction (using difference of difference-waves), across all EEG electrodes.

      We adapted this toolbox to also run cluster-based permutation regressions to examine the relationship between the behavioural variables and the voltages at all EEG electrodes at each time point. On each iteration we shuffled the voltages across trials within each condition and person, and regressed it against the behavioural variable, with the model ‘~1 + voltage + incentive*distractorPresent*THP + (1 | participant)’. The Voltage term measured the association between voltage and the behavioural variable, after controlling for effects of incentive*distractor*THP on behaviour. By shuffling the voltages, we removed the relationship to the behavioural variable, to build the null distribution of t-statistics across electrodes and time-samples. We used the ‘cluster mass’ method (Bullmore et al., 1999; Groppe et al., 2011; Maris & Oostenveld, 2007) to build the null distribution, and calculated the p-value as the proportion of this distribution further from zero than the true t-statistics (two-tailed test). Given the relatively small sample size here, these whole-brain analyses should not be taken as definitive.

      For the mediation analysis, we followed the 4-step process  (Baron & Kenny, 1986; Muller et al., 2005), which requires 4 tests be met for the outcome (behavioural variable, e.g. RT), mediator (ERP, e.g., CNV) and the treatment (Incentive):

      (1) Outcome is significantly associated with the Treatment (RT ~ 1 + Incentive + (1 | participant))

      (2) Mediator is significantly associated with the Treatment (ERP ~ 1 + Incentive + (1 | participant))

      (3) Mediator is significantly associated with the Outcome (RT ~ 1 + Incentive + ERP + (1 | participant))

      (4) And the inclusion of the Mediator reduces the association between the Treatment and Outcome (Incentive effect from model #3)

      The mediation was measured by the reduction in the absolute standardised beta coefficient between incentive and behaviour when the ERP mediator was included (model #3 vs model #1 above). We used permutation-testing to quantify the likelihood of finding these mediations under the null hypothesis, achieved by shuffling the ERP across trials (within each participant) to remove any link between the ERP and behaviour. We repeated this 2500 times to build a null distribution of the change in absolute beta-coefficients for the RT ~ Incentive effect when this permuted mediator was included (model #3 vs model #1). We calculated a one-tailed p-value by finding the proportion of the null distribution that was equal or more negative than the true value (as Mediation is a one-tailed prediction). For this mediation analysis, we only included trials with valid ERP measures, even for the models without the ERP included (e.g., model #1), to keep the trial-numbers and degrees of freedom the same.

      Mediated moderation (Muller et al., 2005) was used to see whether the effect of THP (the Moderator) on behaviour is mediated by the ERP, with the following tests (after the previous Mediation tests were already satisfied):

      (5) THP moderates the Incentive effect, via a significant Treatment*Moderator interaction on the Outcome (RT ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (6) THP moderates the Incentive effect on the Mediator, via a Treatment*Moderator interaction on the Outcome (ERP ~ 1 + Incentive + THP + Incentive*THP + (1 | participant))

      (7) THP’s moderation of the Incentive effect is mediated by the ERP, via a reduction in the association of Treatment*Moderator on the Outcome when the Treatment*Moderator interaction is included (RT ~ 1 + Incentive + THP + Incentive*THP + ERP + ERP*THP + (1 | participant)

      Mediated moderation is measured as the reduction in absolute beta-coefficients for ‘RT ~ Incentive*THP’ between model #5 and #7, which captures how much of this interaction could be explained by including the Mediator*Moderator interaction (ERP*THP in model #7). We tested the significance of this with permutation testing as above, permuting the ERP across trials (within participants) 2500 times, and building a null distribution of the change in the absolute beta-coefficients for RT ~ Incentive*THP between models #7 and #5. We calculated a one-tailed p-value from the proportion of these that were equal or more negative than the true change.

      (2) Please explain why only men were included in this study. We are all hoping that men-only research is a practice of the past.

      We only included men to prevent any chance of administering the drug to someone pregnant. Trihexyphenidyl is categorized by the FDA as a Pregnancy Category Class C drug, and the ‘Summary of Product Characteristics’ states: “There is inadequate information regarding the use of trihexyphenidyl in pregnancy. Animal studies are insufficient with regard to effects on pregnancy, embryonal/foetal development, parturition and postnatal development. The potential risk for humans is unknown. Trihexyphenidyl should not be used during pregnancy unless clearly necessary.”

      While the drug can be prescribed where benefits may outweigh this risk, as there were no benefits to participants in this study, we only recruited men to keep the risk at zero.

      We have updated the Methods/Drugs section to explain this (page 17, line 494):

      “The risks of Trihexyphenidyl in pregnancy are unknown, but the Summary Product of Characteristics states that it “should not be used during pregnancy unless clearly necessary”. As this was a basic research study with no immediate clinical applications, there was no justification for any risk of administering the drug during pregnancy, so we only recruited male participants to keep this risk at zero.”

      And we have referenced this in the Methods/Participants section (page 18, line 501):

      “Our sample size calculations suggested 27 participants would detect a 0.5 effect size with .05 sensitivity and .8 power. We recruited 27 male participants (see Drugs section above)”

      (3) Please explain acronyms (eg EEG) when first used.

      Thank you for pointing this out, we have explained EEG at first use in the abstract and the main text, along with FWER, M1r, and ERP which had also been missed at first use.

      Reviewer #3 (Recommendations For The Authors):

      The authors say: "Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and increased the pull of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity." But I found this statement to be misleading since the primary effects of the drug seem to have been to decrease the frequency of distractor-repulsed saccades... so "decreased push" would probably be a better analogy than "increased pull".

      Thank you for noticing this, we agree, and have changed this to (page 5, line 165):

      “Therefore, acetylcholine antagonism reduced the invigoration of saccades by incentives, and decreased the repulsion of salient distractors. We next asked whether these effects were coupled with changes in preparatory neural activity.”

      I don't see anything in EEG preprocessing about channel rejection and interpolation. Were these steps performed? There are very few results related to the full set of electrodes.

      We did not reject or interpolate any channels, as visual inspection found no obvious outliers in terms of noisiness, and no channels had standard deviations (across time/trials) higher than our standard cutoff (of 80). The artefact rejection was applied across all EEG channels, so any trials with absolute voltages over 200uV in any channel were removed from the analysis. On average 104/120 trials were included (having passed this check, along with eye-movement artefact checks) per condition per person, and we have added the range of these, along with totals across conditions to the Analysis section and a statement about channel rejection/interpolation (page 20, line 588):

      “Epochs were from -200:1500ms around the preparation cue onset, and were baselined to the 100ms before the preparation cue appeared. Visual inspection found no channels with outlying variance, so no channel rejection or interpolation was performed. We rejected trials from the EEG analyses where participants blinked or made saccades (according to EyeLink criteria above) during the epoch, or where EEG voltage in any channel was outside -200:200μV (muscle activity). On average 104/120 trials per condition per person were included (SD = 21, range = 21-120), and 831/960 trials in total per person (SD=160, range=313-954). A repeated-measures ANOVA found there were no significant differences in number of trials excluded for any condition (p > .2).”

    2. eLife Assessment

      The authors have reported an important study in which they use a double-blind design to explore pharmacological manipulations in the context of a behavioral task. While the sample size is small, the use of varied methodology, including electrophysiology, behavior, and pharmacology, makes this manuscript particularly notable. Overall, the findings are solid and motivate future explanations into the relationships between acetylcholine and motivation.

    3. Reviewer #2 (Public review):

      Summary:

      This work by Grogan and colleagues aimed to translate animal studies showing that acetylcholine plays a role in motivation by modulating the effects of dopamine on motivation. They tested this hypothesis with a placebo-controlled pharmacological study administering a muscarinic antagonist (trihexyphenidyl; THP) to a sample of 20 adult men performing an incentivized saccade task while undergoing electroencephalography (EEG). They found that reward increased vigor and reduced reaction times (RTs) and, importantly, these reward effects were attenuated by trihexyphenidyl. High incentives increased preparatory EEG activity (contingent negative variation), and though THP also increased preparatory activity, it also reduced this reward effect on RTs.

      Strengths:

      The researchers address a timely and potentially clinically relevant question with a within-subject pharmacological intervention and a strong task design. The results highlight the importance of the interplay between dopamine and other neurotransmitter systems in reward sensitivity and even though no Parkinson's patients were included in this study, the results could have consequences for patients with motivational deficits and apathy if validated in the future.

      Weaknesses:

      The main weakness of the study is the small sample size (N=20) that unfortunately is limited to men only. Generalizability and replicability of the conclusions remain to be assessed in future research with a larger and more diverse sample size and potentially a clinically relevant population. The EEG results do not shape a concrete mechanism of action of the drug on reward sensitivity.

    4. Reviewer #3 (Public review):

      Summary:

      Grogan et al examine a role for muscarinic receptor activation in action vigor in a saccadic system. This work is motivated by a strong literature linking dopamine to vigor, and some animal studies suggesting that ACH might modulate these effects, and is important because patient populations with symptoms related to reduced vigor are prescribed muscarinic antagonists. The authors use a motivated saccade task with distractors to measure the speed and vigor of actions in humans under placebo or muscarinic antagonism. They show that muscarinic antagonism blunts the motivational effects of reward on both saccade velocity and RT, and also modulates the distractibility of participants, in particular by increasing the repulsion of saccades away from distractors. They show that preparatory EEG signals reflect both motivation and drug condition, and make a case that these EEG signals mediate the effects of the drug on behavior.

      Strengths:

      This manuscript addresses an interesting and timely question and does so using an impressive within subject pharmacological design and a task well designed to measure constructs of interest. The authors show clear causal evidence that ACH affects different metrics of saccade generation related to effort expenditure and their modulation by incentive manipulations. The authors link these behavioral effects to motor preparatory signatures, indexed with EEG, that relate to behavioral measures of interest and in at least one case statistically mediate the behavioral effects of ACH antagonism.

      Weaknesses:

      A primary weakness of this paper is the sample size - since only 20 participants completed the study. The authors address the sample size in several places and I completely understand the reason for the reduced sample size (study halt due to covid). Nonetheless, it is worth stating explicitly that this sample size is relatively small for the effect sizes typically observed in such studies highlighting the need for future confirmatory studies.

    1. eLife Assessment

      This useful work reveals differential activity to food and shock outcomes in central amygdala GABAergic neurons. Solid evidence supports claims of unconditioned stimulus activity that changes with learning. However, the evidence regarding claims related to valence or salience signaling in these neurons is inadequate. This work will be of interest to neuroscientists studying sensory processing and learning in the amygdala.

    2. Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.<br /> The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

    3. Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.<br /> (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.<br /> (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.<br /> (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.<br /> (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

    4. Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

    5. Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

    6. Author response:

      Reviewer #1 (Public review):

      From the Reviewing Editor:

      Four reviewers have assessed your manuscript on valence and salience signaling in the central amygdala. There was universal agreement that the question being asked by the experiment is important. There was consensus that the neural population being examined (GABA neurons) was important and the circular shift method for identifying task-responsive neurons was rigorous. Indeed, observing valenced outcome signaling in GABA neurons would considerably increase the role the central amygdala in valence. However, each reviewer brought up significant concerns about the design, analysis and interpretation of the results. Overall, these concerns limit the conclusions that can be drawn from the results. Addressing the concerns (described below) would work towards better answering the question at the outset of the experiment: how does the central amygdala represent salience vs valence.

      A weakness noted by all reviewers was the use of the terms 'valence' and 'salience' as well as the experimental design used to reveal these signals. The two outcomes used emphasized non-overlapping sensory modalities and produced unrelated behavioral responses. Within each modality there are no manipulations that would scale either the value of the valenced outcomes or the intensity of the salient outcomes. While the food outcomes were presented many times (20 times per session over 10 sessions of appetitive conditioning) the shock outcomes were presented many fewer times (10 times in a single session). The large difference in presentations is likely to further distinguish the two outcomes. Collectively, these experimental design decisions meant that any observed differences in central amygdala GABA neuron responding are unlikely to reflect valence, but likely to reflect one or more of the above features.

      We appreciate the reviewers’ comments regarding the experimental design. When assessing fear versus reward, we chose stimuli that elicit known behavioral responses, freezing versus consumption. The use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. For example, sweet or bitter tastes can be used, but even these activate different taste receptors and vary in the duration of the activation of taste-specific signaling (e.g. how long the taste lingers in the mouth). The approach we employed is similar to that of Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) that used water reward and shock to characterize the response profiles of somatostatin neurons of the central amygdala. Similar to what was reported by Yang and colleagues we observed that the majority of CeA GABA neurons responded selectively to one unconditioned stimulus (~52%). We observed that 15% of neurons responded in the same direction, either activated or inhibited, by the food or shock US. These were defined as salience based on the definitions of Lin and Nicolelis, 2008 (doi: 10.1016/j.neuron.2008.04.031) in which basal forebrain neurons responded similarly to reward or punishment irrespective of valence. The designation of valence encoding based opposite responses to the food or shock is straightforward (~10% of cells); however, we agree that the designation of modality-specific encoding neurons as valence encoding is less straightforward.

      A second weakness noted by a majority of reviewers was a lack of cue-responsive unit and a lack of exploration of the diversity of response types, and the relationship cue and outcome firing. The lack of large numbers of neurons increasing firing to one or both cues is particularly surprising given the critical contribution of central amygdala GABA neurons to the acquisition of conditioned fear (which the authors measured) as well as to conditioned orienting (which the authors did not measure). Regression-like analyses would be a straightforward means of identifying neurons varying their firing in accordance with these or other behaviors. It was also noted that appetitive behavior was not measured in a rigorous way. Instead of measuring time near hopper, measures of licking would have been better. Further, measures of orienting behaviors such as startle were missing.

      The authors also missed an opportunity for clustering-like analyses which could have been used to reveal neurons uniquely signaling cues, outcomes or combinations of cues and outcomes. If the authors calcium imaging approach is not able to detect expected central amygdala cue responding, might it be missing other critical aspects of responding?

      As stated in the manuscript, we were surprised by the relatively low number of cue responsive cells; however, when using a less stringent statistical method (Figure 5 - Supplement 2), we observed 13% of neurons responded to the food associated cue and 23% responded to the shock associated cue. The differences are therefore likely a reflection of the rigor of the statistical measure to define the responsive units. The number of CS responsive units is less than reported in the CeAl by Ciocchi et al., 2010 (doi: 10.1038/nature09559 ) who observed 30% activated by the CS and 25% inhibited, but is not that dissimilar from the results of Duvarci et al., 2011 (doi: 10.1523/JNEUROSCI.4985-10.2011 ) who observed 11% activated in the CeAl and 25% inhibited by the CS. These numbers are also consistent with previous single cell calcium imaging of cell types in the CeA. For example, Yang et al., 2023 (doi: 10.1038/s41586-023-05910-2) observed that 13% of somatostatin neurons responded to a reward CS and 8% responded to a shock CS. Yu et al., 2017 (doi: 10.1038/s41593-017-0009-9) observed 26.5% of PKCdelta neurons responded to the shock CS. It should also be noted that our analysis was not restricted to the CeAl. Finally, Food learning was assessed in an operant chamber in freely moving mice with reward pellet delivery. Because liquids were not used for the reward US, licking is not a metric that can be used.

      All reviewers point out that the evidence for salience encoding is even more limited than the evidence for valence. Although the specific concern for each reviewer varied, they all centered on an oversimplistic definition of salience. Salience ought to scale with the absolute value and intensity of the stimulus. Salience cannot simply be responding in the same direction. Further, even though the authors observed subsets of central amygdala neurons increasing or decreasing activity to both outcomes - the outcomes can readily be distinguished based on the temporal profile of responding.

      We thank the reviewers for their comments relating to the definition of salience and valence encoding by central amygdala neurons. We have addressed each of the concerns below.

      Additional concerns are raised by each reviewer. Our consensus is that this study sought to answer an important question - whether central amygdala signal salience or valence in cue-outcome learning. However, the experimental design, analyses, and interpretations do not permit a rigorous and definitive answer to that question. Such an answer would require additional experiments whose designs would address the significant concerns described here. Fully addressing the concerns of each reviewer would result in a re-evaluation of the findings. For example, experimental design better revealing valence and salience, and analyses describing diversity of neuronal responding and relationship to behavior would likely make the results Important or even Fundamental.

      We appreciate the reviewers’ comments and have addressed each concern below.

      Reviewer #2 (Public review):

      In this article, Kong and authors sought to determine the encoding properties of central amygdala (CeA) neurons in response to oppositely valenced stimuli and cues predicting those stimuli. The amygdala and its subregional components have historically been understood to be regions that encode associative information, including valence stimuli. The authors performed calcium imaging of GABA-ergic CeA neurons in freely-moving mice conditioned in Pavlovian appetitive and fear paradigms, and showed that CeA neurons are responsive to both appetitive and aversive unconditioned and conditioned stimuli. They used a variant of a previously published 'circular shifting' technique (Harris, 2021), which allowed them to delineate between excited/non-responsive/inhibited neurons. While there is considerable overlap of CeA neurons responding to both unconditioned stimuli (in this case, food and shock, deemed "salience-encoding" neurons), there are considerably fewer CeA neurons that respond to both conditioned stimuli that predict the food and shock. The authors finally demonstrated that there are no differences in the order of Pavlovian paradigms (fear - shock vs. shock - fear), which is an interesting result, and convincingly presented given their counterbalanced experimental design.

      In total, I find the presented study useful in understanding the dynamics of CeA neurons during a Pavlovian learning paradigm. There are many strengths of this study, including the important question and clear presentation, the circular shifting analysis was convincing to me, and the manuscript was well written. We hope the authors will find our comments constructive if they choose to revise their manuscript.

      While the experiments and data are of value, I do not agree with the authors interpretation of their data, and take issue with the way they used the terms "salience" and "valence" (and would encourage them to check out Namburi et al., NPP, 2016) regarding the operational definitions of salience and valence which differ from my reading of the literature. To be fair, a recent study from another group that reports experiments/findings which are very similar to the ones in the present study (Yang et al., 2023, describing valence coding in the CeA using a similar approach) also uses the terms valence and salience in a rather liberal way that I would also have issues with (see below). Either new experiments or revised claims would be needed here, and more balanced discussion on this topic would be nice to see, and I felt that there were some aspects of novelty in this study that could be better highlighted (see below).

      One noteworthy point of alarm is that it seems as if two data panels including heatmaps are duplicated (perhaps that panel G of Figure 5-figure supplement 2 is a cut and paste error? It is duplicated from panel E and does not match the associated histogram).

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Major concerns:

      (1) The authors wish to make claims about salience and valence. This is my biggest gripe, so I will start here.

      (1a) Valence scales for positive and negative stimuli and as stated in Namburi et al., NPP, 2016 where we operationalize "valence" as having different responses for positive and negative values and no response for stimuli that are not motivational significant (neutral cues that do not predict an outcome). The threshold for claiming salience, which we define as scaling with the absolute value of the stimulus, and not responding to a neutral stimulus (Namburi et al., NPP, 2016; Tye, Neuron, 2018; Li et al., Nature, 2022) would require the lack of response to a neutral cue.

      We appreciate the reviewer’s comment on the definitions of salience and valence and agree that there is not a consistent classification of these response types in the field. As stated above, we used the designation of salience encoding if the cells respond in the same direction to different stimuli regardless of the valence of the stimulus similar to what was described previously (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031). Similar definitions of salience have also been reported elsewhere (for examples see: Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006,  Zhu et al., 2018 doi: 10.1126/science.aat0481, and  Comoli et al., 2003, doi: 10.1038/nn1113P). Per the suggestion of the reviewer, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      Author response image 1.

      (1b) The other major issue is that the authors choose to make claims about the neural responses to the USs rather than the CSs. However, being shocked and receiving sucrose also would have very different sensorimotor representations, and any differences in responses could be attributed to those confounds rather than valence or salience. They could make claims regarding salience or valence with respect to the differences in the CSs but they should restrict analysis to the period prior to the US delivery.

      Perhaps the reviewer missed this, but analysis of valence and salience encoding to the different CSs are presented in Figure 5G, Figure 5 -Supplement 1 C-D, and Figure 5 -Supplement 2 N-O. Analysis of CS responsiveness to CSFood and CSShock were analyzed during the conditioning sessions Figure 3E-F, Figure 4B-C, Figure 5 – Supplement 2J-O and Figure 5 – Supplement 3K-L, and during recall probe tests for both CSFood and CSShock, Figure 5 – Supplement 1C-J.

      (1c) The third obstacle to using the terms "salience" or "valence" is the lack of scaling, which is perhaps a bigger ask. At minimum either the scaling or the neutral cue would be needed to make claims about valence or salience encoding. Perhaps the authors disagree - that is fine. But they should at least acknowledge that there is literature that would say otherwise.

      (1d) In order to make claims about valence, the authors must take into account the sensory confound of the modality of the US (also mentioned in Namburi et al., 2016). The claim that these CeA neurons are indeed valence-encoding (based on their responses to the unconditioned stimuli) is confounded by the fact that the appetitive US (food) is a gustatory stimulus while the aversive US (shock) is a tactile stimulus.

      We provided the same analysis for the US and CS. The US responses were larger and more prevalent, but similar types of encoding were observed for the CS. We agree that the food reward and the shock are very different sensory modalities. As stated above, the use of stimuli of the same modality is unlikely to elicit easily definable fear or reward responses or to be precisely matched for sensory intensity. We agree that the definition of cells that respond to only one stimulus is difficult to define in terms of valence encoding, as opposed to being specific for the sensory modality and without scaling of the stimulus it is difficult to fully address this issue. It should be noted however, that if the cells in the CeA were exclusively tuned to stimuli of different sensory modalities, we would expect to see a similar number of cells responding to the CS tones (auditory) as respond to the food (taste) and shock (somatosensory) but we do not. Of the cells tracked longitudinally 80% responded to the USs, with 65% of cells responding to food (activated or inhibited) and 44% responding to shock (activated or inhibited).

      (2) Much of the central findings in this manuscript have been previously described in the literature. Yang et al., 2023 for instance shows that the CeA encodes salience (as demonstrated by the scaled responses to the increased value of unconditioned stimuli, Figure 1 j-m), and that learning amplifies responsiveness to unconditioned stimuli (Figure 2). It is nice to see a reproduction of the finding that learning amplifies CeA responses, though one study is in SST::Cre and this one in VGAT::cre - perhaps highlighting this difference could maximize the collective utility for the scientific community?

      We agree that the analysis performed here is similar to what was conducted by Yang et al., 2023. With the major difference being the types of neurons sampled. Yang et al., imaged only somatostatin neurons were as we recorded all GABAergic cell types within the CeA. Moreover, because we imaged from 10 mice, we sampled neurons that ostensibly covered the entire dorsal to ventral extent of the CeA (Figure 1 – Supplement 1). Remarkably, we found that the vast majority of CeA neurons (80%) are responsive to food or shock. Within this 80% there are 8 distinct response profiles consistent with the heterogeneity of cell types within the CeA based on connectivity, electrophysiological properties, and gene expression. Moreover, we did not find any spatial distinction between food or shock responsive cells, with the responsive cell types being intermingled throughout the dorsal to ventral axis (Figure 5 – Supplement 3).

      (3) There is at least one instance of copy-paste error in the figures that raised alarm. In the supplementary information (Figure 5- figure supplement 2 E;G), the heat maps for food-responsive neurons and shock-responsive neurons are identical. While this almost certainly is a clerical error, the authors would benefit from carefully reviewing each figure to ensure that no data is incorrectly duplicated.

      We thank the reviewer for catching this error. It has been corrected.

      (4) The authors describe experiments to compare shock and reward learning; however, there are temporal differences in what they compare in Figure 5. The authors compare the 10th day of reward learning with the 1st day of fear conditioning, which effectively represent different points of learning and retrieval. At the end of reward conditioning, animals are utilizing a learned association to the cue, which demonstrates retrieval. On the day of fear conditioning, animals are still learning the cue at the beginning of the session, but they are not necessarily retrieving an association to a learned cue. The authors would benefit from recording at a later timepoint (to be consistent with reward learning- 10 days after fear conditioning), to more accurately compare these two timepoints. Or perhaps, it might be easier to just make the comparison between Day 1 of reward learning and Day 1 of fear learning, since they must already have these data.

      We agree that there are temporal differences between the food and shock US deliveries. This is likely a reflection of the fact that the shock delivery is passive and easily resolved based on the time of the US delivery, whereas the food responses are variable because they are dependent upon the consumption of the sucrose pellet. Because of these differences the kinetics of the responses cannot be accurately compared. This is why we restricted our analysis to whether the cells were food or shock responsive. Aside from reporting the temporal differences in the signals did not draw major conclusions about the differences in kinetics. In our experimental design we counterbalanced the animals that received fear conditioning firs then food conditioning, or food conditioning then fear conditioning to ensure that order effects did not influence the outcome of the study. It is widely known that Pavlovian fear conditioning can facilitate the acquisition of conditioned stimulus responses with just a single day of conditioning. In contrast, Pavlovian reward conditioning generally progresses more slowly. Because of this we restricted our analysis to the last day of reward conditioning to the first and only day of fear conditioning. However, as stated above, we compared the responses of neurons defined as salience during day 1 of reward conditioning and fear conditioning. As would be predicted based on previous definitions of salience encoding (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected

      (5) The authors make a claim of valence encoding in their title and throughout the paper, which is not possible to make given their experimental design. However, they would greatly benefit from actually using a decoder to demonstrate their encoding claim (decoding performance for shock-food versus shuffled labels) and simply make claims about decoding food-predictive cues and shock-predictive cues. Interestingly, it seems like relatively few CeA neurons actually show differential responses to the food and shock CSs, and that is interesting in itself.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). Interestingly, many of these studies did not vary the US intensity.

      Reviewer #3 (Public review):

      Summary:

      In their manuscript entitled Kong and colleagues investigate the role of distinct populations of neurons in the central amygdala (CeA) in encoding valence and salience during both appetitive and aversive conditioning. The study expands on the work of Yang et al. (2023), which specifically focused on somatostatin (SST) neurons of the CeA. Thus, this study broadens the scope to other neuronal subtypes, demonstrating that CeA neurons in general are predominantly tuned to valence representations rather than salience.

      We thank the reviewer for their insightful comments and assessment of the manuscript.

      Strengths:

      One of the key strengths of the study is its rigorous quantitative approach based on the "circular-shift method", which carefully assesses correlations between neural activity and behavior-related variables. The authors' findings that neuronal responses to the unconditioned stimulus (US) change with learning are consistent with previous studies (Yang et al., 2023). They also show that the encoding of positive and negative valence is not influenced by prior training order, indicating that prior experience does not affect how these neurons process valence.

      Weaknesses:

      However, there are limitations to the analysis, including the lack of population-based analyses, such as clustering approaches. The authors do not employ hierarchical clustering or other methods to extract meaning from the diversity of neuronal responses they recorded. Clustering-based approaches could provide deeper insights into how different subpopulations of neurons contribute to emotional processing. Without these methods, the study may miss patterns of functional specialization within the neuronal populations that could be crucial for understanding how valence and salience are encoded at the population level.

      We appreciate the reviewer’s comments regarding clustering-based approaches. In order to classify cells as responsive to the US or CS we chose to develop a statistically rigorous method for classifying cell response types. Using this approach, we were able to define cell responses to the US and CS. Importantly, we identified 8 distinct response types to the USs. It is not clear how additional clustering analysis would improve cell classifications.

      Furthermore, while salience encoding is inferred based on responses to stimuli of opposite valence, the study does not test whether these neuronal responses scale with stimulus intensity-a hallmark of classical salience encoding. This limits the conclusions that can be drawn about salience encoding specifically.

      As stated above, we used salience classifications similar to those previously described (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). We agree that varying the stimulus intensity would provide a more rigorous assessment of salience encoding; however, several of the studies mentioned above classify cells as salience encoding without varying stimulus intensity. Additionally, the inclusion of recordings with varying US intensities on top of the Pavlovian reward and fear conditioning would further decrease the number of cells that can be longitudinally tracked and would likely decrease the number of cells that could be classified.

      In sum, while the study makes valuable contributions to our understanding of CeA function, the lack of clustering-based population analyses and the absence of intensity scaling in the assessment of salience encoding are notable limitations.

      Reviewer #4 (Public review):

      Summary:

      The authors have performed endoscopic calcium recordings of individual CeA neuron responses to food and shock, as well as to cues predicting food and shock. They claim that a majority of neurons encode valence, with a substantial minority encoding salience.

      Strengths:

      The use of endoscopic imaging is valuable, as it provides the ability to resolve signals from single cells, while also being able to track these cells across time. The recordings appear well-executed, and employ a sophisticated circular shifting analysis to avoid statistical errors caused by correlations between neighboring image pixels.

      Weaknesses:

      My main critique is that the authors didn't fully test whether neurons encode valence. While it is true that they found CeA neurons responding to stimuli that have positive or negative value, this by itself doesn't indicate that valence is the primary driver of neural activity. For example, they report that a majority of CeA neurons respond selectively to either the positive or negative US, and that this is evidence for "type I" valence encoding. However, it could also be the case that these neurons simply discriminate between motivationally relevant stimuli in a manner unrelated to valence per se. A simple test of this would be to check if neural responses generalize across more than one type of appetitive or aversive stimulus, but this was not done. The closest the authors came was to note that a small number of neurons respond to CS cues, of which some respond to the corresponding US in the same direction. This is relegated to the supplemental figures (3 and 4), and it is not noted whether the the same-direction CS-US neurons are also valence-encoding with respect to different USs. For example, are the neurons excited by CS-food and US-food also inhibited by shock? If so, that would go a long way toward classifying at least a few neurons as truly encoding valence in a generalizable way.

      As stated above, valence and salience encoding were defined similar to what has been previously reported (Li et al., 2019, doi: 10.7554/eLife.41223; Yang et al., 2023, doi: 10.1038/s41586-023-05910-2; Huang et al., 2024, doi: 10.1038/s41586-024-07819; Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031; Stephenson-Jones et al., 2020, doi: 10.1016/j.neuron.2019.12.006; Zhu et al., 2018, doi: 10.1126/science.aat0481; and Comoli et al., 2003, doi: 10.1038/nn1113P). As reported in Figure 5 and Figure 5 – Supplement 3, ~29% of CeA neurons responded to both food and shock USs (15% in the same direction and 13.5% in the opposite direction). In contrast, only 6 of 303 cells responded to both the CSfood and CSshock, all in the same direction.

      A second and related critique is that, although the authors correctly point out that definitions of salience and valence are sometimes confused in the existing literature, they then go on themselves to use the terms very loosely. For example, the authors define these terms in such a way that every neuron that responds to at least one stimulus is either salience or valence-encoding. This seems far too broad, as it makes essentially unfalsifiable their assertion that the CeA encodes some mixture of salience and valence. I already noted above that simply having different responses to food and shock does not qualify as valence-encoding. It also seems to me that having same-direction responses to these two stimuli similarly does not quality a neuron as encoding salience. Many authors define salience as being related to the ability of a stimulus to attract attention (which is itself a complex topic). However, the current paper does not acknowledge whether they are using this, or any other definition of salience, nor is this explicitly tested, e.g. by comparing neural response magnitudes to any measure of attention.

      As stated in response to reviewer 2, we longitudinally tracked cells on the first day of Pavlovian reward conditioning the fear conditioning day. Although there were considerably fewer head entries on the first day of reward conditioning, we were able to identify 10 cells that were activated by both the food US and shock US. We compared the responses to the first five head entries and last head entries and the first 5 shocks and last five shocks. Consistent with what has been reported for salience encoding neurons in the basal forebrain (Lin and Nicolelis, 2008, doi: 10.1016/j.neuron.2008.04.031), we observed that the responses were highest when the US was most unexpected and decreased in later trials.

      The impression I get from the authors' data is that CeA neurons respond to motivationally relevant stimuli, but in a way that is possibly more complex than what the authors currently imply. At the same time, they appear to have collected a large and high-quality dataset that could profitably be made available for additional analyses by themselves and/or others.

      Lastly, the use of 10 daily sessions of training with 20 trials each seems rather low to me. In our hands, Pavlovian training in mice requires considerably more trials in order to effectively elicit responses to the CS. I wonder if the relatively sparse training might explain the relative lack of CS responses?

      It is possible that learning would have occurred more quickly if we had used greater than 20 trials per session. However, we routinely used 20-25 trials for Pavlovian reward conditioning (doi: 10.1073/pnas.1007827107; doi: 10.1523/JNEUROSCI.5532-12.2013; doi: 10.1016/j.neuron.2013.07.044; and doi: 10.1016/j.neuron.2019.11.024).

    1. eLife Assessment

      This important study reveals that Excitatory Amino Acid Transporters play a role in chromatic information processing in the retina. The combination of (double) mutants, behavioral assays, immunohistochemistry, and electroretinograms provides solid evidence supporting the appropriately conservative conclusions. The work will be of interest to neurobiologists working on color vision or retinal processing.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript by Garbelli et al. investigates the roles of excitatory amino acid transporters (EAATs) in retinal bipolar cells. The group previously identified that EAAT5b and EAAT7 are expressed at the dendritic tips of bipolar cells, where they connect with photoreceptor terminals. The previous study found that the light responses of bipolar cells, measured by electroretinogram (ERG) in response to white light, were reduced in double mutants, though there was little to no reduction in light responses in single mutants of either EAAT5b or EAAT7.

      The current study further explores the roles of EAAT5b and EAAT7 in bipolar cells' chromatic responses. The authors found that bipolar cell responses to red light, but not to green or UV-blue light, were reduced in single mutants of both EAAT5b and EAAT7. In contrast, UV-blue light responses were reduced in double mutants. Additionally, the authors observed that EAAT5b, but not EAAT7, is strongly localized in the UV cone-enriched area of the eye, known as the "Strike Zone (SZ)." This led them to investigate the impact of the EAAT5b mutation on prey detection performance, which is mediated by UV cones in the SZ. Surprisingly, contrary to the predicted role of EAAT5b in prey detection, EAAT5b mutants did not show any changes in prey detection performance compared to wild-type fish. Interestingly, EAAT7 mutants exhibited enhanced prey detection performance, though the underlying mechanisms remain unclear.

      The distribution of EAAT7 protein in the outer plexiform layer across the eye correlates with the distribution of red cones. Based on this, the authors tested the behavioral performance driven by red light in EAAT5b and EAAT7 mutants. The results here were again somewhat contrary to predictions based on ERG findings and protein localization: the optomotor response was reduced in EAAT5b mutants, but not in EAAT7 mutants.

      Strengths:

      Although the paper lacks cohesive conclusions, as many results contradict initial predictions as mentioned above, the authors discuss possible mechanisms for these contradictions and suggest future avenues for study. Nevertheless, this paper demonstrates a novel mechanism underlying chromatic information processing.<br /> The manuscript is well-written, the data are well-presented, and the analysis is thorough.

      Weaknesses:

      I have only a minor comment. The authors present preliminary data on mGluR6b distribution across the eye. Since this result is based on a single fish, I recommend either adding more samples or removing this data, as it does not significantly impact the paper's main conclusions.

    3. Reviewer #2 (Public review):

      Garbelli et. al. set out to elucidate the function of two glutamate transporters, EAAT5b and EAAT7, in the functional and behavioral responses to different wavelengths of light. The question is an interesting one, because these transporters are well positioned to affect responses to light, and their distribution in the retina suggests that they could play differential roles in visual behaviors. However, the low resolution of both the functional and behavioral data presented here means that the conclusions are necessarily a bit vague.

      In Figure 1, the authors show that the double KO has a decreased ERG response to UV/blue and red wavelengths. However, the individual mutations only affect the response to red light, suggesting that they might affect behaviors such as OMR which typically rely on this part of the visual spectrum. However, there was no significant change in the response to UV/blue light of any intensity, making it unclear whether the mutations could individually play roles in the detection of UV prey. Based on the later behavioral data, it seems likely that at least the EAAT7 KO should affect retinal responses to UV light, but it may be that the ERG does not have the spatial or temporal resolution to detect the difference, or that the presence of blue light overwhelmed any effect of the individual knockouts on the response to UV light.

      In Figures 5 and 6, the authors compare the two knockouts to wild-type fish in terms of their sensitivity to UV prey in a hunting assay. The EAAT5b KO showed no significant impairment in UV sensitivity, while the EAAT7 KO fish actually had an increased hunting response to UV prey. However, there is no comparison of the KO and WT responses to different UV intensities, only in bulk, so we cannot conclude that the EAAT7 KO is allowing the fish to detect weaker prey-like stimuli.

      In Figure 7, the EAAT5b KO seems to cause a decrease in OMR behavior to red grating stimuli, but only one stimulus is tested, so it is unclear whether this is due to a change in visual sensitivity or resolution.

      The conclusions made in the manuscript are appropriately conservative; the abstract states that these transporters somehow influence prey detection and motion sensing, and this is probably true. However, it is unclear to what extent and how they might be acting on these processes, so the conclusions are a bit unsatisfying.

      In terms of impact on the field, this work highlights the potential importance of these two transporters to visual processing, but further studies will be required to say how important they are and what they are doing. The methods presented here are not novel, as UV prey and red OMR stimuli and behaviors have previously been described.

    4. Author response:

      We agree with reviewer #1 to remove the mGluR6b data. It is indeed a weakness and is too preliminary. We will gladly remove it from the revised version.

      We will address the issue of the bulk responses (depicted in Figures 5 and 6) by showing the significance data, arguing that although we cannot prove that prey-detection is increased for lower intensities, the bulk effect is significant, so prey detection is effectively stronger.

    1. eLife Assessment

      This important study explores the neural basis for a well known auditory illusion, often utilized in movie soundtracks, in which a sequence of two complex tones can be perceived as either rising or falling in pitch depending on the context in which they are presented. Convincing single-neuron data and analyses are presented to show that correlates of these pitch-direction changes are found in the ferret primary auditory cortex. While these findings provide an interesting link between cortical activity and perception, the manuscript could be clearer on the wider implications of the failure of traditional decoding models to account for these results.

    2. Reviewer #1 (Public review):

      Summary:

      Previous work demonstrated a strong bias in the percept of an ambiguous Shepard tone as either ascending or descending in pitch, depending on the preceding contextual stimulus. The authors recorded human MEG and ferret A1 single-unit activity during presentation of stimuli identical to those used in the behavioral studies. They used multiple neural decoding methods to test if context-dependent neural responses to ambiguous stimulus replicated the behavioral results. Strikingly, a decoder trained to report stimulus pitch produced biases opposite to the perceptual reports. These biases could be explained robustly by a feed-forward adaptation model. Instead, a decoder that took into account direction selectivity of neurons in the population was able to replicate the change in perceptual bias.

      Strengths:

      This study explores an interesting and important link between neural activity and sensory percepts, and it demonstrates convincingly that traditional neural decoding models cannot explain percepts. Experimental design and data collection appear to have been executed carefully. Subsequent analysis and modeling appear rigorous. The conclusion that traditional decoding models cannot explain the contextual effects on percepts is quite strong.

      Weaknesses:

      Beyond the very convincing negative results, it is less clear exactly what the conclusion is or what readers should take away from this study. The presentation of the alternative, "direction aware" models is unclear, making it difficult to determine if they are presented as realistic possibilities or simply novel concepts. Does this study make predictions about how information from auditory cortex must be read out by downstream areas? There are several places where the thinking of the authors should be clarified, in particular, around how this idea of specialized readout of direction-selective neurons should be integrated with a broader understanding of auditory cortex.

    3. Reviewer #2 (Public review):

      Summary:

      This is an elegant study investigating possible mechanisms underlying the hysteresis effect in the perception of perceptually ambiguous Shepard tones. The authors make a fairly convincing case that the adaptation of pitch direction sensitive cells in auditory cortex is likely responsible for this phenomenon.

      Strengths:

      The manuscript is overall well written. My only slight criticism is that, in places, particularly for non-expert readers, it might be helpful to work a little bit more methods detail into the results section, so readers don't have to work quite so hard jumping from results to methods and back.

      The methods seem sound and the conclusions warranted and carefully stated. Overall I would rate the quality of this study as very high, and I do not have any major issues to raise.

      Weaknesses:

      I think this study is about as good as it can be with the current state of the art. Generally speaking, one has to bear in mind that this is an observational, rather than an interventional study, and therefore only able to identify plausible candidate mechanisms rather than making definitive identifications. However, the study nevertheless represents a significant advance over the current state of knowledge, and about as good as it can be with the techniques that are currently widely available.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Previous work demonstrated a strong bias in the percept of an ambiguous Shepard tone as either ascending or descending in pitch, depending on the preceding contextual stimulus. The authors recorded human MEG and ferret A1 single-unit activity during presentation of stimuli identical to those used in the behavioral studies. They used multiple neural decoding methods to test if context-dependent neural responses to ambiguous stimulus replicated the behavioral results. Strikingly, a decoder trained to report stimulus pitch produced biases opposite to the perceptual reports. These biases could be explained robustly by a feed-forward adaptation model. Instead, a decoder that took into account direction selectivity of neurons in the population was able to replicate the change in perceptual bias.

      Strengths:

      This study explores an interesting and important link between neural activity and sensory percepts, and it demonstrates convincingly that traditional neural decoding models cannot explain percepts. Experimental design and data collection appear to have been executed carefully. Subsequent analysis and modeling appear rigorous. The conclusion that traditional decoding models cannot explain the contextual effects on percepts is quite strong.

      Weaknesses:

      Beyond the very convincing negative results, it is less clear exactly what the conclusion is or what readers should take away from this study. The presentation of the alternative, "direction aware" models is unclear, making it difficult to determine if they are presented as realistic possibilities or simply novel concepts. Does this study make predictions about how information from auditory cortex must be read out by downstream areas? There are several places where the thinking of the authors should be clarified, in particular, around how this idea of specialized readout of direction-selective neurons should be integrated with a broader understanding of auditory cortex.

      While we have not used the term "direction aware", we think the reviewer refers generally to the capability of our model to use a cell's direction selectivity in the decoding. In accordance with the reviewer's interpretation, we did indeed mean that the decoder assumes that a neuron does not only have a preferred frequency, but also a preferred direction of change in frequency (ascending/descending), which is what we use to demonstrate that the decoding in this way aligns with the human percept. We have adapted the text in several places to clarify this, in particular expanding the description in the Methods substantially.

      Reviewer #2 (Public Review):

      The authors aim to better understand the neural responses to Shepard tones in auditory cortex. This is an interesting question as Shepard tones can evoke an ambiguous pitch that is manipulated by a proceeding adapting stimulus, therefore it nicely disentangles pitch perception from simple stimulus acoustics.

      The authors use a combination of computational modelling, ferret A1 recordings of single neurons, and human EEG measurements.

      Their results provide new insights into neural correlates of these stimuli. However, the manuscript submitted is poorly organized, to the point where it is near impossible to review. We have provided Major Concerns below. We will only be able to understand and critique the manuscript fully after these issues have been addressed to improve the readability of the manuscript. Therefore, we have not yet reviewed the Discussion section.

      Major concerns

      Organization/presentation

      The manuscript is disorganized and therefore difficult to follow. The biggest issue is that in many figures, the figure subpanels often do not correspond to the legend, the main body, or both. Subpanels described in the text are missing in several cases.

      We have gone linearly through the text and checked that all figure subpanels are referred to in the text and the legend. As far as we can tell, this was already the case for all panels, with the exception of two subpanels of Fig. 5.

      Many figure axes are unlabelled.

      We have carefully checked the axes of all panels and all but two (Fig. 5D) were labeled. As is customary, certain panels inherit the axis label from a neighboring panel, if the label is the same, e.g. subpanels in Fig. 6F or Fig. 5E, which helps to declutter the figure. We hope that with this clarification, the reviewer can understand the labels of each panel.

      There is an inconsistent style of in-text citation between figures and the main text. The manuscript contains typos and grammatical errors. My suggestions for edits below therefore should not be taken as an exhaustive list. I ask the authors to consider the following only a "first pass" review, and I will hopefully be able to think more deeply about the science in the second round of revisions after the manuscript is better organized.

      While we are puzzled by the severity of issues that R2 indicates (see above, and R3 qualifies it as "well written", and R1 does not comment on the writing negatively), we have carefully gone through all specific issues mentioned by R2 and the other reviewers. We hope that the revised version of the paper with all corrections and clarifications made will resolve any remaining issues.

      Frequency and pitch

      The terms "frequency" and "pitch" seem to be used interchangeably at times, which can lead to major misconceptions in a manuscript on Shepard tones. It is possible that the authors confuse these concepts themselves at times (e.g. Fig 5), although this would be surprising given their expertise in this field. Please check through every use of "frequency" and "pitch" in this manuscript and make sure you are using the right term in the right place. In many places, "frequency" should actually be "fundamental frequency" to avoid misunderstanding.

      Thanks for pointing this out. We have checked every occurrence and modified where necessary.

      Insufficient detail or lack of clarity in descriptions

      There seems to be insufficient information provided to evaluate parts of these analysis, most critically the final pitch-direction decoder (Fig 6), which is a major finding. Please clarify.

      Thanks for pointing this out. We have extended the description of the pitch-direction decoder and highlighted its role for interpreting the results.

      Reviewer #3 (Public Review):

      Summary:

      This is an elegant study investigating possible mechanisms underlying the hysteresis effect in the perception of perceptually ambiguous Shepard tones. The authors make a fairly convincing case that the adaptation of pitch direction sensitive cells in auditory cortex is likely responsible for this phenomenon.

      Strengths:

      The manuscript is overall well written. My only slight criticism is that, in places, particularly for non-expert readers, it might be helpful to work a little bit more methods detail into the results section, so readers don't have to work quite so hard jumping from results to methods and back.

      Following this excellent suggestion, we have added more brief method sketches to the Results section, hopefully addressing this concern.

      The methods seem sound and the conclusions warranted and carefully stated. Overall I would rate the quality of this study as very high, and I do not have any major issues to raise.

      Thanks for your encouraging evaluation of the work.

      Weaknesses:

      I think this study is about as good as it can be with the current state of the art. Generally speaking, one has to bear in mind that this is an observational, rather than an interventional study, and therefore only able to identify plausible candidate mechanisms rather than making definitive identifications. However, the study nevertheless represents a significant advance over the current state of knowledge, and about as good as it can be with the techniques that are currently widely available.

      Thanks for your encouraging evaluation of our work. The suggestion of an interventional study has also been on our minds, however, this appears rather difficult, as it would require a specific subset of cells to be inhibited. The most suitable approach would likely be 2p imaging with holographic inhibition of a subset of cells (using ArchT for example), that has a preference for one direction of pitch change, which should then bias the percept/behavior in the opposite direction.

      Reviewer #1 (Recommendations For The Authors):

      MAJOR CONCERNS

      (1) What is the timescale used to compute direction selectivity in neural tuning? How does it compare to the timing of the Shepard tones? The basic idea of up versus down pitch is clear, the intuition for the role of direction tuning and its relation to stimulus dynamics could be laid out more clearly. Are the authors proposing that there are two "special" populations of A1 neurons that are treated differently to produce the biased percept? Or is there something specific about the dynamics of the Shepard stimuli and how direction selective neurons respond to them specifically? It would help if the authors could clarify if this result links to broader concepts of dynamic pitch coding in general or if the example reported here is specific (or idiosyncratic) to Shepard tones.

      We propose that the findings here are not specific to Shepard tones. To the contrary, only basic properties of auditory cortex neurons, i.e. frequency preference, frequency-direction (i.e. ascending or descending) preference, and local adaptation in the tuning curve, suffice. Each of these properties have been demonstrated many times before and we only verified this in the lead-up to the results in Fig. 6. While the same effects should be observable with pure tones, the lack of ambiguity in the perception of direction of a frequency step for pure tone pairs, would make them less noticeable here. Regarding the time-scale of the directional selectivity, we relied on the sequencing of tones in our paradigm, i.e. 150 ms spacing. The SSTRFs were discretized at 50 ms, and include only the bins during the stimulus, not during the pause. The directional tuning, i.e. differences in the SSTRF above and below the preferred pitchclass for stimuli before the last stimulus, typically extended only one stimulus back in time. We have clarified this in more detail now, in particular in the added Methods section on the directional decoder.

      (2) (p. 9) "weighted by each cell's directionality index ... (see Methods for details)" The direction-selective decoder is interesting and appears critical to the study. However, the details of its implementation are difficult to locate. Maybe Fig. 6A contains the key concepts? It would help greatly if the authors could describe it in parallel with the other decoders in the Methods.

      We have expanded the description of the decoder in the Methods as the reviewer suggests.

      LESSER CONCERNS

      p. 1. (L 24) "distances between the pitch representations...." It's not obvious what "distances" means without reading the main paper. Can some other term or extra context be provided?

      We have added a brief description here.

      p. 2. (L 26) "Shepard tones" Can the authors provide a citation when they first introduce this class of stimuli?

      Citation has been added.

      p. 3 (L 4) "direction selective cells" Please define or provide context for what has a direction. Selective to pitch changes in time?

      Yes, selective to pitch changes in time is what is meant. We have further clarified this in the text.

      p. 4 (L 9-19). This paragraph seems like it belongs in the Introduction?

      Given the concerns raised by R2 about the organization of the manuscript we prefer to keep this 'road-map' in the manuscript, as a guidance for the reader.

      p. 4 (L 32) "majority of cells" One might imagine that the overlap of the bias band and the frequency tuning curve of individual neurons might vary substantially. Was there some criterion about the degree of overlap for including single units in the analysis? Does overlap matter?

      We are not certain which analysis the reviewer is referring to. Generally, cells were not excluded based on their overlap between a particular Bias band and their (Shepard) tuning curve. There are several reasons for this: The bias was located in 4 different, overlapping Shepard tone regions, and all sounds were Shepard tones. Therefore, all cells overlapped with their (Shepard) tuning curve with one or multiple of the Biases. For decoding analysis, all cells were included as both a response and lack of a response is contributing to the decoding. If the reviewer is referring only to the analysis of whether a cell adapts, then the same argument applies as above, i.e. this was an average over all Bias sequences, and therefore every responding cell was driven to respond by the Bias, and therefore it was possible to also assess whether it adapted its response for different positions inside the Bias. We acknowledge that the limited randomness of the Bias sequences in combination with the specific tuning of the cells could in a few cases create response patterns over time that are not indicative of the actual behavior for repeated stimulation, however, since the results are rather clear with 91% of cells adapting, we do not think this would significantly change the conclusions.

      p. 5 (L 17) "desynchronization ... behaving conditions" The logic here is not clear. Is less desynchronization expected during behavior? Typically, increased attention is associated with greater desynchronization.

      Yes, we reformulated the sentence to: While this difference could be partly explained by desynchronization which is typically associated with active behavior or attention [30], general response adaptation to repeated stimuli is also typical in behaving humans [31].

      p. 7 (L 5) "separation" is this a separation in time?

      Yes, added.

      p. 7 (L 33) "local adaptation" The idea of feedforward adaptation biasing encoding has been proposed before, and it might be worth citing previous work. This includes work from Nelken specifically related to SSA. Also, this model seems similar to the one described in Lopez Espejo et al (PLoS CB 2019).

      Thanks for pointing this out. We think, however, that neither of these publications suggested this very narrow way of biasing, which we consider biologically implausible. We have therefore not added either of these citations.

      p. 11 (L. 17) The cartoon in Fig. 6G may provide some intuition, but it is quite difficult to interpret. Is there a way to indicate which neuron "votes" for which percept?

      This is an excellent idea, and we have added now the purported perceptual relation of each cell in the diagram.

      p. 12 (L. 8). "classically assumed" This statement could benefit from a citation. Or maybe "classically" is not the right word?

      We have changed 'classically' to 'typically', and now cite classical works from Deutsch and Repp. We think this description makes sense, as the whole concept of bistable percepts has been interpreted as being equidistant (in added or subtracted semitone steps) from the first tone, see e.g. Repp 1997, Fig.2.

      p. 12 (L. 12) "...previous studies" of Shepard tone percepts? Of physiology?

      We have modified it to 'Relation to previous studies of Shepard tone percepts and their underlying physiology", since this section deals with both.

      p. 12 (L. 25) "compatible with cellular mechanisms..." This paragraph seems key to the study and to Major Concern 1, above. What are the dynamics of the task stimuli? How do they compare with the dynamics of neural FM tuning and previously reported studies of bias? And can the authors be more explicit in their interpretation - should direction selective neurons respond preferentially to the Shepard tone stimuli themselves? And/or is there a conceptual framework where the same neurons inform downstream percepts of both FM sweeps and both normal (unbiased) and biased Shepard tones?

      The reviewer raises a number of different questions, which we address below:

      - Dynamics of the task stimuli in relation to previously reported cellular biasing: The timescales tested in the studies mentioned are similar to what we used in our bias, e.g. Ye et al 2010 used FM sweeps that lasted for up to 200ms, which is quite comparable to our SOA of 150ms.

      - Preferred responses to Shepard tones: no, we do not think that there should be preferred responses to Shepard tones, but rather that responses to Shepard tones can be thought of as the combined responses to the constituent tones.

      - Conceptual framework where the same neurons inform about FM sweeps and both normal (unbiased) and biased Shepard tones: Our perspective on this question is as follows: To our knowledge, the classical approach to population decoding in the auditory system, i.e. weighted based on preferred frequency, has not been directly demonstrated to be read out inside the brain, and certainly not demonstrated to be read out in only this way in all areas of the brain that receive input from the auditory cortex. Rather it has achieved its credibility by being linked directly with animal performance or match with the presented stimuli. However, these approaches were usually geared towards a representation that can be estimated based on constituent frequencies. Additional response properties of neurons, such as directional selectivity have been documented and analyzed before, however, not been used for explaining the percept. We agree that our use of this cellular response preference in the decoding implicitly assumes that the brain could utilize this as well, however, this seems just as likely or unlikely as the use of the preferred frequency of a neuron. Therefore we do not think that this decoding is any more speculative than the classical decoding. In both cases, subsequent neurons would have to implicitly 'know' the preference of the input neuron, and weigh its input correspondingly.

      We have added all the above considerations to the discussion in an abbreviated form.

      p. 15 (L. 15). Is there a citation for the drive system?

      There is no publication, but an old repository, where the files are available, which we cite now: https://code.google.com/archive/p/edds-array-drive/

      p. 16 (L. 24) "position in an octave" It is implied but not explicitly stated that the Shepard tones don't contain the fundamental frequency. Can the authors clarify the relationship between the neural tuning band and the bands of the stimulus. Did a single stimulus band typically fall in a neuron's frequency tuning curve? If not 1, how many?

      Yes, it is correct that the concept of fundamental frequency does not cleanly apply to Shepard tones, because it is composed of octave spaced pure tones, but the lowest tone is placed outside the hearing range of the animal and amplitude envelope (across frequencies). Therefore one or more constituent tones of the Shepard tone can fall into the tuning curve of a neuron and contribute to driving the neuron (or inhibiting it, if they fall within an inhibitory region of the tuning curve). The number of constituent tones that fall within the tuning curve depends on the tuning width of the neurons. The distribution of tuning widths to Shepard tones is shown in Fig. S1E, which indicated that a lot of neurons had rather narrow tuning (close to the center), but many were also tuned widely, indicated that they would be stimulated by multiple constituent tones of the Shepard tone. As the tuning bandwidth (Q30: 30dB above threshold) of most cortical neurons in the ferret auditory cortex (see e.g. Bizley et al. Cerebral Cortex, 2005, Fig.12) is below 1, this means that typically not more than 1 tone fell into the tuning curve of a neuron. However, we also observed multimodal tuning-curves w.r.t. to Shepard tones, which suggests that some neurons were stimulated by more than 2 or more constituent tones (again consistent with the existence of more broadly tuned neurons (see same citation). We have added this information partly to the manuscript in the caption of Fig. S1E.

      p. 17 (L. 32). "Fig 4" Correct figure ref? This figure appears to be a schematic rather than one displaying data.

      Thanks for pointing this out, changed to Fig. 5.

      p. 18 (L. 25). "assign a pitchclass" Can the authors refer to a figure illustrating this process?

      Added.

      p. 19 (L. 17). Is mu the correct symbol?

      Thanks. We changed it to phi_i, as in the formula above.

      p. 19 (L 19). "convolution" in time? Frequency?

      Thanks for pointing this out, the term convolution was incorrect in this context. We have replaced it by "weighted average" and also adapted and simplified the formula.

      p. 19 (L 25) "SSTRF" this term is introduced before it is defined. Also it appears that "SSTRF" and "STRF" are sometimes interchanged.

      Apologies, we have added the definition, and also checked its usage in each location.

      p. 23 (Fig 2) There is a mismatch between panel labels in the figure and in the legend. Bottom right panel (B3), what does time refer to here?

      Thanks for pointing these out, both fixed.

      p. 24 (L 23) "shifts them away" away from what?

      We have expanded the sentence to: "After the bias, the decoded pitchclass is shifted from their actual pitchclass away from the biased pitchclass range ... "

      p. 25 (L 7) "individual properties" properties of individual subjects?

      Thanks for pointing this out, the corresponding sentence has been clarified and citations added.

      p. 26 (L 20) What is plotted in panel D? The average for all cells? What is n?

      Yes, this is an average over cells, the number of cells has now been added to each panel.

      p. 28 (L 3) How to apply the terms "right" "right" "middle" to the panel is not clear. Generally, this figure is quite dense and difficult to interpret.

      We have changed the caption of Panel A and replaced the location terms with the symbols, which helps to directly relate them to the figure. We have considered different approaches of adding or removing content from the figure to help make it less dense, but that all did not seem to help. For lack of better options we have left it in its current form.

      MINOR/TYPOS

      p. 3 (L 1) "Stimulus Specific Adaptation" Capitalization seems unnecessary

      Changed.

      p. 4 (L 14) "Siple"

      Corrected.

      p. 9 (L 10) "an quantitatively"

      Corrected

      p. 9 (L 20) "directional ... direction ... directly ... directional" This is a bit confusing as directseems to mean several different things in its different usages.

      We have gone through these sentences, and we think the terms are now more clearly used, especially since the term 'direction' occurs in several different forms, as it relates to different aspects (cells/percept/hypothesis). Unfortunately, some repetition is necessary to maintain clarity.

      Reviewer #2 (Recommendations For The Authors):

      Detailed critique

      Stimuli

      It would be very useful if the authors could provide demos of their stimuli on a website. Many readers will not be familiar with Shepard tones and the perceptual result of the acoustical descriptions are not intuitive. I ended up coding the stimuli myself to get some intuition for them.

      We have created some sample tones and sequences and uploaded them with the revision as supplementary documents.

      Abstract

      P1 L27 'pitch and...selective cells' - The authors haven't provided sufficient controls to demonstrate that these are "pitch cells" or "selective" to pitch direction. They have only shown that they are sensitive to these properties in their stimuli. Controls would need to be included to ensure that the cells aren't simply responding to one frequency component in the complex sound, for example. This is not really critical to the overall findings, but the claim about pitch "selectivity" is not accurate.

      Fair point. We have removed the word 'selective' in both occurrences.

      Introduction

      P2 L14-17: I do not follow the phonetic example provided. The authors state that the second syllable of /alga/ and /arda/ are physically identical, but how is this possible that ga = da? The acoustics are clearly different. More explanation is needed, or a correction.

      Apologies for the slightly misleading description, it has now been corrected to be in line with the original reference.

      P2,L26-27: Should the two uses of "frequency" be "F0" and "pitch" here? The tones are not separated in frequency by half and octave, but "separated in [F0]" by half an octave, correct? Their frequency ranges are largely overlapping. And the second 'frequency', which refers to the percept, should presumably be "pitch".

      Indeed. This is now corrected.

      P3 L2-6: Unclear at this point in the manuscript what is the difference between the 3 percepts mentioned: perceived pitch-change direction, Shepard tone pitches, and "their respective differences". (It becomes clear later, but clarification is needed here).

      We have tried a few reformulations, however, it tends to overload the introduction with details. We believe it is preferable to present the gist of the results here, and present the complete details later in the MS.

      P3 L6-7 What does it mean that the MEG and single unit results "align in direction and dynamics"? These are very different signals, so clarification is needed.

      We have phrased the corresponding sentence more clearly.

      Results

      Throughout: Choose one of 'pitch class', 'pitchclass', or 'pitch-class' and use it consistently.

      Done.

      P4L12 - would be helpful at this point to define 'repulsive effect'

      We have added another sentence to clarify this term.

      P4, L14 "simple"

      Done

      P4, L12 - not clear here what "repulsive influence" means

      See above.

      P4, L17 - alternative to which explanation? Please clarify. In general, this paragraph is difficult to interpret because we do not yet have the details needed to understand the terms used and the results described. In my opinion, it would be better to omit this summary of the results at the very beginning, and instead reveal the findings as they come, when they can be fully explained to the Reader.

      We agree, but we also believe that a rather general description here is useful for providing a roadmap to the results. However, we have added a half-sentence to clarify what is meant by alternative.

      P4 L30 - text says that cells adapt in their onset, sustained and offset responses, but only data for onset responses are shown (I think - clarification needed for fig 2A2). Supp figure shows only 1 example cell of sustained and offset, and in fact there is no effect of adaptation in the sustained response shown there.

      Regarding the effect of adaptation and whether it can be discerned from the supplementary figure: the shown responses are for 10 repetitions of one particular Bias sequence. Since the response of the cell will depend on its tuning and the specific sequence of the Shepard tones in this Bias, it is not possible to assess adaptation for a given cell. We assess the level of adaptation, by averaging all biases (similar to what is shown in Fig. 2A2) per cell, and then fit an exponential to it, separately by response type. The step direction of the exponential, relative to the spontaneous rate is then used to assess the kind of adaptation. The vast majority of cells show adaptation. We have added this information to the Methods of the manuscript.

      P4, L32 - please state the statistical test and criterion (alpha) used to determine that 91% of cells decreased their responses throughout the Bias sequence. Was this specifically for onset responses?

      Thanks for pointing this out, test and p-value added. Adaptation was observed for onset, sustained and offset responses, in all cases with the vast majority showing an adapting behavior, although the onset responses were adapting the most.

      P4 L36 - "response strength is reduced locally". What does "locally" mean here? Nearby frequencies?

      We have added a sentence here to clarify this question.

      Figure 1 - this appears to be the wrong version of the figure, as it doesn't match the caption or results text. It's not possible to assess this figure until these things are fixed. Figure 1A schematic of definition of f(diff) does not correspond to legend definition.

      As far as we can tell, it is all correct, only the resolution of the figure appears to be rather low. This has been improved now.

      Fig 2 A2 - is this also onset responses only?

      Yes, added to the caption.

      Fig 2 A3 - add y-axis label. The authors are comparing a very wide octave band (5.5 octaves) to a much narrower band (0.5 octaves). Could this matter? Is there something special about the cut-off of 2.5 octaves in the 2 bands, or was this an arbitrary choice?

      Interesting question.... essentially our stimulus design left us only with this choice, i.e. comparing the internal region of the bias with the boundary region of the bias, i.e. the test tones. The internal region just corresponds to the bias, which is 5 st wide, and therefore the range is here given as 2.5 st relative to its center, while the test tones are at the boundary, as they are 3 st from the center. The axis for the bias was mislabelled, and has now been corrected. The y-axis label is matched with the panel to the left, but has now been added to avoid any confusion.

      Fig 2A4 - does not refer to ferret single unit data, as stated in the text (p5L8). Nor does supp Fig2, as stated. Also, the figure caption does not match the figure.

      Apologies, this was an error in the code that led to this mislabelling. We have corrected the labels, which also added back the recovery from the Bias sequence in the new Panel A4.

      P5 l9 - Figure 3 is not understandable at this point in the text, and should not be referred to here. There is a lot going on in Fig 3, and it isn't clear what you are referring to.

      Removed.

      P5 L12 - by Fig 2 B1, I assume you mean A4? Also, F2B1 shows only 1 subject, not 2.

      Yes, mislabeled by mistake, and corrected now.

      Fig2B2 -What is the y-axis?

      Same as in the panel to its left, added for clarity.

      Stimuli: why are tones presented at a faster rate to ferrets than to humans?

      The main reason is that the response analysis in MEG requires more spacing in time than the neuronal analysis in the ferret brain.

      P5 L6 - there is no Fig 5 D2? I don't think it is a good idea to get the reader to skip so far ahead in the figures at this stage anyway, even if such a figure existed. It is confusing to jump around the manuscript

      Changed to 'see below'

      P5 L8 - There is no Figure 2A4, so I don't know whether this time constant is accurate.

      This was in reference to a panel that had been removed before, but we have added it back now.

      P5 L16: "in humans appears to be more substantial (40%) than for the average single units under awake conditions". One cannot directly compare magnitude of effects in MEG and single unit signals in this way and assume it is due to behavioural state. You are comparing different measures of neural activity, averaged over vastly different numbers of numbers, and recorded from different species listening to different stimuli (presentation rates).

      Yes, that's why the next sentence is: "However, comparisons between the level of adaptation in MEG and single neuron firing rates may be misleading, due to the differences in the signal measured and subsequent processing.", and all statements in the preceding sentences are phrased as 'appears' and 'may'. We think we have formulated this comparison with an appropriate level of uncertainty. Further, the main message here is that adaptation is taking place in both active and passive conditions.

      P5 L25 -I do not see any evidence regarding tuning widths in Fig s2, as stated in the text.

      Corrected to Fig. S1.

      P5 l26 - Do not skip ahead to Fig 5 here. We aren't ready to process that yet.

      OK, reference removed.

      P5 l27 - Do you mean because it could be tuning to pitch chroma, not height?

      Yes, that is a possible interpretation, although it could also arise from a combination of excitatory and inhibitory contributions across multiple octaves.

      P5 l33 - remove speculation about active vs passive for reasons given above.

      Removed.

      P6L2-6 'In the present...5 semitone step' - This is an incorrect interpretation of the minimal distance hypothesis in the context of the Shepard tone ambiguity. The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low. Each constituent frequency of a single tone can therefore be perceived either as a harmonic of some lower fundamental frequency or as an independent tone. The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high). The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect. The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept.

      The reviewer here refers to a “minimal distance hypothesis”, which without a literature reference,is hard for us to fully interpret. However, some responses are given below:

      - "The percept is ambiguous because the 'true' F0 of the Shepard tones are imperceptibly low." This statement appears to be based on some misconception: due to the octave spacing (rather than multiple/harmonics of a lowest frequency), the Shepard tones cannot be interpreted as usual harmonic tones would be. It is correct that the lowest tone in a Shepard tone is not audible, due to the envelope and the fact that it could in principle be arbitrarily small... hence, speaking about an F0 is really not well-defined in the case of a Shepard tone. The closest one could get to it would be to refer to the Shepard tone that is both in the audible range and in the non-zero amplitude envelope. But again, since the envelope is fading out the highest and lowest constituent tones, it is not as easy to refer to the lowest one as F0 (as it might be much quieter than the next higher constituent.

      - "The dominant pitch of the second tone in the tritone pair may therefore be biased to be perceived at a lower constituent frequency (when the bias sequence is low) or at a higher constituent frequency (when the bias sequence is high)." This may relate to some known psychophysics, but we are unable to interpret it with certainty.

      - "The text states that the minimal distance hypothesis would predict that an up-bias would make a tritone into a perfect fourth (5 semitones). This is incorrect." We are unsure how the reviewer reaches this conclusion.

      - "The MDH would predict that an up-bias would reduce the distance between the 1st tone in the ambiguous pair and the upper constituent frequency of the 2nd tone in the pair, hence making the upper constituent frequency the dominant pitch percept of the 2nd tone, causing an ascending percept." Again, in the absence of a reference to the MDH, we are unsure of the implied rationale. We agree that this is a possible interpretation of distance, however, we believe that our interpretation of distance (i.e. distances between constituent tones) is also a possible interpretation.

      Fig 4: Given that it comes before Figure 3 in the results text, these should be switched in order in the paper.

      Switched.

      PCA decoder: The methods (p18) state that the PCA uses the first 3 dimensions, and that pitch classes are calculated from the closest 4 stimuli. The results (P6), however, state that the first 2 principal components are used, and classes are computed from the average of 10 adjacent points. Which is correct, or am I missing something?

      Thanks for pointing this out, we have made this more concrete in the Methods to: "The data were projected to the first three dimensions, which represented the pitch class as well as the position in the sequence of stimuli (see Fig. 43A for a schematic). As the position in the Bias sequence was not relevant for the subsequent pitch class decoding, we only focussed on the two dimensions that spanned the pitch circle." Regarding the number of stimuli that were averaged: this might be a slight misunderstanding: Each Shepard tone was decoded/projected without averaging. However, to then assign an estimated pitch class, we first had to establish an axis (here going around the circle), where each position along the axis was associated with a pitch class. This was done by stepping in 0.5 semitone steps, and finding the location in decoded space that corresponded to the median of the Shepard tones within +/- 0.25st. To increase the resolution, this circular 'axis' of 24 points was then linearly interpolated to a resolution of 0.05st. We have updated the text in the Methods accordingly. The mentioning of 10 points for averaging in the Results was correct, as there were 240 tones in all bias stimuli, and 24 bins in the pitch circle. The mentioning of an average over 4 tones in the Methods was a typo.

      Fig 3A: axes of pink plane should be PC not PCA

      Done.

      Fig 3B: the circularity in the distribution of these points is indeed interesting! But what do the authors make of the gap in the circle between semitones 6-7? Is this showing an inherent bias in the way the ambiguous tone is represented?

      While we cannot be certain, we think that this represents an inhomogeneous sampling from the overall set of neural tuning preferences, and that if we had recorded more/all neurons, the circle would be complete and uniformly sampled (which it already nearly is, see Fig.4C, which used to be Fig. 3C).

      Fig 3B (lesser note): It'd be preferable to replace the tint (bright vs. dark) differentiation of the triangles to be filled vs. unfilled because such a subtle change in tint is not easily differentiable from a change in hue (indicating a different variable in this plot) with this particular colour palette

      We have experimented with this suggestion, and it didn't seem to improve the clarity. However, we have changed the outline of the test-pair triangles to white, which now visually separates them better.

      P6 l32 - Please indicate if cross-validation was used in this decoder, and if so, what sort. Ideally, the authors would test on a held-out data set, or at least take a leave-one-out approach. Otherwise, the classifier may be overfit to the data, and overfitting would explain the exceptional performance (r=.995) of the classifier.

      Cross-validation was not used, as the purpose of the decoder is here to create a standard against which to compare the biased responses in the ambiguous pair, which were not used for training of the decoder. We agree that if we instead used a cross-validated decoder (which would only apply to the local average to establish the pitch class circle) the correlation would be somewhat lower, however, this is less relevant for the main question, i.e. the influence of the Bias sequence on the neural representation of the ambiguous pair. We have added this information to the corresponding section.

      Fig 3D: I understood that these pitch classifications shown by the triangles were carried out on the final ambiguous pair of stimuli. I thought these were always presented at the edges of the range of other stimuli, so I do not follow how they have so many different pitchclass values on the x-axis here.

      There were 4 Biases, centered at 0,3,6 or 9 semitones, and covering [-2.5,2.5]st relative to this center. Therefore the edges of the bias ranges (3st away from their centers) happen to be the same as the centers, e.g. for the Bias centered at 3, the ambiguous pair would be a 0-6 or 6-0 step. Therefore there are 4 locations for the ambiguous tones on the x-axis of Fig. 4D (previously 3D).

      Figure 4: This demonstration of the ambiguity of Shepard pairs may be misleading. The actual musical interval is never ambiguous, as this figure suggests. Only the ascending vs descending percept is ambiguous. Therefore the predictions of the ferret A1 decoding (Fig 3D) and the model in Fig 5 are inconsistent with perception in two ways. One (which the authors mention) is the direction of the bias shift (up vs down). Another (not mentioned here) is that one never experiences a shift in the shepard tone at a fraction of a semitone - the musical note stays the same, and changes only in pitch height, not pitch chroma.

      We are unsure of the reviewer’s direction with this question. In particular the second point is not clear to us: "...one (who?) never (in this experiment? in real life?) experiences a bias shift in the Shepard tone at a fraction of a semitone" (why is this relevant in the current experiment?). Pitch chrome would actually be a possible replacement for pitch class, but somehow, the previous Shepard tone literature has referred to it as pitch class.

      P7 l12 - omit one 'consequently'

      Changed to 'Therefore'.

      P7 l24 - I encourage the authors to not use "local" and "global" without making it clear what space they refer to. One tends to automatically think of frequency space in the auditory system, but I think here they mean f0 space? What is a "cell close to the location of the bias"? Cells reside in the brain. The bias is in f0 space. The use of "local" and "global" throughout the manuscript is too vague.

      Agreed, the reference here was actually to the cell's preferred pitch class, not its physical location (which one might arguably be able to disambiguate, given the context). We have changed the wording, and also checked the use of global/local throughout the manuscript. The main use of 'global/local' is now in reference to the range of adaptation, and is properly introduced on first mention.

      P7 L26 -there is no Fig 5D1. Do you mean the left panel of 5D?

      Thanks. Changed.

      FigS3 is referred to a lot on p7-8. Should this be moved to the main text?

      The main reason why we kept it in the supplement is that it is based on a more static model, which is intended to illustrate the consequences of different encoding schemes. In order to not confuse the reader about these two models, we prefer to keep it in the supplement, which - for an online journal - makes little difference since the reader can just jump ahead to this figure in the same way as any other figure.

      Fig 5C, D - label x-axis.

      Added.

      Fig 5E - axis labels needed. I don't know what is plotted on x and y, and cannot see red and green lines in left plot

      Thanks for noticing this, colors corrected, axes labeled.

      Page 8 L3-15 - If I follow this correctly, I think the authors are confusing pitch and frequency here in a way that is fundamental to their model. They seem to equate tonotopic frequency tuning to pitch tuning, leading to confused implications of frequency adaptation on the F0 representation of complex sounds like Shepard tones. To my knowledge, the authors do not examine pure tone frequency tuning in their neurons in this study. Please clarify how you propose that frequency tuning like that shown in Fig 5A relates to representation of the F0 of Shepard tones. Or...are the authors suggesting these neural effects have little to do with pitch processing and instead are just the result of frequency tuning for a single harmonic of the Shepard tones?

      We agree that it is not trivial to describe this well, while keeping the text uncluttered, in particular, because often tuning properties to stimulus frequency contribute to tuning properties of the same neuron for pitch class, although this can be more or less straightforward: specifically, for some narrowly tuned cells, the Shepard tuning is simply a reflection of their tuning to a single octave range of the constituent tones (see Fig. S1). For more broadly tuned cells, multiple constituent tones will contribute to the overall Shepard tuning, which can be additive, subtractive, or more complex. The assumption in our approach is that we can directly estimate the Shepard tuning to evaluate the consequence for the percept. While this may seem artificial, as Shepard tones do not typically occur in nature, the same argument could be made against pure tones, on which classical tuning curves and associated decodings are often based. Relating the Shepard tuning to the classical tuning would be an interesting study in itself, although arguably relating the tuning of one artificial stimulus to another. Regarding the terminology of pitch, pitch class and frequency: The term pitch class is commonly used in the field of Shepard tones, and - as we indicated in the beginning of the results: "the term pitch is used interchangeably with pitch class as only Shepard tones are considered in this study". We agree that the term pitch, which describes the perceptual convergence/construction of a tone-height from a range of possible physical stimuli, needs to be separated from frequency as one contributor/basis for the perception of a pitch. However, we think that the term pitch can - despite its perceptual origin - also be associated with neuron/neural responses, in order to investigate the neural origin of the pitch percept. At the same time, the present study is not targeted to study pitch encoding per se, as this would require the use of a variety of stimuli leading to consistent pitch percepts. Therefore, pitch (class) is here mainly used as a term to describe the neural responses to Shepard tones, based on the previous literature, and the fact that Shepard tones are composite stimuli that lead to a pitch percept. The last sentence has been added to the manuscript for clarity.

      P7-9: I wasn't left with a clear idea of how the model works from this text. I assume you have layers of neurons tuned to frequency or f0 (based on the real data?), which are connected in some way to produce some sort of output when you input a sound? More detail is needed here. How is the dynamic adaptation implemented?

      The detailed description of the model can be found in the Methods section. We have gone through the corresponding paragraph and have tried to clarify the description of the model by introducing a high-level description and the reference to the corresponding Figure (Fig. 5A) in the Results.

      Fig6A: Figure caption can't be correct. In any case, these equations cannot be understood unless you define the terms in them.

      We have clarified the description in the caption.

      Fig 6/directionality analysis: Assuming that the "F" in the STRFs here is Shepard tone f0, and not simple frequency?

      We have changed the formula in the caption and the axis labels now.

      Fig 6C - y-axis values

      In the submission, these values were left out on purpose, as the result has an arbitrary scale, but only whether it is larger or smaller than 0 counts for the evaluation of the decoded directionality (at the current level of granularity). An interesting refinement would be to relate the decoded values to animal performance. We have now scaled the values arbitrarily to fit within [-1,1], but we would like to emphasize that only their relative scale matters here, not their absolute scale.

      Fig 6E - can't both be abscissa (caption). I might be missing something here, but I don't see the "two stripes" in the data that are described in the caption.

      Thank you. The typo is fixed. The stripes are most clearly visible in the right panel of Fig. 6E, red and blue, diagonally from top left to bottom right.

      Fig 6G -I have no idea what this figure is illustrating.

      This panel is described in the text as follows: "The resulting distribution of activities in their relation to the Bias is, hence, symmetric around the Bias (Fig. 6G). Without prior stimulation, the population of cells is unadapted and thus exhibits balanced activity in response to a stimulus. After a sequence of stimuli, the population is partially adapted (Fig. 6G right), such that a subsequent stimulus now elicits an imbalanced activity. Translated concretely to the present paradigm, the Bias will locally adapt cells. The degree of adaptation will be stronger, if their tuning curve overlaps more with the biased region. Adaptation in this region should therefore most strongly influence a cell’s response. For example, if one considers two directional cells, an up- and a down-selective cell, cocentered in the same frequency location below the Bias, then the Bias will more strongly adapt the up-cell, which has its dominant, recent part of the SSTRF more inside the region of the Bias (Fig. 6G right). Consistent with the percept, this imbalance predicts the tone to be perceived as a descending step relative to the Bias. Conversely, for the second stimulus in the pair, located above the Bias, the down-selective cells will be more adapted, thus predicting an ascending step relative to the previous tone."

      I might be just confused or losing steam at this point, but I do not follow what has been done or the results in Fig 6 and the accompanying text very well at all. Can this be explained more clearly? Perhaps the authors could show spike rate responses of an example up-direction and down-direction neuron? Explain how the decoder works, not just the results of it.

      We agree that we are presenting something new here. However, it is conceptually not very different from decoding based on preferred frequencies. We have attempted to provide two illustrations of how the decoder works (Fig. 6A) and how it then leads to the percept using prototypical examples of cellular SSTRFs (Fig. 6G). We have added a complete, but accessible description to the Methods section. Showing firing rates of neurons would unfortunately not be very telling, given the usual variability in neural response and the fact that our paradigm did not have a lot of repetitions (but instead a lot of conditions), which would be able to average out the variability on a single neuron level.

      Discussion - I do not feel I can adequately critique the author's interpretation of the results until I understand their results and methods better. I will therefore save my critique of the discussion section for the next round of revisions after they have addressed the above issues of disorganization and clarity in the manuscript.

      We hope that the updated version of the manuscript provides the reviewer now with this possibility.

      Methods

      P15L7 - gender of human subjects? Age distribution? Age of ferrets?

      We have added this information.

      P16L21 - What is the justification for randomizing the phase of the constituent frequencies?

      The purpose of the randomization was to prevent idiosyncratic phase relationships for particular Shepard tones, which would depend in an orderly fashion on the included base-frequencies if non-randomized, and could have contributed to shaping the percept for each Shepard tone in a way that was only partly determined by the pitch class of the Shepard tone. Added to the section.

      P17L6 - what are the 2 randomizations? What is being randomized?

      Pitch classes and position in the Bias sequence. Added to the section.

      P16 Shepard Tuning section - What were the durations of the tones and the time between tones within a trial?

      Thanks, added!

      Equations - several undefined terms in the equations throughout the manuscript.

      Thanks. We have gone through the manuscript and all equations and have introduced additional definitions where they had been missing.

      Reviewer #3 (Recommendations For The Authors):

      P3L10: "passive" and "active" conditions come totally out of the blue. Need introducing first. (Or cut. If adaptation is always seen, why mention the two conditions if the difference is not relevant here?)

      We have added an additional sentence in the preceding paragraph, that should clarify this. The reason for mentioning it is that otherwise a possible counter-argument could be made that adaptation does not occur in the active condition, which was not tested in ferrets (but presents an interesting avenue for future research).

      P3L14 "siple" typo

      Corrected.

      P4L1 "behaving humans" you should elaborate just a little here on what sort of behavior the participants engaged in.

      Thanks for pointing this out. We have clarified this by adding an additional sentence directly thereafter.

      P4 adaptation: I wonder whether it would be useful to describe the Bias condition a bit more here before going into the observations. The reader cannot know what to expect unless they jump ahead to get a sense of what the Bias looks like in the sense of how many stimuli are in it, and how similar they are to each other. Observations such as "the average response strength decreases as a function of the position in the Bias sequence" are entirely expected if the Bias is made up of highly repetitive material, but less expected if it is not. I appreciate that it can be awkward to have Methods after Results, but with a format like that, the broad brushstroke Methods should really be incorporated into the Results and only the tedious details should be reserved for the Methods to avoid readers having to jump back and forth.

      Agreed, we have inserted a corresponding description before going into the details of the results.

      Related to this (perhaps): Bottom of P4, top of P5: "significantly less reduced (33%, p=0.0011, 2 group t-test) compared to within the bias (Fig. 2 A3, blue vs. red), relative to the first responses of the bias" ... I am at a loss as to what the red and blue symbols in Fig 2 A3 really show, and I wonder whether the "at the edges" to "within the Bias" comparison were to make sense if at this stage I had been told more about the composition of the Bias sequence. Do the ambiguous ('target') tones also occur within the Bias? As I am unclear about what is compared against what I am also not sure how sound that comparison is.

      We have added an extended description of the Bias to the beginning of this section of the manuscript. For your reference: the Shepard tones that made up the ambiguous tones were not part of the Bias sequence, as they are located at 3st distance from the center of the Bias (above and below), while the Bias has a range of only +/- 2.5st.

      Fig 2: A4 B1 B2 labels should be B1 B2 B3

      Corrected.

      Fig 2 A2, A3: consider adjusting y-axis range to have less empty space above the data. In A3 in particular, the "interesting bit" is quite compressed.

      Done, however, while still matching the axes of A2 and A3 for better comparability.

      I am under the strong impression that the human data only made it into Fig 2 and that the data from Fig 3 onwards are animal data only. That is of course fine (MEG may not give responses that are differentiated enough to perform the sort of analyses shown in the later figures. But I do think that somewhere this should be explicitly stated.

      Yes, the reviewer's observation is correct. The decoding analyses could not be conducted on the human MEG data and was therefore not further pursued. Its inclusion in the paper has the purpose of demonstrating that even in humans and active conditions, the local adaptation is present, which is a key contributor to the two decoding models. We now state this explicitly when starting the decoding analysis.

      P5L2 "bias" not capitalized. Be consistent.

      All changed to capitalized.

      P5L8 reference to Fig 2 A4: something is amiss here. From legend of Fig 2 it seems clear that panel A4 label is mislabeled B1. Maybe some panels are missing to show recovery rates?

      Apologies for this residual text from a previous version of the manuscript. We have gone through all references and corrected them.

      P6L7 comma after "decoding".

      Changed.

      Fig 3, I like this analysis. What would be useful / needed here though is a little bit more information about how the data were preprocessed and pooled over animals. Did you do the PCA separately for each animal, then combine, or pool all units into a big matrix that went into the PCA? What about repeat, presentations? Was every trial a row in the matrix, or was there some averaging over repeats? (In fact, were there repeats??)

      Thanks for bringing up these relevant aspects, which were partly insufficiently detailed in the manuscript. Briefly, cells were pooled across animals and we only used cells that could meaningfully contribute to the decoding analysis, i.e. had auditory responses and different responses to different Shepard tones. Regarding the responses, as stated in the Methods, "Each stimulus was repeated 10 times", and we computed average responses across these repetitions. Single trials were not analyzed separately. We have added this information in the Methods, and refer to it in the Results.

      Also, there doesn't appear to be a preselection of units. We would not necessarily expect all cortical neurons to have a meaningful "best pitch" as they may be coding for things other than pitch. Intuitively I suspect that, perhaps, the PCA may take care of that by simply not assigning much weight to units that don't contribute much to explained variance? In any event I think it should be possible, and would be of some interest, to pull out of this dataset some descriptive statistics on what proportion of units actually "care about pitch" in that they have a lot (or at least significantly more than zero) of response variance explained by pitch. Would it make sense to show a distribution of %VE by pitch? Would it make sense to only perform the analysis in Fig 3 on units that meet some criterion? Doing so is unlikely to change the conclusion, but I think it may be useful for other scientists who may want to build on this work to get a sense of how much VE_pitch to expect.

      We fully agree with the reviewer, which is why this information is already presented in Supplementary Fig.1, which details the tuning properties of the recorded neurons. Overall, we recorded from 1467 neurons across all ferrets, out of which 662 were selected for the decoding analysis based on their driven firing rate (i.e. whether they responded significantly to auditory stimulation) and whether they showed a differential response to different Shepard tones The thresholds for auditory response and tuning to Shepard tones were not very critical: setting the threshold low, led to quantitatively the same result, however, with more noise. Setting the thresholds very high, reduced the set of cells included in the analysis, and eventually that made the results less stable, as the cells did not cover the entire range of preferences to Shepard tones. We agree that the PCA based preprocessing would also automatically exclude many of the cells that were already excluded with the more concrete criteria beforehand. We have added further information on this issue in the Methods section under the heading 'Unit selection'.

      P9 "tones This" missing period.

      Changed.

      P10L17 comma after "analysis"

      Changed.

    1. eLife Assessment

      This revised version of the study is important, showing that age-related gut microbiota modulate uric acid metabolism through the NLRP3 inflammasome pathway and thereby regulate susceptibility to age-related gout. Several experimental approaches (mechanistic insights) and methods (data quality) are still incomplete. If strengthened, this paper would be of broad interest to researchers working on gout and the microbiota.

    2. Reviewer #2 (Public review):

      Summary:

      The revised manuscript presents interesting findings on the role of gut microbiota in gout, focusing on the interplay between age-related changes, inflammation, and microbiota-derived metabolites, particularly butyrate. The study provides valuable insights into the therapeutic potential of microbiota interventions and metabolites for managing hyperuricemia and gout. While the authors have addressed many of the previous concerns, a few areas still require clarification and improvements to strengthen the manuscript's clarity and overall impact.

      (1) While the authors mention that outliers in the data do not affect the conclusions, there remains a concern about the reliability of some figures (e.g., Figure 2D-G). It is recommended to provide a more detailed explanation of the statistical analysis used to handle outliers. Additionally, the clarity of the Western blot images, particularly IL-1β in Figure 3C, should be improved to ensure clear and supportive evidence for the conclusions.<br /> (2) The manuscript raises a key question about why butyrate supplementation and FMT have different effects on uric acid metabolism and excretion. While the authors have addressed this by highlighting the involvement of multiple bacterial genera, it is still recommended to expand on the differences between these interventions in the discussion, providing more mechanistic insights based on available literature.<br /> (3) It is noted that IL-6 and TNF-α results in foot tissue were requested and have been added to supplementary material. However, the main text should clearly reference these additions, and the supplementary figures should be thoroughly reviewed for consistency with the main findings. The use of abbreviations (e.g., ns for no significant difference) and labeling should also be carefully checked across all figures.<br /> (4) The manuscript presents butyrate as a key molecule in gout therapy, yet there are lingering concerns about its central role, especially given that other short-chain fatty acids (e.g., acetic and propionic acids) also follow similar trends. The authors should consider further acknowledging these other SCFAs and discussing their potential contribution to gout management. Additionally, the rationale for focusing primarily on butyrate in subsequent research should be made clearer.<br /> (5) The full-length uncropped Western blot images should be provided as requested, to ensure transparency and reproducibility of the data.<br /> (6) Despite the authors' revisions, several references still lack page numbers. Please ensure that all references are properly formatted, including complete page ranges.<br /> The manuscript has improved with the revisions made, particularly regarding clarifications on experimental design and the inclusion of supplementary data. However, some concerns about data quality, mechanistic insights, and clarity in the figures remain. Addressing these points will enhance the overall impact of the work and its potential contribution to the understanding of the gut microbiome in gout and hyperuricemia. A final revision, with careful attention to both major and minor points, is highly recommended before resubmission.

    3. Reviewer #1 (Public review):

      Summary:

      In their manuscript the authors report that fecal transplantation from young mice into old mice alleviates susceptibility to gout. The gut microbiota in young mice is found to inhibit activation of the NLRP3 inflammasome pathway and reduce uric acid levels in the blood in the gout model.

      Strengths:

      They focused on the butanoate metabolism pathway based on the results of metabolomics analysis after fecal transplantation and identified butyrate as the key factor in mitigating gout susceptibility. In general, this is a well-performed study.

      Weaknesses:

      The discussion on the current results and previous studies regarding the effect of butyrate on gout symptoms is insufficient. The authors need to provide a more thorough discussion of other possible mechanisms and relevant literature.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #3 (Public Review):

      Some critical comments are provided below:

      (1) The data quality still needs to be improved. There are many outliers in the experimental data shown in some figures, e.g. Figure 2D-G. The presence of these outliers makes the results unreliable. The author should thoroughly review the data analysis in the manuscript. In addition, a couple of western blot bands, such as IL-1β in Figure 3C, are not clear enough, please provide clearer western blot results again to support the conclusion.

      Following our comparative analysis, we have determined that these data do not affect our conclusions. Moreover, our experimental design included a total of six mice per group, with all mouse samples being subjected to testing.

      (2) As shown in Figure 1G-I, foot thickness and IL-1β content in foot tissues of the Aged+Abx group were significantly reduced, but there was no difference in serum uric acid level. In addition, the Abx-untreated group should be included at all ages.

      Thank you for your comment. We have included this data in Supplemental Material 4.

      (3) Since FMT (Figure 4) and butyrate supplementation (Figure 8) have different effects on uric acid synthesis enzyme and excretion, different mechanisms may lie behind these two interventions. Transplantation with significantly enriched single strains from young mice, such as Bifidobacterium and Akkermansia, is the more reliable approach to reveal the underlying mechanism between gut microbiota and gout.

      Thank you for your comment. Due to the involvement of multiple bacterial genera in gout and hyperuricemia, and the practical challenge of testing all strains, our focus shifted to the functional implications and metabolism of the microbiota. Experimental validation confirmed that butyrate exerts a dual-therapeutic effect in mitigating gout and hyperuricemia.

      (4) In Figure 2F, the results showed the IL-1β, IL-6, and TNF-α content in serum, which was inconsistent with the authors' manuscript description (Line 171).

      Thank you for your comment. The modifications to the results have been implemented.

      (5) Figures 2F-H duplicate Supplementary Figures S1B-D. The authors should prepare the article more carefully to avoid such mistakes.

      Thank you for your comment. We have corrected it in the manuscript.

      (6) In lines 202-206, the authors stated that the elevated serum uric acid levels in the Young+Old or Young+Aged groups, but there is no difference in the results shown in Figure 4A.

      Thank you for your comment. We have corrected it in the manuscript.

      (7) Please visualize the results in Table 2 in a more intuitive manner.

      The results have been presented in Table 2 with a more intuitive visual format. The detailed information is presented in Supplement 4.

      (8) The heatmap in Figure 7A cannot strongly support the conclusion "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group". The author should re-represent the visual results and provide a reasonable explanation. In addition, please provide the ordinate unit of Supplementary Figure 7A-H.

      Thank you for your comment. Figure 7A and Supplementary Figure 7A-H together illustrate "the butyric acid content in the faeces of Young+PBS group was significantly higher than that in the Aged+PBS group", and the specific units of short-chain fatty acids have been annotated in the manuscript.

      (9) Uncropped original full-length western blot should be provided.

      Thank you for your comment. We have made relevant notes in the paper.

      Reviewer #1 (Recommendations For The Authors):

      Gout, a prevalent form of arthritis among the elderly, exhibits an intricate relationship with age and gut microbiota. The authors found that gut microbiota plays a crucial role in determining susceptibility to age-related gout. They observed that age-related gut microbiota regulated the activation of the NLRP3 inflammasome pathway and modulated uric acid metabolism. "Younger" microbiota has a positive impact on the gut microbiota structure of old or aged mice, enhancing butanoate metabolism and butyric acid content. Finally, they found butyric acid exerts a dual effect, inhibiting inflammation in acute gout and reducing serum uric acid levels. This work's insights emphasize the potential of "young" gut microbiome in mitigating senile gout. The whole study was interesting, but there were some minor errors in the overall writing of the paper. The author should carefully check the spelling of the words in the text and the case consistency of the group names.

      Questions:

      (1) Line 118, line 142, and elsewhere 24 months in the same format as before.

      Thank you for your comment. We have corrected it in the manuscript.

      (2) Lines 123, Old and Aged group should be a complex number.

      Thank you for your suggestion. We have corrected it in the manuscript.

      (3) Why does line 133 mention the use of ABX? Please add a brief explanation.

      Thank for your suggestion. The aim of utilizing ABX is to construct the linkage between gut microbiota, age, and gout.

      (4) Lines 172-175, the description of TNF does not match the description of the result figure, may be the picture placement error, please correct this.

      Thank you for your careful review. The error has been corrected and the accurate result has been inserted into the original manuscript.

      (5) Lines183-185 and lines193-lines195, Pro-Caspase-1 and Pro-IL activate excess write.

      Thank you for your careful review. We have corrected the error at the original location.

      (6) Line 400, the text should not be written as increased.

      Thank you for your careful review. We have corrected the error at the original location.

      (7) "ns" needs to be added in the legend to indicate that there is no significant difference.

      Thank you for your careful review. We have corrected the error at the original location.

      (8) Lines 1080-1084 "Old or Aged control group and the old or aged group", group names should be case-sensitive.

      Thank you for your suggestion. We have made the correct modification to the group names.

      (9) Lines 1072-1073, "Representative western blot images of foot tissue NLRP3 pathways proteins" add band density.

      Thank you for your suggestion. We have corrected the error on lines 1072-1073 of the article.

      Reviewer #2 (Recommendations For The Authors):

      Specific comments:

      (1) In Figures 1G-H, the Aged+PBS group with antibiotic treatment shows a significant reduction in foot swelling and IL-1β compared to the Young+PBS and Old+PBS groups. The authors state that age-related changes in the gut microbiota exacerbate gout. However, why does only the Aged+PBS group improve with antibiotic treatment? It seems that butyrate alone cannot explain this phenomenon.

      We utilize antibiotics for treatment in order to establish the relationship between gut microbiota, age, and gout. Different age groups are directly given antibiotics for treatment. We found that after clearing the gut microbiota and then stimulating with MSU, the trend of inflammation factors changing with age disappears.

      (2) In Figure 2, the fecal transplantation from young mice improved the infiltration of inflammatory cells and inflammatory cytokines in the Old and Aged groups. However, in Supplementary Figure 1A, there is no improvement observed in the percentage of foot swelling. Is it appropriate to conclude that inflammation was improved even though foot swelling was not suppressed?

      Although we did not observe changes in the swelling of the mice's feet, there were changes in the inflammatory cell infiltration and inflammation factors in the slices. We rely on a comprehensive assessment of various indicators to determine whether the inflammatory condition has improved or worsened.

      (3) In line #249, the authors state that "the fecal microbiota from mice in the young group promotes uric acid elimination, inhibits reabsorption, and may contribute to the integrity of the intestinal barrier structure." However, Supplementary Figure 3F-H shows no significant alterations in Occludin and ZO-1 mRNA expression levels among all groups. Therefore, it is difficult to conclude that the fecal microbiota from the young group promotes the integrity of the intestinal barrier structure. A functional barrier assay, such as oral administration of FITC-dextran, would be necessary to verify the authors' conclusion.

      In Supplementary Figure 3F-H, we observed that the mRNA expression of Occludin and ZO-1 increased but showed no significant difference. However, after the elderly mice were transplanted with the intestinal microbiota of young mice, the mRNA expression of JAMA showed a significant upward trend. Additionally, due to the scarcity of old mice, we were unable to perform the oral administration of FITC-dextran. However, we supplemented with immunohistochemical slices of Zo-1 and Occludin to support our viewpoint.

      (4) In Figure 4, when comparing the young+PBS group with the old+PBS or aged+PBS groups, there are hardly any differences in the proteins involved in uric acid synthesis (ADA, GDA, XOD) or the genes involved in uric acid transport (URAT1, GLUT9, OAT1, OTA3, ABCG2). Since no changes in uric acid synthesis or transport pathways are observed with aging, it is questionable to conclude that fecal transplantation from young mice improves these pathways and lowers blood uric acid levels.

      In the calculation process, we used different age groups of the control group as references, instead of directly using young mice. We then compared the data of mice of different ages, and the results are in Supplementary Material 4.

      (5) In line 276, the authors describe "the Young +Old and Young+Aged groups tended to be closer to the Old+PBS and Aged+PBS groups, and the Old+Young and Aged+young groups tended to be closer to the Young+PBS group (Figure 5D)". Please conduct a statistical analysis.

      (6) In line 298, the authors hypothesize that butyrate might be the key molecule responsible for controlling gout, as Bifidobacterium and Akkermansia were abundant in the Young group, and the butyrate pathway was prominent. However, neither Bifidobacterium nor Akkermansia are butyrate-producing bacteria. Thus, the conclusion appears to be biased toward butyrate, raising questions about this interpretation.

      Upon comparison, we discovered other bacteria genera that produce butyrate, such as Lachnoclostridium. Additionally, literature (PMID:38126785, 26420851) reports have indicated that Bifidobacteria combined with other genera can enhance the production of butyrate. Meanwhile, Akkermansia, particularly the species Akkermansia muciniphila, has been found to confer several beneficial traits, as evidenced by preclinical studies. These traits include promoting the growth of butyrate-producing bacteria through the production of acetate, which leads to a decrease in the loss of the colonic bilayer and subsequent reduction in inflammation (PMID:35468952). Based on the predicted results of microbiome functions, we observed that the Butanoate_metabolism of the microbiota in young mice and the elderly mice recipients of young mouse microbiota was enhanced. Considering that Lachnoclostridium can produce butyrate, and that Bifidobacteria and Akkermansia can promote the production of butyrate by the intestinal microbiota, we speculated that butyrate might play a role in gout and hyperuricemia.

      (7) In Supplementary Figure 7, acetic acid and propionic acid also show the same behavior as butyric acid. It is possible that these metabolites may also affect the development of gout.

      Thank you for your suggestion. Indeed, Figure 7 does show a similar trend for acetic and propionic acids as for butyric acid. However, considering the predictive data of microbial function and the non-targeted metabolomic data, there is an enhancement of Butanoate_metabolism in both young mice and elderly mice receiving young mouse intestinal microbiota transplants. Therefore, we prioritized butyrate as the subject of our study. Due to the scarcity of elderly mice, we are unable to conduct subsequent experiments with acetic and propionic acids, which is one of the limitations of this study. This work will be addressed in our follow-up research.

      (8) In Figure 6, the secondary bile acid biosynthesis pathway was also changed. However, there is little mention of secondary bile acid in the discussion section. Please carefully discuss other possibilities besides butyrate.

      Thank you for your suggestion. We have incorporated a discussion about secondary bile acids into the relevant section of our manuscript.

      (9) In line #330, the authors state, 'the metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway (Figure 6A-D).' However, there does not appear to be much difference in the butanoate metabolism pathway. Specifically, in Figure 6C, the butanoate metabolism pathway in the Old group does not differ from that in the Young group. Please explain in more detail whether the butanoate metabolism pathway is relevant in the Old group.

      The metabolites identified as showing differential abundance between the groups were enriched in the butanoate metabolism pathway. The differential metabolites are enriched in the butyrate metabolism pathway; however, the non-targeted metabolomics did not reveal the extent of their enrichment.

      (10) In Figure 7, the authors measured the levels of short-chain fatty acids in the Young and Aged groups. They found butyrate in the feces of mice in the Young group was higher than that in the Aged group. However, I wonder whether the Old group also had low levels of butyrate or not.

      In the experiment, we selected three representative groups to verify the hypothesis that butyrate may play a significant role in gout and hyperuricemia. Subsequently, we found that supplementing 18-month-old and 24-month-old mice with butyrate indeed reduced blood uric acid levels and alleviated gout symptoms. Since 18-month-old mice are difficult to obtain, we only conducted microbiome sequencing and non-targeted metabolomic analysis.

      Minor issues:

      (11) In line 74, what does MSU stand for? Please describe the abbreviation.

      In line 74, MSU refers to Monosodium urate crystals.

      (12) In line 136, please insert a space between "IL-1β" and "and".

      Thank you for your suggestion. We have corrected the error of the article.

      (13) In line 570, please describe the method of butyrate administration and also correct the grammatical errors.

      Thank you for your suggestion. We have corrected the error of the article.

      (14) Change the title of x axis in Figure 2F-H, "Serum ~" to "Peritoneal fluid ~", according to the legend.

      Thank you for your suggestion. We have corrected this error in the manuscript.

      (15) In line 302, "succinates" should be "butyric acid or butyrate".

      Thank you for your suggestion. We have corrected this error in the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      (1) The authors showed the results of IL-1β levels in foot tissues in Figure 1C and Figure 1H, and serum IL-1β, IL-6, and TNF-α levels in Figure 2F-H. Could the authors also provide the results of IL-6 and TNF-α in foot tissue in Figure 1?

      Thank you for your suggestion. We have added the results of of IL-6 and TNF-α in foot tissue in supplementary material 4.

      (2) There are some errors in the reference citation format, such as missing page numbers.

      Thank you for your careful review. We have revised the references in our manuscript.

      (3) There are too many writing errors in the manuscript, which greatly affect the understanding of the text. The manuscript must be carefully revised to improve its readability. It's recommended that a professional English writer or native speaker proofread the paper before submission. Some errors, but not limited to these errors, are listed below.

      a. Line 107: The abbreviation for "short-chain fatty acid" should be SCFA, not SFCA.

      Thank you for your careful review. We have corrected this error in the manuscript.

      b. Line 136: There is a missing space between IL-1β and and. B.

      Thank you for your careful review. We have corrected this error in the manuscript.

      c. Line 145, the phrase "on gout on gout", and line 471, "that transplantation" are repeated.

      Thank you for your careful review. We have corrected this error in the manuscript.

      d. Line 152: "Age+PBS" should be "Aged+PBS".

      Thank you for your careful review. We have corrected this error in the manuscript.

      e. In Figure 1e, "Aded+PBS" should be "Aged+PBS".

      Thank you for your careful review.  We have corrected the error in Figure 1e.

      f. Line 152: The phrase "by via" is repeated.

      Thank you for your suggestion. We have deleted the phrase "by via" in line 152.

      g. "16S rDNA" in line 92 is inconsistent with the "16S rRNA" in line 652.

      Thank you for your suggestion. We have revised the error in the manuscript to maintain consistency in professional terminology.

    1. eLife Assessment

      This article describes a novel mechanism allows Drosophila to combat enteric pathogens while also preserving the beneficial indigenous microbiota. The authors provide compelling evidence that oral infection of Drosophila larvae by pathogenic bacteria activate a valve that traps the intruders in the anterior midgut, allowing them to be killed by antimicrobial peptides. This is an important finding revealing a new mechanism of host defense in the gut of insects.

    2. Reviewer #1 (Public review):

      Tleiss et al. demonstrate that while commensal Lactiplantibacillus plantarum freely circulate within the intestinal lumen, pathogenic strains such as Erwinia carotovora or Bacillus thuringiensis are blocked in the anterior midgut where they are rapidly eliminated by antimicrobial peptides. This sequestration of pathogenic bacteria in the anterior midgut requires the Duox enzyme in enterocytes, and both TrpA1 and Dh31 in enteroendocrine cells. This effect induces muscular muscle contraction, which is marked by the formation of TARM structures (thoracic ary-related muscles). This muscle contraction-related blocking happens early after infection (15mins). On the other side, the clearance of bacteria is done by the IMD pathway possibly through antimicrobial peptide production while it is dispensable for the blockage. Genetic manipulations impairing bacterial compartmentalization result in abnormal colonization of posterior midgut regions by pathogenic bacteria. Despite a functional IMD pathway, this ectopic colonization leads to bacterial proliferation and larval death, demonstrating the critical role of bacteria anterior sequestration in larval defense.

      In general, this fundamentally important study reveals unique mechanisms in the gut immunity of Drosophila larvae. It also describes a previously understudied structure, TARM, which may play a crucial role in this process. This significant work substantially advances our understanding of pathogen clearance by identifying a new mode of pathogen eradication from the insect gut. The evidence supporting the authors' claims is compelling, and the study opens new avenues for future research in gut immunity.

    3. Reviewer #2 (Public review):

      Summary:

      This article describes a novel mechanism of host defense in the gut of Drosophila larvae. Pathogenic bacteria trigger the activation of a valve that blocks them in the anterior midgut where they are subjected to the action of antimicrobial peptides. In contrast, beneficial symbiotic bacteria do not activate the contraction of this sphincter and can access the posterior midgut, a compartment more favorable to bacterial growth.

      Strengths:

      The authors decipher the underlying mechanism of sphincter contraction, revealing that ROS production by Duox activates the release of DH31 by enteroendocrine cells that stimulate visceral muscle contractions. Use of mutations affecting the Imd pathway or lacking antimicrobial peptides reveals their contribution to pathogen elimination in the anterior midgut.

      Weaknesses:

      The mechanism allowing the discrimination between commensal and pathogenic bacteria remains unclear.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Tleiss et al. demonstrate that while commensal Lactiplantibacillus plantarum freely circulate within the intestinal lumen, pathogenic strains such as Erwinia carotovora or Bacillus thuringiensis are blocked in the anterior midgut where they are rapidly eliminated by antimicrobial peptides. This sequestration of pathogenic bacteria in the anterior midgut requires the Duox enzyme in enterocytes, and both TrpA1 and Dh31 in enteroendocrine cells. This effect induces muscular muscle contraction, which is marked by the formation of TARM structures (thoracic ary-related muscles). This muscle contraction-related blocking happens early after infection (15mins). On the other side, the clearance of bacteria is done by the IMD pathway possibly through antimicrobial peptide production while it is dispensable for the blockage. Genetic manipulations impairing bacterial compartmentalization result in abnormal colonization of posterior midgut regions by pathogenic bacteria. Despite a functional IMD pathway, this ectopic colonization leads to bacterial proliferation and larval death, demonstrating the critical role of bacteria anterior sequestration in larval defense.

      This important work substantially advances our understanding of the process of pathogen clearance by identifying a new mode of pathogen eradication from the insect gut. The evidence supporting the authors' claims is solid and would benefit from more rigorous experiments.

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We have linked the adult phenotype to the larval model to explore the ROS/TrpA1/Dh31 axis in both contexts.  As highlighted in the discussion, however, there are key behavioral differences between larvae and adult flies. Unlike larvae, which remain in the food environment, adult flies have the ability to move away. This difference could impact the relevance of gut muscle contraction and bacterial clearance mechanisms between the two stages. Specifically, in larvae, the rapid ejection of gut contents due to muscle contraction poses a unique risk: larvae may inadvertently re-ingest the expelled material within minutes, which could influence their immune defenses. We have clarified this distinction and our hypothesis in the final section of the discussion, as it emphasizes the adaptive nature of this mechanism in larvae.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      To address this, we have provided new data (Movie 5), in which larvae were fed a lower dose of Bt-GFP at 1.3 × 10^10 CFU/mL. In this video, we observe that when larvae ingest fewer bacteria, no blockage occurs, and the bacteria are able to reach the posterior midgut. As the bacterial load is lower, the fluorescence signal is weaker, but the movie clearly shows the excretion of bacteria. Importantly, under these conditions, no larval death was observed. These findings suggest that below a certain bacterial threshold, the pathogenicity is insufficient to: (1) trigger the blockage response, and (2) kill the larvae. In such cases, bacteria are likely eliminated through normal peristaltic movements rather than through the blockage mechanism described in our study.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      As mentioned in our previous response, we hypothesize that the larvae’s ability to resist low concentrations of pathogenic bacteria is likely due to being below the threshold of virulence. At lower bacterial doses, the pathogenic load is insufficient to trigger the blockage mechanism or cause larval death. In these cases, it is probable that classical peristaltic movements of the gut efficiently eliminate the bacteria, preventing them from colonizing the posterior midgut or causing significant harm. Thus, the larvae rely on standard gut motility and immune mechanisms, rather than the blockage response, to clear lower doses of bacteria.

      Why is this model only applied to high-dose infections? 

      The reason this model primarily applies to high-dose infections is that lower concentrations of pathogenic bacteria do not trigger the blockage mechanism. As we mentioned in the manuscript, for low bacterial concentrations, where the GFP signal remains detectable, wild-type larvae are still able to resist live bacteria in the posterior part of the intestine.

      Regarding the bacterial doses used in our experiments, it's important to clarify that we calculate the bacterial load based on colony-forming units (CFU). In our setup, there are approximately 5 × 10^4 CFU per midgut. For each experiment, we prepare 500 µl of contaminated medium containing 4 × 10^10 CFU. Fifty larvae are placed into this 500 µl of medium, meaning each larva ingests around 5 × 10^4 CFU within one hour of feeding.

      This leads us to two key points:

      (1) Continuous feeding might trigger the blockage response even at lower doses, as extended exposure to bacteria could lead to higher accumulation within the gut.

      (2) Other defense mechanisms, such as the production of reactive oxygen species (ROS) or classical peristaltic movements, could be sufficient to eliminate lower bacterial doses (around 10^3 CFU or below).

      We also refer to the newly provided Movie 5, where larvae fed with Bt-GFP at 1.3 × 10^10 CFU/mL show no blockage at low ingestion levels and successfully eliminate the bacteria.

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      During the 4 to 6-hour period, several defense mechanisms are activated. ROS play a bacteriostatic and bacteriolytic role, helping to control bacterial growth. Concurrently, the IMD pathway is activated, leading to the transcription, translation, and secretion of antimicrobial peptides. These AMPs exert both bacteriostatic and bacteriolytic effects, contributing to the eventual clearance of the pathogenic bacteria.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We have provided new data (Supplementary Figure 6) that includes RT-qPCR analysis of the whole larval gut in wt, TrpA1- and Dh31- genetic background after feeding with Lp, Ecc15, Bt, or yeast only. We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences between the genotypes tested.

      Additionally, we included new imaging data (Supplementary Figure 11) from AMP reporter larvae (Dpt-Cherry) fed with fluorescent Lp or Bt. In larvae infected with Bt, which is blocked in the anterior part of the gut, the dpt gene is predominantly induced in this region, indicating strong IMD pathway activity in response to Bt infection. Conversely, in larvae fed with Lp-GFP, the Dpt-Cherry reporter shows weak expression in the anterior midgut, and is barely detectable in the posterior midgut where Lp-GFP establishes itself. This aligns with previous findings by Bosco-Drayon et al. (2012), which demonstrated low AMP expression in the posterior midgut due to the presence of negative regulators of the IMD pathway, such as amidases and Pirk.

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      Based on our new data (Supplementary Figure 11), we observe that Dpt-RFP expression is primarily localized in the anterior midgut and likely in the beginning of acidic region in larvae infected with Bt, Ecc and Lp. 

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      We observe that bacteria are not evenly distributed along the gut in wild-type larvae as well, with LP. This suggests that the transit time in the anterior part of the gut may be relatively short due to active peristaltism, which would make this region function as a "checkpoint" for bacteria that are not supposed to be blocked. Indeed, we confirmed that peristaltism is active during our intoxication experiments, which could explain the rapid movement of bacteria through the anterior midgut.

      In contrast, bacteria tend to remain longer in the posterior midgut, which corresponds to the absorptive functions of intestinal cells in this region. This would explain why we observe more bacteria in the posterior midgut for Lp in control larvae and for Ecc15 and Bt in the TrpA1- and Dh31- mutants. Although a few bacteria are still found in the anterior midgut, they are consistently in much lower numbers compared to the posterior, as shown in Figures 1A and 3A of our manuscript.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We investigated whether the ROS/TrpA1/Dh31 axis influences AMP expression by performing RT-qPCR on the whole gut of larvae in wild-type, TrpA1-, and Dh31- genetic backgrounds. Larvae were fed with Lp, Ecc, Bt, or yeast (new data: Supplementary Figure 6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the different genotypes.

      Additionally, we provide imaging data from AMP reporter larvae (pDpt-Cherry) fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These results further confirm that the ROS/TrpA1/Dh31 axis does not significantly affect AMP expression in our experimental conditions.

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      We agree that the TARM structures are a fascinating aspect of this study and acknowledge the interest in their potential role in the blocking and killing phenotypes. While we are keen to explore the specific contributions of these structures during bacterial intoxication, the current genetic tools available for manipulating TARMs target both TARM T1 and T2 simultaneously, as demonstrated by Bataillé et al., 2020 (Fig. 2). Of note, these muscles are essential for proper gut positioning in larvae, and their absence leads to significant defects in food intake and transit, which would confound the results of our intoxication experiments (see Fig. 6 from Bataillé et al., 2020).

      Therefore, while TARMs are likely involved in these processes, the current limitations in selectively targeting them prevent us from definitively testing their role in bacterial blocking and killing at this stage. We hope to address this in future studies as more refined genetic tools become available.

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      To determine whether the ROS/TrpA1/Dh31 axis is required for the formation of TARM structures, we examined larval guts from control, TrpA1-, and Dh31- mutant backgrounds. Our new data (Supplementary Figure 8) show that the TARM T2 structures are still present in the mutants, indicating that the formation of these structures does not depend on the ROS/TrpA1/Dh31 axis.

      Reviewer #2 (Public Review):

      This article describes a novel mechanism of host defense in the gut of Drosophila larvae. Pathogenic bacteria trigger the activation of a valve that blocks them in the anterior midgut where they are subjected to the action of antimicrobial peptides. In contrast, beneficial symbiotic bacteria do not activate the contraction of this sphincter, and can access the posterior midgut, a compartment more favorable to bacterial growth.

      Strengths:

      The authors decipher the underlying mechanism of sphincter contraction, revealing that ROS production by Duox activates the release of DH31 by enteroendocrine cells that stimulate visceral muscle contractions. The use of mutations affecting the Imd pathway or lacking antimicrobial peptides reveals their contribution to pathogen elimination in the anterior midgut.

      Weaknesses:

      The mechanism allowing the discrimination between commensal and pathogenic bacteria remains unclear.

      Based on our findings, we hypothesize that ROS play a crucial role in this discrimination process, with uracil release by pathogenic or opportunistic bacteria potentially serving as a key signal.

      To test whether uracil could trigger this discrimination, we conducted experiments where Lp was supplemented with uracil. However, our results show that uracil supplementation alone was not sufficient to induce the blockage response (new data: Supplementary Figure 5). This suggests that while uracil may be a factor in bacterial discrimination, it is likely not the sole trigger, and additional bacterial factors or signals may be required to activate the blockage mechanism. 

      The use of only two pathogens and one symbiotic species may not be sufficient to draw a conclusion on the difference in treatment between pathogenic and symbiotic species.

      To address this concern, we performed additional intoxication experiments using Escherichia coli OP50, a bacterium considered innocuous and commonly used as a standard food source for C. elegans in laboratory settings. The results, presented in our updated data (new data: Fig 1B), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our conclusion that the gut’s discriminatory mechanism is specific to pathogenic bacteria, and not merely based on bacterial genus.

      We can also wonder how the process of sphincter contraction is affected by the procedure used in this study, where larvae are starved. Does the sphincter contraction occur in continuous feeding conditions? Since larvae are continuously feeding, is this process physiologically relevant?

      In our intoxication protocol, the larvae are exposed to contaminated food for 1 hour, during which the blockage ratio is quantified. Since this period involves continuous feeding with the contaminated food, we do not consider the larvae starved during the quantification process. Our observations show differences in the blockage response depending on the bacterial contaminant and the genetic background of the host. Additionally, we were able to trigger the blocking phenomenon using exogenous hCGRP.

      Regarding the experimental setup for movie observations, it is true that larvae are immobilized on tape in a humid chamber, which is not a fully physiological context. However, in the new movie we provide (Movie 3), co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green) shows that both are initially blocked, followed by the posterior release of Dextran once the bacterial clearance begins.

      Furthermore, to address the question of continuous exposure, we extended the exposure period to 20 hours instead of 1 hour. Even after prolonged exposure, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This supports the physiological relevance of the sphincter contraction and its ability to function under continuous feeding conditions.

      Reviewer #1 (Recommendations For The Authors):

      (1) The authors performed the experiments on Drosophila larvae. I wonder whether this model could extend to adult flies since they have shown that the ROS/TRPA1/Dh31 axis is important for gut muscle contraction in adult flies. If not, how would the authors explain the discrepancy between larvae and adults?

      We link the adult phenotype to the one we describe in larvae in order to have the candidate approach toward the ROS/TrpA1/Dh31 axis. As we already mention in the discussion, while larvae stay in the food, adult flies can go away. If larvae eject their gut content, they may ingest it within minutes. We clarify our idea in the last part of the discussion.

      (2) The authors performed their experiments and proposed the models based on two pathogenic bacteria and one commensal bacterial at a relatively high bacterial dose. They showed that feeding Bt at 2X1010 or Ecc15 at 4X108 did not induce a blockage phenotype. 

      I wonder whether larvae die under conditions of enteric infection with low concentrations of pathogenic bacteria. 

      Video provided with Bt-GFP 1.3 10^10 CFU/mL (new data: Movie 5). When larvae eat less, there is no blockage and bacteria can reach the posterior midgut. Note that the fluorescence is weak due to the low amount of bacteria ingested. The movie shows an excretion of the bacteria. There is also no death of the larvae. Together these results suggest that below a given threshold, the virulence of the bacteria is too weak to i) trigger a blockage and 2/ kill the larva. The bacteria are likely eliminated through classical peristaltism.

      If larvae do not show mortality, what is the mechanism for resisting low concentrations of pathogenic bacteria? 

      Maybe we are below the threshold of virulence. See our response just above.

      Why is this model only applied to high-dose infections? 

      As mentioned in the manuscript, lower concentrations do not trigger the blockage and for lower concentrations with a GFP signal still detectable, wild-type animals resist the presence of live-bacteria within the posterior part of the intestine.

      About the doses, the CFU should be considered. Indeed, there are around 5.10^4 CFU per midgut. In our experimental procedure we calculate the amount of bacteria for 500 µl of contaminated medium (i.e. 4.10^10 CFU/500µl of medium). Then around 50 larvae were deposited in the 500µl of contaminated media. In this condition, one larva ingests 5.10^4 CFU. Moreover, larvae are only fed for 1h. 

      So 1/ continuous feeding may also trigger locking even at lower doses and 2/ the other mechanisms of defenses (such as ROS) or peristalsis may be sufficient to eliminate lower doses (i.e. 10^3 CFU or below). See the new movie 5 we provide with Bt-GFP 1.3 10^10 CFU/mL

      (3) The authors claim that the lock of bacteria happens at 15 minutes while killing by AMPs happens 6-8 hours later. 

      Our CFU data indicate that it’s after 4 to 6 hours that the quantity of bacteria decreases. We fixed this in the text.

      What happened during this period? 

      ROS activity (bacteriostatic and bacteriolytic), IMD activation, AMP transcription, translation, secretion and bacteriostatic as well as bacteriolytic activity.

      More importantly, is IMD activity induced in the anterior region of the larval gut in both Ecc15 and Bt infection at 6 hours after infection? 

      We provide new data for larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMP-encoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (Dpt-Cherry) fed with fluorescent Lp or Bt (new data: SUPP11) showing that with Bt blocked in the anterior part of the intestine, the dpt gene is mainly induced in this area. Note that in the larva infected with Lp-GFP, the Dpt-Cherry reporter is weakly expressed in the anterior midgut. In the posterior midgut, the place where Lp-GFP is established, Dpt-Cherry is barely detectable. This observation is in line with the previous observation made by Bosco-Drayon et al., (2012) demonstrating the low level of AMP expression in the posterior midgut due to the expression of the IMD negative regulators such as amidases and pirk. In the larva infected with Bt-GFP, note the obvious expression of DptCherry in the anterior midgut colocalizing with the bacteria (new data: SUPP11).

      Are they mostly expressed in the anterior midgut in both bacterial infections? Several papers have shown quite different IMD activity patterns in the Drosophila gut. Zhai et al. have shown that in adult Drosophila, IMD activity was mostly absent in the R2 region as indicated by dpt-lacZ. Vodovar et al. have shown that the expression of dpt-lacZ is observable in proventriculus while Pe is not in the same region. Tzou et al. showed that Ecc15 infection induced IMD activity in the anterior midgut 24 hours after infection. 

      In ctrl animals fed Bt, Ecc and Lp we see Dpt-RFP in anterior midgut and likely in the beginning of acidic region. See the new data: SUPP11 images provided for the previous remark.

      Using TrpA1 and Dh31 mutants, the authors found both Ecc15 and Bt in the posterior midgut. Why are they not evenly distributed along the gut? 

      Same is true with Lp in wt; not evenly distributed. As if the transit time in the anterior part is very short due to peristaltism which would fit for a check point area if you’re not supposed to be blocked. Indeed, peristaltism is active during our intoxications. Then, it stays longer in the posterior part, fitting with the absorptive skills of the intestinal cells in this area. With Lp in ctrl or Ecc and Bt in TrpA1- and Dh31- mutants, there are always a few in the anterior midgut but always much less compared to the posterior. See our figure 1A and 3A.

      Last but not least, does the ROS/TrpA1/Dh31 axis affect AMP expression?

      We provide larval whole gut RT-qPCR data in wt, TrpA1- and Dh31- genetic background fed with Lp or Ecc or Bt or yeast only (new data: SUPP6). We monitored 3 different AMPencoding genes and found differences related to the food content, but no differences between genotypes. In addition, we provide images from AMP reporter animals (pDptCherry) fed with fluorescent Lp or Bt, (new data: SUPP11).

      (4) The TARM structure part is quite interesting. However, the authors did not show its relevance in their model. Is this structure the key-driven force for the blocking phenotype and killing phenotype? 

      Indeed, we would like to explore the roles of these structures and the putative requirement upon bacterial intoxication using some driver lines developed by the team that studied these muscles in vivo. However, the genetic tools currently available will target TARMsT1 and T2 at the same time. See Fig 2 form Bataillé et al, . 2020. Moreover, these TARMs are, at first, crucial for the correct positioning of the gut within the larvae and their absence lead to a global food intake and transit defect that will bias the outcomes of our intoxication protocol (see fig 6 from Bataillé et al,. 2020).

      Is the ROS/TrpA1/Dh31 axis required to form this structure?

      We provide images of larval guts from ctrl, TrpA1 and Dh31 mutants demonstrating the presence of the TARMs T2 structures despite the mutations (new data: SUPP8). In addition, we provide representative movies of peristalsis in intestines of Dh31 mutants fed or not with Ecc to illustrate that muscular activity is not abolished (new data: Movie 9 and Movie 10).

      Minor points:

      (1) Why not use the Pros-Gal4/UAS-Dh31 strain in Figure 3B in addition to hCGRP?

      We opted for exogenous hCGRP addition because it allowed us precise timing control over Dh31 activation. Overexpression of Dh31 from embryogenesis or early larval stages could have significant and unintended effects on intestinal physiology, potentially confounding the results. While temporal control using TubG80ts could be an alternative, our focus was on identifying the specific cells responsible for the phenomenon.

      To achieve this, we perturbed Dh31 production via RNAi, specifically targeting a limited number of enteroendocrine cells (EECs) using the DJ752-Gal4 driver, as described by Lajeunesse et al., 2010. Our new data (Supplementary Figure 4) demonstrate that Dh31 expression in this subset of cells is indeed necessary for the blockage phenomenon.

      (2) Section title (line 287) refers to mortality, but no mortality data is in the figure.

      We agree that the title referenced mortality, whereas no mortality data was presented in this section. We have updated the title to better reflect the data discussed in this part of the manuscript.

      (3) It may be better to combine ROS-related contents in the same figure.

      While it is technically feasible to consolidate the ROS-related content into one figure, doing so would require splitting essential data, such as the Gal4 controls for the RNAi assays and parts of the survival phenotype data. We believe that the current structure of the study, which first explores the molecular aspects of the phenomenon and then demonstrates its relevance to the animal’s survival, provides a clearer and more logical flow. For these reasons, we prefer to maintain the current figure layout.

      Reviewer #2 (Recommendations For The Authors):

      Major recommendation

      (1) Other wild-type backgrounds should be added (including the w Drosdel background of the AMP14 deficient flies) to check the robustness of the phenotype.

      To address the concern regarding the robustness of the phenotype across different wildtype backgrounds, we have tested additional genetic backgrounds, including w1, the isogenized w1118 and Oregon animals. 

      The results (new data: Figure 1C) demonstrate that Lp is able to transit freely to the posterior part of the intestine in all backgrounds, while Ecc and Bt are blocked in the anterior part. These findings confirm the robustness of the phenotype across different wildtype strains.

      (2) Although we recognize that this may be limited by the number of GFP-expressing species, other commensal and pathogenic bacteria should be tested in this assay (e.g. E. faecalis and Acetobacter).

      We performed new intoxication experiments using Escherichia coli OP50, a wellestablished innocuous bacterial strain. The data, presented in Figure 1B (new data), show that E. coli OP50, despite being from the same genus as Ecc, does not trigger the blockage response. This further supports our hypothesis that the blockage phenomenon is specific to pathogenic bacteria and not simply related to the bacterial genus.

      (3) It is important to test whether sphincter closure also occurs in continuous feeding conditions. This does not mean repeating all the experiments but just shows that this mechanism can take place in conditions where larvae are kept in a vial with food.

      While the movies we provide involve larvae immobilized on tape in a humid chamber, which is not a fully physiological context, we now provide new data (Movie 3) showing that, after co-treatment with fluorescent Dextran (Red) and fluorescent Bt (Green), both substances are initially blocked in the anterior midgut. Later, the dextran is released posteriorly once bacterial clearance has begun.

      Additionally, we extended the feeding period in our experiments from 1 hour to 20 hours to simulate more continuous exposure to contaminated food. Even under these prolonged conditions, we observed that pathogens are still blocked in the anterior part of the gut (new data: Supplementary Figure 2B). This confirms that the sphincter mechanism can function in continuous feeding conditions as well.

      (4) What are the molecular determinants discriminating innocuous from pathogenic bacteria? Addressing this point will increase the impact of the article. The fact that Relish mutants have normal valve constriction suggests that peptidoglycan recognition is not involved. Is there a sensing of pathogen virulence factors? 

      Our data suggest that uracil could be a key molecular determinant in discriminating between innocuous and pathogenic bacteria, as previously described by the W-J Lee team in several studies on adult Drosophila. However, in our experiments, exogenous uracil addition using the blue dye protocol (Keita et al., 2017) did not induce any significant changes in the larvae. Similarly, uracil supplementation in adult flies failed to trigger the Ecc expulsion and gut contraction phenotype, as reported by Benguettat et al., 2018. 

      To further investigate this, we tested the addition of uracil during Lp-GFP intoxication. In these experiments, we did not observe any blockage of Lp (new data: Supplementary Figure 5). These results suggest that uracil might not be the sole trigger for the blockage response, or we may not be providing uracil exogenously in the most effective way. Alternatively, there could be other pathogen-specific virulence factors that contribute to this discrimination mechanism.

      To address this question, the authors should infect larvae with Ecc15 evf- mutants or Ecc15 lacking uracil production. 

      Thank you for your suggestion to use Ecc15 evf- mutants or Ecc15 lacking uracil production to explore the role of uracil in bacterial discrimination. While we have provided some data using uracil supplementation (new data: Supplementary Figure 5), we agree that testing mutants like PyrE would be an important next step. Unfortunately, we currently lack access to fluorescent PyrE or Ecc15 evf- mutants.

      We are planning to address this by developing a new protocol involving fluorescent beads alongside bacteria. This approach will allow us to test several bacterial strains in parallel and better define the size threshold of the valve. However, we do not have the relevant data yet, but this will be a key focus of our future work.

      Similarly, does feeding heat-killed Ecc15 or Bt induce sequestration in the anterior midgut (larvae may be fed dextran-FITC at the same time to track bacteria)?

      Unfortunately, in our attempts to test heat-killed or ethanol-killed fluorescent Ecc15 for these experiments, we encountered an issue: while we were able to efficiently kill the bacteria, we lost the GFP signal required to track their position in the gut. This made it challenging to assess whether sequestration in the anterior midgut occurs with non-viable bacteria.

      Is uracil or Bt toxin feeding sufficient to induce valve closure? 

      As previously mentioned, uracil is a strong candidate for bacterial discrimination, and we have tested its role by adding exogenous uracil during Lp-GFP intoxication. However, in these experiments, Lp was not blocked (new data: Supplementary Figure 5). This suggests that uracil alone may not be sufficient to induce valve closure, or it may not be the only factor involved. It is also possible that our method of exogenous uracil supplementation may not be effectively mimicking the endogenous conditions.

      Regarding Bt, we used vegetative cells without Cry toxins in our experiments. Cry toxins are only produced during sporulation and are enclosed in crystals within the spore. The Bt strain we used, 4D22, has been deleted for the plasmids encoding Cry toxins. As a result, there were no Cry toxins present in the Bt-GFP vegetative cells used in our assays. This has been clarified in the Materials and Methods section of the manuscript.

      Would Bleomycin induce the same phenotype? 

      Indeed, Bleomycin, as well as paraquat, has been shown to damage the gut and trigger intestinal cell proliferation in adult Drosophila through mechanisms involving TrpA1. Testing whether Bleomycin induces a similar phenotype in larvae would indeed be interesting.

      However, one challenge we face in our intoxication protocol is that larvae tend to stop feeding when chemicals are added to their food mixture. We encountered similar difficulties in our DTT experiments, which were challenging to set up for this reason. Consequently, we aim to avoid approaches that might impair the general feeding activity of the larvae, as it can significantly affect the outcomes of our experiments.

      Could this process of sphincter closure be more related to food poisoning?

      If gut damage were the primary trigger for sphincter closure, we would indeed expect the blockage phenomenon to occur later following bacterial exposure. However, in our experiments, we observe the blockage occurring early after bacterial contact, suggesting that damage may not be the main trigger for this response.

      That said, we have not yet tested bacterial mutants lacking toxins, nor have we tested a direct damaging agent such as Bleomycin, as proposed. These would be valuable future experiments to explore the potential role of gut damage more thoroughly in this process.

      (5) Is Imd activation normal in trpA1 and DH31 mutants? The authors could use a diptericin reporter gene to check if Diptericin is affected by a lack of valve closure in trpA1.

      To address this, we performed RT-qPCR on whole larval guts from wt, TrpA11 and Dh31KG09001 genetic background. Larvae were fed with Lp, Ecc, Bt or yeast only (new data: SUPP6). We monitored the expression of three different AMP-encoding genes and found that while AMP expression varied depending on the food content, there were no significant differences in AMP expression between the genotypes.

      Additionally, we provide imaging data from AMP reporter animals (pDpt-Cherry) in a wildtype background, fed with fluorescent Lp or Bt (new data: Supplementary Figure 11). These images also support the conclusion that Diptericin expression is not significantly affected by a lack of valve closure in trpA1 and Dh31 mutants.

      (6) Are the 2-6 DH31 positive cells the same cells described by Zaidman et al., Developmental and Comparative Immunology 36 (2012) 638-647.

      The cells identified as hemocytes in the midgut junctions by Zaidman et al. are likely the same cells we describe in our study, as they are located in the same region and are Dh31 positive. We have added a reference to this paper and included lines in the manuscript acknowledging this connection.

      Although confirming whether these cells are Hml+, Dh31+, and TrpA1+ would clarify their exact identity, this falls outside the scope of our current study. However, the possibility that these cells play a role in physical barrier immunity and also possess a hemocyte identity is indeed intriguing, and we hope future research will explore this further.

      Minor points

      (1) The mutations should be appropriately labelled with the allele name.

      This has been fixed in the main text, in Fig Legends, and in figures. 

      (2) Line 230-231: the sentence is unclear to me.

      We simplified the sentence and do not refer to the expulsion in larvae.

      (3) Discussion: although the discussion is already a bit long, it would be interesting to see if this process is likely to happen/has been described in other insects (mosquito, Bactrocera, ...).

      We reviewed the available literature but were unable to find specific examples describing the blockage phenomenon in other insects. Most studies we found focused on symbiotic bacteria rather than pathogenic or opportunistic bacteria. However, as mentioned in our manuscript, the anterior localization of opportunistic or pathogenic bacteria has been observed in Drosophila by independent research groups.

      (4) Line 546: add the Caudal Won-Jae Lee paper to state the posterior midgut is less microbicidal.

      We added the reference at the right place, mentioning as well that it concerns adults. 

      (5)  Figure 6 indicates what the cells are, shown by the arrow.

      The sentence ‘the arrows point to TARMs’ is present in the legend of Fig6.

      (6) Does the sphincter closure depend on hemocytes?

      As mentioned above, the cells we identify as TrpA1+ in the midgut junction may be the same cells described by Zaidman et al., 2012, and earlier by Lajeunesse et al., 2010. Inactivating hemocytes using the Hml-Gal4 driver may also affect these Dh31+ cells, as they share similarities with hemocytes, as pointed out by Zaidman et al. However, distinguishing between hemocytes and Dh31+/TrpA1+ cells would require a genetic intersectional approach, which is beyond the scope of our current study.

      Nevertheless, the possibility that these cells play a dual role in immunity (through blockage) and share characteristics with hemocytes while functioning as enteroendocrine cells (EECs) is quite intriguing and deserves further exploration in future studies.

    1. eLife Assessment

      This valuable study proposes that protein secreted by colon cancer cells induces cells with Paneth-like properties that favor colon cancer metastasis. The evidence supporting the conclusions is solid but the study would benefit from more direct experiments to test the functional role of Paneth-like cells and to monitor metastasis from colon tumors. The work will be of interest to researchers studying colon cancer metastasis.

    2. Reviewer #1 (Public review):

      Summary:

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells.

      Strengths:

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC.

      Weaknesses:

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses.

      Main comments

      Novelty:<br /> The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234)

      Mouse data:<br /> (a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneth cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis.<br /> (b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was low (17 cells, Fig.3), and assuming that these cells are driving the differences seems somewhat far-fetched.<br /> (c) Fig. 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis.<br /> (d) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.?

      Human data:<br /> Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage?

      Bioinformatic analysis<br /> GEO repositories remain not open (at the time of the re-review) and SRA links for raw data are still unavailable. Without access to raw data, it is not possible to verify the analyses or fully assess the results. A part of the article was made by re-analyzing public data so the authors should make even the raw available and not just the count tables

    3. Reviewer #2 (Public review):

      Summary:

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9.

      Strengths:

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context.

      Remaining Weaknesses after revision:

      (1) The authors have effectively explained the regulation of HNF4A at both mRNA and protein levels. To further strengthen their findings, I recommend using CRISPR technology to generate DKK2 and HNF4A double knockout organoids. This approach would allow the authors to investigate whether the AKP liver metastasis is restored in the double knockout condition. Such an experiment would provide more direct evidence that HNF4A protein stabilization is the crucial mechanism for liver metastasis suppression following DKK2 knockout.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The authors addressed the influence of DKK2 on colorectal cancer (CRC) metastasis to the liver using an orthotopic model transferring AKP-mutant organoids into the spleens of wild-type animals. They found that DKK2 expression in tumor cells led to enhanced liver metastasis and poor survival in mice. Mechanistically, they associate Dkk2-deficiency in donor AKP tumor organoids with reduced Paneth-like cell properties, particularly Lz1 and Lyz2, and defects in glycolysis. Quantitative gene expression analysis showed no significant changes in Hnf4a1 expression upon Dkk2 deletion. Ingenuity Pathway Analysis of RNA-Seq data and ATAC-seq data point to a Hnf4a1 motif as a potential target. They also show that HNF4a binds to the promoter region of Sox9, which leads to LYZ expression and upregulation of Paneth-like properties. By analyzing available scRNA data from human CRC data, the authors found higher expression of LYZ in metastatic and primary tumor samples compared to normal colonic tissue; reinforcing their proposed link, HNF4a was highly expressed in LYZ+ cancer cells compared to LYZ- cancer cells. 

      Strengths: 

      Overall, this study contributes a novel mechanistic pathway that may be related to metastatic progression in CRC. 

      Weaknesses: 

      The main concerns are related to incremental gains, missing in vivo support for several of their conclusions in murine models, and missing human data analyses. Additionally, methods and statistical analyses require further clarification. 

      Main comments: 

      (1) Novelty 

      The authors previously described the role of DKK2 in primary CRC, correlating increased DKK2 levels to higher Src phosphorylation and HNF4a1 degradation, which in turn enhances LGR5 expression and "stemness" of cancer cells, resulting in tumor progression (PMID: 33997693). A role for DKK2 in metastasis has also been previously described (sarcoma, PMID: 23204234). 

      (2) Mouse data 

      a) The authors analyzed liver mets, but the main differences between AKT and AKP/Dkk2 KO organoids could arise during the initial tumor cell egress from the intestinal tissue (which cannot be addressed in their splenic injection model), or during pre-liver stages, such as endothelial attachment. While the analysis of liver mets is interesting, given that Paneths cells play a role in the intestinal stem cell niche, it is questionable whether a study that does not involve the intestine can appropriately address this pathway in CRC metastasis. 

      We value the reviewer’s comment that the splenic injection model cannot represent metastasis from the primary tumors, intravasation and extravasation. Therefore, we performed the orthotopic transplantation of AKP and KO organoids into the colon directly then, tested metastasis of cancer.

      Author response image 1.

      Primary tumor formation and liver metastasis by orthotopic transplantation of AKP or KO colon cancer organoids. 6-8 week-old male C57BL/6J mice were treated with 2.5% DSS dissolved in drinking water for 5 days, followed by regular water for 2 days to remove gut epithelium. After recovery with the regular water, the colon was flushed with 1000 μl of 0.1% BSA in PBS. Then, 200,000 dissociated organoid cells in 200 μl of 5% Matrigel and 0.1% BSA in PBS were instilled into the colonic luminal space. After infusion, the anal verge was sealed with Vaseline. 8 weeks after transplantation, the mice were sacrificed to measure primary tumor formation and liver metastasis.

      As a result, 4 out 6 mice in the control group successfully formed colorectal primary tumors whereas only 2 out 6 mice showed primary tumor formation in the KO group (Author response image 1A). The size of tumors was reduced by about half (10-12 mm to 5-7 mm). Only one AKP mouse developed metastasized nodules in the liver (Author response image 1B). Next, to measure the circulating tumor cells, we harvested at least 500 ul of bloods from the portal vein and then analyzed tdTomato-positive tumor cells (Author response image 2). Flow cytometry analysis of PBMCs showed the presence of tdTomatohiCD45- cells as well as tdTomatomidCD45+ cells in 2 out of 6 AKP mice, while no tdTomato-positive cells were observed in the PBMCs of KO organoid-transplanted mice.

      Due to the limited numbers of mice showed primary and metastatic tumor formation, we cannot provide a statistic analysis of DKK2-mediated metastasis. However, our revised data indicate a trend that DKK2 KO reduced primary tumor formation, the number of circulating tumor cells and liver metastasis. This trend is consistent with our previous report in the iScience paper, which showed that DKK2 KO reduced AOM/DSS-induced polyp formation about 60 % and decreased metastasis in the splenic injection model system in this manuscript. Further studies are necessary to confirm this trend and to provide the underlying mechanisms of intravasation and extravasation of circulating tumor cells.

      Author response image 2.

      Flow cytometry analysis of tdTomato+ circulating colon tumor cells in PBMCs. PBMCs were harvested via the portal vein after euthanasia. CD45 and tdTomato were analyzed by flow cytometry.

      b) The overall number of Paneth cells found in the scRNA-seq analysis of liver mets was strikingly low (17 cells, Figure 3), and assuming that these cells are driving the differences seems somewhat far-fetched. Adding to this concern is inappropriate gating in the flow plot shown in Figure 6. This should be addressed experimentally and in the interpretation of data. 

      We appreciate for reviewer’s comments to clarify this point. Since the number of LYZ+ cells is low in our scRNA-seq analysis, we performed flow cytometry in Figure 6H showing the clear population expressing LYZ in the same splenic injection model of metastasis. Figure 6H is a representative image of triplicates for each group and we performed this experiment three times, independently. As suggested, we changed the graph format and updated the gating and statistical analysis in Fig 6H and 6I. This in vivo result confirmed our in vitro data showing that DKK2 KO reduced LYZ+ cells while increase the HNF4α1 proteins.

      c) Figures 3, 5, and 6 show the individual gene analyses with unclear statistical data. It seems that the p-values were not adjusted, and it is unclear how they reached significance in several graphs. Additionally, it was not stated how many animals per group and cells per animal/group were included in the analyses. 

      In Fig. 3, mouse scRNA-seq data were generated from pooled cancer samples from 5 animals per group. The Wilcoxon signed-rank test was performed for each gene and/or regulon activity. Since multiple testing adjustments were not performed, a p-value adjustment is neither needed nor applicable..

      In Fig. 5, human data were analyzed. Cells from the same sample are dependent, but differential gene expression (DEG) analysis typically calculates statistics under the assumption that they are independent. This assumption may explain the low p-values observed in our data. To address this issue, we applied pseudobulk DEG analysis to our human single-cell data. Even after correcting for statistical error, we confirmed that the genes of interest still exhibited significantly different expression patterns (Author response image 3).

      Author response image 3.

      Pseudobulk DEG analysis confirmed the differential expression genes of interest.

      In Fig.6H-6I, the number of animals per group is provided in the figure legend.

      d) Figure 6 suggests a signaling cascade in which the absence of DKK2 leads to enhanced HNF4A expression, which in turn results in reduced Sox9 expression and hence reduced expression of Paneth cell properties. It is therefore crucial that the authors perform in vivo (splenic organoid injection) loss-of-function experiments, knockdown of Sox9 expression in AKP organoids, and Sox9 overexpression experiments in AKP/Dkk2 KO organoids to demonstrate Sox9 as the central downstream transcription factor regulating liver CRC metastasis. 

      Sox9 is a well-established marker gene for Paneth cell formation in the gut. Therefore, overexpression or knockout of the Sox9 gene would result in either an increase or decrease in Paneth cells in the organoids. We believe that the suggested experiments fall outside the scope of this manuscript. Instead, we demonstrated the change in the Paneth cell differentiation marker, Sox9, in the presence or absence of DKK2.

      e) Given the previous description of the role of DKK2 in primary CRC, it is important to define the step of liver metastasis affected by Dkk2 deficiency in the metastasis model. Does it affect extravasation, liver survival, etc.? 

      We appreciate the reviewer’s insights and perspectives. Regarding liver survival, it is well known that stem cell niche formation is a critical step for the outgrowth of metastasized cancer cells (Fumagalli et al. 2019, Cell Stem Cell). LYZ+ Paneth cells are recognized as stem cell niche cells in the intestine, and human scRNA-seq data have shown that LYZ+ cancer cells express stem cell niche factors such as Wnt and Notch ligands. To determine whether LYZ+ cancer cells act as stem cell niche cells, we performed confocal microscopy to assess whether LYZ+ cancer cells express WNT3A and DLL4 in AKP organoids (Author response image 4). The results show that LYZ labeling co-localizes with DLL4 and WNT3A expression, while the organoid reporter tdTomato is evenly distributed. Additionally, our in vitro and in vivo data indicate that DKK2 deficiency leads to a reduction of LYZ+ cancer cells, which may contribute to stem cell niche formation. Based on these findings, we propose that DKK2 is an essential factor for stem cell niche formation, which is required for cancer cell survival in the liver during the early stages of metastasis. Although our revised data confirmed the trend that DKK2 deficiency decreases liver metastasis, we have not yet determined whether DKK2 is involved in extravasation. This research topic should be addressed in future studies.

      Author response image 4.

      Confocal microscopy analysis for lysozyme (LYZ) and Paneth cell-derived stem cell niche factors, WNT3A and DLL4 in AKP colon cancer organoids.

      The method is described in the supplemental information. The list of antibodies used: DLL4 (delta-like 4) Polyclonal Antibody (Invitrogen, PA5-85931), WNT3A Polyclonal Antibody (Invitrogen, PA5-102317), Goat anti-Rabbit IgG (H+L) Cross-Adsorbed Secondary Antibody, Alexa Fluor™ 488 (Invitrogen, A-11008), Anti-Lysozyme C antibody (H-10, Santacurz, sc-518083), Goat anti-Mouse IgM (Heavy chain) Secondary Antibody, Alexa Fluor™ 647 (Invitrogen, A-21238).

      (3) Human data 

      Can the authors address whether the expression of Dkk2 changes in human CRC and whether mutations in Dkk2 as correlated with metastatic disease or CRC stage? 

      The human data were useful in identifying the presence of LYZ+ cancer cells with Paneth cell properties. However, due to the limited number of late-stage patient samples with high DKK2 expression, the results were not statistically significant. Nevertheless, the trend suggests a positive correlation between DKK2 expression and the malignant stage of CRC.

      (4) Bioinformatic analysis 

      The authors did not provide sufficient information on bioinformatic analyses. The authors did not include information about the software, cutoffs, or scripts used to make their analyses or output those figures in the manuscript, which challenges the interpretation and assessment of the results. Terms like "Quantitative gene expression analyses" (line 136) "visualized in a Uniform Approximation and Projection" (line 178) do not explain what was inputted and the analyses that were executed. There are multiple forms to align, preprocess, and visualize bulk, single cell, ATAC, and ChIP-seq data, and depending on which was used, the results vary greatly. For example, in the single-cell data, the authors did not inform how many cells were sequenced, nor how many cells had after alignment and quality filtering (RNA count, mt count, etc.), so the result on Paneth+ to Goblet+ percent in lines 184 and 185 cannot be reached because it depends on this information. The absence of a clustering cutoff for the single-cell data is concerning since this greatly affects the resulting cluster number (https://www.nature.com/articles/s41592-023-01933-9). The authors should provide a comprehensive explanation of all the data analyses and the steps used to obtain those results. 

      We apologize for the insufficient information. Below, we provide detailed information on the data analyses, which are also available in the GEO database (Bulk RNA-seq: GSE157531, ATAC-seq: GSE157529, ChIP-seq: GSE277510). Methods are updated in the current version of supplemental information.

      (5) Clarity of methods and experimental approaches 

      The methods were incomplete and they require clarification. 

      We’ve updated our methods as requested by the reviewer.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors propose that DKK2 is necessary for the metastasis of colon cancer organoids. They then claim that DKK2 mediates this effect by permitting the generation of lysozyme-positive Paneth-like cells within the tumor microenvironmental niche. They argue that these lysozyme-positive cells have Paneth-like properties in both mouse and human contexts. They then implicate HNF4A as the causal factor responsive to DKK2 to generate lysozyme-positive cells through Sox9. 

      Strengths: 

      The use of a genetically defined organoid line is state-of-the-art. The data in Figure 1 and the dependence of DKK2 for splenic injection and liver engraftment, as well as the long-term effect on animal survival, are interesting and convincing. The rescue using DKK2 administration for some of their phenotype in vitro is good. The inclusion and analysis of human data sets help explore the role of DKK2 in human cancer and help ground the overall work in a clinical context. 

      Weaknesses: 

      In this work by Shin et al., the authors expand upon prior work regarding the role of Dickkopf-2 in colorectal cancer (CRC) progression and the necessity of a Paneth-like population in driving CRC metastasis. The general topic of metastatic requirements for colon cancer is of general interest. However, much of the work focuses on characterizing cell populations in a mouse model of hepatic outgrowth via splenic transplantation. In particular, the concept of Paneth-like cells is primarily based on transcriptional programs seen in single-cell RNA sequencing data and needs more validation. Although including human samples is important for potential generality, the strength could be improved by doing immunohistochemistry in primary and metastatic lesions for Lyz+ cancer cells. Experiments that further bolster the causal role of Paneth-like CRC cells in metastasis are needed. 

      Recommendations for the Authors:

      Reviewing Editor (Recommendations for the Authors): 

      Here we note several key concerns with regard to the main conclusions of the paper. Additional experiments to directly address these concerns would be required to substantially update the reviewer evaluation. 

      (1) Demonstration of a causal role of Paneth-like cells in CRC metastasis, for example by sorting the Paneth-like cells - either by the markers they identified in the subsequent single cell or by scatter - to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with tumorigenicity and engraftment. 

      We sincerely appreciate the reviewing editor’s comment. First, as previously reported (Shin et al., iScience 2021), there is no difference in proliferation between WT and KO during in vitro organoid culture or in vivo colitis-induced tumors. However, DKK2 deficiency led to morphological changes, which we analyzed using bulk RNA-seq. As described in the manuscript, Paneth cell marker genes, such as Lysozymes and defensins, were significantly reduced in DKK2 KO AKP organoids.

      Due to the nature of these markers, it is technically challenging to isolate live LYZ+ cancer cells. To address this issue in the future, we plan to develop organoids that express a reporter gene specific for Paneth cells. In this manuscript, we demonstrated a correlation between DKK2 and the formation of LYZ+ cancer cells. In both the splenic injection model (Fig. 1) and the orthotopic transplantation model (Fig. R1-R2), we observed that transplantation of cancer organoids with reduced numbers of LYZ+ cells (KO organoids) led to decreased metastatic tumor formation. The number of LYZ+ cells in KO-transplanted mice remained low in liver metastasized tumor nodules (Fig. 6H-I6). Immunohistochemistry further confirmed that LYZ+ cancer cells were barely detectable in KO samples (Author response image 5). These data suggest that DKK2 is essential for the formation of LYZ+ cancer cells, which are necessary for outgrowth following metastasis.

      Author response image 5.

      Histology of Lysozyme positive cells in metastasized tumor nodules in liver of colon cancer organoid transplanted mice. Immunohistochemistry of Lysozyme positive Paneth-like cells cells in liver metastasized colon cancer (Upper panels, DAB staining). Identification of tumor nodules by H&E staining (lower panels, Scale bar = 100 μm). Magnified tumor nodules are shown in the 2nd and 3rd columns (Scale bar = 25 μm). Arrows indicate Lysozyme positive Paneth like cells in tumor epithelial cells. Infiltration of Lysozyme positive myeloid cells is detected in both AKP and KO tumor nodules. AKP: Control colon cancer organoids carrying mutations in Apc, Kras and Tp53 genes. KO: Dkk2 knockout colon cancer organoids

      (2) Further characterization of Lyz+/Paneth-like cells to further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? 

      We appreciate the reviewing editor’s comment. In response, we performed confocal microscopy analysis to examine the protein levels of LYZ, Wnt3A, and DLL4 in AKP colon cancer organoids (Author response image 4). The data presented above show that LYZ+ cancer cells express both Wnt3A and DLL4, suggesting that LYZ+ colon cancer cells may function similarly to Paneth cells, which are stem cell niche cells. Furthermore, using the Panglao database, we demonstrated that LYZ+/Paneth-like cells exhibit typical Paneth cell properties in human scRNA-seq data (Fig. 4 and Fig. 5). These findings suggest that LYZ+ colon cancer cells possess Paneth cell properties.

      (3) Experiments to test metastasis, ideally from orthotopic colonic tumors, to ensure phenotypes aren't restricted to the splenic model of hepatic colonization and outgrowth used at present. 

      We are in agreement with the reviewing editor and reviewers, which is why we conducted the orthotopic transplantation experiment. However, we encountered challenges in establishing this model effectively. After multiple trials, we observed that many mice did not form primary tumors, and the variability, particularly in metastasis, was difficult to control. Only a few AKP-transplanted mice developed liver metastasis. The representative revision data have been provided above. Nevertheless, we believe that this model needs further improvement and optimization to reliably study metastasis originating from primary tumors.

      (4) To generalize claims to human cancer, the authors should test whether loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      We agree with your point, and this will be addressed in future studies.

      (5) Clarifying inconsistencies regarding effect of DKK2 loss on HNF4A (Figure 1E vs Figure 6I). 

      In Figure 1 E, we measured the mRNA levels of HNF4A in metastasized foci by qPCR while in Figure 6I, we measured the protein level of HNF4A by flow cytometry. Recent studies, including our previous report, have shown that HNF4A protein levels are regulated by proteasomal degradation mediated by pSrc (Mori-Akiyama et al. 2007, Gastroenterology, Bastide et al. 2007, Journal of Cell Biology, Shin et al. 2021 iScience). Consequently, while the mRNA levels remained unchanged in Fig. 1E, we observed a reduction of HNF4A protein levels in Figure 6I.

      (6) Addressing concerns about statistics and reporting as outlined by Reviewer 1. 

      Thank you very much for your assistance in improving our manuscript. The updates have been incorporated as detailed above.

      These are the central reviewer concerns that would require additional experimentation to update the editorial summary. Other concerns should be addressed in a revision response but do not require additional experimentation. 

      Reviewer #1 (Recommendations For The Authors): 

      Specific comments: 

      • Do Dkk2-KO organoids grow normally?

      Yes, in vitro.

      Since the authors reported on the effects of Dkk2 in the induction/maintenance of the Paneth cell niche, changes in AKP organoid numbers of growth rate between Dkk2-WT and KO would be an expected outcome. 

      Disruption of Paneth cell formation in normal organoids is expected to alter growth. However, DKK2 KO in colon cancer organoids with mutations in the Apc, Kras, and Tp53 genes exhibits growth rates and organoid sizes similar to those of WT AKP controls. In contrast to in vitro observations, we observed a significant reduction in metastasized tumor growth in vivo. Further analyses of factors derived from LYZ+ cancer cells will help address the discrepancy in DKK2's absence between in vitro and in vivo conditions.

      • Figure 1: 

      - Panel C: The legend indicates what c.p. stands for.

      c.p.m. stands for count per minutes for in vivo imaging analysis. This has been updated in the Figure legend.

      - Panel E: Please comment on the possible underlying reasons for the lack of change in HNF4a1 levels. 

      This has been updated in response to the reviewing editor’s comment (5) above.

      - Panel E: Number of mice from which isolated cancer nodules were harvested. 

      Total mice per group were 5. This has been updated in the legend.

      • Figure 2: 

      - Suggestion: Panel A should be presented in Figure 1 since Dkk2 KO organoids are already used in Figure 1. 

      We added this to present the recovery of DKK2 by adding recombinant DKK2 proteins in Fig.2.

      - Panel B: Please explain why these genes are marked in blue. 

      It has been described in the legend. “Paneth cell marker genes are highlighted as blue circles (AKP=3 and KO=5 biological replicates were analyzed).”

      • Figure 3: 

      - Indicate the number of cells recovered from AKP vs. KO mice (since liver metastasis was already reduced in KO mice). This should be shown in a UMAP. 

      - Panel A: 4th line in the pathways, correct "Singel" typo. 

      We appreciate your correction. It has been fixed.

      - Panel A: There are multiple versions of PanglaoDB with different markers; a list of all that was used to determine cell type should be provided. 

      - Panel C: Bar value for the WNT pathway is not displayed, and there is no legend to indicate the direction of the analysis (that is, AKPvsKO or KOvsAKP). 

      It is KOvsAKP, described in the figure legend.

      - Panel C: Ingenuity pathway analysis is not a good tool to look at this type of result because it does not include the gene fold changes in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or fold changes - recommend substituting any type of GSEA analysis, such as fgsea. -o Panel D: the term "Patient" to refer to mice is confusing. Use "Mice" or "Treatment" or "Condition" instead. 

      Corrected

      - Panel D: Information about the number of mice per group, cells per animal (or liver let) used, and additional clarification about the statistical analysis used is required, as differences shown in this panel appear subtle given the standard variation in each group. Box plots need to show individual/raw values. 

      • Figure 4: 

      - Panel E: It would be helpful to show the cutoff lines for the Paneth cell score and Lyz expression in the graphs. 

      It has been updated in response to the reviewer’s request.

      • Figure 5: 

      - Panel B: again, information about the number of "patients" or cells used and clarification about the statistical analysis used is required as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values

      It has been updated in response to the reviewer’s request.

      • Figure 6: 

      - Panel A: Add a legend to inform the direction of the process (e.g., red, activation, blue, repression). We noticed the Yap1 bar data had no color. Is there a reason for that? Please explain this point in the revised manuscript. 

      Red color added for the Yap1.

      - Panel A: Ingenuity pathway analysis is not a good tool to look at this type of results because it does not include the gene Foldchanges in the analysis, so it only provides a Z-score of the presence of that pathway and not the degree it is increased or not. I recommend substituting any type of GSEA analysis, such as fgsea. 

      - Panels A&B: Again, only p-value scores were provided, while fold changes are necessary to define the ratio of presence increase of normal vs. AKP. 

      - Panel D: No raw or pre-processed ChIP-seq data was provided. Additionally, please indicate exactly the genome location (it seems the image was edited from a raw made on UCSC genome browser-it should be remade by adding coordinates and other important information (genes around, epigenetic, etc.). 

      - Panel H/I: Flow cytometry gating is inappropriate, as its catching cells are negative for LYZ in both AKP and KO cells, resulting in an overestimation of the number of Lyz cells. Gating should specifically select very few LYZ-positive cells in the top/left quadrant. 

      The updates have been made, and the statistical data have been re-analyzed.

      - Panel J: Information about the number of animals/organoids or cells used and clarification about the statistical analysis used is required, as the display of data generates concerns about the distribution within groups. Box plots need to show individual/raw values. 

      • Overall: 

      - A supplementary table with all the sequenced libraries and their depth, read length/cell count should be provided.

      All of the information is now available in the GEO database. We used previously published human epithelial datasets for human single cell analysis (Joanito*, Wirapati*, Zhao*, Nawaz* et al, Nat Genetics, 2022, PMID: 35773407).

      - The Hallmark Geneset used is very broad, and the authors should confirm the results on GO bp. 

      Using Gene Ontology biological processes (GO bp), we observed that glycolysis-related genes were enriched in our newly described cell population, although the adjusted p-value did not exceed 0.05.

      Author response image 6

      GSEA with GOBP pathway highlighted glycoprotein and protein localization to extracellular region, both of which are related Paneth cell functions. Paneth cells secrete α-defensins, angiogenin-4, lysozyme and secretory phospholipase A2. The enriched glycoprotein process and protein localization not extracellular region reflect the characteristics of Paneth cells. 

       

      - qPCR is not a good way to confirm sequencing results; while PCR data is pre-normalized, sequencing is normalized only after quantification, so results on 6 E and F should be shown on the sequencing data. 

      The expression level of Sox9 is relatively low. In our bulk RNA-seq data, the averages for Sox9 in AKP versus DKK2 KO are 28.2 and 25.1, respectively. While there is a similar trend, the difference is not statistically significant in this dataset, and we did not include an experimental group for reconstitution. Therefore, we conducted qPCR experiments for the reconstitution study by adding recombinant DKK2 (rmDKK2) protein to the culture. Furthermore, it is well established that Sox9 is an essential transcription factor for the formation of LYZ+ Paneth cells. Based on this, we assessed the levels of LYZ and Sox9 using qPCR and confocal microscopy in the presence or absence of DKK2.

      • Edits in the text: 

      - There are several typographical errors. Specific suggestions are provided below. 

      - Line 43: "Chromatin immunoprecipitation followed by sequencing analysis," state analysis of what cells before continuing with "revealed..." revealed... 

      - Line 77: Recent findings have identified 

      - Line 138: were reduced in KO tumor samples à rephrase to clarify "KO-derived liver tumors" 

      - Line 167: Recombinant mouse DKK2 protein treatment in KO organoids partially rescued this effect. Add "partially" since adding rmDkk2 didn't fully restore Lyz1 and Lyz2 levels. 

      - Line 185-187: the authors should not reference Figure 6 because it has not been introduced yet. 

      - Line 198-199: The authors claimed a correlation between Dkk2 expression and Lgr5 expression; however, the graph presented in Figure 3B does not indicate this. The R-value was 0.11, which does not indicate a correlative expression between these genes. 

      - Line 232-233: the authors need to show any connection to Dkk2 gene expression in human samples in order to draw that conclusion. 

      - Line 294: expression, leading to the formation 

      - Line 347: Wnt ligand (correct Wng typo) 

      We have modified our manuscript in accordance with the reviewer’s suggestions.

      Reviewer #2 (Recommendations For The Authors): 

      Specific criticisms/suggestions: 

      Author claim 1: Dkk2 is necessary for liver metastasis of colon cancer organoids. <br /> This model is one of hepatic colonization and eventual outgrowth and not metastasis. Metastasis is optimally assessed using autochthonous models of cancer generation, with the concomitant intravasation, extravasation, and growth of cancer cells at the distant site. The authors should inject their various organoids in an orthotopic colonic transplantation assay, which permits the growth of tumors in the colon, and they can then identify metastasis in the liver that results from that primary cancer lesion (i.e., to better model physiologic metastasis from the colon to liver). 

      The data of orthotopic colonic transplantation data has been provided above (Author response images 1 and 2).

      Author claim 2: DKK2 is required for the formation of lysozyme-positive cells in colon cancer. 

      It would greatly strengthen the authors' claim if supraphysiologic or very high amounts of DKK2 enhance CRC organoid line engraftment ( i.e., the specific experiment being pre-treatment with high levels of DKK2 and immediate transplantation to see a number of outgrowing clones). If DKK2 is causal for the engraftment of the tumors, increased DKK2 should enhance their capacity for engraftment. 

      Paneth cells have physical properties permitting sorting and are readily identifiable on flow cytometry. The authors should demonstrate increased tumorigenicity and engraftment by sorting the Paneth-like cells-either by the markers they identified in the subsequent single cell or by scatter to establish whether the frequency of the Paneth-like cells in a culture of organoids is directly correlated with engraftment potential. 

      Further characterization of the Paneth-like cells would help further the authors' argument for the unique function that they have in their tumor model. Specifically, do the cells with Paneth-like cells secrete Wnt3, EGF, Notch ligand, and DII4 as normal Paneth cells do? Immunofluorescence, sorting, or western blots would all be reasonable methods to assess protein levels in the sorted population. 

      This has been performed and provided above (Author response images 1 and 3)

      Author claim 3: Lyzosome (LYZ)+ cancer cells exhibit Paneth cell properties in both mouse and human systems. 

      For the claim to be general to human cancer, the author should demonstrate that loss of DKK2 impacts LYZ+ cancer cells in human organoids and affects their engraftment in immunodeficient mice compared to control. Another more correlative way to validate the LYZ+ expression in human colon cancer would be to stain for LYZ in metastatic vs. primary colon cancer, expecting metastatic lesions to be enriched for LYZ+ cells. 

      The claims on the metabolic function of Paneth-like cells need more clarification. Do the cancer cells with Paneth features have a distinct metabolic profile compared to the other cell populations? The authors should address this through metabolic characterization of isolated LYZ+ cells with Seahorse or comparison of Dkk2 KO to WT organoids (i.e., +/-LYZ+ cancer cell population). 

      To address this question, we need to develop organoids with a Paneth cell reporter gene. We appreciate the reviewer’s comment, and this should be pursued in future studies.

      Author claim 4: HNF4A mediates the formation of Lysozyme (Lyz)-positive colon cancer cells by DKK2. 

      The authors implicate HNF4A and Sox9 as causal effectors of the Paneth-like cell phenotype and subsequent metastatic potential. There appears to be some discordance regarding the effect of DKK2 loss on HNF4A. In Figure 1E, the authors show that gene expression in metastatic colon cancer cells for HNF4A in DKK2 knockout vs AKP control is insignificant. However, in Figure 6I, there is a highly significant difference in the number of HNF4A positive cells, more than a 3-fold percentage difference, with a p-value of <0.0001. If there is the emergence of a rare but highly expressing HNF4A cell type that on aggregate bulk expression leads to no difference, but sorts differentially, why is it not identified in the single-cell data set? These data together are highly inconsistent with regards to the effect of DKK2 on HNF4A and require clarification. 

      Previous studies have demonstrated that HNF4A is regulated by proteasomal degradation mediated by pSrc. As a result, the mRNA level of HNF4A remains unchanged, while the protein level is significantly reduced in colon cancer cells. DKK2 KO leads to decreased Src phosphorylation, resulting in the recovery of HNF4A protein levels. This explains why HNF4A cannot be detected in scRNA-seq datasets, which measure mRNA. We have shown this in our previous report. In this manuscript, based on ChIP-seq data using an anti-HNF4A monoclonal antibody, as well as confocal microscopy and qPCR data for the Sox9 gene, we propose that HNF4A acts as a regulator of cancer cells exhibiting Paneth cell properties.

    1. eLife Assessment

      This useful study presents a novel microscopy technique called "Expansion Tiling Light Sheet Microscopy" and an accompanying computational pipeline for the faster collection and analysis of 3D volumetric images in animals like planarians. This approach produces beautiful 3D microscropy images and is solid on a technical level. However, due to the use of antibody reagents that visualize many – but not all – neurons and muscle subtypes, the evidence for the biological conclusions in this study remains incomplete. With the claims appropriately contextualized, this paper will be of interest to cell biologists working on imaging and analyzing whole animals.

    2. Reviewer #1 (Public review):

      Summary:

      The planarian flatworm Schmidtea mediterranea is widely used as a model system for regeneration because of its remarkable ability to regenerate its entire body plan from very small fragments of tissue, including the complete and rapid regeneration of the CNS. Prior to this study, analysis of CNS regeneration in planaria has mostly been performed on a gross anatomical level. Despite its simplicity compared to vertebrates, the CNS of many invertebrates, including planaria, is nonetheless complex, intricate, and densely packed. Some invertebrate models allow the visualization of individual cellular components of the CNS using transgenic techniques. Until transgenesis becomes commonplace in planaria, the visualization and analysis of detailed CNS anatomy must rely on alternate approaches in order to capitalize on the immense promise of this system as a model for CNS regeneration. Another challenge for the study of the CNS more broadly is how to perform imaging of a complete CNS on a reasonable timescale such that multiple individuals per experimental condition can be imaged.

      Strengths:

      In this report, Lu et al. describe a careful and detailed analysis of the planarian neuroanatomy and musculature in both the homeostatic and regenerating contexts. To improve the effective resolution of their imaging, the authors optimized a tissue expansion protocol for planaria. Imaging was performed by light sheet microscopy, and the resulting optical sections were tiled to reconstruct whole worms. Labelled tissues and cells were then segmented to allow quantification of neurons and muscle fibers, as well as all cells in individual worms using a DNA dye. The resulting workflow can produce highly detailed and quantifiable 3D reconstructions at a rate that is fast enough to allow the analysis of large numbers of animals.

      Weaknesses:

      Lu et al. use their workflow to visualize RNA expression of five enzymes that are each involved in the biosynthetic pathway of different neurotransmitters/modulators, namely chat (cholinergeric), gad (GABAergic), tbh (octopaminergic), th (dopaminergic), and tph (serotonergic). In this way, they generate an anatomical atlas of neurons that produce these molecules. Collectively these markers are referred to as the "neuronpool." They overstate when they write, "The combination of these five types of neurons constitutes a neuron pool that enables the labeling of all neurons throughout the entire body." This statement does not accurately represent the state of our knowledge about the diversity of neurons in S. mediterranea. There are several lines of evidence that support the presence of glutamatergic and glycinergic neurons, including the following. The glutamate receptor agonists NMDA and AMPA both produce seizure-like behaviors in S. mediterranea that are blocked by the application of glutamate receptor antagonists MK-801 and DNQX (which antagonize NMDA and AMPA glutamate receptors, respectively; Rawls et al., 2009). scRNA-Seq data indicates that neurons in S. mediterranea express a vesicular glutamate transporter, a kainite-type glutamate receptor, a glycine receptor, and a glycine transporter (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Two AMPA glutamate receptors, GluR1 and GluR2, are known to be expressed in the CNS of another planarian species, D. japonica (Cebria et al., 2002). Likewise, there is abundant evidence for the presence of peptidergic neurons in S. mediterranea (Collins et al., 2010; Fraguas et al., 2012; Ong et al., 2016; Wyss et al., 2022; among others) and in D. japonica (Shimoyama et al., 2016). For these reasons, the authors should not assume that all neurons can be assayed using the five markers that they selected. The situation is made more complex by the fact that many neurons in S. mediterranea appear to produce more than one neurotransmitter/modulator/peptide (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022), which is common among animals (Vaaga et al., 2014; Brunet Avalos and Sprecher, 2021). However the published literature indicates that there are substantial populations of glutamatergic, glycinergic, and peptidergic neurons in S. mediterranea that do not produce other classes of neurotransmission molecule (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Thus it seems likely that the neuronpool will miss many neurons that only produce glutamate, glycine or a neuropeptide.

      The authors use their technique to image the neural network of the CNS using antibodies raised vs. Arrestin, Synaptotagmin, and phospho-Ser/Thr. They document examples of both contralateral and ipsilateral projections from the eyes to the brain in the optic chiasma (Figure 1C-F). These data all seem to be drawn from a single animal in which there appears to be a greater than normal number of nerve fiber defasciculatations. It isn't clear how well their technique works for fibers that remain within a nerve tract or the brain. The markers used to image neural networks are broadly expressed, and it's possible that most nerve fibers are too densely packed (even after expansion) to allow for image segmentation. The authors also show a close association between estrella-positive glial cells and nerve fibers in the optic chiasma.

      The authors count all cell types, neuron pool neurons, and neurons of each class assayed. They find that the cell number to body volume ratio remains stable during homeostasis (Figure S3C), and that the brain volume steadily increases with increasing body volume (Figure S3E). They also observe that the proportion of neurons to total body cells is higher in worms 2-6 mm in length than in worms 7-9 mm in length (Figure 2D, S3F). They find that the rate at which four classes of neurons (GABAergic, octopaminergic, dopaminergic, serotonergic) increase relative to the total body cell number is constant (Figure S3G-J). They write: "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." This conclusion should not be reached without first directly counting the number of cholinergic neurons and total body cells. Given that glutamatergic, glycinergic, and peptidergic neurons were not counted, it also remains possible that the non-linear dynamics are due (in part or in whole) to one or more of these populations.

      The authors next assayed the production of different classes of neurons in regenerating post-pharyngeal tail fragments. At 14 dpa, they find significantly reduced proportions of octopaminergic, GABAergic, and dopaminergic neurons in these regenerated animals (Figure 3K). Given that these three neuron classes are primarily found in the brain region (Figure S2A), this suggests that the brains of these animals may not have finished regenerating by 14 dpa.

      The authors next applied their imaging and segmentation technique to the musculature using the 6G10 antibody. They find that the body wall muscle fibers from the dorsal and ventral body walls integrate differently at the anterior end (to form a cobweb-like arrangement) compared to the posterior end (Figure 4I). They knock down β-catenin in regenerating head anterior fragments and find that the resulting double-headed worms produce a cobweb-like arrangement at both ends (Figure 4J).

      RNAi knockdown of inr-1 is known to produce mobility defects and have elongated bodies relative to control animals (Lei et al., 2016; Figure S6A). To understand the nature of these defects, the authors image the muscle of inr-1 RNAi animals and find increased circular body wall muscle fibers on both dorsal and ventral sides, while β-catenin RNAi animals have increased longitudinal muscle fibers on the dorsal side (Figure 6C). The inr-1 RNAi animals also have reduced cholinergic neurons (Figure S6B), and ectopic expression of the GABAergic marker gad in the periphery (Figure S6B). Lastly the authors simultaneously image muscle and estrella-positive glia and find that these glia lack their typically elaborate stellate morphology in inr-1 RNAi animals (Figure 6E, S6E-K). The combination of this muscle, neuronal, and glial defects may account for the mobility defects observed in inr-1 RNAi worms.

    3. Author response:

      Reviewer #1 (Public review):

      Lu et al. use their workflow to visualize RNA expression of five enzymes that are each involved in the biosynthetic pathway of different neurotransmitters/modulators, namely chat (cholinergeric), gad (GABAergic), tbh (octopaminergic), th (dopaminergic), and tph (serotonergic). In this way, they generate an anatomical atlas of neurons that produce these molecules. Collectively these markers are referred to as the "neuronpool." They overstate when they write, "The combination of these five types of neurons constitutes a neuron pool that enables the labeling of all neurons throughout the entire body." This statement does not accurately represent the state of our knowledge about the diversity of neurons in S. mediterranea. There are several lines of evidence that support the presence of glutamatergic and glycinergic neurons, including the following. The glutamate receptor agonists NMDA and AMPA both produce seizure-like behaviors in S. mediterranea that are blocked by the application of glutamate receptor antagonists MK-801 and DNQX (which antagonize NMDA and AMPA glutamate receptors, respectively; Rawls et al., 2009). scRNA-Seq data indicates that neurons in S. mediterranea express a vesicular glutamate transporter, a kainite-type glutamate receptor, a glycine receptor, and a glycine transporter (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Two AMPA glutamate receptors, GluR1 and GluR2, are known to be expressed in the CNS of another planarian species, D. japonica (Cebria et al., 2002). Likewise, there is abundant evidence for the presence of peptidergic neurons in S. mediterranea (Collins et al., 2010; Fraguas et al., 2012; Ong et al., 2016; Wyss et al., 2022; among others) and in D. japonica (Shimoyama et al., 2016). For these reasons, the authors should not assume that all neurons can be assayed using the five markers that they selected. The situation is made more complex by the fact that many neurons in S. mediterranea appear to produce more than one neurotransmitter/modulator/peptide (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022), which is common among animals (Vaaga et al., 2014; Brunet Avalos and Sprecher, 2021). However the published literature indicates that there are substantial populations of glutamatergic, glycinergic, and peptidergic neurons in S. mediterranea that do not produce other classes of neurotransmission molecule (Brunet Avalos and Sprecher, 2021; Wyss et al., 2022). Thus it seems likely that the neuronpool will miss many neurons that only produce glutamate, glycine or a neuropeptide.

      In response to your comments, we agree that our initial statement regarding the "neuron pool" overstated the extent of neuronal coverage provided by the five selected markers. We have revised the sentence as “The combination of these five types of neurons constitutes a neuron pool that enables the labeling of most of the neurons throughout the entire body, including the eyes, brain, and pharynx”. 

      Furthermore, we chose the five neurotransmitter systems (cholinergic, GABAergic, octopaminergic, dopaminergic, and serotonergic) based on their well-characterized roles in planarian neurobiology and the availability of reliable markers. However, we acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling, which have been documented in S. mediterranea. We will also add the content about other neuron types in our revised manuscript “Additionally, there is considerable diversity among glutamatergic, glycinergic, and peptidergic neurons in planarians. Many neurons in S. mediterranea express more than one neurotransmitter or neuropeptide, which adds further complexity to the system.”

      The authors use their technique to image the neural network of the CNS using antibodies raised vs. Arrestin, Synaptotagmin, and phospho-Ser/Thr. They document examples of both contralateral and ipsilateral projections from the eyes to the brain in the optic chiasma (Figure 1C-F). These data all seem to be drawn from a single animal in which there appears to be a greater than normal number of nerve fiber defasciculatations. It isn't clear how well their technique works for fibers that remain within a nerve tract or the brain. The markers used to image neural networks are broadly expressed, and it's possible that most nerve fibers are too densely packed (even after expansion) to allow for image segmentation. The authors also show a close association between estrella-positive glial cells and nerve fibers in the optic chiasma. 

      Thank you for your detailed feedback. While we did not perform segmentation of all neuron fibers, we were able to segment more isolated fibers that were not densely packed within the neural tracts. We use 120 nm resolution to segment neurons along the three axes. Our data show the presence of both contralateral and ipsilateral projections of visual neurons. Although Figure 1C-F shows data from one planarian, we imaged three independent specimens to confirm the consistency of these observations. In the revised manuscript, we will include a discussion on the limitations of TLSM in reconstructing neural networks, particularly when it comes to resolving fibers within densely packed regions of the nerve tracts.

      The authors count all cell types, neuron pool neurons, and neurons of each class assayed. They find that the cell number to body volume ratio remains stable during homeostasis (Figure S3C), and that the brain volume steadily increases with increasing body volume (Figure S3E). They also observe that the proportion of neurons to total body cells is higher in worms 2-6 mm in length than in worms 7-9 mm in length (Figure 2D, S3F). They find that the rate at which four classes of neurons (GABAergic, octopaminergic, dopaminergic, serotonergic) increase relative to the total body cell number is constant (Figure S3G-J). They write: "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." This conclusion should not be reached without first directly counting the number of cholinergic neurons and total body cells. Given that glutamatergic, glycinergic, and peptidergic neurons were not counted, it also remains possible that the non-linear dynamics are due (in part or in whole) to one or more of these populations. 

      We have removed the statement "Since the pattern of cholinergic neurons is the major cell population in the brain, these results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is likely from the cholinergic neurons." We changed this statement into “These results suggest that the above observation of the non-linear dynamics between neurons and cell numbers is not likely from the octopaminergic, GABAergic, dopaminergic and serotonergic neurons. Since our neuron pool may not include glutamatergic, glycinergic, and peptidergic neurons, we would like to add the possibility that the non-linear dynamics may be from cholinergic neurons or other neurons not included in our staining.”

      Reviewer #2 (Public review): 

      Weaknesses: 

      (1) The proprietary nature of the microscope, protected by a patent, limits the technical details provided, making the method hard to reproduce in other labs. 

      Thank you for your comment. We understand the importance of reproducibility and transparency in scientific research. We would like to point out that the detailed design and technical specifications of the TLSM are publicly available in our published work: Chen et al., Cell Reports, 2020. Additionally, the protocol for C-MAP, including the specific experimental steps, is comprehensively described in the methods section of this paper. We believe that these resources should provide sufficient information for other labs to replicate the method.

      (2) The resolution of the analyses is mostly limited to the cellular level, which does not fully leverage the advantages of expansion microscopy. Previous applications of expansion microscopy have revealed finer nanostructures in the planarian nervous system (see Fan et al. Methods in Cell Biology 2021; Wang et al. eLife 2021). It is unclear whether the current protocol can achieve a comparable resolution. 

      Thank you for raising this important point. The strength of our C-MAP protocol lies in its fluorescence-protective nature and user convenience. Notably, the sample can be expanded up to 4.5-fold linearly without the need for heating or proteinase digestion, which helps preserve fluorescence signals. In addition, the entire expansion process can be completed within 48 hours. While our current analysis focused on cellular-level structures, our method can achieve comparable or better resolution and we will add this information in the revised manuscript.

      (3) The data largely corroborate past observations, while the novel claims are insufficiently substantiated. 

      A few major issues with the claims: 

      (4) Line 303-304: While 6G10 is a widely used antibody to label muscle fibers in the planarian, it doesn't uniformly mark all muscle types (Scimone at al. Nature 2017). For a more complete view of muscle fibers, it is important to use a combination of antibodies targeting different fiber types or a generic marker such as phalloidin. This raises fundamental concerns about all the conclusions drawn from Figures 4 and 6 about differences between various muscle types. Additionally, the authors should cite the original paper that developed the 6G10 antibody (Ross et al. BMC Developmental Biology 2015). 

      We appreciate the reviewer’s insightful comments and acknowledge that 6G10 does not uniformly label all muscle fiber types. We agree that this limitation should be recognized in the interpretation of our results. we will revise the manuscript to explicitly state the limitations of using 6G10 alone for muscle fiber labeling and highlight the need for additional markers. We would also clarify that the primary objective of our study was not to distinguish all muscle fiber types but rather to demonstrate the application of our 3D tissue reconstruction method in addressing traditional research questions. Nonetheless, we agree that expanding the labeling strategy in future studies would allow for a more thorough investigation of muscle fiber diversity. We will ensure all citations are properly revised and updated in our next version.

      (5) Lines 371-379: The claim that DV muscles regenerate into longitudinal fibers lacks evidence. Furthermore, previous studies have shown that TFs specifying different muscle types (DV, circular, longitudinal, and intestinal) both during regeneration and homeostasis are completely different (Scimone et al., Nature 2017 and Scimone et al., Current Biology 2018). Single-cell RNAseq data further establishes the existence of divergent muscle progenitors giving rise to different muscle fibers. These observations directly contradict the authors' claim, which is only based on images of fixed samples at a coarse time resolution. 

      Thank you for your valuable feedback. Our intent was not to suggest that DV muscles regenerate into longitudinal fibers. Our observations focused on the wound site, where DV muscle fibers appear to reconnect, and longitudinal fibers, along with other muscle types, gradually regenerate to restore the structure of the injured area. We will revise the relevant sections of the manuscript to clarify this dynamic process more accurately.

      (6) Line 423: The manuscript lacks evidence to claim glia guide muscle fiber branching. 

      We will remove this statement from the revised version. Instead, we will focus on describing our observations of the connections between glial cells and muscle fibers.

      (7) Lines 432/478: The conclusion about neuronal and muscle guidance on glial projections is similarly speculative, lacking functional evidence. It is possible that the morphological defects of estrella+ cells after bcat1 RNAi are caused by Wnt signaling directly acting on estrella+ cells independent of muscles or neurons. 

      We understand that this approach is insufficient and we will revise the manuscript to more clearly state the limitations of our data. We will describe our observations as preliminary and suggest that further experiments are required.

      (8) Finally, several technical issues make the results difficult to interpret. For example, in line 125, cell boundaries appear to be determined using nucleus images; in line 136, the current resolution seems insufficient to reliably trace neural connections, at least based on the images presented. 

      We use two setups for imaging cells and neuron projections. For cellular resolution imaging, we utilized a 1× air objective with a numerical aperture (NA) of 0.25 and a working distance of 60 mm (OLYMPUS MV PLAPO). The voxel size used was 0.8×0.8×2.5 µm3. This configuration resulted in a resolution of 2×2×5 µm3 and a spatial resolution of 0.5×0.5×1.25 µm3 with 4× isotropic expansion. Alternatively, for sub-cellular imaging, we employed a 10×0.6 SV MP water immersion objective with 0.8 NA and a working distance of 8 mm (OLYMPUS). The voxel size used in this configuration was 0.26×0.26×0.8 µm3. As a result of this configuration, we achieved a resolution of 0.5×0.5×1.6 µm3 and a spatial resolution of 0.12×0.12×0.4 µm3 with a 4.5× isotropic expansion. The higher resolution achieved with sub-cellular imaging allows us to observe finer structures and trace neural connections.

      Regarding your question about cell boundaries, we will revise the manuscript to specify that the boundaries we identified are those of each nucleus, rather than entire cells. This distinction will be made clear in the revised version.

      Reviewer #3 (Public review): 

      Weaknesses: 

      (1) The work would have been strengthened by a more careful consideration of previous literature. Many papers directly relevant to this work were not cited. Such omissions do the authors a disservice because in some cases, they fail to consider relevant information that impacts the choice of reagents they have used or the conclusions they are drawing. 

      For example, when describing the antibody they use to label muscles (monoclonal 6G10), they do not cite the paper that generated this reagent (Ross et al PMCID: PMC4307677), and instead, one of the papers they do cite (Cebria 2016) that does not mention this antibody. Ross et al reported that 6G10 does not label all body wall muscles equivalently, but rather "predominantly labels circular and diagonal fibers" (which is apparent in Figure S5A-D of the manuscript being reviewed here). For this reason, the authors of the paper showing different body wall muscle populations play different roles in body patterning (Scimone et al 2017, PMCID: PMC6263039, also not cited in this paper) used this monoclonal in combination with a polyclonal antibody to label all body wall muscle types. Because their "pan-muscle" reagent does not label all muscle types equivalently, it calls into question their quantification of the different body wall muscle populations throughout the manuscript. It does not help matters that their initial description of the body wall muscle types fails to mention the layer of thin (inner) longitudinal muscles between the circular and diagonal muscles (Cebria 2016 and citations therein). 

      Ipsilateral and contralateral projections of the visual axons were beautifully shown by dye-tracing experiments (Okamoto et al 2005, PMID: 15930826). This paper should be cited when the authors report that they are corroborating the existence of ipsilateral and contralateral projections. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript. We acknowledge the limitations of this approach and recognize that it does not encompass all neuron types, particularly those involved in glutamatergic, glycinergic, and peptidergic signaling. We will also add the content about other neuron types in our revised version.

      (2) The proportional decrease of neurons with growth in S. mediterranea was shown by counting different cell types in macerated planarians (Baguna and Romero, 1981; https://link.springer.com/article/10.1007/BF00026179) and earlier histological observations cited there. These results have also been validated by single-cell sequencing (Emili et al, bioRxiv 2023, https://www.biorxiv.org/content/10.1101/2023.11.01.565140v). Allometric growth of the planaria tail (the tail is proportionately longer in large vs small planaria) can explain this decrease in animal size. The authors never really discuss allometric growth in a way that would help readers unfamiliar with the system understand this. 

      Thank you for your feedback. We will incorporate these citations and clarifications into the revised manuscript.

      (3) In some cases, the authors draw stronger conclusions than their results warrant. The authors claim that they are showing glial-muscle interactions, however, they do not provide any images of triple-stained samples labeling muscle, neurons, and glia, so it is impossible for the reader to judge whether the glial cells are interacting directly with body wall muscles or instead with the well-described submuscular nerve plexus. Their conclusion that neurons are unaffected by beta-cat or inr-1 RNAi based on anti-phospho-Ser/Thr staining (Fig. 6E) is unconvincing. They claim that during regeneration "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373). They provide no evidence for such switching of muscle cell types, so it is unclear why they say this. 

      We acknowledge that some of our conclusions were overclaimed given the current data, and we appreciate the opportunity to clarify and refine these claims in the revised manuscript. Regarding the statement that "DV muscles initially regenerate into longitudinal fibers at the anterior tip" (line 373), as addressed in our previous response, this phrasing was unclear. Our intent was not to imply that DV muscles switch into longitudinal fibers. Instead, we observed that muscle fibers reconnect at the wound site, with longitudinal fibers and other muscle types gradually restoring the structure. We will revise this section to better describe the dynamic changes observed during regeneration.

      (4) The authors show how their automated workflow compares to manual counts using PI-stained specimens (Figure S1T). I may have missed it, but I do not recall seeing a similar ground truth comparison for their muscle fiber counting workflow. I mention this because the segmented image of the posterior muscles in Figure 4I seems to be missing the vast majority of circular fibers visible to the naked eye in the original image. 

      Thank you for raising this important point. We will include a ground truth comparison of our automated muscle fiber counting with manual counts in the supplementary figures. Regarding the observation of missing circular fibers in Figure 4I, we agree that the segmentation appears to have missed a significant number of circular fibers in this particular image. This may have been due to limitations in the current parameters of the segmentation algorithm, especially in distinguishing fibers in regions of varying intensity or overlap. We are revisiting the segmentation parameters to improve the accuracy of detecting circular fibers, and we will provide an updated version of Figure 4I in the revised manuscript.

      (5) It is unclear why the abstract says, "We found the rate of neuron cell proliferation tends to lag..." (line 25). The authors did not measure proliferation in this work and neurons do not proliferate in planaria. 

      Thank you for bringing this to our attention. What we intended to convey was the increase in neuron number during homeostasis. We will revise the abstract to avoid this mistake in this context and instead describe it as the increase in neuron numbers due to progenitor cell differentiation during homeostasis.

      (6) It is unclear what readers are to make of the measurements of brain lobe angles. Why is this a useful measurement and what does it tell us? 

      The measurement of brain lobe angles is intended to provide a quantitative assessment of the growth and morphological changes of the planarian brain during regeneration. Additionally, the relevance of brain lobe angles has been explored in previous studies, such as Arnold et al., Nature, 2016, further supporting its use as a meaningful parameter.

      (7) The authors repeatedly say that this work lets them investigate planarians at the single-cell level, but they don't really make the case that they are seeing things that haven't already been described at the single-cell level using standard confocal microscopy. 

      Thank you for your comment. We agree that single-cell level imaging has been previously achieved in planarians using conventional confocal microscopy. However, our goal was to extend the application of expansion microscopy by combining C-MAP with tiling light sheet microscopy (TLSM), which allows for faster and high-resolution 3D imaging of whole-mount planarians. This combination offers several key advantages over traditional confocal microscopy. For example, it enables high-throughput imaging across entire organisms with a level of detail and speed that is not easily achieved using confocal methods. This approach allows us to investigate the planarian nervous system at multiple developmental and regenerative stages in a more comprehensive manner, capturing large-scale structures while preserving fine cellular details. The ability to rapidly image whole planarians in 3D with this resolution provides a more efficient workflow for studying complex biological processes. We believe this distinction is significant and represents an advance over previous methods. We will clarify this point in the manuscript to better distinguish our approach from standard techniques.

    1. eLife Assessment

      This study provides a valuable perspective on visual cortex architecture by identifying two cortical gradients that change across the lifespan and have distinct functional and structural features. The first gradient captures well-mapped variations in cortical thickness and myelination markers from early sensory to higher-order cortex, while the second gradient shows divergence in these measures with a more localized structure, notably predicting a previously unknown cluster of visual field maps in the anterior temporal lobe. The large-scale lifespan data are compelling, but the evidence overall is incomplete with key questions around methodical checks and implementation, the standard of evidence for the new visual maps, and how the gradient model relates to sharp tissue boundaries parcellating the cortex.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript uses large-scale existing datasets that span almost the full range of human life (5-100 years) to identify two distinct architectural cortical gradients within the visual cortex. These gradients are distinct: in one, cytoarchitecture and myeloarchitecture converge and in the other, they diverge. The authors tested whether these gradients mapped onto known functional properties of the visual cortex, as well as accounting for visual behaviours that are impacted throughout the lifespan. The manuscript also reports the identification of a hitherto unknown cluster of visual field maps in the anterior temporal lobe.

      Strengths:

      A major strength of the current manuscript is the use of large-scale measurements of human brain structure throughout the lifespan, courtesy of the Human Connectome Project Initiative. The scope of this cross-sectional analysis would be rare, if not impossible to achieve through an individual project.

      The approach employed holds promise for assessing the link between large-scale anatomical gradients in the brain and functional/behavioural properties. The current manuscript focuses on the visual cortex but the approach could easily be implemented across the brain in general.

      Weaknesses:

      While the evidence in favour of the two gradients largely supports the claims, the evidence for a new visual field map cluster in the anterior temporal lobe falls short of the level used historically when identifying visual field maps in the visual cortex and is, at present, not convincing.

      More specifically, the progressions of polar angle within the putative anterior lobe cluster are highly variable across subjects. Few subjects have convincing polar angle reversals at either the horizontal or vertical meridians. In other cases, a putative border is shown that spans different polar angles, which does not align with the accepted definitions for visual field maps in the cortex.

    3. Reviewer #2 (Public review):

      Summary:

      The authors used large MRI data sets of the Human Connectome Project (HCP) and also conducted additional pRF analyses to describe the structural architecture of the human visual cortex in reference to its functional features. By conducting a PCA, they identify 2 components that explain around 50% of the variance, the driven by a positive co-variance between cortical thickness and T1/T2 ratio, the second by their negative co-variance. The first PC spans most early visual cortex and hence shows a relation to pRF size when taking both early and late visual areas into account. The second is more variable in location and does not relate to pRF size or visual hierarchy. The relationship between these two gradients to cell body density using the BigBrain is explored.

      Strengths:

      The authors make an attempt to describe the overall architectural features of the cortex and link it to features of functional representations, and the underlying histology, using different sets of datasets and methods, including histology. They highlight that investigating the structural architecture of the cortex provides important information on their intrinsic organization and common features.

      Weaknesses:

      The neurobiological model does not take into consideration present knowledge about the microstructural organization of the visual system. This limits the way the results are interpreted correctly. Critical information on the layer-specific myeloarchitecture and cytoarchitecture (and their relation to cortical thickness), as explored for example by Sereno et al. 2013 Cereb Cortex, is missing. There is no information given with respect to how different visual areas differ in their microstructural profile. It is also not mentioned that cortical parcellation is indeed characterized by sharp boundaries between areas, rather than structural gradients, so it remains unclear why focusing on a gradient is of interest. The authors cite the parcellation atlas by Glasser et al. 2016, but do not discuss the rationale of this publication, which was not the definition of gradients, but the definition of sharp boundaries for cortex parcellation. Indeed (as explained below), the results of the authors seem to a large extent to be driven by cortex parcellation, but instead of acknowledging this fact, the authors write (line 179) that "we hypothesize that these local deviations from the canonical thickness and density of cortex underlie the finer-scale division of visual cortex into categorically distinct regions. That is, does the realization of the cortex into distinct regions involve these regions becoming more distinct from a prototypical cortical sheet (i.e., gradient 1)?" - While the first sentence is reasonable, the second sentence is pure speculation ignoring present knowledge on cortical parcellation of this area according to which there is no "prototypical cortical sheet", but each area has its distinct microstructural profile.

      Instead of building on present, detailed knowledge of brain anatomy and in-vivo cortex parcellation of the visual system and its known relation to visual maps, the authors focus on two metrics of cortex architecture (mean T1/T1 over depth and cortical thickness), and conduct a PCA to explore their shared variance. It needs to be clarified if the PCA was conducted correctly. There is no mention of standardizing the variables, which could bias the results. In addition, in a PCA, all possible features are categorized as vector components, and those are scanned through the samples, hence, one such analysis per vertex. But the authors write "in which participants are features and cortical vertices are samples" and "the thickness and tissue density maps were concatenated". This needs clarification. The architecture of the PCA should be visualized better.

      Because the PCA only contains two features, PC1 is driven by the positive relationship between cortical thickness and mean T1/T2, whereas PC2 is driven by their negative relationship. Because in the early visual cortex, cortical thickness and mean T1/T2 correlate positively, it naturally follows that PC1 relates to pRF size (but mediated by the actual cortex parcellation). However, it is unclear why this insight is interesting. I also do not share the view that "these findings demonstrate that gradient 1 acts as a global gradient enveloping the entire visual cortex (...) while gradient 2 acts as a local gradient specific to individual visual streams". I think this relationship between cortical thickness and T1/T2 ratio does not have much to do with local and global gradients. But if so, stronger arguments as to why this should be the case should be presented.

      What the authors make of this result (particularly the discussion starting line 366) is not clear to me. I cannot follow the line of argumentation, which in my view is too far away from the data.

    1. eLife Assessment

      The authors present three valuable transgenic models carrying three representative exon deletions of the dystrophin gene. The findings are supported by rigorous biochemical assays and state-of-the-art microscopy methods, although the evidence, while overall solid, is only partially developed, and some points could be improved.

    2. Reviewer #1 (Public review):

      Summary:

      In this article the authors described mouse models presenting with backer muscular dystrophy, they created three transgenic models carrying three representative exon deletions: ex45-48 del., ex45-47 19 del., and ex45-49 del.. This article is well written but needs improvement in some points.

      Strengths:

      This article is well written. The evidence supporting the authors' claims is robust, though further implementation is necessary. The experiments conducted align with the current state-of-the-art methodologies.

      Weaknesses:

      This article does not analyze atrophy in the various mouse models. Implementing this point would improve the impact of the work

    3. Reviewer #2 (Public review):

      Miyazaki et al. established three distinct BMD mouse models by deleting different exon regions of the dystrophin gene, observed in human BMD. The authors demonstrated that these models exhibit pathophysiological changes, including variations in body weight, muscle force, muscle degeneration, and levels of fibrosis, alongside underlying molecular alterations such as changes in dystrophin and nNOS levels. Notably, these molecular and pathological changes progress at different rates depending on the specific exon deletions in the dystrophin gene. Additionally, the authors conducted extensive fiber typing, revealing a site-specific decline in type IIa fibers in BMD mice, which they suggest may be due to muscle degeneration and reduced capillary formation around these fibers.

      Strengths:

      The manuscript introduces three novel BMD mouse models with different dystrophin exon deletions, each demonstrating varying rates of disease progression similar to the human BMD phenotype. The authors also conducted extensive fiber typing across different muscles and regions within the muscles, effectively highlighting a site-specific decline in type IIa muscle fibers in BMD mice.

      Weaknesses:

      The authors have inadequate experiments to support their hypothesis that the decay of type IIa muscle fibers is likely due to muscle degeneration and reduced capillary formation. Further investigation into capillary density and histopathological changes across different muscle fibers is needed, which could clarify the mechanisms behind these observations.

    4. Author response:

      Reviewer #1 (Public review):

      Summary:

      In this article the authors described mouse models presenting with backer muscular dystrophy, they created three transgenic models carrying three representative exon deletions: ex45-48 del., ex45-47 19 del., and ex45-49 del. This article is well written but needs improvement in some points.

      Strengths:

      This article is well written. The evidence supporting the authors' claims is robust, though further implementation is necessary. The experiments conducted align with the current state-of-the-art methodologies.

      Weaknesses:

      This article does not analyze atrophy in the various mouse models. Implementing this point would improve the impact of the work

      We thank the reviewer for their constructive suggestions and comments on this work. Muscle hypertrophy is shown with growth in dystrophin-deficient skeletal muscle in mdx mice; thus, we did not pay attention to the factors associated with muscle atrophy in BMD mice. As the reviewer suggested, the examination of the association between type IIa fiber reduction and muscle atrophy is important, and the result is considered to be helpful in resolving the cause of type IIa fiber reduction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the cross-sectional areas (CSA) of muscles and compare them with the changes in the proportion of type IIa fibers.

      (2) Evaluate the expression levels of Murf1 and Atrogin1 as markers of muscle atrophy using RT-PCR.

      Reviewer #2 (Public review):

      Summary

      Miyazaki et al. established three distinct BMD mouse models by deleting different exon regions of the dystrophin gene, observed in human BMD. The authors demonstrated that these models exhibit pathophysiological changes, including variations in body weight, muscle force, muscle degeneration, and levels of fibrosis, alongside underlying molecular alterations such as changes in dystrophin and nNOS levels. Notably, these molecular and pathological changes progress at different rates depending on the specific exon deletions in the dystrophin gene. Additionally, the authors conducted extensive fiber typing, revealing a site-specific decline in type IIa fibers in BMD mice, which they suggest may be due to muscle degeneration and reduced capillary formation around these fibers.

      Strengths:

      The manuscript introduces three novel BMD mouse models with different dystrophin exon deletions, each demonstrating varying rates of disease progression similar to the human BMD phenotype. The authors also conducted extensive fiber typing across different muscles and regions within the muscles, effectively highlighting a site-specific decline in type IIa muscle fibers in BMD mice.

      Weaknesses:

      The authors have inadequate experiments to support their hypothesis that the decay of type IIa muscle fibers is likely due to muscle degeneration and reduced capillary formation. Further investigation into capillary density and histopathological changes across different muscle fibers is needed, which could clarify the mechanisms behind these observations.

      We thank the reviewer for these positive comments and the very important suggestion about type IIa fiber reduction and capillary change around muscle fibers in BMD mice. From the results of the cardiotoxin-induced muscle degeneration and regeneration model, type IIa and IIx fibers showed delayed recovery compared with that of type-IIb fibers. However, this delayed recovery of type IIa and IIx could not explain the cause of the selective muscle fiber reduction limited to type IIa fibers in BMD mice. Therefore, we considered vascular dysfunction as the reason for the selective type IIa fiber reduction, and we found morphological capillary changes from a “ring pattern” to a “dot pattern” around type IIa fibers in BMD mice. However, the association between selective type IIa fiber reduction and the capillary change around muscle fibers in BMD mice remains unclear due to the lack of information about capillaries around type IIx and IIb fibers. The reviewer pointed out this insufficient evaluation of capillaries around other muscle fibers (except for type IIa fibers), and this suggestion is very helpful for explaining the association between selective type IIa fiber reduction and vascular dysfunction in BMD mice.

      Thus, we are planning to:

      (1) Evaluate the changes in capillary formation around other muscle fibers, except for type IIa fibers (e.g., type IIx and IIb fibers).

      (2) Evaluate the endothelial area around other muscle fibers, except for type IIa fibers.

    1. eLife Assessment

      This study analyzes a cohort of small intestine neuroendocrine tumors, and the description of tumor-intrinsic programs that govern such rare cancers is felt to be valuable. The methods are for the most part felt to be solid, however, some broad concerns were raised that the possible separation of samples by a program may be impacted by fresh versus frozen sequencing. Similarly, the heterogeneity of the SiNET tumor microenvironment is unclear given a mixing of subtypes and the proliferation of NE and immune cells in SiNET could be influenced by technical factors. Recommendations were made to extend these data with other published datasets of SiNET tumors, expanding technical clarity and details, and validating findings using cell lines/PDX if available.

    2. Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      (2) Results:<br /> Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

    3. Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

    4. Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

    5. Author response:

      Reviewer #1 (Public review):

      Summary:

      The authors have assembled a cohort of 10 SiNET, 1 SiAdeno, and 1 lung MiNEN samples to explore the biology of neuroendocrine neoplasms. They employ single-cell RNA sequencing to profile 5 samples (siAdeno, SiNETs 1-3, MiNEN) and single-nuclei RNA sequencing to profile seven frozen samples (SiNET 4-10).

      They identify two subtypes of siNETs, characterized by either epithelial or neuronal NE cells, through a series of DE analyses. They also report findings of higher proliferation in non-malignant cell types across both subtypes. Additionally, they identify a potential progenitor cell population in a single-lung MiNEN sample.

      Strengths:

      Overall, this study adds interesting insights into this set of rare cancers that could be very informative for the cancer research community. The team probes an understudied cancer type and provides thoughtful investigations and observations that may have translational relevance.

      Weaknesses:

      The study could be improved by clarifying some of the technical approaches and aspects as currently presented, toward enhancing the support of the conclusions:

      (1) Methods: As currently presented, it is possible that the separation of samples by program may be impacted by tissue source (fresh vs. frozen) and/or the associated sequencing modality (single cell vs. single nuclei). For instance, two (SiNET1 and SiNET2) of the three fresh tissues are categorized into the same subtype, while the third (SiNET9) has very few neuroendocrine cells. Additionally, samples from patient 1 (SiNET1 and SiNET6) are separated into different subtypes based on fresh and frozen tissue. The current text alludes to investigations (i.e.: "Technical effects (e.g., fresh vs. frozen samples) could also impact the capture of distinct cell types, although we did not observe a clear pattern of such bias."), but the study would be strengthened with more detail.

      We thank the reviewer for the thoughtful and constructive review. Due to the difficulty in obtaining enough SiNET samples, we used two platforms to generate data - single cell analysis of fresh samples, and single nuclei analysis of frozen samples. We opted to combine both sample types in our analysis while being fully aware of the potential for batch effects. We therefore agree that this is a limitation of our work, and that differences between samples should be interpreted with caution.

      Nevertheless, we argue that the two SiNET subtypes that we have identified are very unlikely to be due to such batch effect. First, the epithelial SiNET subtype was not only detected in two fresh samples but also in one frozen sample (albeit with relatively few cells, as the reviewer correctly noted). Second, and more importantly, the epithelial SiNET subtype was also identified in analysis of an external and much larger cohort of bulk RNA-seq SiNET samples that does not share the issue of two platforms (as seen in Fig. 2f). Moreover, the proportion of samples assigned to the two subtypes is similar between our data and the external data. We therefore argue that the identification of two SiNET subtypes cannot be explained by the use of two data platforms. However, we agree that the results should be further investigated and validated by future studies, as is often done in research on rare tumors.

      The reviewer also commented that two samples from the same patient which were profiled by different platforms (SiNET1 and SiNET6) were separated into different subtypes. We would like to clarify that this is not the case, since SiNET6 was not included in the subtype analysis due to too few detected Neuroendocrine cells, and was not assigned to any subtype, as noted in the text and as can be seen by its exclusion from Figure 2 where subtypes are defined. We apologize that our manuscript may have gave the wrong impression about SiNET6 classification (it is labeled in Fig. 4a in a misleading manner). In the revised manuscript, we will correct the labeling in Fig. 4a and clarify that SiNET is not assigned to any subtype. We will further acknowledge the limitation of the two platforms and the arguments in favor of the existence of two SiNET subtypes.

      (2) Results:<br /> Heterogeneity in the SiNET tumor microenvironment: It is unclear if the current analysis of intratumor heterogeneity distinguishes the subtypes. It may be informative if patterns of tumor microenvironment (TME) heterogeneity were identified between samples of the same subtype. The team could also evaluate this in an extension cohort of published SiNET tumors (i.e. revisiting additional analyses using the SiNET bulk RNAseq from Alvarez et al 2018, a subset of single-cell data from Hoffman et al 2023, or additional bulk RNAseq validation cohorts for this cancer type if they exist [if they do not, then this could be mentioned as a need in Discussion])

      We agree that analysis of an independent cohort will assist in defining the association between TME and the SiNET subtype. However, the sample size required for that is significantly larger than the data available. In the revised manuscript we will note that as a direction for future studies.

      (3) Proliferation of NE and immune cells in SiNETs: The observed proliferation of NE and immune cells in SiNETs may also be influenced by technical factors (including those noted above). For instance, prior studies have shown that scRNA-seq tends to capture a higher proportion of immune cells compared to snRNA-seq, which should be considered in the interpretation of these results. Could the team clarify this element?

      We agree that different platforms could affect the observed proportions of immune cells, and more generally the proportions of specific cell types. However, the low proliferation of Neuroendocrine cells and the higher proliferation of immune cells (especially B cells, but also T cells and macrophages) is consistently observed in both platforms, as shown in Fig. 4a, and therefore appears to be reliable despite the limitations of our work. We will clarify this consistency in the revised manuscript. 

      (4) Putative progenitors in mixed tumors: As written, the identification of putative progenitors in a single lung MiNEN sample feels somewhat disconnected from the rest of the study. These findings are interesting - are similar progenitor cell populations identified in SiNET samples? Recognizing that ideally additional validation is needed to confidently label and characterize these cells beyond gene expression data in this rare tumor, this limitation could be addressed in a revised Discussion.

      We agree with this comment and will add the need for additional validation for this finding in the revised Discussion.

      Reviewer #2 (Public review):

      Summary:

      The research identifies two main SiNET subtypes (epithelial-like and neuronal-like) and reveals heterogeneity in non-neuroendocrine cells within the tumor microenvironment. The study validates findings using external datasets and explores unexpected proliferation patterns. While it contributes to understanding SiNET oncogenic processes, the limited sample size and depth of analysis present challenges to the robustness of the conclusions.

      Strengths:

      The studies effectively identified two subtypes of SiNET based on epithelial and neuronal markers. Key findings include the low proliferation rates of neuroendocrine (NE) cells and the role of the tumor microenvironment (TME), such as the impact of Macrophage Migration Inhibitory Factor (MIF).

      Weaknesses:

      However, the analysis faces challenges such as a small sample size, lack of clear biological interpretation in some analyses, and concerns about batch effects and statistical significance.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to profile small intestine neuroendocrine tumors (siNETs) using single-cell/nucleus RNA sequencing, an established method to characterize the diversity of cell types and states in a tumor. Leveraging this dataset, they identified distinct malignant subtypes (epithelial-like versus neuronal-like) and characterized the proliferative index of malignant neuroendocrine cells versus non-malignant microenvironment cells. They found that malignant neuroendocrine cells were far less proliferative than some of their non-malignant counterparts (e.g., B cells, plasma cells, epithelial cells) and there was a strong subtype association such that epithelial-like siNETs were linked to high B/plasma cell proliferation, potentially mediated by MIF signaling, whereas neuronal-like siNETs were correlated with low B/plasma cell proliferation. The authors also examined a single case of a mixed lung tumor (neuroendocrine and squamous) and found evidence of intermediate/mixed and stem-like progenitor states that suggest the two differentiated tumor types may arise from the same progenitor.

      Strengths:

      The strengths of the paper include the unique dataset, which is the largest to date for siNETs, and the potentially clinically relevant hypotheses generated by their analysis of the data.

      Weaknesses:

      The weaknesses of the paper include the relatively small number of independent patients (n = 8 for siNETs), lack of direct comparison to other published single-cell NET datasets, mixing of two distinct methods (single-cell and single-nucleus RNA-seq), lack of direct cell-cell interaction analyses and spatially-resolved data, and lack of in vitro or in vivo functional validation of their findings.

      The analytical methods applied in this study appear to be appropriate, but the methods used are fairly standard to the field of single-cell omics without significant methodological innovation. As the authors bring forth in the Discussion, the results of the study do raise several compelling questions related to the possibility of distinct biology underlying the epithelial-like and neuronal-like subtypes, the origin of mixed tumors, drivers of proliferation, and microenvironmental heterogeneity. However, this study was not able to further explore these questions through spatially-resolved data or functional experiments.

    1. eLife Assessment

      The study by Chen and Phillips provides evidence for a dynamic switch in the small RNA repertoire of the Argonaute protein NRDE-3 during embryogenesis in C. elegans. The work is supported by solid experimental data, although some conclusions regarding the functional role of specific RNA granules remain uncertain. Nevertheless, this study offers valuable insights into RNA regulation and developmental biology, with broader implications for understanding small RNA pathways in other systems.

    2. Reviewer #1 (Public review):

      Summary:

      Chen and Phillips describe the dynamic appearance of cytoplasmic granules during embryogenesis analogous to SIMR germ granules, and distinct from CSR-1-containing granules, in the C. elegans germline. They show that the nuclear Argonaute NRDE-3, when mutated to abrogate small RNA binding, or in specific genetic mutants, partially colocalizes to these granules along with other RNAi factors, such as SIMR-1, ENRI-2, RDE-3, and RRF-1. Furthermore, NRDE-3 RIP-seq analysis in early vs. late embryos is used to conclude that NRDE-3 binds CSR-1-dependent 22G RNAs in early embryos and ERGO-1-dependent 22G RNAs in late embryos. These data lead to their model that NRDE-3 undergoes small RNA substrate "switching" that occurs in these embryonic SIMR granules and functions to silence two distinct sets of target transcripts - maternal, CSR-1 targeted mRNAs in early embryos and duplicated genes and repeat elements in late embryos.

      Strengths:

      The identification and function of small RNA-related granules during embryogenesis is a poorly understood area and this study will provide the impetus for future studies on the identification and potential functional compartmentalization of small RNA pathways and machinery during embryogenesis.

      Weaknesses:

      (1) While the authors acknowledge the following issue, their finding that loss of SIMR granules has no apparent impact on NRDE-3 small RNA loading puts the functional relevance of these structures into question. As they note in their Discussion, it is entirely possible that these embryonic granules may be "incidental condensates." It would be very welcomed if the authors could include some evidence that these SIMR granules have some function; for example, does the loss of these SIMR granules have an effect on CSR-1 targets in early embryos and ERGO-1-dependent targets in late embryos?

      (2) The analysis of small RNA class "switching" requires some clarification. The authors re-define ERGO-1-dependent targets in this study to arrive at a very limited set of genes and their justification for doing this is not convincing. What happens if the published set of ERGO-1 targets is used? Further, the NRDE-3 RIP-seq data is used to conclude that NRDE-3 predominantly binds CSR-1 class 22G RNAs in early embryos, while ERGO-1-dependent 22G RNAs are enriched in late embryos. a) The relative ratios of each class of small RNAs are given in terms of unique targets. What is the total abundance of sequenced reads of each class in the NRDE-3 IPs? b) The "switching" model is problematic given that even in late embryos, the majority of 22G RNAs bound by NRDE-3 is in the CSR-1 class (Figure 5D). c) A major difference between NRDE-3 small RNA binding in eri-1 and simr-1 mutants appears to be that NRDE-3 robustly binds CSR-122G RNAs in eri-1 but not in simr-1 in late embryos. This result should be better discussed.

      (3) Ultimately, if the switching is functionally important, then its impact should be observed in the expression of their targets. RNA-seq or RT-qPCR of select CSR-1 and ERGO-1 targets should be assessed in nrde-3 mutants during early vs late embryogenesis.

    3. Reviewer #2 (Public review):

      Summary:

      NRDE-3 is a nuclear WAGO-clade Argonaute that, in somatic cells, binds small RNAs amplified in response to the ERGO-class 26G RNAs that target repetitive sequences. This manuscript reports that, in the germline and early embryos, NRDE-3 interacts with a different set of small RNAs that target mRNAs. This class of small RNAs was previously shown to bind to a different WAGO-clade Argonaute called CSR-1, which is cytoplasmic, unlike nuclear NRDE-3. The switch in NRDE-3 specificity parallels recent findings in Ascaris where the Ascaris NRDE homolog was shown to switch from sRNAs that target repetitive sequences to CSR-class sRNAs that target mRNAs.

      The manuscript also correlates the change in NRDE-3 specificity with the appearance in embryos of cytoplasmic condensates that accumulate SIMR-1, a scaffolding protein that the authors previously implicated in sRNA loading for a different nuclear Argonaute HRDE-1. By analogy, and through a set of corelative evidence, the authors argue that SIMR foci arise in embryogenesis to facilitate the change in NRDE-3 small RNA repertoire. The paper presents lots of data that beautifully documents the appearance and composition of the embryonic SIMR-1 foci, including evidence that a mutated NRDE-3 that cannot bind sRNAs accumulates in SIMR-1 foci in a SIMR-1-dependent fashion.

      Weaknesses:

      The genetic evidence, however, does not support a requirement for SIMR-1 foci: the authors detected no defect in NRDE-3 sRNA loading in simr-1 mutants. Although the authors acknowledge this negative result in the discussion, they still argue for a model (Figure 7) that is not supported by genetic data. My main suggestion is that the authors give equal consideration to other models - see below for specifics.

    4. Reviewer #3 (Public review):

      Summary:

      Chen and Phillips present intriguing work that extends our view on the C. elegans small RNA network significantly. While the precise findings are rather C. elegans specific there are also messages for the broader field, most notably the switching of small RNA populations bound to an argonaute, and RNA granules behavior depending on developmental stage. The work also starts to shed more light on the still poorly understood role of the CSR-1 argonaute protein and supports its role in the decay of maternal transcripts. Overall, the work is of excellent quality, and the messages have a significant impact.

      Strengths:

      Compelling evidence for major shift in activities of an argonaute protein during development, and implications for how small RNAs affect early development. Very balanced and thoughtful discussion.

      Weaknesses:

      Claims on col-localization of specific 'granules' are not well supported by quantitative data.

    1. eLife Assessment

      This study presents valuable findings on changes in neuronal alpha activity elicited by prolonged pain in healthy human participants. The evidence supporting the claims of the authors, however, is incomplete and would benefit from clarifications of analytical strategies, additional statistical analyses, and shaping of the interpretations. With the methodological and interpretative parts strengthened, the work will be of interest to neuroscientists investigating the brain mechanisms of pain to identify new approaches to pain treatment

    2. Reviewer #1 (Public review):

      Summary:

      Furman et al. reanalyze data from a previous study and investigate alterations of peak alpha frequency (PAF) and alpha power (AP) in the context of prolonged pain with electroencephalography (EEG). Using two experimental pain models (phasic and capsaicin heat pain), they set out to clarify if previously reported changes in alpha activity in chronic pain can already be observed during prolonged pain in healthy human participants. They conclude that PAF is reliably slowed, and AP reliably decreased in response to prolonged pain. From the patterns of their findings, they furthermore deduce that AP changes indicate the presence of ongoing pain while PAF changes reflect pain-associated states like sensitization which can outlast ongoing pain percepts and indicate a potential for experiencing pain. Lastly, they conclude that the reported changes in alpha activity are likely due to specific power decreases in the faster alpha range between 10 and 12 Hz and discuss potential clinical implications of their findings in terms of risk biomarkers and early pain interventions.

      Strengths:

      The study focuses on a timely topic with potential implications for chronic pain diagnosis and treatment, an area that urgently needs new approaches. The addressed questions nicely build upon and extend the previous work of the authors. The analyzed data set is comprehensive including two different prolonged pain paradigms, two visits following the same experimental procedures, and a total sample size of n = 61 participants. Thereby, it enabled internal replications of findings across both paradigms and visits, which is important to confirm the consistency of findings.

      Weaknesses:

      One overarching difficulty is the high number of analyses presented by the authors. They were in part developed "on the go", are not always easy to follow, and sidetrack the reader from the main findings. Only a minor part of the analyses is described in the methods section, while many analyses are outlined within the results, the supplementary material, and/or figure legends. In addition, a range of purely descriptive findings are displayed. Overall, the manuscript would clearly benefit from a more streamlined and consistent presentation of the applied methods and results.

      Concerning the main findings, the presented evidence for a slowing of PAF and a reduction of AP in the context of both phasic and capsaicin heat pain and across both visits is convincing. The location of the peak of the effect at left frontocentral areas, however, remains puzzling. The authors convincingly show that the effect cannot be explained by activity related to the pain rating procedure and provide evidence that an effect of the same direction can also observed at corresponding electrodes contralateral to pain stimulation. However, further reasons are not discussed.

      The conclusion that PAF slowing might be more related to pain-associated states like sensitization rather than the presence of ongoing pain is deduced from a continued slowing of PAF after capsaicin-induced pain has subsided, while AP goes back to baseline values. Although this speculation is interesting, the readers should be aware that this dissociation was unexpected and resulted in changes in the main a-priori-defined statistical contrasts presented in the methods section. Further replications in future studies are needed to strengthen this finding.

      The last conclusion made by the authors is that the observed changes in alpha activity are caused by specific changes in the faster alpha range and are the least convincing. If I understand correctly, the only presented statistical evidence corroborating this conclusion is based on the single selected electrode C3 shown in Figure 5 A, D, and E. With the remaining parts of Figure 5 and Figure 6, differences are discussed but Figures do not include statistical results. Unless the discussed findings are backed up more clearly, the degree of mechanistic conclusions concerning the 10-12 Hz power changes throughout the title, abstract, and main manuscript and in relation to the multiple oscillators model seems not justified.

      Lastly, it is important to note that the current manuscript was published as a preprint in 2021. Thus, the cited literature still needs to be updated, and the present findings need to be integrated with the work published since. For example, a recent systematic review on potential M/EEG-based biomarkers of chronic pain (Zebhauser et al., 2023, Pain) revealed that previous evidence concerning changes of alpha activity in chronic pain is much less consistent than currently outlined in the manuscript.

      Overall:

      All in all, the presented findings extend previous knowledge concerning the role of alpha activity in pain and thus represent a valuable contribution towards a better understanding of the mechanisms of pain and potential new treatment targets.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigated the modulation of alpha oscillations, specifically peak alpha frequency (PAF) and alpha power, during prolonged pain. The findings suggest that the alpha rhythm consists of multiple, independent oscillators, and suggest that the modulation of a "fast" oscillator may represent a promising therapeutic target for ongoing pain management.

      Strengths:

      EEG data were collected from a relatively large sample of participants, and the experiment was conducted using two prolonged pain models - phasic heat pain and capsaicin heat pain - at two separate testing visits approximately 8 weeks apart. The study produced reliable results across different pain models and at different testing intervals.

      Weaknesses:

      There are discrepancies between the results and their interpretation, indicating a need for more appropriate data analyses. Additionally, the experimental design does not adequately control for the potential time effects, which cannot be ruled out as a confounding factor.

    4. Reviewer #3 (Public review):

      Summary:

      Furman et al. investigated how exposure to prolonged pain impacts human alpha oscillations recorded by electroencephalography (EEG). Two experimental models of prolonged pain were employed in healthy participants, phasic heat pain (PHP) and capsaicin heat pain (CHP). 61 participants completed two identical study visits separated by at least 8 weeks. Peak alpha frequency was reliably slowed by exposure to prolonged pain, whereas overall alpha power was reliably reduced. Both effects appeared to reflect a specific decrease in higher frequency (10-12Hz) alpha activity. The authors suggest that slowing of alpha oscillations is a reliable neural correlate of pain exposure and that manipulation of alpha activity may hold promise for treating chronic pain.

      Strengths:

      The study uses a within-participants design to show that exposure to pain is associated with acute changes in both the power and frequency of alpha oscillations.

      By employing two experimental models of pain exposure and two separate testing visits, the authors were able to show that the effects of pain exposure on alpha activity are replicable across models and time.

      Rigorous analysis approaches are used throughout.

      Weaknesses:

      No a priori power analysis is presented and (due to exclusions) most of the analyses conducted included (sometimes considerably) fewer participants than the overall sample size.

      It is not clear whether the power and frequency changes represent two sides of the same coin or whether they reflect distinct mechanisms. The authors suggest in the manuscript that both effects may be explained by decreased power in 'fast' (8-12 Hz) alpha activity, but at other times interpret the effects to potentially represent distinct mechanisms. It would be useful for the authors to further clarify their thoughts on this point.

      The statistical significance of some of the effects was dependent on analysis choices such as the exact frequency range chosen to identify alpha peaks.

      No control condition was used, and I was left wondering if the effects would be specific to painful stimuli, or would also see the same effects for pleasant or neutral somatosensory stimuli?

    1. eLife Assessment

      The study is a valuable addition to the field, showing how particulate matter may be acting via nociceptor neurons towards neutrophilic asthma exacerbations. The solid evidence for the role of a nociceptive pathway in model systems is relevant to human asthma in its current form but would be further strengthened by mechanistic insights. This would be particularly relevant to further translational research towards blocking the exacerbating effect of air pollution on asthma.

    2. Reviewer #1 (Public review):

      Summary:

      In the presented study, the authors aim to explore the role of nociceptors in the fine particulate matter (FPM) mediated Asthma phenotype, using rodent models of allergic airway inflammation. This manuscript builds on previous studies, and identify transciptomic reprogramming and an increased sensitivity of the jugular nodose complex (JNC) neurons, one of the major sensory ganglion for the airways, on exposure to FPM along with Ova during the challenge phase. The authors then use OX-314 a selectively permeable form of lidocaine, and TRPV1 knockouts to demonstrate that nociceptor blocking can reduce airway inflammation in their experimental setup.

      The authors further identify the presence of Gfra3 on the JNC neurons, a receptor for the protein Artemin, and demonstrate their sensitivity to Artmein as a ligand. They further show that alveolar macrophages release Artemin on exposure to FPM.

      Strengths:

      The study builds on results available from multiple previous work, and presents important results which allow insights into the mixed phenotypes of Asthma seen clinically. In addition, by identifying the role of nociceptors, they identify potential therapeutic targets which bear high translational potential.

      Weaknesses:

      While the results presented in the study are highly relevant, there is a need for further mechanistic dissection to allow better inferences. Currently certain results seem assocaitive. Also, certain visualisations and experimental protocols presented in the manuscript need careful assessment and interpretation.

      While Asthma is a chronic disease, the presented results are particularly important to explore Asthma exacerbations in response to acute expsoure to air pollutants. This is relevant in today's age of increasing air pollution and increasing global travel.

    3. Reviewer #2 (Public review):

      Summary:

      The authors sought to investigate the role of nociceptor neurons in the pathogenesis of pollution-mediated neutrophilic asthma.

      Strengths:

      The authors utilize TRPV1 ablated mice to confirm effects of intranasally administered QX-314 utilized to block sodium currents.

      The authors demonstrate that via artemin, which is upregulated in alveolar macrophages in response to pollution, sensitizes JNC neurons thereby increasing their responsiveness to pollution. Ablation or inactivity of nociceptor neurons prevented the pollution induced increase in inflammation.

      Weaknesses:

      While neutrophilic, the model used doesn't appear to truly recapitulate a Th2/Th17 phenotype. No IL-17A is visible/evident in the BALF fluid within the model. (Figure 3F).

      Unclear of the relevance of the RNAseq dataset, none of the identified DEGs were evaluated in the context of mechanism.

      The authors overall achieved the aim of demonstrating that nociceptor neurons are important to the pathogenesis of pollution-exacerbated asthma. Their results support their conclusions overall, although there are ways the study findings can be strengthened. This work further evaluates how nociceptor neurons contribute to asthma pathogenesis important for consideration while proposing treatment strategies for undertreated asthma endotypes.

    4. Reviewer #3 (Public review):

      Asthma is a complex disease that includes endogenous epithelial, immune, and neural components that respond awkwardly to environmental stimuli. Small airborne particles with diameters in the range of 2.5 micrometers or less, so-called PM2.5, are generally thought to contribute to some forms of asthma. These forms of asthma may have increased numbers of neutrophils and/or eosinophils present in bronchoalveolar lavage fluid and are difficult to treat effectively as they tend to be poorly responsive to steroids. Here, Wang and colleagues build on a recent model that incorporated PM2.5 which was found to have a neutrophilic component. Wang altered the model to provide an extra kick via the incorporation of ovalbumin. Building on their prior expertise linking nociceptors and inflammation, they find that silencing TRPV1-expressing neurons either pharmacologically or genetically, abrogated inflammation and the accumulation of neutrophils. By examining bronchoalveolar lavage fluid, they found not only that levels of the number of cytokines were increased, but also that artemin, a protein that supports neuronal development and function, was elevated, which did not occur in nociceptor-ablated mice. They also found that alveolar macrophages exposed to PM2.5 particles had increased artemin transcription, suggesting a further link between pollutants, and immune and neural interactions.

      There are substantial caveats that must be attached to the suggestions by the authors that targeting nociceptors might provide an approach to the treatment of neutrophilic airway inflammation in pollution-driven asthma in general and wildfire-associated respiratory problems in particular. These caveats include the uncertainty of the relevance of the conventional source of PM2.5, to pollution and asthma. According to the National Institute of Standards and Technology (NIST), the standard reference material (SRM) 2786 is a mix obtained from an air intake system in the Czech Republic. It is not clear exactly what is in the mix, and a recent bioRxiv preprint, https://www.biorxiv.org/content/10.1101/2023.08.18.553903v3.full.pdf reveals the presence of endotoxin. Care should thus be taken in interpreting data using particulate matter. Regarding wildfires, there is data that indicates that such exposure is toxic to macrophages. What impact might that then have on the production of cytokines, and artemin, in humans?

    1. eLife Assessment

      This important study presents the first identification of the odorant receptor for the trail pheromone in termites. The evidence supporting the conclusions is compelling, with state-of-the-art neurophysiological and genetic methods. The work will be of broad interest in multiple disciplines, such as entomology, chemical ecology, and sensory physiology.

    2. Reviewer #1 (Public review):

      Summary:

      In their comprehensive analysis Diallo et al. deorphanise the first olfactory receptor of a non-hymenopteran eusocial insect - a termite and identified the well-established trail pheromone neocembrene as the receptor's best ligand. By using a large set of odorants the authors convincingly show that, as expected for a pheromone receptor, PsimOR14 is very narrowly tuned. While the authors first make use of an ectopic expression system, the empty neuron of Drosophila melanogaster, to characterise the receptor's responses, they next perform single sensillum recordings with different sensilla types on the termite antenna. By that, they are able to identify a sensillum that houses three neurons, of which the B neuron exhibits the narrow responses described for PsimOR14. Hence the authors do not only identify the first pheromone receptor in a termite but can even localize its expression on the antenna. The authors in addition perform a structural analysis to explain the binding properties of the receptor and its major and minor ligands (as this is beyond my expertise, I cannot judge this part of the manuscript). Finally, they compare expression patterns of ORs in different castes and find that PsimOR14 is more strongly expressed in workers than in soldier termites, which corresponds well with stronger antennal responses in the worker caste.

      Strengths:

      The manuscript is well-written and a pleasure to read. The figures are beautiful and clear. I actually had a hard time coming up with suggestions.

      Weaknesses:

      Whenever it comes to the deorphanization of a receptor and its potential role in behaviour (in the case of the manuscript it would be trail-following of the termite) one thinks immediately of knocking out the receptor to check whether it is necessary for the behaviour. However, I definitely do not want to ask for this (especially as the establishment of CRISPR Cas-9 in eusocial insects usually turns out to be a nightmare). I also do not know either, whether knockdowns via RNAi have been established in termites, but maybe the authors could consider some speculation on this in the discussion.

    3. Reviewer #2 (Public review):

      Summary:

      In this manuscript, the authors performed the functional analysis of odorant receptors (ORs) of the termite Prorhinotermes simplex to identify the receptor of trail-following pheromone. The authors performed single-sensillum recording (SSR) using the transgenic Drosophila flies expressing a candidate of the pheromone receptor and revealed that PsimOR14 strongly responds to neocembrene, the major component of the pheromone. Also, the authors found that one sensillum type (S I) detects neocembrene and also performed SSR for S I in wild termite workers. Furthermore, the authors revealed the gene, transcript, and protein structures of PsimOR14, predicted the 3D model and ligand docking of PsimOR14, and demonstrated that PsimOR14 is higher expressed in workers than soldiers using RNA-seq for heads of workers and soldiers of P. simplex and that EAG response to neocembrene is higher in workers than soldiers. I consider that this study will contribute to further understanding of the molecular and evolutionary mechanisms of the chemoreception system in termites.

      Strength:

      The manuscript is well written. As far as I know, this study is the first study that identified a pheromone receptor in termites. The authors not only present a methodology for analyzing the function of termite pheromone receptors but also provide important insights in terms of the evolution of ligand selectivity of termite pheromone receptors.

      Weakness:

      As you can see in the "Recommendations to the Authors" section below, there are several things in this paper that are not fully explained about experimental methods. Except for this point, this paper appears to me to have no major weaknesses.

    4. Reviewer #3 (Public review):

      Summary:

      Chemical communication is essential for the organization of eusocial insect societies. It is used in various important contexts, such as foraging and recruiting colony members to food sources. While such pheromones have been chemically identified and their function demonstrated in bioassays, little is known about their perception. Excellent candidates are the odorant receptors that have been shown to be involved in pheromone perception in other insects including ants and bees but not termites. The authors investigated the function of the odorant receptor PsimOR14, which was one of four target odorant receptors based on gene sequences and phylogenetic analyses. They used the Drosophila empty neuron system to demonstrate that the receptor was narrowly tuned to the trail pheromone neocembrene. Similar responses to the odor panel and neocembrene in antennal recordings suggested that one specific antennal sensillum expresses PsimOR14. Additional protein modeling approaches characterized the properties of the ligand binding pocket in the receptor. Finally, PsimOR14 transcripts were found to be significantly higher in worker antennae compared to soldier antennae, which corresponds to the worker's higher sensitivity to neocembrene.

      Strengths:

      The study presents an excellent characterization of a trail pheromone receptor in a termite species. The integration of receptor phylogeny, receptor functional characterization, antennal sensilla responses, receptor structure modeling, and transcriptomic analysis is especially powerful. All parts build on each other and are well supported with a good sample size.

      Weaknesses:

      The manuscript would benefit from a more detailed explanation of the research advances this work provides. Stating that this is the first deorphanization of an odorant receptor in a clade is insufficient. The introduction primarily reviews termite chemical communication and deorphanization of olfactory receptors previously performed. Although this is essential background, it lacks a good integration into explaining what problem the current study solves.

      Selecting target ORs for deorphanization is an essential step in the approach. Unfortunately, the process of choosing these ORs has not been described. Were the authors just lucky that they found the correct OR out of the 50, or was there a specific selection process that increased the probability of success?

      The authors assigned antennal sensilla into five categories. Unfortunately, they did not support their categories well. It is not clear how they were able to differentiate SI and SII in their antennal recordings.

      The authors used a large odorant panel to determine receptor tuning. The panel included volatile polar compounds and non-volatile non-polar hydrocarbons. Usually, some heat is applied to such non-volatile odorants to increase volatility for receptor testing. It is unclear how it is possible that these non-volatile compounds can reach the tested sensilla without heat application.