10,000 Matching Annotations
  1. Oct 2025
    1. Reviewer #1 (Public review):

      Summary:

      In this study, Ana Lapao et al. investigated the roles of Rab27 effector SYTL5 in cellular membrane trafficking pathways. The authors found that SYTL5 localizes to mitochondria in a Rab27A-dependent manner. They demonstrated that SYTL5-Rab27A positive vesicles containing mitochondrial material are formed under hypoxic conditions, thus they speculate that SYTL5 and Rab27A play roles in mitophagy. They also found that both SYTL5 and Rab27A are important for normal mitochondrial respiration. Cells lacking SYTL5 undergo a shift from mitochondrial oxygen consumption to glycolysis which is a common process known as the Warburg effect in cancer cells. Based on cancer patient database, the author noticed that low SYTL5 expression is related to reduced survival for adrenocortical carcinoma patients, indicating SYTL5 could be a negative regulator of the Warburg effect and potentially tumorigenesis.

      Strengths:

      The authors take advantages of multiple techniques and novel methods to perform the experiments.

      (1) Live-cell imaging revealed that stably inducible expression of SYTL5 co-localized with filamentous structures positive for mitochondria. This result was further confirmed by using correlative light and EM (CLEM) analysis and western blotting from purified mitochondrial fraction.

      (2) In order to investigate whether SYTL5 and RAB27A are required for mitophagy in hypoxic conditions, two established mitophagy reporter U2OS cell lines were used to analyze the autophagic flux.

      Weaknesses:

      This study revealed a potential function of SYTL5 in mitophagy and mitochondrial metabolism. However, the mechanistic evidence that establishes the relationship between SYTL5/Rab27A and mitophagy is insufficient. The involvement of SYTL5 in ACC needs more investigation. Furthermore, images and results supporting the major conclusions need to be improved.

      Comments on revisions: The authors did not revise the paper as suggested.

    2. Reviewer #2 (Public review):

      Summary:

      The authors provide convincing evidence that Rab27 and STYL5 work together to regulate mitochondrial activity and homeostasis.

      Strengths:

      The development of models which allow the function to be dissected, and the rigous approach and testing of mitochondrial activity.

      This work is carefully done, and supports the importance of the roles of Rab27A and STYL5.

    3. Reviewer #3 (Public review):

      In the manuscript by Lapao et al., the authors uncover a role for the RAB27A effector protein SYTL5 in regulating mitochondrial function and apparent selective turnover of mitochondrial components. The authors find that SYTL5 localizes to mitochondria in a RAB27A dependent way and that loss of SYTL5 (or RAB27A) impairs lysosomal turnover of MTCO1 (but not a matrix-based reporter/other mitochondrial proteins). The authors go on to show that loss of SYTL5 impacts mitochondrial respiration and ECAR and as such may influence the Warburg effect and tumorigenesis. Of relevance here, the authors go on to show that SYTL5 expression is reduced in adrenocortical carcinomas and this correlates with reduced survival rates.

      As previously reviewed, this is a very intriguing body of work and reveals a new role for SYTL5/RAB27A at the mitochondria. Unfortunately, it appears that SYTL5 is challenging protein to detect endogenously and the authors' cell lines "comprise a heterogenous pool with high variability", which means that a lot of my original concerns remain. It is still also not clear if the conventional autophagy machinery is required for this pathway, especially if SYTL5/RAB27A mitochondrial recruitment is upstream of this. Hopefully, in future work, the authors (and/or others) will be able to address this and build on the mechanisms of this interesting and potentially important pathway.

    1. eLife Assessment

      This work provides one of the first important attempts to look at Drosophila immune responses against bacterial, viral, and fungal pathogens in a way that combines the roles of four major arms in immunity (Imd signaling, Toll signaling, phagocytosis, and melanization) rather than studying them separately. The findings are compelling and the tools provided can be used as they are, or built upon, in various contexts.

    2. Reviewer #1 (Public review):

      Summary:

      The innate immune system serves as the first line of defense against invading pathogens. Four major immune-specific modules-the Toll pathway, the Imd pathway, melanization, and phagocytosis-play critical roles in orchestrating the immune response. Traditionally, most studies have focused on the function of individual modules in isolation. However, in recent years, it has become increasingly evident that effective immune defense requires intricate interactions among these pathways.

      Despite this growing recognition, the precise roles, timing, and interconnections of these immune modules remain poorly understood. Moreover, addressing these questions represents a major scientific undertaking.

      Strengths:

      In this manuscript, Ryckebusch et al. systematically evaluate both the individual and combined contributions of these four immune modules to host defense against a range of pathogens. Their findings significantly enhance our understanding of the layered architecture of innate immunity.

    3. Reviewer #2 (Public review):

      Summary:

      In this work, the authors take a holistic view at the Drosophila immunity by selecting four major components of fly immunity often studied separately (Toll signaling, Imd signaling, phagocytosis and melanization), and studying their combinatory effects on the efficiency of the immune response. They achieve this by using fly lines mutant for one of these components, or modules, as well as for a combination of them, and testing the survival of these flies upon infection with a plethora of pathogens (bacterial, viral and fungal).

      Strengths:

      It is clear that this manuscript has required a large amount of hands-on work, considering the number of pathogens, mutations and timepoints tested. In my opinion, this work is a very welcome addition to the literature on fly immune responses, which obviously do not occur one type of a response at a time, but in parallel, subsequently and/or are interconnected. I find that the major strength of this work is the overall concept, which is made possible by the mutations designed to target the specific immune function of each module, without effects on other functions. I believe that the combinatory mutants will be of use for the fly community and enable further studies of interplay of these components of immune response in various settings.

      To control for the effects arising from the genetic variation other than the intended mutations, the mutants have been backcrossed into a widely used, isogenized Drosophila strain called w1118. Therefore, the differences accounted for by the genotype are controlled.

      I also appreciate that the authors have investigated the two possible ways of dealing with an infection: tolerance and resistance, and how the modules play into those.

      Weaknesses:

      While controlling for the background effects is vital, the w1118 background is problematic (an issue not limited to this manuscript) because of the wide effects of the white mutation on several phenotypes (also other than eye color/eyesight). It is a possibility that the mutation influences the functionality of the immune response components. I acknowledge that it is not reasonable to ask for data in different backgrounds better representing a "wild type" fly, but I think this matter should be brought up and discussed.

      The whole study has been conducted on male flies. Immune responses show quite extensive sex-specific variation across a variety of species studied, also in the fly. But the reasons for this variation are not fully understood. Therefore, I suggest that the authors would conduct a subset of experiments on female flies to see if the findings apply to both sexes, especially the infection-specificity of the module combinations.

      Comments on the revised manuscript:

      I appreciate the author's responses to the points I raised and the additional work they have conducted. The authors have now discussed the possible background effect and added an experiment on female flies showing that the module function is applicable to both sexes.

    1. eLife Assessment

      This potentially valuable study presents claims of evidence for coordinated membrane potential oscillations in E. coli biofilms that can be linked to a putative K+ channel and that may serve to enhance photo-protection. The finding of waves of membrane potential would be of interest to a wide audience from molecular biology to microbiology and physical biology. Unfortunately, a major issue is that it is unclear whether the dye used can act as a Nernstian membrane potential dye in E. coli. The arguments of the authors, who largely ignore previously published contradictory evidence, are not adequate in that they do not engage with the fact that the dye behaves in their hands differently than in the hands of others. In addition, the lack of proper validation of the experimental method including key control experiments leaves the evidence incomplete.

    2. Reviewer #1 (Public Review):

      (1) Significance of the findings:

      Cell-to-cell communication is essential for higher functions in bacterial biofilms. Electrical signals have proven effective in transmitting signals across biofilms. These signals are then used to coordinate cellular metabolisms or to increase antibiotic tolerance. Here, the authors have reported for the first time coordinated oscillation of membrane potential in E. coli biofilms that may have a functional role in photoprotection.

      (2) Strengths of the manuscript:

      - The authors report original data.<br /> - For the first time, they showed that coordinated oscillations in membrane potential occur in E. Coli biofilms.<br /> - The authors revealed a complex two-phase dynamic involving distinct molecular response mechanisms.<br /> - The authors developed two rigorous models inspired by 1) Hodgkin-Huxley model for the temporal dynamics of membrane potential and 2) Fire-Diffuse-Fire model for the propagation of the electric signal.<br /> - Since its discovery by comparative genomics, the Kch ion channel has not been associated with any specific phenotype in E. coli. Here, the authors proposed a functional role for the putative gated-voltage-gated K+ ion channel (Kch channel) : enhancing survival under photo-toxic conditions.

      (3) Weakness:

      - Contrarily to what is stated in the abstract, the group of B. Maier has already reported collective electrical oscillations in the Gram-negative bacterium Neisseria gonorrhoeae (Hennes et al., PLoS Biol, 2023).<br /> - The data presented in the manuscript are not sufficient to conclude on the photo-protective role of the Kch channel. The authors should perform the appropriate control experiments related to Fig4D,E, i.e. reproduce these experiments without ThT to rule out possible photo-conversion effects on ThT that would modify its toxicity. In addition, it looks like the data reported on Fig 4E are extracted from Fig 4D. If this is indeed the case, it would be more conclusive to report the percentage of PI-positive cells in the population for each condition. This percentage should be calculated independently for each replicate. The authors should then report the average value and standard deviation of the percentage of dead cells for each condition.<br /> - Although Fig 4A clearly shows that light stimulation has an influence on the dynamics of ThT signal in the biofilm, it is important to rule out possible contributions of other environmental variations that occur when the flow is stopped at the onset of light stimulation. I understand that for technical reasons, the flow of fresh medium must be stopped for the sake of imaging. Therefore, I suggest to perform control experiments consisting in stopping the flow at different time intervals before image acquisition (30min or 1h before). If there is no significant contribution from environmental variations due to medium perfusion arrest, the dynamics of ThT signal must be unchanged regardless of the delay between flow stop and the start of light stimulation.<br /> - To precise the role of K+ in the habituation response, I suggest using the ionophore valinomycin at sub-inhibitory concentrations (5 or 10µM). It should abolish the habituation response. In addition, the Kch complementation experiment exhibits a sharp drop after the first peak but on a single point. It would be more convincing to increase the temporal resolution (1min->10s) to show that there are indeed a first and a second peak. Finally, the high concentration (100µM) of CCCP used in this study completely inhibits cell activity. Therefore, it is not surprising that no ThT dynamics was observed upon light stimulation at such concentration of CCCP.<br /> - Since TMRM signal exhibits a linear increase after the first response peak (Supp Fig1D), I recommend to mitigate the statement at line 78.<br /> - Electrical signal propagation is an important aspect of the manuscript. However, a detailed quantitative analysis of the spatial dynamics within the biofilm is lacking. At minima, I recommend to plot the spatio-temporal diagram of ThT intensity profile averaged along the azimuthal direction in the biofilm. In addition, it is unclear if the electrical signal propagates within the biofilm during the second peak regime, which is mediated by the Kch channel: I have plotted the spatio-temporal diagram for Video S3 and no electrical propagation is evident at the second peak. In addition, the authors should provide technical details of how R^2(t) is measured in the first regime (Fig 7E).<br /> - In the series of images presented in supplementary Figure 4A, no wavefront is apparent. Although the microscopy technics used in this figure differs from other images (like in Fig2), the wavefront should be still present. In addition, there is no second peak in confocal images as well (Supp Fig4B) .<br /> - Many important technical details are missing (e.g. biofilm size, R^2, curvature and 445nm irradiance measurements). The description of how these quantitates are measured should be detailed in the Material & Methods section.<br /> - Fig 5C: The curve in Fig 5D seems to correspond to the biofilm case. Since the model is made for single cells, the curve obtained by the model should be compared with the average curve presented in Fig 1B (i.e. single cell experiments).<br /> - For clarity, I suggest to indicate on the panels if the experiments concern single cell or biofilm experiments. Finally, please provide bright-field images associated to ThT images to locate bacteria.<br /> - In Fig 7B, the plateau is higher in the simulations than in the biofilm experiments. The authors should add a comment in the paper to explain this discrepancy.

    3. Reviewer #2 (Public Review):

      The authors use ThT dye as a Nernstian potential dye in E. coli. Quantitative measurements of membrane potential using any cationic indicator dye are based on the equilibration of the dye across the membrane according to Boltzmann's law.

      Ideally, the dye should have high membrane permeability to ensure rapid equilibration. Others have demonstrated that E.coli cells in the presence of ThT do not load unless there is blue light present, that the loading profile does not look like it is expected for a cationic Nernstian dye. They also show that the loading profile of the dye is different for E.coli cells deleted for the TolC pump. I, therefore, objected to interpreting the signal from the ThT as a Vm signal when used in E.coli. Nothing the authors have said has suggested that I should be changing this assessment.

      Specifically, the authors responded to my concerns as follows:

      (1) 'We are aware of this study, but believe it to be scientifically flawed. We do not cite the article because we do not think it is a particularly useful contribution to the literature.' This seems to go against ethical practices when it comes to scientific literature citations. If the authors identified work that handles the same topic they do, which they believe is scientifically flawed, the discussion to reflect that should be included.

      (2)'The Pilizota group invokes some elaborate artefacts to explain the lack of agreement with a simple Nernstian battery model. The model is incorrect not the fluorophore.'<br /> It seems the authors object to the basic principle behind the usage of Nernstian dyes. If the authors wish to use ThT according to some other model, and not as a Nernstian indicator, they need to explain and develop that model. Instead, they state 'ThT is a Nernstian voltage indicator' in their manuscript and expect the dye to behave like a passive voltage indicator throughout it.

      (3)'We think the proton effect is a million times weaker than that due to potassium i.e. 0.2 M K+<br /> versus 10-7 M H+. We can comfortably neglect the influx of H+ in our experiments.'<br /> I agree with this statement by the authors. At near-neutral extracellular pH, E.coli keeps near-neutral intracellular pH, and the contribution from the chemical concentration gradient to the electrochemical potential of protons is negligible. The main contribution is from the membrane potential. However, this has nothing to do with the criticism to which this is the response of the authors. The criticism is that ThT has been observed not to permeate the cell without blue light. The blue light has been observed to influence the electrochemical potential of protons (and given that at near-neutral intracellular and extracellular pH this is mostly the membrane potential, as authors note themselves, we are talking about Vm effectively). Thus, two things are happening when one is loading the ThT, not just expected equilibration but also lowering of membrane potential. The electrochemical potential of protons is coupled via the membrane potential to all the other electrochemical potentials of ions, including the mentioned K+.

      (4) 'The vast majority of cells continue to be viable. We do not think membrane damage is dominating.' In response to the question on how the authors demonstrated TMRM loading and in which conditions (and while reminding them that TMRM loading profile in E.coli has been demonstrated in Potassium Phosphate buffer). The request was to demonstrate TMRM loading profile in their condition as well as to show that it does not depend on light. Cells could still be viable, as membrane permeabilisation with light is gradual, but the loading of ThT dye is no longer based on simple electrochemical potential (of the dye) equilibration.

      (5) On the comment on the action of CCCP with references included, authors include a comment that consists of phrases like 'our understanding of the literature' with no citations of such literature. Difficult to comment further without references.

      (6) 'Shielding would provide the reverse effect, since hyperpolarization begins in the dense centres of the biofilms. For the initial 2 hours the cells receive negligible blue light. Neither of the referee's comments thus seem tenable.'<br /> The authors have misunderstood my comment. I am not advocating shielding (I agree that this is not it) but stating that this is not the only other explanation for what they see (apart from electrical signaling). The other I proposed is that the membrane has changed in composition and/or the effective light power the cells can tolerate. The authors comment only on the light power (not convincingly though, giving the number for that power would be more appropriate), not on the possible changes in the membrane permeability.

      (7) 'The work that TolC provides a possible passive pathway for ThT to leave cells seems slightly niche. It just demonstrates another mechanism for the cells to equilibrate the concentrations of ThT in a Nernstian manner i.e. driven by the membrane voltage.' I am not sure what the authors mean by another mechanism. The mechanism of action of a Nernstian dye is passive equilibration according to the electrochemical potential (i.e. until the electrochemical potential of the dye is 0).

      (8) 'In the 70 years since Hodgkin and Huxley first presented their model, a huge number of similar models have been proposed to describe cellular electrophysiology. We are not being hyperbolic when we state that the HH models for excitable cells are like the Schrödinger<br /> equation for molecules. We carefully adapted our HH model to reflect the currently understood electrophysiology of E. coli.'

      I gave a very concrete comment on the fact that in the HH model conductivity and leakage are as they are because this was explicitly measured. The authors state that they have carefully adopted their model based on what is currently understood for E.coli electrophysiology. It is not clear how. HH uses gKn^4 based on Figure2 here https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1392413/pdf/jphysiol01442-0106.pdf, i.e. measured rise and fall of potassium conductance on msec time scales. I looked at the citation the authors have given and found a resistance of an entire biofilm of a given strain at 3 applied voltages. So why n^4 based on that? Why does unknown current have gqz^4 form? Sodium conductance in HH is described by m^3hgNa (again based on detailed conductance measurements), so why unknown current in E.coli by gQz^4? Why leakage is in the form that it is, based on what measurement?

      Throughout their responses, the authors seem to think that collapsing the electrochemical gradient of protons is all about protons, and this is not the case. At near neutral inside and outside pH, the electrochemical potential of protons is simply membrane voltage. And membrane voltage acts on all ions in the cell.

      Authors have started their response to concrete comments on the usage of ThT dye with comments on papers from my group that are not all directly relevant to this publication. I understand that their intention is to discredit a reviewer but given that my role here is to review this manuscript, I will only address their comments to the publications/part of publications that are relevant to this manuscript and mention what is not relevant.

      Publications in the order these were commented on.

      (1) In a comment on the paper that describes the usage of ThT dye as a Nernstian dye authors seem to talk about a model of an entire active cell.<br /> 'Huge oscillations occur in the membrane potentials of E. coli that cannot be described by the SNB model.' The two have nothing to do with each other. Nernstian dye equilibrates according to its electrochemical potential. Once that happens it can measure the potential (under the assumption that not too much dye has entered and thus lowered too much the membrane potential under measurement). The time scale of that is important, and the dye can only measure processes that are slower than that equilibration. If one wants to use a dye that acts under a different model, first that needs to be developed, and then coupled to any other active cell model.

      (2) The part of this paper that is relevant is simply the usage of TMRM dye. It is used as Nernstian dye, so all the above said applies. The rest is a study of flagellar motor.

      (3) The authors seem to not understand that the electrochemical potential of protons is coupled to the electrochemical potentials of all other ions, via the membrane potential. In the manuscript authors talk about, PMF~Vm, as DeltapH~0. Other than that this publication is not relevant to their current manuscript.

      (4) The manuscript in fact states precisely that PMF cannot be generated by protons only and some other ions need to be moved out for the purpose. In near neutral environment it stated that these need to be cations (K+ e.g.). The model used in this manuscript is a pump-leak model. Neither is relevant for the usage of ThT dye.

      Further comments include, along the lines of:

      'The editors stress the main issue raised was a single referee questioning the use of ThT as an indicator of membrane potential. We are well aware of the articles by the Pilizota group and we believe them to be scientifically flawed. The authors assume there are no voltage-gated ion channels in E. coli and then attempt to explain motility data based on a simple Nernstian battery model (they assume E. coli are unexcitable<br /> matter). This in turn leads them to conclude the membrane dye ThT is faulty, when in fact it is a problem with their simple battery model.'

      The only assumption made when using a cationic Nernstian dye is that it equilibrates passively across the membrane according to its electrochemical potential. As it does that, it does lower the membrane potential, which is why as little as possible is added so that this is negligible. The equilibration should be as fast as possible, but at the very least it should be known, as no change in membrane potential can be measured that is faster than that.

      This behaviour should be orthogonal to what the cell is doing, it is a probe after all. If the cell is excitable, a Nernstian dye can be used, as long as it's still passively equilibrating and doing so faster than any changes in membrane potential due to excitations of the cells. There are absolutely no assumptions made on the active system that is about to be measured by this expected behaviour of a Nernstian dye. And there shouldn't be, it is a probe. If one wants to use a dye that is not purely Nernstian that behaviour needs to be described and a model proposed. As far as I can find, authors do no such thing.

      There is a comment on the use of a flagellar motor as a readout of PMF, stating that the motor can be stopped by YcgR citing the work from 2023. Indeed, there is a range of references such as https://doi.org/10.1016/j.molcel.2010.03.001 that demonstrate this (from around 2000-2010 as far as I am aware). The timescale of such slowdown is hours (see here Figure 5 https://www.cell.com/cell/pdf/S0092-8674(10)00019-X.pdf). Needless to say, the flagellar motor when used as a probe, needs to stay that in the conditions used. Thus one should always be on the lookout at any other such proteins that could slow it down and we are not aware of yet or make the speed no longer proportional to the PMF. In the papers my group uses the motor the changes are fast, often reversible, and in the observation window of 30min. They are also the same with DeltaYcgR strain, which we have not included as it seemed given the time scales it's obvious, but certainly can in the future (as well as stay vigilant on any conditions that would render the motor a no longer suitable probe for PMF).

    4. Reviewer #3 (Public Review):

      This manuscript by Akabuogu et al. investigates membrane potential dynamics in E. coli. Membrane potential fluctuations have been observed in bacteria by several research groups in recent years, including in the context of bacterial biofilms where they have been proposed to play a role in cellular communication. Here, these authors investigate membrane potential in E. coli, in both single cells and biofilms. I have reviewed the revised manuscript provided by the authors, as well as their responses to the initial reviews; my opinion about the manuscript is largely unchanged. I have focused my public review on those issues that I believe to be most pressing, with additional comments included in the review to authors. Although these authors are working in an exciting research area, the evidence they provide for their claims is inadequate, and several key control experiments are still missing. In some cases, the authors allude to potentially relevant data in their responses to the initial reviews, but unfortunately these data are not shown. Furthermore, I cannot identify any traveling wavefronts in the data included in this manuscript. In addition to the challenges associated with the use of Thioflavin-T (ThT) raised by the second reviewer, these caveats make the work presented in this manuscript difficult to interpret.

      First, some of the key experiments presented in the paper lack required controls:

      (1) This paper asserts that the observed ThT fluorescence dynamics are induced by blue light. This is a fundamental claim in the paper, since the authors go on to argue that these dynamics are part of a blue light response. This claim must be supported by the appropriate negative control experiment measuring ThT fluorescence dynamics in the absence of blue light- if this idea is correct, these dynamics should not be observed in the absence of blue light exposure. If this experiment cannot be performed with ThT since blue light is used for its excitation, TMRM can be used instead.

      In response to this, the authors wrote that "the fluorescent baseline is too weak to measure cleanly in this experiment." If they observe no ThT signal above noise in their time lapse data in the absence of blue light, this should be reported in the manuscript- this would be a satisfactory negative control. They then wrote that "It appears the collective response of all the bacteria hyperpolarization at the same time appears to dominate the signal." I am not sure what they mean by this- perhaps that ThT fluorescence changes strongly only in response to blue light? This is a fundamental control for this experiment that ought to be presented to the reader.

      (2) The authors claim that a ∆kch mutant is more susceptible to blue light stress, as evidenced by PI staining. The premise that the cells are mounting a protective response to blue light via these channels rests on this claim. However, they do not perform the negative control experiment, conducting PI staining for WT the ∆kch mutant in the absence of blue light. In the absence of this control it is not possible to rule out effects of the ∆kch mutation on overall viability and/or PI uptake. The authors do include a growth curve for comparison, but planktonic growth is a very different context than surface-attached biofilm growth. Additionally, the ∆kch mutation may have impacts on PI permeability specifically that are not addressed by a growth curve. The negative control experiment is of key importance here.

      Second, the ideas presented in this manuscript rely entirely on analysis of ThT fluorescence data, specifically a time course of cellular fluorescence following blue light treatment. However, alternate explanations for and potential confounders of the observed dynamics are not sufficiently addressed:

      (1) Bacterial cells are autofluorescent, and this fluorescence can change significantly in response to stress (e.g. blue light exposure). To characterize and/or rule out autofluorescence contributions to the measurement, the authors should present time lapse fluorescence traces of unstained cells for comparison, acquired under the same imaging conditions in both wild type and ∆kch mutant cells. In their response to reviewers the authors suggested that they have conducted this experiment and found that the autofluorescence contribution is negligible, which is good, but these data should be included in the manuscript along with a description of how these controls were conducted.

      (2) Similarly, in my initial review I raised a concern about the possible contributions of photobleaching to the observed fluorescence dynamics. This is particularly relevant for the interpretation of the experiment in which catalase appears to attenuate the decay of the ThT signal; this attenuation could alternatively be due to catalase decreasing ThT photobleaching. In their response, the authors indicated that photobleaching is negligible, which would be good, but they do not share any evidence to support this claim. Photobleaching can be assessed in this experiment by varying the light dosage (illumination power, frequency, and/or duration) and confirming that the observed fluorescence dynamics are unaffected.

      Third, the paper claims in two instances that there are propagating waves of ThT fluorescence that move through biofilms, but I do not observe these waves in any case:

      (1) The first wavefront claim relates to small cell clusters, in Fig. 2A and Video S2 and S3 (with Fig. 2A and Video S2 showing the same biofilm.) I simply do not see any evidence of propagation in either case- rather, all cells get brighter and dimmer in tandem. I downloaded and analyzed Video S3 in several ways (plotting intensity profiles for different regions at different distances from the cluster center, drawing a kymograph across the cluster, etc.) and in no case did I see any evidence of a propagating wavefront. (I attempted this same analysis on the biofilm shown in Fig. 2A and Video S2 with similar results, but the images shown in the figure panels and especially the video are still both so saturated that the quantification is difficult to interpret.) If there is evidence for wavefronts, it should be demonstrated explicitly by analysis of several clusters. For example, a figure of time-to-peak vs. position in the cluster demonstrating a propagating wave would satisfy this. Currently, I do not see any wavefronts in this data.

      (2) The other wavefront claim relates to biofilms, and the relevant data is presented in Fig. S4 (and I believe also in what is now Video S8, but no supplemental video legends are provided, and this video is not cited in text.) As before, I cannot discern any wavefronts in the image and video provided; Reviewer 1 was also not able to detect wave propagation in this video by kymograph. Some mean squared displacements are shown in Fig. 7. As before, the methods for how these were obtained are not clearly documented either in this manuscript or in the BioRXiv preprint linked in the initial response to reviewers, and since wavefronts are not evident in the video it is hard to understand what is being measured here- radial distance from where? (The methods section mentions radial distance from the substrate, this should mean Z position above the imaging surface, and no wavefronts are evident in Z in the figure panels or movie.) Thus, clear demonstration of these wavefronts is still missing here as well.

      Fourth, I have some specific questions about the study of blue light stress and the use of PI as a cell viability indicator:

      (1) The logic of this paper includes the premise that blue light exposure is a stressor under the experimental conditions employed in the paper. Although it is of course generally true that blue light can be damaging to bacteria, this is dependent on light power and dosage. The control I recommended above, staining cells with PI in the presence and absence of blue light, will also allow the authors to confirm that this blue light treatment is indeed a stressor- the PI staining would be expected to increase in the presence of blue light if this is so.

      (2) The presence of ThT may complicate the study of the blue light stress response, since ThT enhances the photodynamic effects of blue light in E. coli (Bondia et al. 2021 Chemical Communications). The authors could investigate ThT toxicity under these conditions by staining cells with PI after exposing them to blue light with or without ThT staining.

      (3) In my initial review, I wrote the following: "In Figures 4D - E, the interpretation of this experiment can be confounded by the fact that PI uptake can sometimes be seen in bacterial cells with high membrane potential (Kirchhoff & Cypionka 2017 J Microbial Methods); the interpretation is that high membrane potential can lead to increased PI permeability. Because the membrane potential is largely higher throughout blue light treatment in the ∆kch mutant (Fig. 3[BC]), this complicates the interpretation of this experiment." In their response, the authors suggested that these results are not relevant in this case because "In our experiment methodology, cell death was not forced on the cells by introducing an extra burden or via anoxia." However, the logic of the paper is that the cells are in fact dying due to an imposed external stressor, which presumably also confers an increased burden as the cells try to deal with the stress. Instead, the authors should simply use a parallel method to confirm the results of PI staining. For example, the experiment could be repeated with other stains, or the viability of blue light-treated cells could be addressed more directly by outgrowth or colony-forming unit assays.

      The CFU assay suggested above has the additional advantage that it can also be performed on planktonic cells in liquid culture that are exposed to blue light. If, as the paper suggests, a protective response to blue light is being coordinated at the biofilm level by these membrane potential fluctuations, the WT strain might be expected to lose its survival advantage vs. the ∆kch mutant in the absence of a biofilm.

      Fifth, in several cases the data are presented in a way that are difficult to interpret, or the paper makes claims that are different to observe in the data:

      (1) The authors suggest that the ThT and TMRM traces presented in Fig. S1D have similar shapes, but this is not obvious to me- the TMRM curve has very little decrease after the initial peak and only a modest, gradual rise thereafter. The authors suggest that this is due to increased TMRM photobleaching, but I would expect that photobleaching should exacerbate the signal decrease after the initial peak. Since this figure is used to support the use of ThT as a membrane potential indicator, and since this is the only alternative measurement of membrane potential presented in text, the authors should discuss this discrepancy in more detail.

      (2) The comparison of single cells to microcolonies presented in figures 1B and D still needs revision:

      First, both reviewer 1 and I commented in our initial reviews that the ThT traces, here and elsewhere, should not be normalized- this will help with the interpretation of some of the claims throughout the manuscript.

      Second, the way these figures are shown with all traces overlaid at full opacity makes it very difficult to see what is being compared. Since the point of the comparison is the time to first peak (and the standard deviation thereof), histograms of the distributions of time to first peak in both cases should be plotted as a separate figure panel.<br /> Third, statistical significance tests ought to be used to evaluate the statistical strength of the comparisons between these curves. The authors compare both means and standard deviations of the time to first peak, and there are appropriate statistical tests for both types of comparisons.

      (3) The authors claim that the curve shown in Fig. S4B is similar to the simulation result shown in Fig. 7B. I remain unconvinced that this is so, particularly with respect to the kinetics of the second peak- at least it seems to me that the differences should be acknowledged and discussed. In any case, the best thing to do would be to move Fig. S4B to the main text alongside Fig. 7B so that the readers can make the comparison more easily.

      (4) As I wrote in my first review, in the discussion of voltage-gated calcium channels, the authors refer to "spiking events", but these are not obvious in Figure S3E. Although the fluorescence intensity changes over time, these fluctuations cannot be distinguished from measurement noise. A no-light control could help clarify this.

      (5) In the lower irradiance conditions in Fig. 4A, the ThT dynamics are slower overall, and it looks like the ThT intensity is beginning to rise at the end of the measurement. The authors write that no second peak is observed below an irradiance threshold of 15.99 µW/mm2. However, could a more prominent second peak be observed in these cases if the measurement time was extended? Additionally, the end of these curves looks similar to the curve in Fig. S4B, in which the authors write that the slow rise is evidence of the presence of a second peak, in contrast to their interpretation here.

      Additional considerations:

      (1) The analysis and interpretation of the first peak, and particularly of the time-to-fire data is challenging throughout the manuscript the time resolution of the data set is quite limited. It seems that a large proportion of cells have already fired after a single acquisition frame. It would be ideal to increase the time resolution on this measurement to improve precision. This could be done by imaging more quickly, but that would perhaps necessitate more blue light exposure; an alternative is to do this experiment under lower blue light irradiance where the first spike time is increased (Figure 4A).

      (2) The authors suggest in the manuscript that "E. coli biofilms use electrical signalling to coordinate long-range responses to light stress." In addition to the technical caveats discussed above, I am missing a discussion about what these responses might be. What constitutes a long-range response to light stress, and are there known examples of such responses in bacteria?

      (3) The presence of long-range blue light responses can also be interrogated experimentally, for example, by repeating the Live/Dead experiment in planktonic culture or the single-cell condition. If the protection from blue light specifically emerges due to coordinated activity of the biofilm, the ∆kch mutant would not be expected to show a change in Live/Dead staining in non-biofilm conditions. The CFU experiment I mentioned above could also implicate coordinated long-range responses specifically, if biofilms and liquid culture experiments can be compared (although I know that recovering cells from biofilms is challenging.)

      4. At the end of the results section, the authors suggest a critical biofilm size of only 4 μm for wavefront propagation (not much larger than a single cell!) The authors show responses for various biofilm sizes in Fig. 2C, but these are all substantially larger (and this figure also does not contain wavefront information.) Are there data for cell clusters above and below this size that could support this claim more directly?

      (5) In Fig. 4C, the overall trajectories of extracellular potassium are indeed similar, but the kinetics of the second peak of potassium are different than those observed by ThT (it rises minutes earlier)- is this consistent with the idea that Kch is responsible for that peak? Additionally, the potassium dynamics also include the first ThT peak- is this surprising given that the Kch channel has no effect on this peak according to the model?

      Detailed comments:

      Why are Fig. 2A and Video S2 called a microcluster, whereas Video S3, which is smaller, is called a biofilm?

      "We observed a spontaneous rapid rise in spikes within cells in the center of the biofilm" (Line 140): What does "spontaneous" mean here?

      "This demonstrates that the ion-channel mediated membrane potential dynamics is a light stress relief process.", "E. coli cells employ ion-channel mediated dynamics to manage ROS-induced stress linked to light irradiation." (Line 268 and the second sentence of the Fig. 4F legend): This claim is not well-supported. There are several possible interpretations of the catalase experiment (which should be discussed); this experiment perhaps suggests that ROS impacts membrane potential but does not indicate that these membrane potential fluctuations help the cells respond to blue light stress. The loss of viability in the ∆kch mutant might indicate a link between these membrane potential experiments and viability, but it is hard to interpret without the no light controls I mention above.

      "The model also predicts... the external light stress" (Lines 338-341): Please clarify this section. Where does this prediction arise from in the modeling work? Second, I am not sure what is meant by "modulates the light stress" or "keeps the cell dynamics robust to the intensity of external light stress" (especially since the dynamics clearly vary with irradiance, as seen in Figure 4A).

      "We hypothesized that E. coli not only modulates the light-induced stress but also handles the increase of the ROS by adjusting the profile of the membrane potential dynamics" (Line 347): I am not sure what "handles the ROS by adjusting the profile of the membrane potential dynamics" means. What is meant by "handling" ROS? Is the hypothesis that membrane potential dynamics themselves are protective against ROS, or that they induce a ROS-protective response downstream, or something else? Later the authors write that changes in the response to ROS in the model agree with the hypothesis, but just showing that ROS impacts the membrane potential does not seem to demonstrate that this has a protective effect against ROS.

      "Mechanosensitive ion channels (MS) are vital for the first hyperpolarization event in E. coli." (Line 391): This is misleading- mechanosensitive ion channels totally ablate membrane potential dynamics, they don't have a specific effect on the first hyperpolarization event. The claim that mechanonsensitive ion channels are specifically involved in the first event also appears in the abstract.

      Also, the apparent membrane potential is much lower even at the start of the experiment in these mutants (Fig. 6C-D)- is this expected? This seems to imply that these ion channels also have a blue light-independent effect.

      Throughout the paper, there are claims that the initial ThT spike is involved in "registering the presence of the light stress" and similar. What is the evidence for this claim?

      "We have presented much better quantitative agreement of our model with the propagating wavefronts in E. coli biofilms..." (Line 619): It is not evident to me that the agreement between model and prediction is "much better" in this work than in the cited work (reference 57, Hennes et al. 2023). The model in Figure 4 of ref. 57 seems to capture the key features of their data.

      In methods, "Only cells that are hyperpolarized were counted in the experiment as live" (Line 745): what percentage of cells did not hyperpolarize in these experiments?

      Some indication of standard deviation (error bars or shading) should be added to all figures where mean traces are plotted.

      Video S8 is very confusing- why does the video play first forwards and then backwards? It is easy to misinterpret this as a rise in the intensity at the end of the experiment.

    1. eLife Assessment

      This is a fundamental study that provides a detailed single-cell transcriptomic and epigenomic map of the mouse trabecular meshwor, identifying three distinct trabecular meshwor subtypes with specific functional roles. It links the glaucoma-associated transcription factor LMX1B to mitochondrial regulation in TM3 cells and demonstrates that nicotinamide treatment prevents IOP elevation in Lmx1bV265D/+ mutant mice, highlighting a potential metabolic therapeutic strategy for glaucoma. This convincing work would be further supported by data that link the transcriptional data with mitochondrial functional assays.

    2. Reviewer #1 (Public review):

      Summary:

      This study provides a comprehensive single-cell and multiomic characterization of trabecular meshwork (TM) cells in the mouse eye, a structure critical to intraocular pressure (IOP) regulation and glaucoma pathogenesis. Using scRNA-seq, snATAC-seq, immunofluorescence, and in situ hybridization, the authors identify three transcriptionally and spatially distinct TM cell subtypes. The study further demonstrates that mitochondrial dysfunction, specifically in one subtype (TM3), contributes to elevated IOP in a genetic mouse model of glaucoma carrying a mutation in the transcription factor Lmx1b. Importantly, treatment with nicotinamide (vitamin B3), known to support mitochondrial health, prevents IOP elevation in this model. The authors also link their findings to human datasets, suggesting the existence of analogous TM3-like cells with potential relevance to human glaucoma.

      Strengths:

      The study is methodologically rigorous, integrating single-cell transcriptomic and chromatin accessibility profiling with spatial validation and in vivo functional testing. The identification of TM subtypes is consistent across mouse strains and institutions, providing robust evidence of conserved TM cell heterogeneity. The use of a glaucoma model to show subtype-specific vulnerability, combined with a therapeutic intervention-gives the study strong mechanistic and translational significance. The inclusion of chromatin accessibility data adds further depth by implicating active transcription factors such as LMX1B, a gene known to be associated with glaucoma risk. The integration with human single-cell datasets enhances the potential relevance of the findings to human disease.

      Weaknesses:

      Although the LMX1B transcription factor is implicated as a key regulator in TM3 cells, its role in directly controlling mitochondrial gene expression is not fully explored. Additional analysis of motif accessibility or binding enrichment near relevant target genes could substantiate this mechanistic link. The therapeutic effect of vitamin B3 is clearly demonstrated phenotypically, but the underlying cellular and molecular mechanisms remain somewhat underdeveloped - for instance, changes in mitochondrial function, oxidative stress markers, or NAD+ levels are not directly measured. While the human relevance of TM3 cells is suggested through marker overlap, more quantitative approaches, such as cell identity mapping or gene signature scoring in human datasets, would strengthen the translational connection.

      Overall, this is a compelling and carefully executed study that offers significant advances in our understanding of TM cell biology and its role in glaucoma. The integration of multimodal data, disease modeling, and therapeutic testing represents a valuable contribution to the field. With additional mechanistic depth, the study has the potential to become a foundational resource for future research into IOP regulation and glaucoma treatment.

    3. Reviewer #2 (Public review):

      Summary:

      This elegant study by Tolman and colleagues provides fundamental findings that substantially advance our knowledge of the major cell types within the limbus of the mouse eye, focusing on the aqueous humor outflow pathway. The authors used single-cell and single-nuclei RNAseq to very clearly identify 3 subtypes of the trabecular meshwork (TM) cells in the mouse eye, with each subtype having unique markers and proposed functions. The U. Columbia results are strengthened by an independent replication in a different mouse strain at a separate laboratory (Duke). Bioinformatics analyses of these expression data were used to identify cellular compartments, molecular functions, and biological processes. Although there were some common pathways among the 3 subtypes of TM cells (e.g., ECM metabolism), there also were distinct functions. For example:

      • TM1 cell expression supports heavy engagement in ECM metabolism and structure, as well as TGFβ2 signaling.

      • TM2 cells were enriched in laminin and pathways involved in phagocytosis, lysosomal function, and antigen expression, as well as End3/VEGF/angiopoietin signaling.

      • TM3 cells were enriched in actin binding and mitochondrial metabolism.

      They used high-resolution immunostaining and in situ hybridization to show that these 3 TM subtypes express distinct markers and occupy distinct locations within the TM tissue. The authors compared their expression data with other published scRNAseq studies of the mouse as well as the human aqueous outflow pathway. They used ATAC-seq to map open chromatin regions in order to predict transcription factor binding sites. Their results were also evaluated in the context of human IOP and glaucoma risk alleles from published GWAS data, with interesting and meaningful correlations. Although not discussed in their manuscript, their expression data support other signaling pathways/ proteins/ genes that have been implicated in glaucoma, including: TGFβ2, BMP signaling (including involvement of ID proteins), MYOC, actin cytoskeleton (CLANs), WNT signaling, etc.

      In addition to these very impressive data, the authors used scRNAseq to examine changes in TM cell gene expression in the mouse glaucoma model of mutant Lmxb1-induced ocular hypertension. In man, LMX1B is associated with Nail-Patella syndrome, which can include the development of glaucoma, demonstrating the clinical relevance of this mouse model. Among the gene expression changes detected, TM3 cells had altered expression of genes associated with mitochondrial metabolism. The authors used their previous experience using nicotinamide to metabolically protect DBA2/J mice from glaucomatous damage, and they hypothesized that nicotinamide supplementation of mutant Lmx1b mice would help restore normal mitochondrial metabolism in the TM and prevent Lmx1b-mediated ocular hypertension. Adding nicotinamide to the drinking water significantly prevented Lmxb1 mutant mice from developing high intraocular pressure. This is a laudable example of dissecting the molecular pathogenic mechanisms responsible for a disease (glaucoma) and then discovering and testing a potential therapy that directly intervenes in the disease process and thereby protects from the disease.

      Strengths:<br /> There are numerous strengths in this comprehensive study including:<br /> • Deep scRNA sequencing that was confirmed by an independent dataset in another mouse strain at another university.<br /> • Identification and validation of molecular markers for each mouse TM cell subset along with localization of these subsets within the mouse aqueous outflow pathway.<br /> • Rigorous bioinformatics analysis of these data as well as comparison of the current data with previously published mouse and human scRNAseq data.<br /> • Correlating their current data with GWAS glaucoma and IOP "hits".<br /> • Discovering gene expression changes in the 3 TM subgroups in the mouse mutant Lmx1b model of glaucoma.<br /> • Further pursuing the indication of dysfunctional mitochondrial metabolism in TM3 cells from Lmx1b mutant mice to test the efficacy of dietary supplementation with nicotinamide. The authors nicely demonstrate the disease modifying efficacy of nicotinamide in preventing IOP elevation in these Lmx1b mutant mice, preventing the development of glaucoma. These results have clinical implications for new glaucoma therapies.

      Weaknesses:<br /> • Occasional over-interpretation of data. The authors have used changes in gene expression (RNAseq) to implicate functions and signaling pathways. For example: they have not directly measured "changes in metabolism", "mitochondrial dysfunction" or "activity of Lmx1b".<br /> • In their very thorough data set, there is enrichment of or changes in gene expression that support other pathways that have been previously reported to be associated with glaucoma (such as TGFβ2, BMP signaling, actin cytoskeletal organization (CLANs), WNT signaling, ossification, etc. that appears to be a lost opportunity to further enhance the significance of this work.

    4. Reviewer #3 (Public review):

      Summary:In this study, the authors perform multimodal single-cell transcriptomic and epigenomic profiling of 9,394 mouse TM cells, identifying three transcriptionally distinct TM subtypes with validated molecular signatures. TM1 cells are enriched for extracellular matrix genes, TM2 for secreted ligands supporting Schlemm's canal, and TM3 for contractile and mitochondrial/metabolic functions. The transcription factor LMX1B, previously linked to glaucoma, shows the highest expression in TM3 cells and appears to regulate mitochondrial pathways. In Lmx1bV265D mutant mice, TM3 cells exhibit transcriptional signs of mitochondrial dysfunction associated with elevated IOP. Notably, vitamin B3 treatment significantly mitigates IOP elevation, suggesting a potential therapeutic avenue.

      This is an excellent and collaborative study involving investigators from two institutions, offering the most detailed single-cell transcriptomic and epigenetic profiling of the mouse limbal tissues-including both TM and Schlemm's canal (SC), from wild-type and Lmx1bV265D mutant mice. The study defines three TM subtypes and characterizes their distinct molecular signatures, associated pathways, and transcriptional regulators. The authors also compare their dataset with previously published murine and human studies, including those by Van Zyl et al., providing valuable cross-species insights.

      Strengths:

      (1) Comprehensive dataset with high single-cell resolution<br /> (2) Use of multiple bioinformatic and cross-comparative approaches<br /> (3) Integration of 3D imaging of TM and SC for anatomical context<br /> (4) Convincing identification and validation of three TM subtypes using molecular markers.

      Weaknesses:

      (1) Insufficient evidence linking mitochondrial dysfunction to TM3 cells in Lmx1bV265D mice: While the identification of TM3 cells as metabolically specialized and Lmx1b-enriched is compelling, the proposed link between Lmx1b mutation and mitochondrial dysfunction remains underdeveloped. It is unclear whether mitochondrial defects are a primary consequence of Lmx1b-mediated transcriptional dysregulation or a secondary response to elevated IOP. Additional evidence is needed to clarify whether Lmx1b directly regulates mitochondrial genes (e.g., via ChIP-seq, motif analysis, or ATAC-seq), or whether mitochondrial changes are downstream effects.<br /> Furthermore, the protective effects of nicotinamide (NAM) are interpreted as evidence of mitochondrial involvement, but no direct mitochondrial measurements (e.g., immunostaining, electron microscopy, OCR assays) are provided. It is essential to validate mitochondrial dysfunction in TM3 cells using in vivo functional assays to support the central conclusion of the paper. Without this, the claim that mitochondrial dysfunction drives IOP elevation in Lmx1bV265D mice remains speculative. Alternatively, authors should consider revising their claims that mitochondrial dysfunction in these mice is a central driver of TM dysfunction.

      (2) Mechanism of NAM-mediated protection is unclear: The manuscript states that NAM treatment prevents IOP elevation in Lmx1bV265D mice via metabolic support, yet no data are shown to confirm that NAM specifically rescues mitochondrial function. Do NAM-treated TM3 cells show improved mitochondrial integrity? Are reactive oxygen species (ROS) reduced? Does NAM also protect RGCs from glaucomatous damage? Addressing these points would clarify whether the therapeutic effects of NAM are indeed mitochondrial.

      (3) Lack of direct evidence that LMX1B regulates mitochondrial genes: While transcriptomic and motif accessibility analyses suggest that LMX1B is enriched in TM3 cells and may influence mitochondrial function, no mechanistic data are provided to demonstrate direct regulation of mitochondrial genes. Including ChIP-seq data, motif enrichment at mitochondrial gene loci, or perturbation studies (e.g., Lmx1b knockout or overexpression in TM3 cells) would greatly strengthen this central claim.

      (4)Focus on LMX1B in Fig. 5F lacks broader context: Figure 5F shows that several transcription factors (TFs)-including Tcf21, Foxs1, Arid3b, Myc, Gli2, Patz1, Plag1, Npas2, Nr1h4, and Nfatc2-exhibit stronger positive correlations or motif accessibility changes than LMX1B. Yet the manuscript focuses almost exclusively on LMX1B. The rationale for this focus should be clarified, especially given LMX1B's relatively lower ranking in the correlation analysis. Were the functions of these other highly ranked TFs examined or considered in the context of TM biology or glaucoma? Discussing their potential roles would enhance the interpretation of the transcriptional regulatory landscape and demonstrate the broader relevance of the findings.

      Other weaknesses:

      (1) In abstract, they say a number of 9,394 wild-type TM cell transcriptomes. The number of Lmx1bV265D/+ TM cell transcriptomes analyzed is not provided. This information is essential for evaluating the comparative analysis and should be clearly stated in the Abstract and again in the main text (e.g., lines 121-123). Including both wild-type and mutant cell counts will help readers assess the balance and robustness of the dataset.

      (2) Did the authors monitor mouse weight or other health parameters to assess potential systemic effects of treatment? It is known that the taste of compounds in drinking water can alter fluid or food intake, which may influence general health. Also, does Lmx1bV265D/+ have mice exhibit non-ocular phenotypes, and if so, does nicotinamide confer protection in those tissues as well? Additionally, starting the dose of the nicotinamide at postnatal day 2, how long the mice were treated with water containing nicotinamide, and after how many days or weeks IOP was reduced, and how long the decrease in the IOP was sustained.<br /> (3) While the IOP reduction observed in NAM-treated Lmx1bV265D/+ mice appears statistically significant, it is unclear whether this reflects meaningful biological protection. Several untreated mice exhibit very high IOP values, which may skew the analysis. The authors should report the mean values for IOP in both untreated and NAM-treated groups to clarify the magnitude and variability of the response.<br /> (4) Additionally, since NAM has been shown to protect RGCs in other glaucoma models directly, the authors should assess whether RGCs are preserved in NAM-treated Lmx1b V265D/+ mice. Demonstrating RGC protection would support a synergistic effect of NAM through both IOP reduction and direct neuroprotection, strengthening the translational relevance of the treatment.<br /> (5) Can the authors add any other functional validation studies to explore to understand the pathways enriched in all the subtypes of TM1, TM2, and TM3 cells, in addition to the ICH/IF/RNAscope validation?<br /> (6) The authors should include a representative image of the limbal dissection. While Figure S1 provides a schematic, mouse eyes are very small, and dissecting unfixed limbal tissue is technically challenging. It is also difficult to reconcile the claim that the majority of cells in the limbal region are TM and endothelium. As shown in Figure S6, DAPI staining suggests a much higher abundance of scleral cells compared to TM cells within the limbal strip. Additional clarification or visual evidence would help validate the dissection strategy and cellular composition of the captured region.

    1. eLife Assessment

      This is a valuable methodological contribution towards accurate characterization of viral genetic diversity using long-read sequencing and unique molecular identifiers (UMIs). However, the methods are currently incomplete and the sensitivity is not rigorously demonstrated. Addressing these gaps would strengthen the manuscript and make it a key addition to the field.

    2. Reviewer #1 (Public review):

      Tamao et al. aimed to quantify the diversity and mutation rate of the influenza (PR8 strain) in order to establish a high-resolution method for studying intra-host viral evolution. To achieve this, the authors combined RNA sequencing with single-molecule unique molecular identifiers (UMIs) to minimize errors introduced during technical processing. They proposed an in vitro infection model with a single viral particle to represent biological genetic diversity, alongside a control model using in vitro transcribed RNA for two viral genes, PB2 and HA.

      Through this approach, the authors demonstrated that UMIs reduced technical errors by approximately tenfold. By analyzing four viral populations and comparing them to in vitro transcribed RNA controls, they estimated that ~98.1% of observed mutations originated from viral replication rather than technical artifacts. Their results further showed that most mutations were synonymous and introduced randomly. However, the distribution of mutations suggested selective pressures that favored certain variants. Additionally, comparison with a closely related influenza strain (A/Alaska/1935) revealed two positively selected mutations, though these were absent in the strain responsible for the most recent pandemic (CA01).

      Overall, the study is well-designed, and the interpretations are strongly supported by the data. However, the following clarifications are recommended:

      (1) The methods section is overly brief. Even if techniques are cited, more experimental details should be included. For example, since the study focuses heavily on methodology, details such as the number of PCR cycles in RT-PCR or the rationale for choosing HA and PB2 as representative in vitro transcripts should be provided.

      (2) Information on library preparation and sequencing metrics should be included. For example, the total number of reads, any filtering steps, and quality score distributions/cutoff for the analyzed reads.

      (3) In the Results section (line 115, "Quantification of error rate caused by RT"), the mutation rate attributed to viral replication is calculated. However, in line 138, it is unclear whether the reported value reflects PB2, HA, or both, and whether the comparison is based on the error rate of the same viral RNA or the mean of multiple values (as shown in Figure 3A). Please clarify whether this number applies universally to all influenza RNAs or provide the observed range.

      (4) Since the T7 polymerase introduced errors are only applied to the in vitro transcription control, how were these accounted for when comparing mutation rates between transcribed RNA and cell-culture-derived virus?

      (5) Figure 2 shows that a UMI group size of 4 has an error rate of zero, but this group size is not mentioned in the text. Please clarify.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript presents a technically oriented application of UMI-based long-read sequencing to study intra-host diversity in influenza virus populations. The authors aim to minimize sequencing artifacts and improve the detection of rare variants, proposing that this approach may inform predictive models of viral evolution. While the methodology appears robust and successfully reduces sequencing error rates, key experimental and analytical details are missing, and the biological insight is modest. The study includes only four samples, with no independent biological replicates or controls, which limits the generalizability of the findings. Claims related to rare variant detection and evolutionary selection are not fully supported by the data presented.

      Strengths:

      The study addresses an important technical challenge in viral genomics by implementing a UMI-based long-read sequencing approach to reduce amplification and sequencing errors. The methodological focus is well presented, and the work contributes to improving the resolution of low-frequency variant detection in complex viral populations.

      Weaknesses:

      The application of UMI-based error correction to viral population sequencing has been established in previous studies (e.g., in HIV), and this manuscript does not introduce a substantial methodological or conceptual advance beyond its use in the context of influenza.

      The study lacks independent biological replicates or additional viral systems that would strengthen the generalizability of the conclusions. Potential sources of technical error are not explored or explicitly controlled. Key methodological details are missing, including the number of PCR cycles, the input number of molecules, and UMI family size distributions. These are essential to support the claimed sensitivity of the method.

      The assertion that variants at {greater than or equal to}0.1% frequency can be reliably detected is based on total read count rather than the number of unique input molecules. Without information on UMI diversity and family sizes, the detection limit cannot be reliably assessed.

      Although genetic variation is described, the functional relevance of observed mutations in HA and NA is not addressed or discussed in the context of known antigenic or evolutionary features of influenza. The manuscript is largely focused on technical performance, with limited exploration of the biological implications or mechanistic insights into influenza virus evolution.

      The experimental scale is small, with only four viral populations derived from single particles analyzed. This limited sample size restricts the ability to draw broader conclusions about quasispecies dynamics or evolutionary pressures.

    1. eLife Assessment

      This study provides important insights into the role of polyUbiquitination in neurodegenerative diseases, elucidating how pUb promotes neurodegeneration by affecting proteasomal function. The findings not only offer a new perspective on the pathophysiology of neurodegenerative diseases but also provide potential targets for developing new therapeutic strategies. The experiments in the revised submission provide solid evidence to support the conclusions.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript discusses the role of phosphorylated ubiquitin (pUb) by PINK1 kinase in neurodegenerative diseases. It reveals that elevated levels of pUb are observed in aged human brains and those affected by Parkinson's disease (PD), as well as in Alzheimer's disease (AD), aging, and ischemic injury. The study shows that increased pUb impairs proteasomal degradation, leading to protein aggregation and neurodegeneration. The authors also demonstrate that PINK1 knockout can mitigate protein aggregation in aging and ischemic mouse brains, as well as in cells treated with a proteasome inhibitor. While this study provided some interesting data, several important points should be addressed before being further consideration.

      Strengths:

      (1) Reveals a novel pathological mechanism of neurodegeneration mediated by pUb, providing a new perspective on understanding neurodegenerative diseases.

      (2) The study covers not only a single disease model but also various neurodegenerative diseases such as Alzheimer's disease, aging, and ischemic injury, enhancing the breadth and applicability of the research findings.

      Comments on revisions:

      This study, through a systematic experimental design, reveals the crucial role of pUb in forming a positive feedback loop by inhibiting proteasome activity in neurodegenerative diseases. The data are comprehensive and highly innovative. However, some of the results are not entirely convincing, particularly the staining results in Figure 1.

      In Figure 1A, the density of DAPI staining differs significantly between the control patient and the AD patient, making it difficult to conclusively demonstrate a clear increase in PINK1 in AD patients. Quantitative analysis is needed. In Fig 1C, the PINK1 staining in the mouse brain appears to resemble non-specific staining.

    1. eLife Assessment

      This manuscript presents an in-depth analysis of gene expression across multiple brown algal species with differing life histories, providing convincing evidence for the conservation of life cycle-specific gene expression. While largely descriptive, the study is an important step forward in understanding the core cellular processes that differ between life cycle phases, and its findings will be of broad interest to developmental and evolutionary biologists.

    2. Reviewer #2 (Public review):

      Summary:

      The manuscript by Ratchinski et al presents a comprehensive analysis of developmental and life history gene expression patterns in brown algal species. The manuscript shows that the degree of generation bias or generation-specific gene expression correlates with the degree of dimorphism. It also reports conservation of life cycle features within generations and marked changes in gene expression patterns in Ectocarpus in the transition between gamete and early sporophyte. The manuscript also reports considerable conservation of gene expression modules between two representative species, particularly in genes associated with conserved functional characteristics.

      Strengths:

      The manuscript represents a considerable "tour de force" dataset and analytical effort. While the data presented is largely descriptive, it is likely to provide a very useful resource for studies of brown algal development and for comparative studies with other developmental and life cycle systems.

      Comments on revisions

      The authors have provided in their response (point 1) a good clarification for their rationale in excluding fucoid algae from the study, based on the diploid nature of the fucoid life cycle. Similarly, they have noted (point 2) that "the relationship between changes in gene expression during very early sporophyte development and during alternation of life cycle generations could be investigated further using a highlydimorphic kelp model system such as Saccharina latissima." For the benefit of the reader who may not be too familiar with the different life cycles in brown algae, I would recommend that these clarifications are included in the Discussion.

      Otherwise the authors have addressed my previous comments adequately.

    1. eLife Assessment

      In this preregistered study, Kunkel and colleagues set out to compare the magnitude and duration of placebo versus nocebo effects in healthy volunteers, and also to examine the different factors contributing to these effects. The authors follow a rigorous methodology in a within-subjects design, taking into consideration standard conventions for manipulation of expectations, and using an appropriate sham condition. They present compelling evidence of long-lasting placebo and nocebo effects, with nocebo responses demonstrating consistently greater strength. These valuable results have the potential for a great impact in the field of experimental and clinical pain.

    2. Reviewer #1 (Public review):

      Summary:

      The study aimed to: (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning, (2) examine the persistence of these effects one week later, and (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participant's expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      Weaknesses:

      There are a limited number of trials per test condition (10) which means that the trajectory of responses to the manipulation may not be explored, which would be an interesting future study.

      The differences between the nocebo and control condition in pain ratings during conditioning could be explained by differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about the expectation effects here. A a randomisation error meant that 25 participants received an unbalanced number 448 of trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80), although the authors accounted for this during analysis so it is not of major concern.

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Comments on revisions:

      I am satisfied with the author's revisions to the manuscript and have no further comments.

    3. Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      Comments on revisions:

      The authors have addressed all of my concerns and comments - one point for them to verify is that indeed analyses that have not been preregistered will be flagged as such. The provided pre-registration link doesn't specify much about the analysis plans and specific tests used.

    4. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This manuscript presents a study on expectation manipulation to induce placebo and nocebo effects in healthy participants. The study follows standard placebo experiment conventions with the use of TENS stimulation as the placebo manipulation. The authors were able to achieve their aims. A key finding is that placebo and nocebo effects were predicted by recent experience, which is a novel contribution to the literature. The findings provide insights into the differences between placebo and nocebo effects and the potential moderators of these effects.

      Specifically, the study aimed to:

      (1) assess the magnitude of placebo and nocebo effects immediately after induction through verbal instructions and conditioning

      (2) examine the persistence of these effects one week later, and

      (3) identify predictors of sustained placebo and nocebo responses over time.

      Strengths:

      An innovation was to use sham TENS stimulation as the expectation manipulation. This expectation manipulation was reinforced not only by the change in pain stimulus intensity, but also by delivery of non-painful electrical stimulation, labelled as TENS stimulation.

      Questionnaire-based treatment expectation ratings were collected before conditioning and after conditioning, and after the test session, which provided an explicit measure of participants' expectations about the manipulation.

      The finding that placebo and nocebo effects are influenced by recent experience provides a novel insight into a potential moderator of individual placebo effects.

      We thank the reviewer for their thorough evaluation of our manuscript and for highlighting the novelty and originality of our study.

      Weaknesses:

      There are a limited number of trials per test condition (10), which means that the trajectory of responses to the manipulation may not be adequately explored.

      We appreciate the reviewer’s comment regarding the number of trials in the test phase. The trial number was chosen to ensure comparability with previous studies addressing similar research questions with similar designs (e.g. Colloca et al., 2010). Our primary objective was to directly compare placebo and nocebo effects within a within-subject design and to examine their persistence one week after the first test session. While we did not specifically aim to investigate the trajectory of responses within a single testing session, we fully agree that a comprehensive analysis of the trajectories of expectation effects on pain would be a valuable extension of our work. We have now acknowledged this limitation and future direction in the revised manuscript.

      The paragraph reads as follows: “It is important to note that our study was designed in alignment with previous studies addressing similar questions (e.g., Colloca et al., 2010). Our primary aim was to directly compare placebo and nocebo effects in a within-subject design and assess their persistence of these effects one week following the first test session. One limitation of our approach is the relatively short duration of each session, which may have limited our ability to examine the trajectory of responses within a single session. Future studies could address this limitation by increasing the number of trials for a more comprehensive analysis.”

      On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60, and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. There is a potential risk of revealing the manipulation to participants during the re-familiarization process, as they were not previously briefed to expect the painful stimulus intensity to vary without the application of sham TENS stimulation.

      We thank the reviewer for the opportunity to clarify this point. Participants were informed at the beginning of the experiment that we would use different stimulation intensities to re-familiarize them with the stimuli before the second test session. We are therefore confident that participants perceived this step as part of a recalibration rather than associating it with the experimental manipulation. We have added this information to the revised version of the manuscript.

      The paragraph now reads as follows: “On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60 and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation. Note that participants were informed that these pre-test stimuli were part of the recalibration and refamiliarization procedure conducted prior to the second test session.”

      The differences between the nocebo and control conditions in pain ratings during conditioning could be explained by the differing physiological effects of the different stimulus intensities, so it is difficult to make any claims about expectation effects here.

      We appreciate the reviewer’s comment and agree that, despite the careful calibration of the three pain stimuli, we cannot entirely rule out the possibility that temporal dynamics during the conditioning session were influenced by differential physiological effects of the varying stimulus intensities (e.g., intensity-dependent habituation or sensitization). We have addressed this in the revision of the manuscript, but we would like to emphasize that the stronger nocebo effects during the test phase are statistically controlled for any differences in the conditioning session.

      The paragraph now reads: “This asymmetry is noteworthy in and of itself because it occurred despite the equidistant stimulus calibration relative to the control condition prior to conditioning. It may be the result of different physiological effects of the stimuli over time or amplified learning in the nocebo condition, consistent with its heightened biological relevance, but it could also be a stronger effect of the verbal instructions in this condition.”

      A randomisation error meant that 25 participants received an unbalanced number of 448 trials per condition (i.e., 10 x VAS 40, 14 x VAS 60, 12 x VAS 80).

      We agree that this is indeed unfortunate. However, we would like to point out that all analyses reported in the manuscript have been controlled for the VAS ratings in the conditioning session, i.e., potential effects of the conditioned placebo and nocebo stimuli. Moreover, we have now conducted additional analyses, presented here in our response to the reviewers, to demonstrate that this imbalance did not systematically bias the results. Importantly, the key findings observed during the test phase remain robust despite this issue.

      Specifically, when excluding these 25 participants from the analyses, the reported stronger nocebo compared to placebo effects in the test session on day 1 remain unchanged. Likewise, the comparison of placebo and nocebo effects between days 1 and 8 shows the same pattern when excluding the participants in question. The only exception is the interaction between effect (placebo vs nocebo) x session (day 1 vs day 8), which changed from a borderline significant result (p = .049) to insignificant (p = .24). However, post hoc tests continued to show the same pattern as originally reported: a significant reduction in the nocebo effect from day 1 to day 8 and no significant change in the placebo effect.

      Reviewer #2 (Public review):

      Summary:

      Kunkel et al aim to answer a fundamental question: Do placebo and nocebo effects differ in magnitude or longevity? To address this question, they used a powerful within-participants design, with a very large sample size (n=104), in which they compared placebo and nocebo effects - within the same individuals - across verbal expectations, conditioning, testing phase, and a 1-week follow-up. With elegant analyses, they establish that different mechanisms underlie the learning of placebo vs nocebo effects, with the latter being acquired faster and extinguished slower. This is an important finding for both the basic understanding of learning mechanisms in humans and for potential clinical applications to improve human health.

      Strengths:

      Beyond the above - the paper is well-written and very clear. It lays out nicely the need for the current investigation and what implications it holds. The design is elegant, and the analyses are rich, thoughtful, and interesting. The sample size is large which is highly appreciated, considering the longitudinal, in-lab study design. The question is super important and well-investigated, and the entire manuscript is very thoughtful with analyses closely examining the underlying mechanisms of placebo versus nocebo effects.

      We thank the reviewer for their positive evaluation of our manuscript and for acknowledging the methodological rigor and the significant implications for clinical applications and the broader research field.

      Weaknesses:

      There were two highly addressable weaknesses in my opinion:

      (1) I could not find the preregistration - this is crucial to verify what analyses the authors have committed to prior to writing the manuscript. Please provide a link leading directly to the preregistration - searching for the specified number in the suggested website yielded no results.

      We thank the reviewer for pointing this out. We included a link to the preregistration in the revised manuscript. This study was pre-registered with the German Clinical Trial Register (registration number: DRKS00029228; https://drks.de/search/de/trial/DRKS00029228).

      (2) There is a recurring issue which is easy to address: because the Methods are located after the Results, many of the constructs used, analyses conducted, and even the main placebo and nocebo inductions are unclear, making it hard to appreciate the results in full. I recommend finding a way to detail at the beginning of the results section how placebo and nocebo effects have been induced. While my background means I am familiar with these methods, other readers will lack that knowledge. Even a short paragraph or a figure (like Figure 4) could help clarify the results substantially. For example, a significant portion of the results is devoted to the conditioning part of the experiment, while it is unknown which part was involved (e.g., were temperatures lowered/increased in all trials or only in the beginning).

      We thank the reviewer for their helpful comment and agree that the Results section requires additional information that would typically be provided by the Methods section if it directly followed the Introduction. In response, we have moved the former Figure 4 from the Methods section to the beginning of the Results section as a new Figure 1, to improve clarity. Further, we have revised the Methods section to explicitly state that all trials during the conditioning phase were manipulated in the same way.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the authors):

      (1) Given that the authors are claiming (correctly) that there is only limited work comparing placebo/nocebo effects, there are some papers missing from their citations:

      Nocebo responses are stronger than placebo responses after subliminal pain conditioning - - Jensen, K., Kirsch, I., Odmalm, S., Kaptchuk, T. J. & Ingvar, M. Classical conditioning of analgesic and hyperalgesic pain responses without conscious awareness. Proc. Natl. Acad. Sci. USA 112, 7863-7 (2015)

      We thank the reviewer and have now included this relevant publication into the introduction of the revised manuscript.

      Hird, E.J., Charalambous, C., El-Deredy, W. et al. Boundary effects of expectation in human pain perception. Sci Rep 9, 9443 (2019). https://doi.org/10.1038/s41598-019-45811-x

      We thank the reviewer for suggesting this relevant publication. We have now included it into the discussion of the revised manuscript by adding the following paragraph:

      “Recent work using a predictive coding framework further suggests that nocebo effects may be less susceptible to prediction error than placebo effects (Hird et al., 2019), which could contribute to their greater persistence and strength in our study.”

      (2) The trial-by-trial pain ratings could have been usefully modelled with a computational model, such as a Bayesian model (this is especially pertinent given the reference to Bayesian processing in the discussion). A multilevel model could also be used to increase the power of the analysis. This is a tentative suggestion, as I appreciate it would require a significant investment of time and work - alternatively, the authors could acknowledge it in the Discussion as a useful future avenue for investigation, if this is preferred.

      We thank the reviewer for this thoughtful suggestion. While we agree that computational modelling approaches could provide valuable insights into individual learning, our study was not designed with this in mind and the relatively small number of trials per condition and the absence of trial-by-trial expectancy ratings limit the applicability of such models. We have therefore chosen not to pursue such analysis but highlight it in the discussion as a promising direction for future research.

      “Notably, the most recent experience was the most predictive in all three analyses; for instance, the placebo effect on day 8 was predicted by the placebo effect on day 1, not by the initial conditioning. This finding supports the Bayesian inference framework, where recent experiences are weighted more heavily in the process of model updating because they are more likely to reflect the current state of the environment, providing the most relevant and immediate information needed to guide future actions and predictions24. Interestingly, while a change in pain predicted subsequent nocebo effects, it seemed less influential than for placebo effects. This aligns with findings that longer conditioning enhanced placebo effects, while it did not affect nocebo responses10 and the conclusion that nocebo instruction may be sufficient to trigger nocebo responses. Using Bayesian modeling, future studies could identify individual differences in the development of placebo and nocebo effects by integrating prior experiences and sensory inputs, providing a probabilistic framework for understanding the underlying mechanisms.”

      (3) The paper is missing any justification of sample size, i.e. power analysis - please include this.

      We apologize for the missing information on our a priori power analysis. As there is a lack of prior studies investigating within-subjects comparisons of placebo and nocebo effects that could inform precise effect size estimates for our research question, we based our calculation on the ability detect small effects. Specifically, the study was powered to detect effect sizes in the range of d = 0.2 - 0.25 with α = .05 and power = .9, yielding a required sample size of N = 83-129. We have now added this information to the methods section of the revised manuscript.

      (4) "On day 8, one stimulus per stimulation intensity (i.e., VAS 40, 60 and 80) was applied before the start of the test session to re-familiarise participants with the thermal stimulation."

      What were the instructions about this? Was it before the electrode was applied? This runs the risk of unblinding participants, as they only expect to feel changes in stimulus intensity due to the TENS stimulation.

      We thank the reviewer for pointing out the potential risk of unblinding participants due to the re-familiarization process prior to the second test session. We would like to clarify that we followed specific procedures to prevent participants from associating this process with the experimental manipulation. The re-familiarisation with the thermal stimuli was conducted after the electrode had been applied and re-tested to ensure that both stimulus modalities were re-introduced in a consistent and neutral context. Participants were explicitly informed that both procedures were standard checks prior to the actual test session (“We will check both once again before we begin the actual measurement.”). For the thermal stimuli, we informed participants that they would experience three different intensities to allow the skin to acclimate (e.g., “...we will test the heat stimuli in 3 trials with different temperatures, allowing your skin to acclimate to the stimuli. …”), without implying any connection to the experimental conditions.

      Importantly, this re-familiarization procedure mirrored what participants had already experienced during the initial calibration session on day 1. We therefore assume that participants interpreted as a routine technical step rather than part of the experimental manipulation. We have now clarified this procedure in the methods section of the revised manuscript.

      (5) "For a comparison of pain intensity ratings between time-points, an ANOVA with the within-subject factors Condition (placebo, nocebo, control) and Session (day 1, day 8) was carried out. For the comparison of placebo and nocebo effects between the two test days, an ANOVA with the with-subject factors Effect (placebo effect, nocebo effect) and Session (day 1, day 8) was used."

      It seems that one ANOVA is looking at raw pain scores and one is looking at difference scores, but this is a bit confusing - please rephrase/clarify this, and explain why it is useful to include both.

      We thank the reviewer for highlighting this point. Our primary analyses focus on placebo and nocebo effects, which we define as the difference in pain intensity ratings between the control and the placebo condition (placebo effect) and the nocebo and the control condition (nocebo effect), respectively.

      To examine whether condition effects were present at each time-point, we first conducted two separate repeated measures ANOVAs - one for day 1 and one for day 8 - with the within-subject factor CONDITION (placebo, nocebo, control).

      To compare the magnitude and persistence of placebo and nocebo effects over time, we then calculated the above-mentioned difference scores and submitted these to a second ANOVA with within-subject factors EFFECT (placebo vs. nocebo effect) and SESSION (day 1 vs. day 8). We have now clarified this approach on page 19 of the revised manuscript. To avoid confusion, the Condition x Session ANOVA has been removed from the manuscript.

      (6) Please can the authors provide a figure illustrating trial-by-trial ratings during test trials as well as during conditioning trials?

      In response to the reviewer’s point, we now provide the trial-by-trial ratings of the test phases on days 1 and 8 as an additional figure in the Supplement (Figure S1) and would like to clarify that trial-by-trial pain intensity ratings of the conditioning phase are displayed in Figure 2C of the manuscript,

      (7) "Separate multiple linear regression analyses were performed to examine the influence of expectations (GEEE ratings) and experienced effects (VAS ratings) on subsequent placebo and nocebo effects. For day 1, the placebo effect was entered as the dependent variable and the following variables as potential predictors: (i) expected improvement with placebo before conditioning, (ii) placebo effect during conditioning and (iii) the expected improvement with placebo before the test session at day 1"

      The term "placebo effect during conditioning" is a bit confusing - I believe this is just the effect of varying stimulus intensities - please could the authors be more explicit on the terminology they use to describe this? NB changes in pain rating during the conditioning trials do not count as a placebo/nocebo effect, as most of the change in rating will reflect differences in stimulation intensity.

      We agree with the reviewer that the cited paragraph refers to the actual application of lower or higher pain stimuli during the conditioning session, rather than genuinely induced placebo or nocebo effect. We thank the reviewer for this helpful observation and have revised the terminology, accordingly, now referring to these as “pain relief during conditioning” and “pain worsening during conditioning”.

      (8) Supplementary materials: "The three temperature levels were perceived as significantly different (VAS ratings; placebo condition: M= 32.90, SD= 16.17; nocebo condition: M= 56.62, SD= 17.09; control condition: M= 80.84, SD= 12.18"

      This suggests that the VAS rating for the control condition was higher than for the nocebo condition. Please could the authors clarify/correct this?

      We thank the reviewer for spotting this error. The values for the control and the nocebo condition had accidentally been swapped. This has now been corrected in the manuscript: control condition: M= 56.62, SD= 17.09; nocebo condition: M= 80.84, SD= 12.18.

      (9) "To predict placebo responses a week later (VAScontrol - VASplacebo at day 8), the same independent variables were entered as for day 1 but with the following additional variables (i) the placebo effect at day 1 and (ii) the expected improvement with placebo before the test session at day 8."

      Here it would be much clearer to say 'pain ratings during test trials at day 1".

      We agree with the reviewer and have revised the manuscript as suggested.

      (10) For completeness, please present the pain intensity ratings during conditioning as well as calibration/test trials in the figure.

      Please see our answer to comment (6).

      (11) In Figure 1a, it looks like some participants had rated the control condition as zero by day 8. If so, it's inappropriate to include these participants in the analysis if they are not responding to the stimulus. Were these the participants who were excluded due to pain insensitivity?

      On day 8, the lowest pain intensity ratings observed were VAS 3 in the placebo condition and VAS 2 in the control condition, both from the same participant. All other participants reported minimum values of VAS 11 or higher (all on a scale from 0-100). Thus, no participant provided a pain rating of VAS 0, and all ratings indicated some level of pain perception in response to the stimulus. We did not define an exclusion criterion based on day 8 pain ratings in our preregistration, and we did not observe any technical issues with the stimulation procedure. To avoid post-hoc exclusions and maintain consistency with our preregistered analysis plan, we therefore decided to include all participants in the analysis.

      (12) "Comparison of day 1 and day 8. A direct comparison of placebo and nocebo effects on day 1 and day 8 pain intensity ratings showed a main effect of Effect with a stronger nocebo effect (F(1,97)= 53.93, 131 p< .001, η2= .36) but no main effect of Day (F(1,97)= 2.94, p= .089, η2 = .029). The significant Effect x Session interaction indicated that the placebo effect and the nocebo effect developed differently over time (F(1,97)= 3.98, p= .049, η2 = .039)"

      This is confusing as it talks about a main effect of "day" and then interaction with "session" - are they two different models? The authors need to clarify.

      We thank the reviewer for pointing this out. In our analysis, “Session” is the correct term for the experimental factor, which has two factor levels, “day 1” and “day 8”. This has now been corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the authors):

      (1) More information on how "size of the effect" in Figures 1b and 2b was calculated is needed; this can be in the legend. If these are differences between control and each condition, then they were reversed for one condition (nocebo?), which is ok - but this should be clearly explained.

      We agree with the reviewer and have now revised the figure legends to improve clarity. The legends now read:

      1b: “Figure 1. Pain intensity ratings and placebo and nocebo effects during calibration and test sessions. (A) Mean pain intensity ratings in the placebo, nocebo and control condition during calibration, and during the test sessions at day 1 and day 8. (B) Placebo effect (control condition - placebo condition, i.e., positive value of difference) and nocebo effect (nocebo condition - control condition, i.e., positive value of difference) on day 1 and day 8. Error bars indicate the standard error of the mean, circles indicate mean ratings of individual participants. *: p < .001, : p < .01, n.s.: non-significant.”

      2b: “Figure 2. Mean and trial-by-trial pain intensity ratings, placebo and nocebo effects during conditioning. (A) Mean pain intensity ratings of the placebo, nocebo and control condition during conditioning. (B) Placebo effect (control condition - placebo condition, i.e., positive value of difference) and nocebo effect (nocebo condition - control condition, i.e., positive value of difference) during conditioning. (C) Trial-by-trial pain intensity ratings (with confidence intervals) during conditioning. Error bars indicate the standard error of the mean, circles indicate mean ratings of individual participants. ***: p < .001.”

      (2) In the methods, I was missing a clear understanding of how many trials there were in the conditioning phase, and then how many in the other testing phases. Also, how long did the experiment last in total?

      We apologize that the exact number of trials in the testing phases was not clear in the original manuscript. We now indicate on page 18 of the revised manuscript that we used 10 trials per condition in the test sessions. We have also added information on the duration of each test day (i.e., three hours on day 1 and one hour on day 8) on page 15.

      (3) In expectancy ratings, line 186 - are improvement and worsening expectations different from expected pain relief? It is implied that these are two different constructs - it would be helpful to clarify that.

      We agree that this is indeed confusing and would like to clarify that both refer to the same construct. We used the Generic rating scale for previous treatment experiences, treatment expectations, and treatment effects (GEEE questionnaire, Rief et al. 2021) that discriminates between expected symptom improvement, expected symptom worsening, and expected side effects due to a treatment. We now use the terms “expected pain relief” and “expected pain worsening” throughout the whole manuscript.

      (4) In the last section of the Results, somatosensory amplification comes out of nowhere - and could be better introduced (see point 2 above).

      We agree with the reviewer that introducing the concept of somatosensory amplification and its potential link to placebo/nocebo effects only in the Methods is unhelpful, given that this section appears at the end of the manuscript. We therefore now introduce the relevant publication (Doering et al., 2015) before reporting our findings on this concept.

      (5) In line 169, if the authors want to specify what portion of the variance was explained by expectancy, they could conduct a hierarchical regression, where they first look at R2 without the expectancy entered, and only then enter it to obtain the R2 change.

      We fully agree that hierarchical regression can be a useful approach for isolating the contribution of variables. However, in our case, expectancy was assessed at different time points (e.g., before conditioning and before the test session on day 1), and there was no principled rationale for determining the order in which these different expectancy-related variables should be entered into a hierarchical model.

      That said, in response to the reviewer’s suggestion, we have now conducted hierarchical regression analyses in which all expectancy-related variables were entered together as a single block (see below). These analyses largely confirmed the findings reported so far and are provided here in the response to the reviewers below. Given the exploratory nature of this grouping and the lack of an a priori hierarchy, we feel that the standard multiple regression models remain the most appropriate for addressing our research question because it allows us to evaluate the total contribution of expectancy-related predictors while also examining the individual contribution of each variable within the block. We would therefore prefer to retain these as the primary analyses in the manuscript.

      Results of the hierarchical regression analyses:

      Day 1 - Placebo response: In step 1, we entered the difference in pain intensity ratings between the control and the placebo condition during conditioning as a predictor. In step 2, we added the two variables reflecting expectations (i.e., expected improvement with placebo (i) before conditioning and (ii) before the test session on day 1). This allowed us to assess whether expectation-related variables explained additional variance beyond the effect of conditioning.

      The overall regression model at step 1 was significant, F(1, 102) = 13.42, p < .001, explaining 11.6% of the variance in the dependent variable (R<sup>2</sup> = .116). Adding the expectancy-related predictors in step 2 did not lead to a significant increase in explained variance, ΔR<sup>2</sup> = .007, F(2, 100) = 0.384, p = .682. Thus, the conditioning response significantly predicted placebo-related pain reduction on day 1, but additional information on expectations did not account for further variance.

      Day 1 - Nocebo response: The equivalent analysis was run for the nocebo response on day 1. In step 1, the pain intensity difference between the nocebo and the control condition was entered as a predictor before adding the two expectancy ratings (i.e., expected worsening with nocebo (i) before conditioning and (ii) before the test session on day 1).

      In step 1, the regression model was not statistically significant, F(1, 102) = 2.63, p = .108, and explained only 2.5% of the variance in nocebo response (R<sup>2</sup> = .025). Adding the expectation-related predictors in Step 2 slightly increased the explained variance by ΔR<sup>2</sup> = .027, but this change was also non-significant, F(2, 100) = 1.41, p = .250. The overall variance explained by the full model remained low (R<sup>2</sup> = .052). These results suggest that neither conditioning nor expectation-related variables reliably predicted nocebo-related pain increases on day 1.

      Day 8 - Placebo response: For the prediction of the placebo effect on day 8, the following variables reflecting perceived effects were entered as predictors in step 1: the difference in pain intensity ratings between the control and the placebo condition (i) during conditioning and (ii) on day 1. In step 2, the variables reflecting expectations were added: the expected improvement with placebo (i) before conditioning, (ii) before the test session on day 1 and (iii) before the test session on day 8.

      In step 1, the model was statistically significant, F(3, 95) = 14.86, p < .001, explaining 23.8% of the variance in the placebo response (R<sup>2</sup> = .238, Adjusted R<sup>2</sup> = .222). In step 2, the addition of the expectation-related predictors resulted in a non-significant improvement in model fit, ΔR<sup>2</sup> = .051, F(3, 92) = 2.21, p = .092. The overall variance explained by the full model increased modestly to 29.0%.

      Day 8 - Nocebo response: For the equivalent analyses of nocebo responses on day 8, the following variables were included in step 1: the difference in pain intensity ratings between the nocebo and the control condition (i) during conditioning and (ii) on day 1. In step 2, we entered the variables reflecting nocebo expectations including expected worsening with nocebo (i) before conditioning, (ii) before the test session on day 1 and (iii) before the test session on day 8. In step 1, the model significantly predicted the day 8 nocebo response, F(3, 95) = 6.04, p = .003, accounting for 11.3% of the variance (R<sup>2</sup> = .113, Adjusted R<sup>2</sup> = .094). However, the addition of expectation-related predictors in Step 2 resulted in only a negligible and non-significant improvement, ΔR<sup>2</sup> = .006, F(3, 92) = 0.215, p = .886. The full model explained just 11.9% of the variance (R<sup>2</sup> = .119).

      Typos:

      (6) Abstract - 104 heathy xxx (word missing).

      (7) Line 61 - reduce or decrease - I think you meant increase.

      Thank you, we have now corrected both sentences.

      References

      Colloca L, Petrovic P, Wager TD, Ingvar M, Benedetti F. How the number of learning trials affects placebo and nocebo responses. Pain. 2010

      Doering BK, Nestoriuc Y, Barsky AJ, Glaesmer H, Brähler E, Rief W. Is somatosensory amplification a risk factor for an increased report of side effects? Reference data from the German general population. J Psychosom Res. 2015

    1. eLife Assessment

      This work describes a highly complex automated algorithm for analyzing vascular imaging data from two-photon microscopy. This tool has the potential to be extremely valuable to the field and to fill gaps in knowledge of hemodynamic activity across a regional network. The solid biological application provides a demonstration of their pipeline's capabilities and suggests intriguing hypotheses around prolonged vascular tone changes, but will need to be followed up by further experiments to be conclusively demonstrated.

    2. Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors describe a new pipeline to measure changes in vasculature diameter upon opt-genetic stimulation of neurons.

      The work is interesting and the topic is quite relevant to better understand the hemodynamic response on the graph/network level.

      Strengths:

      The manuscript provides a pipeline that allows for the detection of changes in the vessel diameter as well as simultaneously allowing for the location of the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph-level mechanisms of regulating activity-dependent blood flow.

      The interesting findings include that vessel radius changes depend on depth from the cortical surface and that dilations on average happen closer to the activated neurons.

    3. Reviewer #2 (Public Review):

      Summary:

      The authors develop a highly detailed pipeline to analyze hemodynamic signals from in vivo two-photon fluorescence microscopy. This includes motion correction, segmentation of the vascular network, diameter measurements across time, mapping neuronal position relative to the vascular network, and analyzing vascular network properties (interactions between different vascular segments). For the segmentation, the authors use a Convolution Neural Network to identify vessel (or neural) and background pixels and train it using ground truth images based on semi-automated mapping followed by human correction/annotation. Considerable processing was done on the segmented images to improve accuracy, extract vessel center lines, and compute frame-by-frame diameters. The model was tested with artificial diameter increases and Gaussian noise and proved robust to these manipulations.

      Network-level properties include Assortativity - a measure of how similar a vessel's response is to nearby vessels - and Efficiency - the ease of flow through the network (essentially, the combined resistance of a path based on diameter and vessel length between two points).

      Strengths:

      This is a very powerful tool for cerebral vascular biologists as many of these tasks are labor intensive, prone to subjectivity, and often not performed due to the complexity of collecting and managing volumes of vascular signals. Modelling is not my specialty so I cannot speak too specifically, but the model appears to be well-designed and robust to perturbations. It has many clever features for processing the data.

      The authors rightly point out that there is a real lack in the field of knowledge of vascular network activity at single-vessel resolution. Network anatomy has been studied, but hemodynamics are typically studied either with coarse resolution or in only one or a few vessels at a time. This pipeline has the potential to change that.

      [Editors' note: this work has been through three rounds of revisions, and most recently the authors have added caveats to the discussion. This version of the paper has been assessed by the editors and the weaknesses identified previously remain with earlier versions of the work.]

    4. Author response:

      The following is the authors’ response to the previous reviews

      Reviewer #1 (Public review):

      Summary:

      In the manuscript the authors describe a new pipeline to measure changes in vasculature diameter upon optogenetic stimulation of neurons. The work is useful to better understand the hemodynamic response on a network /graph level.

      Strengths:

      The manuscript provides a pipeline that allows to detect changes in the vessel diameter as well as simultaneously allows to locate the neurons driven by stimulation.

      The resulting data could provide interesting insights into the graph level mechanisms of regulating activity dependent blood flow.

      Weaknesses:

      (1) The manuscript contains (new) wrong statements and (still) wrong mathematical formulas.

      The symbols in these formulas have been updated to disambiguate them, and the accompanying statements have been adjusted for clarity.

      (2) The manuscript does not compare results to existing pipelines for vasculature segmentation (opensource or commercial). Comparing performance of the pipeline to a random forest classifier (illastik) on images that are not preprocessed (i.e. corrected for background etc.) seems not a particularly useful comparison.

      We’ve now included comparisons to Imaris (a commercial) for segmentation and VesselVio (open-source) for graph extraction software.

      For the ilastik comparison, the images were preprocessed prior to ilastik segmentation, specifically by doing intensity normalization.

      Example segmentations utilizing Imaris have now been included. Imaris leaves gaps and discontinuities in the segmentation masks, as shown in Supplementary Figure 10. The Imaris segmentation masks also tend to be more circular in cross-section despite irregularities on the surface of the vessels observable in the raw data and identified in manual segmentation. This approach also requires days to months to generate per image stack.

      A comparison to VesselVio has now also been generated, and results are visualized in Supplementary Figure 11. VesselVio generates individual graphs for each time point, resulting in potential discrepancies in the structure of the graphs from different time points. Furthermore, Vesselvio uses distance transformation to estimate the vascular radius, which renders the vessel radius estimates highly susceptible to variation in the user selected methodology used to obtain segmentation results; while our approach uses intensity gradient-based boundary detection from centerlines in the image instead mitigating this bias. We have added the following paragraph to the Discussion section on the comparisons with the two methods:

      “Comparison with commercial and open-source vascular analysis pipelines

      To compare our results with those achievable on these data with other pipelines for segmentation and graph network extraction, we compared segmentation results qualitatively with Imaris version 9.2.1 (Bitplane) and vascular graph extraction with VesselVio [1]. For the Imaris comparison, three small volumes were annotated by hand to label vessels. Example slices of the segmentation results are shown in Supplementary Figure 10. Imaris tended to either over- or under-segment vessels, disregard fine details of the vascular boundaries, and produce jagged edges in the vascular segmentation masks. In addition to these issues with segmentation mask quality, manual segmentation of a single volume took days for a rater to annotate. To compare to VesselVio, binary segmentation masks (one before and one after photostimulation) generated with our deep learning models were loaded into VesselVio for graph extraction, as VesselVio does not have its own method for generating segmentation masks. This also facilitates a direct comparison of the benefits of our graph extraction pipeline to VesselVio. Visualizations of the two graphs are shown in Supplementary Figure 11. Vesselvio produced many hairs at both time points, and the total number of segments varied considerably between the two sequential stacks: while the baseline scan resulted in 546 vessel segments, the second scan had 642 vessel segments. These discrepancies are difficult to resolve in post-processing and preclude a direct comparison of individual vessel segments across time. As the segmentation masks we used in graph extraction derive from the union of multiple time points, we could better trace the vasculature and identify more connections in our extracted graph. Furthermore, VesselVio relies on the distance transform of the user supplied segmentation mask to estimate vascular radii; consequently, these estimates are highly susceptible to variations in the input segmentation masks.We repeatedly saw slight variations between boundary placements of all of the models we utilized (ilastik, UNet, and UNETR) and those produced by raters. Our pipeline mitigates this segmentation method bias by using intensity gradient-based boundary detection from centerlines in the image (as opposed to using the distance transform of the segmentation mask, as in VesselVio).”

      (3) The manuscript does not clearly visualize performance of the segmentation pipeline (e.g. via 2d sections, highlighting also errors etc.). Thus, it is unclear how good the pipeline is, under what conditions it fails or what kind of errors to expect.

      On reviewer’s comment, 2D slices have been added in the Supplementary Figure 4.

      (4) The pipeline is not fully open-source due to use of matlab. Also, the pipeline code was not made available during review contrary to the authors claims (the provided link did not lead to a repository). Thus, the utility of the pipeline was difficult to judge.

      All code has been uploaded to Github and is available at the following location: https://github.com/AICONSlab/novas3d

      The Matlab code for skeletonization is better at preserving centerline integrity during the pruning of hairs from centerlines than the currently available open-source methods.

      - Generalizability: The authors addressed the point of generalizability by applying the pipeline to other data sets. This demonstrates that their pipeline can be applied to other data sets and makes it more useful.  However, from the visualizations it's unclear to see the performance of the pipeline, where the pipelines fails etc. The 3d visualizations are not particularly helpful in this respect . In addition, the dice measure seems quite low, indicating roughly 20-40% of voxels do not overlap between inferred and ground truth. I did not notice this high discrepancy earlier. A thorough discussion of the errors appearing in the segmentation pipeline would be necessary in my view to better assess the quality of the pipeline.

      2D slices from the additional datasets have been added in the Supplementary Figure 13 to aid in visualizing the models’ ability to generalize to other datasets.

      The dice range we report on (0.7-0.8) is good when compared to those (0.56-86) of 3D segmentations of large datasets in microscopy [2], [3], [4], [5], [6]. Furthermore, we had two additional raters segment three images from the original training set. We found that the raters had a mean inter class correlation  of 0.73 [7]. Our model outperformed this Dice score on unseen data: Dice scores from our generalizability tests on C57 mice and Fischer rats on par or higher than this baseline.

      Reviewer #2 (Public review):

      The authors have addressed most of my concerns sufficiently. There are still a few serious concerns I have. Primarily, the temporal resolution of the technique still makes me dubious about nearly all of the biological results. It is good that the authors have added some vessel diameter time courses generated by their model. But I still maintain that data sampling every 42 seconds - or even 21 seconds - is problematic. First, the evidence for long vascular responses is lacking. The authors cite several papers:

      Alarcon-Martinez et al. 2020 show and explicitly state that their responses (stimulus-evoked) returned to baseline within 30 seconds. The responses to ischemia are long lasting but this is irrelevant to the current study using activated local neurons to drive vessel signals.

      Mester et al. 2019 show responses that all seem to return to baseline by around 50 seconds post-stimulus.

      In Mester et al. 2019, diffuse stimulations with blue light showed a return to baseline around 50 seconds post-stimulus (cf. Figure 1E,2C,2D). However, focal stimulations where the stimulation light is raster scanned over a small region focused in the field of view show longer-lasting responses (cf. Figure 4) that have not returned to baseline by 70 seconds post-stimulus [8]. Alarcon-Martinez et al. do report that their responses return baseline within 30 seconds; however, their physiological stimulation may lead to different neuronal and vessel response kinetics than those elicited by the optogenetic stimulations as in current work.

      O'Herron et al. 2022 and Hartmann et al. 2021 use opsins expressed in vessel walls (not neurons as in the current study) and directly constrict vessels with light. So this is unrelated to neuronal activity-induced vascular signals in the current study.

      We agree that optogenetic activation of vessel-associated cells is distinct from optogenetic activation of neurons, but we do expect the effects of such perturbations on the vasculature to have some commonalities.

      There are other papers including Vazquez et al 2014 (PMID: 23761666) and Uhlirova et al 2016 (PMID: 27244241) and many others showing optogenetically-evoked neural activity drives vascular responses that return back to baseline within 30 seconds. The stimulation time and the cell types labeled may be different across these studies which can make a difference. But vascular responses lasting 300 seconds or more after a stimulus of a few seconds are just not common in the literature and so are very suspect - likely at least in part due to the limitations of the algorithm.

      The photostimulation in Vazquez et al. 2014 used diffuse photostimulation with a fiberoptic probe similar to Mester et al. 2019 as opposed to raster scanning focal stimulation we used in this study and in the study by Mester et al. 2019  where we observed the focal photostimulation to elicited longer than a minute vascular responses. Uhlirova et al. 2016 used photostimulation powers between 0.7 and 2.8 mW, likely lower than our 4.3 mW/mm<sup>2</sup> photostimulation. Further, even with focal photostimulation, we do see light intensity dependence of the duration of the vascular responses. Indeed, in Supplementary Figure 2, 1.1 mW/mm<sup>2</sup> photostimulation leads to briefer dilations/constrictions than does 4.3 mW/mm<sup>2</sup>; the 1.1 mW/mm<sup>2</sup> responses are in line, duration wise, with those in Uhlirova et al. 2016.

      Critically, as per Supplementary Figure 2, the analysis of the experimental recordings acquired at 3-second temporal resolution did likewise show responses in many vessels lasting for tens of seconds and even hundreds of seconds in some vessels.

      Another major issue is that the time courses provided show that the same vessel constricts at certain points and dilates later. So where in the time course the data is sampled will have a major effect on the direction and amplitude of the vascular response. In fact, I could not find how the "response" window is calculated. Is it from the first volume collected after the stimulation - or an average of some number of volumes? But clearly down-sampling the provided data to 42 or even 21 second sampling will lead to problems. If the major benefit to the field is the full volume over large regions that the model can capture and describe, there needs to be a better way to capture the vessel diameter in a meaningful way.

      In the main experiment (i.e. excluding the additional experiments presented in the Supplementary Figure 2 that were collected over a limited FOV at 3s per stack), we have collected one stack every 42 seconds. The first slice of the volume starts following the photostimulation, and the last slice finishes at 42 seconds. Each slice takes ~0.44 seconds to acquire. The data analysis pipeline (as demonstrated by the Supplementary Figure 2) is not in any way limited to data acquired at this temporal resolution and - provided reasonable signal-to-noise ratio (cf. Figure 5) - is applicable, as is, to data acquired at much higher sampling rates.

      It still seems possible that if responses are bi-phasic, then depth dependencies of constrictors vs dilators may just be due to where in the response the data are being captured - maybe the constriction phase is captured in deeper planes of the volume and the dilation phase more superficially. This may also explain why nearly a third of vessels are not consistent across trials - if the direction the volume was acquired is different across trials, different phases of the response might be captured.

      Alternatively, like neuronal responses to physiological stimuli, the vascular responses elicited by increases in neuronal activity may themselves be variable in both space and time.

      I still have concerns about other aspects of the responses but these are less strong. Particularly, these bi-phasic responses are not something typically seen and I still maintain that constrictions are not common. The authors are right that some papers do show constriction. Leaving out the direct optogenetic constriction of vessels (O'Herron 2022 & Hartmann 2021), the Alarcon-Martinez et al. 2020 paper and others such as Gonzales et al 2020 (PMID: 33051294) show different capillary branches dilating and constricting. However, these are typically found either with spontaneous fluctuations or due to highly localized application of vasoactive compounds. I am not familiar with data showing activation of a large region of tissue - as in the current study - coupled with vessel constrictions in the same region. But as the authors point out, typically only a few vessels at a time are monitored so it is possible - even if this reviewer thinks it unlikely - that this effect is real and just hasn't been seen.

      Uhlirova et al. 2016 (PMID: 27244241) observed biphasic responses in the same vessel with optogenetic stimulation in anesthetized and unanesthetized animals (cf Fig 1b and Fig 2, and section “OG stimulation of INs reproduces the biphasic arteriolar response”). Devor et al. (2007) and Lindvere et al. (2013) also reported on constrictions and dilations being elicited by sensory stimuli.

      I also have concerns about the spatial resolution of the data. It looks like the data in Figure 7 and Supplementary Figure 7 have a resolution of about 1 micron/pixel. It isn't stated so I may be wrong. But detecting changes of less than 1 micron, especially given the noise of an in vivo prep (brain movement and so on), might just be noise in the model. This could also explain constrictions as just spurious outputs in the model's diameter estimation. The high variability in adjacent vessel segments seen in Figure 6C could also be explained the same way, since these also seem biologically and even physically unlikely.

      Thank you for your comment. To address this important issue, we performed an additional validation experiment where we placed a special order of fluorescent beads with a known diameter of 7.32 ± 0.27um, imaged them following our imaging protocol, and subsequently used our pipeline to estimate their diameter. Our analysis converged on the manufacturer-specified diameters, estimating the diameter to be 7.34 ± 0.32. The manuscript has been updated to detail this experiment, as below:

      Methods section insert

      “Second, our boundary detection algorithm was used to estimate the diameters of fluorescent beads of a known radius imaged under similar acquisition parameters. Polystyrene microspheres labelled with Flash Red (Bangs Laboratories, inc, CAT# FSFR007) with a nominal diameter of 7.32um and a specified range of 7.32 ± 0.27um as determined by the manufacturer using a Coulter counter were imaged on the same multiphoton fluorescence microscope set-up used in the experiment (identical light path, resonant scanner, objective, detector, excitation wavelength and nominal lateral and axial resolutions, with 5x averaging). The images of the beads had a higher SNR than our images of the vasculature, so Gaussian noise was added to the images to degrade the SNR to the same level of that of the blood vessels. The images of the beads were segmented with a threshold, centroids calculated for individual spheres, and planes with a random normal vector extracted from each bead and used to estimate the diameter of the beads. The same smoothing and PSF deconvolution steps were applied in this task. We then reported the mean and standard deviation of the distribution of the diameter estimates. A variety of planes were used to estimate the diameters.”

      Results Section Insert

      “Our boundary detection algorithm successfully estimated the radius of precisely specified fluorescent beads. The bead images had a signal-to-noise ratio of 6.79 ± 0.16 (about 35% higher than our in vivo images): to match their SNR to that of in vivo vessel data, following deconvolution, we added Gaussian noise with a standard deviation of 85 SU to the images, bringing the SNR down to 5.05 ± 0.15. The data processing pipeline was kept unaltered except for the bead segmentation, performed via image thresholding instead of our deep learning model (trained on vessel data). The bead boundary was computed following the same algorithm used on vessel data: i.e., by the average of the minimum intensity gradients computed along 36 radial spokes emanating from the centreline vertex in the orthogonal plane. To demonstrate an averaging-induced decrease in the uncertainty of the bead radius estimates on a scale that is finer than the nominal resolution of the imaging configuration, we tested four averaging levels in 289 beads. Three of these averaging levels were lower than that used on the vessels, and one matched that used on the vessels (36 spokes per orthogonal plane and a minimum of 10 orthogonal planes per vessel). As the amount of averaging increased, the uncertainty on the diameter of the beads decreased, and our estimate of the bead's diameter converged upon the manufacturer's Coulter counter-based specifications (7.32 ± 0.27um), as tabulated below in Table 1.”

      Bibliography

      (1) J. R. Bumgarner and R. J. Nelson, “Open-source analysis and visualization of segmented vasculature datasets with VesselVio,” Cell Rep. Methods, vol. 2, no. 4, Apr. 2022, doi: 10.1016/j.crmeth.2022.100189.

      (2) G. Tetteh et al., “DeepVesselNet: Vessel Segmentation, Centerline Prediction, and Bifurcation Detection in 3-D Angiographic Volumes,” Front. Neurosci., vol. 14, Dec. 2020, doi: 10.3389/fnins.2020.592352.

      (3) N. Holroyd, Z. Li, C. Walsh, E. Brown, R. Shipley, and S. Walker-Samuel, “tUbe net: a generalisable deep learning tool for 3D vessel segmentation,” Jul. 24, 2023, bioRxiv. doi: 10.1101/2023.07.24.550334.

      (4) W. Tahir et al., “Anatomical Modeling of Brain Vasculature in Two-Photon Microscopy by Generalizable Deep Learning,” BME Front., vol. 2020, p. 8620932, Dec. 2020, doi: 10.34133/2020/8620932.

      (5) R. Damseh, P. Delafontaine-Martel, P. Pouliot, F. Cheriet, and F. Lesage, “Laplacian Flow Dynamics on Geometric Graphs for Anatomical Modeling of Cerebrovascular Networks,” ArXiv191210003 Cs Eess Q-Bio, Dec. 2019, Accessed: Dec. 09, 2020. (Online). Available: http://arxiv.org/abs/1912.10003

      (6) T. Jerman, F. Pernuš, B. Likar, and Ž. Špiclin, “Enhancement of Vascular Structures in 3D and 2D Angiographic Images,” IEEE Trans. Med. Imaging, vol. 35, no. 9, pp. 2107–2118, Sep. 2016, doi: 10.1109/TMI.2016.2550102.

      (7) T. B. Smith and N. Smith, “Agreement and reliability statistics for shapes,” PLOS ONE, vol. 13, no. 8, p. e0202087, Aug. 2018, doi: 10.1371/journal.pone.0202087.

      (8) J. R. Mester et al., “In vivo neurovascular response to focused photoactivation of Channelrhodopsin-2,” NeuroImage, vol. 192, pp. 135–144, May 2019, doi: 10.1016/j.neuroimage.2019.01.036.

    1. eLife Assessment

      This paper provides important insight into how early life experience shapes adult behavior in fruit bats. The authors raised juvenile bats either in an impoverished or enriched environment and studied their foraging behaviors. The evidence is convincing that bats raised in enriched environments are more active, bold, and exploratory. The work will be of interest to ethologists and developmental psychologists.

    2. Reviewer #1 (Public review):

      Summary:

      The authors show that early life experience of juvenile bats shape their outdoor foraging behaviors. They achieve this by raising juvenile bats either in an impoverished or enriched environment. They subsequently test the behavior of bats indoors and outdoors. The authors show that behavioral measures outdoors were more reliable in delineating the effect of early life experiences as the bats raised in enriched environments were more bold, active and exhibit higher exploratory tendencies.

      Strengths:

      The major strength of the study is providing a quantitative study of animal "personality" and how it is likely shaped by innate and environmental conditions. The other major strength is the ability to do reliable long term recording of bats in the outdoors giving researchers the opportunity to study bats in their natural habitat. To this point, the study also shows that the behavioral variables measured indoors do not correlate to that measured outdoor, thus providing a key insight into the importance of test animal behaviors in their natural habitat.

      Weaknesses were in the first round of review:

      It is not clear from the analysis presented in the paper how persistent those environmentally induced changes, do they remain with the bats till end of their lives.

      Comments on revisions:

      The authors have addressed those weaknesses and the paper is much stronger.

    1. eLife Assessment

      This revised manuscript presents an important characterization of mouse auditory cortex receptive field organization, utilizing two-photon imaging of specific subpopulations. They demonstrate a degradation of tonotopic organization from the input to the output neurons. The strength of the evidence is convincing.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, Gu et al., employed novel viral strategies, combined with in vivo two-photon imaging, to map the tone response properties of two groups of cortical neurons in A1 - The thalamocortical recipient (TR neurons) and the corticothalamic (CT neurons). They observed a clear tonotopic gradient among TR neurons but not in CT neurons. Moreover, CT neurons exhibited high heterogeneity of their frequency tuning and broader bandwidth, suggesting increased synaptic integration in these neurons. By parsing out different projecting-specific neurons within A1, this study provides insight into how neurons with different connectivity can exhibit different frequency response-related topographic organization.

      Strengths:

      This study reveals the importance of studying neurons with projection specificity rather than layer specificity since neurons within the same layer have very diverse molecular, morphological, physiological, and connectional features. By utilizing a newly developed rabies virus CSN-N2c GCaMP-expressing vector, the authors can label and image specifically the neurons (CT neurons) in A1 that project to the MGB. To compare, they used an anterograde trans-synaptic tracing strategy to label and image neurons in A1 that receive input from MGB (TR neurons).

      Weaknesses:

      - Perhaps as cited in the introduction, it is well known that tonotopic gradient is well preserved across all layers within A1, but I feel if the authors want to highlight the specificity of their virus tracing strategy and the populations that they imaged in L2/3 (TR neurons) and L6 (CT neurons), they should perform control groups where they image general excitatory neurons in the two depths and compare to TR and CT neurons, respectively. This will show that it's not their imaging/analysis or behavioral paradigms that are different from other labs.  

      - Fig 1D and G, the y-axis is Distance from pia (%). I'm not exactly sure what this means. How does % translate to real cortical thickness? 

      - For Fig. 2G and H, is each circle a neuron or an animal? Why are they staggered on top of each other on the x-axis? If x-axis is thedistance from caudal to rostral, each neuron should have a different distance? Also,it seems like it's because Fig. 2H has more circles, that's why it has morevariation thus not significant (for example, at 600 or 900um, 2G seems to haveless circles than 2H).  

      - Similar in Fig 2J and L, why are the circles staggered onthe y-axis now? And is each circle now a neuron or a trial? It seems they havemuch more circles than Fig 2G and 2H. Also I don't think doing a correlation isthe proper stats for this type of plot (this point applies to Fig. 3H and 3J)

      - What does inter-quartile range of BF (IQRBF, in octaves) imply? What's the interpretation of this analysis? I am confused why TR neurons showhigh IQR in HF areas compared to LF areas mean homogeneity among TR neurons (line 213 - 216). On the same note, how is this different from the BF variability?  Isn't higher IQR = tohigher variability?

      - Fig. 4A-B, there's no clear critieria on how the authors categorize V, I, and O Shape. The descriptions in the Methods (line 721 - 725) are also very vague.  

      Comments on revisions:

      The authors have addressed all my questions in the previous round.

    3. Reviewer #2 (Public review):

      Summary:

      Gu and Liang et. al investigated how auditory information is mapped and transformed as it enters and exits a auditory cortex. They use anterograde transsynaptic tracers to label and perform calcium imaging of thalamorecipient neurons in A1 and retrograde tracers to label and perform calcium imaging of corticothalamic output neurons. They demonstrate a degradation of tonotopic organization from the input to output neurons.

      Strengths:

      The experiments appear well executed, well described, and analyzed.

      Weaknesses:

      (1) Given that the CT and TR neurons were imaged at different depths, the question as to whether not these differences could otherwise be explained by layer-specific differences is still not 100% resolved. Control measurements would be needed either by recording 1) CT neurons upper layers 2) TR in deeper layers 3) non-CT in deeper layers and/or 4) non-TR in upper layers.

      (2) What percent of the neurons at the depths being are CT neurons? Similar questions for TR neurons?

      (3) V-shaped, I-shaped, or O-shaped is not an intuitively understood nomenclature, consider changing. Further, the x/y axis for Figure 4a is not labeled, so it's not clear what the heat maps are supposed to represent.

      (4). Many references about projection neurons and cortical circuits are based on studies from visual or somatosensory cortex. Auditory cortex organization is not necessarily the same as other sensory areas. Auditory cortex references should be used specifically, and not sources reporting on S1, V1.

      Comments on revisions:

      The authors have fully addressed my concerns.

    4. Reviewer #3 (Public review):

      Summary:

      The authors performed wide-field and 2-photon imaging in vivo in awake head-fixed mice, to compare receptive fields and tonotopic organization in thalamocortical recipient (TR) neurons vs corticothalamic (CT) neurons of mouse auditory cortex. TR neurons were found in all cortical layers while CT neurons were restricted to layer 6. The TR neurons at nominal depths of 200-400 microns have a remarkable degree of tonotopy (as good if not better than tonotopic maps reported by multiunit recordings). In contrast, CT neurons were very heterogenous in terms of their best frequency (BF), even when focusing on the low vs high frequency regions of primary auditory cortex. CT neurons also had wider tuning.

      Strengths:

      This is a thorough examination using modern methods, helping to resolve a question in the field with projection-specific mapping.

      Weaknesses:

      There are some limitations due to the methods, and it's unclear what the importance of these responses are outside of behavioral context or measured at single timepoints given the plasticity, context-dependence, and receptive field 'drift' that can occur in cortex.

      (1) Probably the biggest conceptual difficulty I have with the paper is comparing these results to past studies mapping auditory cortex topography, mainly due to differences in methods. Conventionally, tonotopic organization is observed for characteristic frequency maps (not best frequency maps), as tuning precision degrades and best frequency can shift as sound intensity increases. The authors used six attenuation levels (30-80 dB SPL) and report that the background noise of the 2-photon scope is <30 dB SPL, which seems very quiet. The authors should at least describe the sound-proofing they used to get the noise level that low, and some sense of noise across the 2-40 kHz frequency range would be nice as a supplementary figure. It also remains unclear just what the 2-photon dF/F response represents in terms of spikes. Classic mapping using single-unit or multi-unit electrodes might be sensitive to single spikes (as might be emitted at characteristic frequency), but this might not be as obvious for Ca2+ imaging. This isn't a concern for the internal comparison here between TR and CT cells as conditions are similar, but is a concern for relating the tonotopy or lack thereof reported here to other studies.

      (2) It seems a bit peculiar that while 2721 CT neurons (N=10 mice) were imaged, less than half as many TR cells were imaged (n=1041 cells from N=5 mice). I would have expected there to be many more TR neurons even mouse for mouse (normalizing by number of neurons per mouse), but perhaps the authors were just interested in a comparison data set and not being as thorough or complete with the TR imaging?

      (3) The authors definitions of neuronal response type in the methods needs more quantitative detail. The authors state: ""Irregular" neurons exhibited spontaneous activity with highly variable responses to sound stimulation. "Tuned" neurons were responsive neurons that demonstrated significant selectivity for certain stimuli. "Silent" neurons were defined as those that remained completely inactive during our recording period (> 30 min). For tuned neurons, the best frequency (BF) was defined as the sound frequency associated with the highest response averaged across all sound levels." The authors need to define what their thresholds are for 'highly variable', 'significant', and 'completely inactive'. Is best frequency the most significant response, the global max (even if another stimulus evokes a very close amplitude response), etc.

      Comments on revisions:

      I think the authors misunderstood my point about sound level and characteristic frequency vs best frequency tonotopic maps. Yes, many studies of cortical responses present stimuli at higher intensities than the characteristic frequencies, but as tuning curves widen with sound level, the macroscopic tonotopic organization of primary auditory cortex breaks down at higher intensities. This is why most of the classic studies of tonotopy e.g., from the Merzenich lab) generated maps of characteristic frequency. As I mentioned before, this isn't so much of an issue for the authors' comparisons of TR vs CT organization in their own study, but in general, this makes it difficult to compare aspects of spatially-organized tonotopy from imaging studies with the older electrophysiological 'truer' tonotopic maps. That said, this means that CT cells also might be tonotopically organized if the authors had been able to look at lower intensity tuning properties.

    1. eLife Assessment

      This study presents a valuable assessment of and solid evidence for increased similarity in visual appearance combined with increased chemical differences between two butterfly species in sympatry compared with differences between three populations of one of the two species in allopatry. The similarity in visual appearance hints to an evolutionary response to shared predators (but alternative explanations are possible). Thus, the difference in chemical signaling likely helps to avoid between-species mating in sympatry.

    2. Joint Public Review:

      Summary:

      Ledamoisel et al. examined the evolution of visual and chemical signals in closely related Morpho butterfly species to understand their role in species coexistence. Using an integrative, state-of-the-art approach combining spectrophotometry, visual modeling, and behavioral mate choice experiments, they quantified differences in wing iridescence and assessed its influence on mate preference in allopatry and sympatry. They also performed chemical analyses to determine whether sympatric species exhibit divergent chemical cues that may facilitate species recognition and mate discrimination. The authors found iridescent coloration to be similar in sympatric Morpho species. Furthermore, male mate choice experiments revealed that in sympatry, males fail to discriminate conspecific females based on coloration, reinforcing the idea that visual signal convergence is primarily driven by predation pressure. In contrast, the divergence of chemical signals among sympatric species suggests their potential role in facilitating species recognition and mate discrimination. The authors conclude that interactions between ecological pressures and signal evolution may shape species coexistence.

      Strengths:

      The study is well-designed and integrates multiple methodological approaches to provide a thorough assessment of signal evolution in the studied species. We appreciate the authors' careful consideration of multiple selective pressures and their combined influence on signal divergence and convergence. Additionally, the inclusion of both visual and chemical signals adds an interesting and valuable dimension to the study, enhancing its importance. Beyond butterflies, this research broadens our understanding of multimodal communication and signal evolution in the context of species coexistence.

      Reviewing Editor comment:

      The authors have improved their submission after revisions and responded to the previous concerns of the reviewers.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      In this study, Ledamoisel et al. examined the evolution of visual and chemical signals in closely related Morpho butterfly species to understand their role in species coexistence. Using an integrative, state-of-the-art approach combining spectrophotometry, visual modeling, and behavioral mate choice experiments, they quantified differences in wing iridescence and assessed its influence on mate preference in allopatry and sympatry. They also performed chemical analyses to determine whether sympatric species exhibit divergent chemical cues that may facilitate species recognition and mate discrimination. The authors found iridescent coloration to be similar in sympatric Morpho species. Furthermore, male mate choice experiments revealed that in sympatry, males fail to discriminate conspecific females based on coloration, reinforcing the idea that visual signal convergence is primarily driven by predation pressure. In contrast, the divergence of chemical signals among sympatric species suggests their potential role in facilitating species recognition and mate discrimination. The authors conclude that interactions between ecological pressures and signal evolution may shape species coexistence.

      Strengths:

      The study is well-designed and integrates multiple methodological approaches to provide a thorough assessment of signal evolution in the studied species. I appreciate the authors' careful consideration of multiple selective pressures and their combined influence on signal divergence and convergence. Additionally, the inclusion of both visual and chemical signals adds an interesting and valuable dimension to the study, enhancing its importance. Beyond butterflies, this research broadens our understanding of multimodal communication and signal evolution in the context of species coexistence.

      Weaknesses:

      (1) The broader significance of the findings needs to be better articulated. While the authors emphasize that comparing adaptive traits in sympatry and allopatry provides insights into selective processes shaping reproductive isolation and coexistence, it is unclear what key conceptual or theoretical questions are being addressed. Are these patterns expected under certain evolutionary scenarios? Have they been empirically demonstrated in other systems? The authors should explicitly state the overarching research question, incorporate some predictions, and better contextualize their findings within the existing literature. If the results challenge or support previous work, that should be highlighted to strengthen the study's importance in a broader context.

      We thank the reviewer for their valuable feedback. We understand that the framing of the results and the discussion may fail to convey the broader significance of our findings. In the first version of the manuscript, we framed our manuscript around the processes shaping reproductive isolation and co-existence in sympatry, but now realize that this question was too broad in regards to our results. We thus strictly focused on outlining the importance of ecological interactions in the evolution of traits in sympatric species. In the revised version of the manuscript, we rewrote the first paragraph of the introduction to introduce context regarding the effect of ecological interactions on trait evolution (lines 43-60). We then explicitly introduce the theoretical question investigated in our paper (i.e. “we investigate how ecological interactions in sympatry can constrain natural and sexual selection shaping trait evolution”, lines 62-63) and our predictions regarding the evolution of traits in sympatry vs. allopatry (lines 74-80). We also added predictions regarding our experiments on Morpho at the end of the introduction (lines 146-157). As a result, the discussion is now better aligned with the introduction, by discussing the putative effect of predation and mate choice on the evolution of wing iridescence in Morpho.

      (2) The motivation for studying visual signals and mate choice in allopatric populations (i.e., at the intraspecific level) is not well articulated, leaving their role in the broader narrative unclear. In particular, the rationale behind experiments 1, 2, and 3 is not well defined, as the authors have not made a strong case for the need for these intraspecific comparisons in the introduction. This issue is further compounded by the authors' primary focus on signal evolution in sympatry throughout both the results and the discussion. For instance, the divergence of iridescence in allopatry is a potentially interesting result. But the authors have not discussed its implications.

      We now clearly state in the introduction our motivation for studying visual signals and mate choice in allopatric populations (lines 74-80, lines 146-157). We argued that intraspecific comparisons help identify whether visual cues can be used in mate recognition between phylogenetically close subspecies, between whom visual resemblance is supposed to be higher than between closely-related species (tetrad experiment, and experiment 1). As M. h. bristowi and M. h. theodorus have different wing pattern, we also used this comparison to identify the traits involved in male mate preference within a species, testing the importance of iridescent color (experiment 2) or iridescent patterning (experiment 3). The results of those experiments can then be used to assess whether these traits are used in species recognition between sympatric species. See also our answers to recommendations 11 and 15 from reviewer #1.

      Overall, given that the primary conclusions are based on results and analyses in sympatry, the role of allopatric populations in shaping these conclusions needs to be better integrated and justified. Without a stronger link between the comparative framework and the study's key takeaways, the use of allopatric populations feels somewhat peripheral rather than central to the study's aim. Since the primary conclusions remain valid even without the allopatric comparisons, their inclusion requires a clearer rationale.

      To make a stronger case for the use of the allopatric population in our manuscript, we strengthened the justification behind the study of intraspecific allopatric populations vs. interspecific sympatric populations, as the iridescence measurements and the mate choice experiments in allopatric populations can serve as a baseline in studying how species interactions can shape the evolution of traits and mate recognition when compared to sympatric populations. Following your major comment #1, we rewrote the introduction to include a justification to the need for studying allopatric vs. sympatric populations (lines 74-80), and also further highlighted the need to study iridescence in sympatric species to fully understand the trait evolution of sympatric species in the discussion (339-343).

      (3) While the authors demonstrate that iridescence is indistinguishable to predators in sympatry, they overstate the role of predation in driving convergence. The present study does not experimentally demonstrate that iridescence in this species has a confusion effect or contributes to evasive mimicry. Alternatively, convergence could result from other selective forces, such as signal efficacy due to environmental conditions, rather than being solely driven by predation.

      We acknowledge that our study does not directly demonstrate that iridescence contributes to evasive mimicry. We did tone down the interpretation of the results in the discussion and state that predation is not the only selective pressure that could have promoted a convergent evolution of iridescence in sympatric species, as iridescence is a trait that could be involved in thermoregulation (lines 346-353) and camouflage (lines 363-369) for example. We made sure to mention that convergence in iridescent signals in sympatry is only an indirect support to the evasive mimicry hypothesis, and that further research is still needed, including direct predation experiments, to show that this convergence is indeed triggered by predation (lines 391-396).  

      Reviewer #2 (Public review):

      This study presents an investigation of the visual and chemical properties and mating behaviour in Morpho butterflies, aimed at addressing the nature of divergence between closely related species in sympatry. The study species consists of three subspecies of Morpho helenor (bristowi, theodorus, and helenor), and the conspecific Morpho achilles achilles. The authors postulate that whereas the iridescent blue signals of all (sub)species should function as a predator reduction signal (similar to aposematism) and therefore exhibit convergence, the same signals should indicate divergence if used as a mating signal, particularly in sympatric populations. They also assess chemical profiles among the species to assess the potential utility of scent in mediating species/sex discrimination.

      The authors first used reflectance spectrometry to calculate hue, brightness, and chroma, plus two measures of "iridescence" (perhaps better phrased as angular dependence) in each (sub)species. This indicated the ubiquitous presence of sexual dimorphism in brightness (males brighter), which also appears to be the case for iridescence (Figure 3A-B). Analysis of these data also indicated that whereas there is evidence for divergence among subspecies in allopatry, the same evidence is lacking for species in sympatry (P = 0.084). This was supported further by visual modelling, which showed that both conspecifics and birds should be (theoretically) capable of perceiving the colour difference among allopatric populations of M. helenor, whereas the same is not true for the sympatric species.

      The authors then conducted mate choice trials, first using live individuals and second using female dummies. The live experiments indicated the presence of assortative mating among the two subspecies of M. helenor (bristowi and theodorus). The dummy presentations indicated (a) bristowi males prefer conspecific wings, whereas theodorus have no preference, (b) bristowi males prefer the con(sub)specific colour pattern, (c) theodorus prefer the con(sub)specific iridescence when the pattern is manipulated to be similar among female dummies. A fourth experiment, using sympatric M. achilles and M. helenor, indicated no preference for conspecific female dummies. Finally, chemical analysis indicated substantial differences between these two species in putative pheromone compounds, and especially so in the males.

      The authors conclude that the similarity of iridescence among species in sympatry is suggestive of convergence upon a common anti-predation signal. Despite some behavioural evidence in favourof colour (iridescence)-based mate discrimination, chemical differences between Achilles and Helenor are posed as more likely to function for species isolation than visual differences.

      Overall, I enjoyed reading this manuscript, which presents a valiant attempt at studying visual, chemical and behavioural divergence in this iconic group of butterflies.

      Major comments

      My only major comment concerns the authors' favoured explanation for aposematism (or evasive mimicry) for convergence among species, which is based upon the you-can't-catch-me hypothesis first presented by Young 1971. Although there is supporting work showing that iridescent-like stimuli are more difficult to precisely localize by a range of viewers, most of the evidence as applied to the Morpho system is circumstantial, and I'm not certain that there is widespread acceptance of this hypothesis. Given that the present study deals with closely-related  (sub)species, one alternative explanation - a "null" hypothesis of sorts - is for a lack of divergence (from a common starting point) as opposed to evolutionary convergence per se. in other words, two subspecies are likely to retain ancestral character states unless there is selection that causes them to diverge. I feel that the manuscript would benefit from a discussion of this alternative, if not others. Signalling to predators could very well be involved in constraining the extent of convergence, but this seems a little premature to state as an up-front conclusion of this work. There is also the result of a *dorsal* wing manipulation by Vieira-Silva et al. 2024 which seems difficult to reconcile in light of this explanation. Whereas this paper is cited by the authors, a more nuanced discussion of their experimental results would seem appropriate here.

      We thank the reviewer for their constructive comments on our manuscript. We appreciate the reviewer’s concern regarding the way iridescence convergence between sympatric species is discussed in our manuscript, which align with similar concerns raised by Reviewer 1. Indeed, the you-can't-catch-me hypothesis has not been yet empirically tested in Morpho, this is currently a working hypothesis only supported by indirect lines of evidence.

      Among the 30 known Morpho species, iridescence is most likely the ancestral character, notably because iridescence is a trait shared by a majority of Morpho (we now mention this in the introduction lines 108-110). In this paper, we thus did not aim to identify the evolutionary forces involved in the appearance of iridescence in this group, but rather wanted to understand to what extent ecological interactions can impact the diversification (or not) of this trait. As such, the dorsal manipulations performed in Vieira-Silva et al 2024 showing that iridescence in Morpho may have a similar effect than crypsis does not impact our working hypothesis. Instead, we use VieraSilva et al 2024 to discuss the potential anti-predator effect of iridescence, that could potentially promote convergent evolution of iridescent patterns.

      In the main text, we now clearly mention our null hypothesis: under a scenario of neutral evolution of iridescence, we would expect that the divergence in wing coloration between two M. helenor subspecies would be lower than between two different Morpho species (M. helenor and M. achilles) and showed that our results sharply differ from this null expectation.

      We then improved the discussion by adding alternative hypotheses potentially explaining the convergent iridescent signal detected in sympatric species: we discussed the expected effect under neutral evolution (lines 339-343), but also added alternative hypotheses regarding the diversification of iridescence due to camouflage (lines 363-369), predator evasion (lines 373-377) and thermoregulation (lines 346-353).

      Reviewer #3 (Public review):

      The authors investigated differences in iridescence wing colouration of allopatric (geographically separated) and sympatric (coexisting) Morpho butterfly (sub)species. Their aim was to assess if iridescence wing colouration of Morpho (sub)species converged or diverged depending on coexistence and if iridescence wing colouration was involved in mating behaviour and reproductive isolation. The authors hypothesize that iridescence wing colouration of different (sub)species should converge in sympatry and diverge in allopatry. In sympatry, iridescence wing colouration can act as an effective antipredator defence with shared benefits if multiple (sub)species share the same colouration. However, shared wing colouration can have potential costs in terms of reproductive interference since wing colouration is often involved in mate recognition. If the benefits of a shared antipredator defence outweigh the costs of reproductive interference, iridescence wing colouration will show convergence and alternative mate recognition strategies might evolve, such as chemical mate recognition. In allopatry, iridescence wing colouration is expected to diverge due to adaptation to different local conditions and no alternative mate recognition is expected.

      Strengths:

      (1) Using allopatric and sympatric (sub)species that are closely related is a powerful way to test evolutionary hypotheses

      (2) By clearly defining iridescence and measuring colour spectra from a variety of angles, applying different methods, a very comprehensive dataset of iridescence wing colouration is achieved.

      (3) By experimentally manipulating wing coloration patterns, the authors show visual mate recognition for M. h. bristowi and could, in theory, separate different visual aspects of colouration (patterns VS iridescence strength).

      (4) Measurements of chemical profiles to investigate alternative mate recognition strategies in case of convergence of visual signals.

      Weaknesses:

      In my opinion, studies should be judged on the methods and data included, and not on additional measurements that could have been taken or additional treatments/species that should be included, since in most ecological and evolutionary studies, more measurements or treatments/species can always be included. However, studies do need to ensure appropriate replication and appropriate measurements to test their hypothesis AND support their conclusions. The current study failed to ensure appropriate replication, and in various cases, the results do not support the conclusions.

      First, when using allopatric and sympatric (sub)species pairs to test evolutionary hypotheses, replication is important. Ideally, multiple allopatric and sympatric (sub)species pairs are compared to avoid outlier (sub)species or pairs that lead to biased conclusions. Unfortunately, the current study compares 1 allopatric and 1 sympatric (sub)species pair, hence having poor (no) replication on the level of allopatric and sympatric (sub)species pairs,

      We would like to thank the reviewer for their constructive feedback. We agree that replication is important to test evolutionary hypotheses and that our study lacks replication for allopatric and sympatric Morpho populations. Ideally, one would require several allopatric and sympatric replicates to conclude on the effect of species interaction in trait evolution. Our study is a preliminary attempt at answering this question, covering a few Morpho populations but proposing a broad assessment of iridescence and mate preference for those populations. We clearly mentioned in the discussion that investigating multiple populations is needed to test whether the trend we observed in this paper can be generalized (line 388-392).

      Second, chemical profiles were only measured for sympatric species and not for allopatric (sub)species, which limits the interpretation of this data. The allopatric (sub)species could have been measured as non-coexistence "control". If coexistence and convergence in wing colouration drives the evolution of alternative mate recognition signals, such alternative signals should not evolve/diverge for allopatric (sub)species where wing colouration is still a reliable mate recognition cue. More importantly, no details are provided on the quantification of butterfly chemical profiles, which is essential to understand such data. It is unclear how the chemical profiles were quantified and what data (concentrations, ratios, proportions) were used to perform NDMS and generate Figure 5 and the associated statistical tests.

      We recognize that having the chemical profiles of the genitalia of the Morpho from the allopatric populations would have made a stronger case in favor of reinforcement acting on the divergence of the chemical compounds found on the genitalia of the sympatric Morpho species. Due to limited access to the biological material needed at the time of the chromatography, we could not test for lower divergence in the chemical profiles of allopatric Morpho butterflies. We made sure to mention this limitation in the discussion (lines 457-461). 

      We already stated in the methods that we compiled the area under the peak of each components found in the chromatograms of our samples and that we performed all the statistical analyses on this dataset. To make it clearer, we mention in the new version of the manuscript that the area under the peak of each component allows to measure the concentration of the components (in the methods lines 720, 723, 733). We also added some precisions in the legend of Figure 5.

      Third, throughout the discussion, the authors mention that their results support natural selection by predators on iridescent wing colouration, without measuring natural selection by predators or any other measure related to predation. It is unclear by what predators any of the butterfly species are predated on at this point

      We made sure to mention in the introduction (line 132-136) and in the discussion (line 373-377) that previous predation experiments performed on Morpho and other butterflies showed evidence that birds are likely predators for these species. These observations lead us to test for the putative effect of predation on the evolution of their color pattern, without directly testing predatory rates. We made sure this information is transparent in the revised manuscript, and now precise that assessing wing convergence is only an indirect way of testing the escape mimicry hypothesis (line 393-396).

      To continue on the interpretation of the data related to selection on specific traits by specific selection agents: This study did not measure any form of selection or any selection agent. Hence, it is not known if iridescent wing colouration is actually under selection by predators and/or mates, if maybe other selection agents are involved or if these traits converge due to genetic correlations with other traits under selection. For example, Iridescent colouration in ground beetles has functions as antipredator defence but also thermo- and water regulation. None of these issues are recognized or discussed.

      The lack of discussion of alternative selective pressures involved in the evolution of iridescence was pointed out by all reviewers. We thus modified the text to account for this comment, and no longer limit our discussion to the putative effects of predation. We now specifically discuss alternative hypotheses, including crypsis (362-369) and thermoregulation (line 346-353).

      Finally, some of the results are weakly supported by statistics or questionable methodology.

      Most notably, the perception of the iridescence coloration of allopatric subspecies by bird visual systems. Although for females, means and errors (not indicated what exactly, SD, SE or CI) are clearly above the 1 JND line, for males, means are only slightly above this line and errors or CIs clearly overlap with the 1 JND line. Since there is no additional statistical support, higher means but overlap of SD, SE or CI with the baseline provides weak statistical support for differences.

      We thank the reviewer for bringing interpretation issues concerning the chromatic distances of allopatric Morpho species measured with a bird vision model. We made sure to be nuanced in the description of this graph in the results section (line 208-212). Note that this addition does not change our main conclusion stating that Morpho and predator visual models better discriminate iridescence differences between allopatric subspecies than between sympatric species.

      We now also clearly mention in the figure’s legend that the error bars represent the confidence intervals obtained after performing a bootstrap analysis, in addition to the mention of the nature of the error bars already mentioned in the methods (line 580).

      Regarding the assortative mating experiment, the results are clearly driven by M. bristowi. For M. theodorus, females mate equally often with conspecifics (6 times) as with M. bristowi (5 times). For males, the ratio is slightly better (6 vs 3), but with such low numbers, I doubt this is statistically testable. Overall low mating for M. bristowi could indicate suboptimal experimental conditions, and hence results should be interpreted with care.

      We recognize that the tetrad experiment results are mainly driven by M. bristowi’s behavior as already mentioned in the results (line 231-232) but we now also mention it in the discussion (lines 401-402). This experiment would have benefited from more replicates, but the limited access to live males and virgin females for both subspecies was a limiting factor. Fisher’s exact test used to assess assortative mating is specifically appropriate to small sample sizes. We recognize that the sampling size is not ideal, however it is still statistically testable.

      Regarding the wing manipulation experiment, M. theodorus does not show a preference when dummies with non-modified wings are presented and prefers non-modified dummies over modified dummies. This is acknowledged by the authors but not further discussed. Certainly, some control treatment for wing modification could have been added.

      The use of controls to consider the effect of wing modification and odor by the permanent marker were already mentioned in the methods (lines 636-639). Following your recommendation and comments from the other reviewers, we now mention the use of this control in the results (lines 278283). We also address a potential issue that would have resulted in the rejection of these modified dummies by live males: we cannot be sure whether butterflies perceive these modifications as equivalent to natural coloration (lines 281-282). An additional control could have been used, adding black ink on the black dorsal parts of the pattern to assess its potential visual effect. The constraints on sampling unfortunately did not allow to add another treatment.

      Overall, the fact that certain measurements only provide evidence for 1 of the 2 (sub)species (assortative mating, wing manipulation) or one sex of one of the species (bird visual systems) means overall interpretation and overgeneralization of the results to both allopatric or sympatric species should be done with care, and such nuances should ideally be discussed.

      The aim of the authors, "to investigate the antagonistic effects of selective pressures generated by mate recognition and shared predation" has not been achieved, and the conclusions regarding this aim are not supported by the results. Nevertheless, the iridescence colour measurements are solid, and some of the behavioural experiments and chemical profile measurements seem to yield interesting results. The study would benefit from less overinterpretation of the results in the framework of predation and more careful consideration of methodological difficulties, statistical insecurities, and nuances in the results.

      Overall, we would like to thank all reviewers for their thorough assessment of our work. We understand that the imbalance between mate choice data, visual model data and chemical data only gives us a partial assessment of species recognition in Morpho butterflies, thus requiring more precision in the interpretation and the discussion of our results. We made sure to add balanced interpretations in our discussion, by mentioning the lack of replicates for allopatric and sympatric populations (lines 391-392), and the lack of chemical characterization of allopatric species (lines 458361, see previous comments) and by being more transparent on methodological limitations that we failed to convey in the first version of our manuscript. We brought nuance to our discussion and also discussed alternative hypotheses to predation to explain the convergence of iridescence found in sympatry.

      Reviewing Editor Comments:

      While all reviewers acknowledge the value of your data, they converge in their recommendations to tone down the evolutionary interpretations. Ideally, to test your main hypothesis, you would need several species pairs, or if only one, as in your case, replicated sympatric and allopatric sites for both species. Furthermore, your more specific hypotheses about convergence (vs. nondivergence), response to predators (vs. other environmental variables), and avoiding interspecific mating in sympatry (vs. not avoiding it in allopatry) would require appropriate alternative treatments/controls. We therefore recommend that you focus on those statements that you can support with your experiments and data, and introduce these statements in the introduction with reference to the appropriate literature.

      Reviewer #1 (Recommendations for the authors):

      (1) Line 25: This stated aim seems a bit off. The authors did not sensu stricto quantify 'how shared adaptive traits may shape genetic divergence' in this study. I suggest rewriting or deleting this whole sentence altogether. The study's aim is already clear in lines 29-34.

      We deleted the mention of the characterization of genetic divergence, since this study did not focus on any genetic analysis.

      (2) Line 34: The authors here state that they compared allopatric vs sympatric populations. This is strictly not true for M. Achilles. Further, the results after this sentence focus solely ondivergence/convergence in sympatry, nothing at the intraspecific level and implications of the findings

      We now mention that we tested allopatric vs. sympatric species of M. helenor only (lines 28-29). We also mention that the behavioral experiments were based on intraspecific comparisons, and discuss the implications of this result in the discussion.

      (3) Line 35: 'convergence driven by predation': this is a strong statement and cannot be directly inferred from the present set of experiments. Consider toning it down.

      We added nuance to this statement by rephrasing it “suggesting that predation may favors local resemblance” (lines 32-33)

      (4) Line 36: Replace 'behavioral results' with 'behavioral experiments' or something similar.

      Corrected

      (5) Line 45-49: These opening statements need some citations.

      We provided references for the first few lines, by citing terHorst et al 2018 (line 44) underlining the importance of species interactions in trait evolution, and Blomberg et al 2003 (line 45) showing that closely-related species tend to resemble each other by quantifying the phylogenetic signal of various traits.

      (6) Line 83, 165: 'visual effect', not sure what the authors are referring to. Please rewrite.

      We defined “visual effect” as the way wing color patterns could be perceived by predators or mates. We removed mentions of “visual effect” and directly used its definition instead.

      (7) Line 105 onwards: This section of the introduction could benefit from more concise writing. The authors might consider reducing the number of specific examples and instead offering broader general statements, supported by citations from multiple studies.

      We reduced the number of examples given in this paragraph and used general statements supported by multiple citations as examples. (lines 102-119).

      (8) Line 108-110: This sentence seems to be redundant with the previous one.

      We merged this sentence with the previous one to improve clarity. (lines 103-105)

      (9) Line 140: 'with chemical defenses': include citations here.

      We added citations of Joron et al 1999 and Merrill et al 2014, which document the evolution of convergent wing patterns (mimicry) in butterfly species with chemical-defenses.

      (10) Line 149: This is a bit of a stretch. Note that genetic divergence could be influenced by many other things, not only the processes that the authors examined.

      We agree with the reviewer that the study of the convergent vs. divergent evolution of visual cues is not enough to fully understand the mechanisms allowing genetic divergence between species. Because this paper does not focus on characterizing genetic divergence, we removed it from the manuscript to avoid oversimplification.

      (11) Line 151: Again. Here, the author's primary focus seems to be at an interspecific level. One is left to wonder about the need for comparisons at the intraspecific level in M.helenor and the implications. Please clarify

      In the end of the introduction (lines 146-157), we specifically highlighted the importance of intraspecific comparisons. While studying the effect of sympatry on the evolution of the iridescent color pattern, we use this intraspecific comparison as a baseline to account for convergence or divergence of iridescence in a sympatric interspecific pair of Morpho, because under neutral evolution two subspecies are expected to be more similar than two different species (this assumption has been clarified line 147-148). We also used intraspecific mate choice to test for the use of visual cues in mate recognition (experiment 1) and to test what type of signal could be perceived by Morphos (the iridescent coloration or the iridescent pattern, experiment 2 and 3). These results help contextualize the interspecific mate choice, focused on determining whether visual cues could also be used in species recognition. Since we show that iridescent coloration is important in mate recognition at the intraspecific scale, it helps understand why species recognition is low at the interspecific scale because of wing color convergence between M. helenor and M. achilles.

      (12) Line 154: 'signals on mate preferences'.

      Corrected.

      (13) Line 189: 'At the intraspecific level', maybe in the brackets include 'allopatric populations' just so the results are in a similar format as in the color contrast section below.

      We added details to make clearer that the intraspecific level is studied between allopatric Morpho populations (line 189).

      (14) Line 189-192: Please rearrange the figure (current B as A and vice versa) or present the results in order as in the figure (interspecific first and then intraspecific level).

      We rearranged Figure 3 so that the intraspecific comparison (allopatric population) appears as A and the interspecific level (sympatric population) appears as B, to follow the order of presentation in the main text.

      (15) Line 232: The motivation behind experiments 1, 2, and 3 is unclear. The authors have not made a strong point in the introduction about the need for these comparisons at an intraspecific level. Given that the authors are focused on divergence/convergence at an interspecific level, this set of experiments seems to be irrelevant to the present study. The implications of these findings are also not discussed.

      We added motivation to the use of experiment 1, 2, and 3 in the introduction (lines 151-154) by stating that those experiments were used to assess whether blue color could indeed be used as a mating cue in Morpho helenor (experiment 1) and to try to understand what part of the visual signal is important in mate choice in Morpho helenor: the wing pattern (experiment 2) or the iridescent coloration (experiment 3). Although motivation for these experiments was not detailed in our manuscript, we already discussed the implications of the results of experiments 1, 2 and 3 in the discussion by stating that visual cues can take many forms and that considering both color AND pattern is important in understanding visual cues (lines 408-416). We carefully reworked this new version to make it more straightforward.

      (16) Line 260: Insert 'wild-type' before model to ensure similar wording as in the previous section.

      Corrected.

      (17) Line 286: Insert 'sympatric' after mimetic.

      Corrected.

      (18) Line 307: Include a reference to the figures or table where these results are presented.

      We now mention in the main text that the different proportions of beta-ocimene found between males M. helenor and M. achilles are shown in Table S2.

      (19) Line 343: These inferences are speculative. Add a line here, something like 'although this warrants further research in this species'.

      We detailed what additional experiments are needed lines 388-396.

      (20) Line 357: The authors have not discussed their results on iridescence divergence in allopatric populations (line 190) and its implications.

      We now made clear in the beginning of the discussion that the divergence of iridescence in allopatric populations is used as a baseline to test for convergent iridescence between species (lines 339-343).

      (21) Line 361 onwards: This first paragraph is a bit confusing, as the results mainly focus on allopatry, while the title refers to sympatry.

      To avoid confusion between the title and the content of the discussion, we divided the last part of the discussion into two different parts. As the first paragraph mainly focus on allopatry, we isolated it and titled it “Iridescent color patterns can be used as mate recognition cues in M. helenor” (line 498). The next paragraph of the discussion, focusing on the sympatric Morpho populations, has been titled “Evolution of visual and olfactory cues in mimetic sister-species living in sympatry” (line 418).

      (21)  Line 383: visual cues 'as' poor species.

      Corrected.

      (23) Line 405: Why females here and not males? This is again confusing since the authors tested for male mate choice in the main experiments. Some background information on sex-specific mate choice in the methods might help.

      In this specific sentence, we talk about performing mate choice experiments to test for the discrimination of olfactory cues by females (and not males) because we found a high divergence in the chemical compounds found on male genitalia. Although female chemical compounds could also be used as a cue by males in mate recognition, olfactive mate choice is often driven by female choice in butterflies. We recognize that this perspective does not line up with the mate choice presented in our results section which focused on male mate choice based on visual cues, because of ecological reasons (Morpho males tend to be attracted to bright blue colorations but not females) and technical reasons (in cages, females tend to hide away from the males or male dummies, and this behavior is not compatible with experiments involving flying around false males). In the discussion, we made sure to precise that the perspective we cite here is about testing the implications of divergence in male olfactory cues (line 454). We also added motivation to why we chose to investigate male (and not female) mate choice based on visual cues in the methods (lines 613-618) and in the results (219-223).

      (24) Line 417: This inference is speculative. Consider toning it down.

      We rewrote the sentence: “We find evidence of converging iridescent patterns in sympatry suggesting that predation could play a major role in the evolution of iridescence. Further work is nevertheless needed to directly test this hypothesis and establish the important of evasive mimicry in Morpho” (lines 465-468).

      (25) Line 429: 'Convergent trait evolution leads to mutualistic interactions enhancing coexistence'. Careful here. It is not very evident how convergent trait evolution (iridescence) is mutualistic in this case, as there is no experimental evidence for evasive mimicry yet. Consider rewording or toning this sentence down.

      We agree with the reviewer and removed this statement, only keeping the end of the sentence: “Altogether, this study addresses how convergence in one trait as a result of biotic interactions may alter selection on traits in other sensory modalities, resulting in a complex mosaic of biodiversity. (lines 479-481).

      (26) Line 442: Since the samples come from a breeding farm, I have a few questions. How are the authors sure about the location where the specimens were collected? How long have they been kept in captivity? Have they been subjected to any artificial selection? More details are needed here.

      Since M. helenor bristowi and M. helenor theodorus are only found in the wild in West and East Ecuador respectively, those M. helenor subspecies can only be collected in those two allopatric populations. Their phenotype is directly linked to their geographic repartition, this is how we made sure about their collect location. M. h. theodorus we used in this study were caught in East Ecuador in Tena, and M. h. bristowi were caught in West Ecuador in Pedro Vincente Madonado. We received pupae from the breeding farm, meaning that the Morpho used for the experiments were raised in captivity since their date of emergence. Upon emergence, they were transferred into cages for 4 to 5 days to wait for sexual maturity before performing the tetrad and mate choice experiments. This information was added to the method (lines 490-496).

      (27) Line 476: Include some citations supporting this statement.

      We now cite Bennett and Théry (2007), reviewing avian color vision, and Briscoe (2008), characterizing the sensitivity of the photoreceptors found in the eyes of butterflies. Both citations show that the 300-700nm range is seen by avian and butterfly visual systems.

      (28) Line 480 onwards: Please clarify if the analysis used only one value (mean?) per species, sex, angle of measurement, and locality or included data from multiple individuals.

      The analyses of both colorimetric variables and global iridescence were performed using iridescence data from multiple individuals (10 males and 10 females from M. h. bristowi, M. h. theodorus, M. h. helenor and M. a. achilles), for which we measured iridescence at 21 angles of illumination. Sampling size are mentioned lines 507, 515, 540-542.

      (29) Line 510: Is there a specific reason that authors did not investigate achromatic contrasts? Provide some justification here. Or include the results of achromatic contrasts in the supplement.

      We added the achromatic results in the supplement and in the results (lines 200-204). For both the avian visual model and the Morpho visual model, the confidence intervals always overlapped with the JND threshold, showing that neither birds nor butterflies could theoretically discriminate the wing reflectance brightness in allopatric and sympatric populations.

      (30) Line 552 onwards: I may have missed it. It is not entirely clear why the authors focused on male mate choice rather than female preference for visual cues. The authors should explicitly justify this choice and cite previous studies demonstrating that male mate choice, rather than female preference, is important in this species. This should be stated in the results section as well.

      We added a paragraph in the method (lines 613-618) to describe the ecological and technical reasons leading to testing only male mate choice using visual cues (also see our response to recommendation #23).

      (31) Line 537 onwards: What was the criterion used to score that mating had occurred? Why first mating and not how long they were mating? Please add these details.

      We stopped the experiment as soon as a male/female pair was formed by joining their genitalia (we added this information in the method lines 599-600). Since the tetrad experiment involves the interaction of two males and two females from different subspecies, we considered that mate choice happened before the formation of any couple, and is not necessarily dependent on how long they mate by observing their mating behavior. For instance, we witnessed avoidance behaviors from females that systematically hide their genitalia and refused to join their abdomen to some males, while being very ‘open’ to others (but did not quantify it).  

      (32) Line 571: The authors used a black permanent marker to modify wing patterns but did not validate whether butterflies perceive these modifications as equivalent to natural coloration. It is possible that the alterations introduced unintended visual cues and may explain why most males rejected the dummies (line 267). The authors should acknowledge this limitation here.

      We now acknowledge this limitation in the method (lines 638-639) and in the results section (lines 278-283).

      (33) Line 591: Insert 'above' after protocol.

      Corrected.

      (34) Line 605: If the authors included random effects in their model, then it should be generalized linear mixed model (GLMM) and not GLM as they wrote.

      We indeed included a random effect in our model accounting for male ID and trial number, we thus replaced “GLM” by “GLMM” in the manuscript.

      (35) Line 615: This set of analyses does not seem to account for pseudo-replication, as the data were recorded from the same male more than once (Line 583). Please clarify and redo the analysis with the GLMM framework

      We run new analyses using the GLMM framework: we used a binomial GLMM to test whether individuals preferentially interacted with dummy 1 vs. dummy 2 while accounting for pseudoreplication. The previously detected tendencies hold true with these new analyses, except for the visual mate discrimination of M. achilles: we now find statistical evidence that M. achilles tend to approach more their conspecifics during the mate choice experiment, although the signal is weak (line 297-307). Indeed, while we previously concluded that both species in sympatry (M. helenor and M. achilles) could not discriminate their conspecific mates, we now emphasize that M. achilles is somewhat sensitive to some visual signals. However, its estimated probability of approaching a conspecific is only 0.54, which is low compared to the estimated probability of approaching (0.61) or touching (0.84) a con-subspecific for M. bristowi. We thus concluded that even though some visual cues could be relevant for mate recognition, they are less reliable for male choice in sympatric populations were color patterns are more convergent, compared to allopatric populations. We thus updated Figure 4 and Figure S8 and S9, which are now picturing the probability of approaching or touching a conspecific or con-subspecific with the updated pvalues retrieved from the GLMM analyses. We also updated the results (line 297-307) and the discussion (lines 430-438) to bring nuance to our previous results.  

      (36) Line 963: Figure 3D. Is there a particular reason for comparing allopatric populations only within Ecuador rather than between Ecuador and French Guiana for M. helenor? Please clarify.

      We aimed at comparing the putative discrimination of blue coloration using visual models vs. what the butterflies actually discriminate using mate choice experiments. Since we only performed mate choice experiments involving M. h. bristowi x M. h. theodorus (allopatric populations within Ecuador) and M. h. helenor x M. a. achilles (sympatric population from Ecuador), we only looked at those comparisons using visual models. We added this precision lines (559-560).

      (37) Line 980: Are these predicted probabilities or just mean proportions as written in line 614? Then the label should be changed to 'Proportion of approaches' or something similar.

      Following our answer to recommendation #35, the points now represent the probability of touching a conspecific in the graph for each male, for every trial of every male tested. We corrected the legend of the figure. 

      Reviewer #2 (Recommendations for the authors):

      (1) Line 25: "...therefore facilitating co-existence in sympathy".

      Corrected.

      (2) Line 28: "contrasting" instead of contrasted.

      Corrected.

      (3) Line 33: begin a new sentence at the colon.

      Corrected.

      (4) Line 49: the phrase "habitat filtering" is unclear and should perhaps be defined or qualified.

      We replaced “habitat filtering” by its definition and cited Keddy (1992), describing the community assembly rules and defining habitat filtering (line 46)

      (5) Line 52: remove "even".

      Corrected.

      (6) Line 53: divergent suites may also result because traits are often constrained by genetic architecture (multivariate genetic covariances). This is discussed at length and specifically in relation to ornamental coloration by Kemp et al. 2023

      We rewrote the introduction and focused on only reviewing the ecological interactions promoting trait divergence in sympatric species, and did not mention genetics in this paper.

      (7) Line 87: (and throughout) refer to "colouration" or "colour pattern" rather than "colourations".

      Corrected.

      (8) Line 151: Remove "To do so,".

      Corrected.

      (9) Line 191: I would like to see the degrees of freedom for this test.

      We added the F-statistic=2.09 and the degrees of freedom df=1 of this test, and for all the following tests.

      (10) Line 201: (and throughout) replace "on" with "of".

      Corrected.

      (11) Line 205: modelling the visual properties of the wings allows one to infer what is theoretically visible/distinguishable. The modelling is useful but not necessarily definitive of vision/behaviour per se under different conditions in the wild. I therefore think it is appropriate to phrase the wording around the modelling approach more carefully. Perhaps refer to "theoretical" or "inferred" discriminability, or state (e.g.) that species should/should not be capable of perceiving differences based on the modelling data. You do this well in your wording of lines 207-209. This need not apply in the discussion because you're then dealing with the combination of modelling results and behaviour (mating trials).

      We agree with the reviewer that visual modelling only allows to infer what is theoretically discriminated by the butterflies, and that the wording of our sentence is confusing. We therefore modified the sentence to account for those precisions: “Morpho butterflies and predators can theoretically visually perceive the difference in the blue coloration between different subspecies of M. helenor…… using both bird and Morpho visual models” (line 206-209).

      (12) Line 222: Either the chi-square test or Fisher's exact test should be sufficient (why report both?)

      Chi-square test relies on large-sample assumptions (expected counts>5) whereas Fischer’s exact test does not and is valid even with small or unbalanced sample sizes. Since the M. bristowi female/M. h. theodorus male paring only occurred 3 times, we do not meet the primary assumptions to apply a Chi-square test, although it is significant. We used a Fischer’s test to confirm the results. Using both and finding that both tests are significant shows that the results are robust, although they may appear redundant. To simplify, we remove the results of the Chisquare test and only keep the Fisher’s test in the methodology and the results.

      (13) Line 224 (and throughout): Degrees of freedom should be provided for statistical tests.

      We reported the statistic value and the degrees of freedom for all mentions of the statistical tests in the main text, except for the Fischer test which does not rely on an asymptotic distribution like the Chi-squared distribution as it is an exact test.

      (14) Lines 266-267: This sentence has interest, but it is rather vague at present. Wouldn't your controls account for the effect of manipulation? This could be explained further.

      During our mate choice experiments, all Morpho female dummies used for the experiments were painted with black markers, either on their dorsal blue band to modify their blue iridescent phenotype, or on their ventral side, thus controlling for the effect of manipulation. However, we cannot rule out that the modification of the dorsal blue iridescence could have had a “repulsive” effect for males for several reasons. For example, depending on the visual discrimination of darker colors by Morphos, the painted black band could have a slightly different color compared to the dark “brown” usually surrounding their blue iridescent patterns. We now explain this in the results (lines 278-283) and in the methodology (lines 638-639)  

      (15) Line 316: I'm not certain that the similarity is best described as "striking", given a P-value of 0.084 for this contrast

      We agree with the reviewer and removed this adjective for this line.

      (16) Lines 387-390: This sentence is puzzling because, theoretically speaking, we should expect selection on visual preference to be heightened (not relaxed) in sympatry if colouration isincluded among the traits used in mate selection. I'm not certain I have understood the meaning here.

      We would like to thank the reviewer for pointing out this typo. If shared predatory pressures favors convergent evolution of color pattern, then the visual signals become less reliable for species recognition. As a result, sexual selection on visual preference is heightened and becomes stronger, favoring the evolution of alternative cues used to discriminate conspecific mates. We changed the sentence and now write “the convergent evolution of iridescent wing patterns… may have negatively impact visual discrimination and favored the evolution of divergent olfactory cues” (lines 457-458).

      (17) Line 529: Mating experiments. Given that these are quite large butterflies, I wondered whether a 3x3x2m cage would be sufficient in size to allow the expression of male courtship. A brief description of the courtship behaviour in these species or Morphos generally would be a useful addition to the paper.

      A cage this size was enough for the males to express a flight behavior similar to what can be seen in nature, while also being able to see the females (live females or dummies). We tried to perform mate experiments in a larger cage (7m x 5m x 3m) but the trials were not conclusive because male did not find the dummies depending on where they were flying in the cage. A 3mx3mx2m cage is a good compromise maximizing interactions while still allowing enough space to fly. We now describe Morpho male behavior and female behavior in the methods (lines 613-618).

      (18) Line 546: Why are both tests needed (chi-square AND Fisher's exact)?

      Similarly to our answer on recommendations #12, were used both tests to show robustness in the statistical results. We only kept the Fisher’s test results to simplify the results.

    1. eLife Assessment

      This study presents important information about the role of mu opioid receptors in neurotransmission between the medial habenula and the interpeduncular nucleus. The authors provide convincing evidence that mu opioid receptor activation has differential effects on transmission from substance P neurons and cholinergic neurons, and that blockade of potassium channels can unmask a nicotinic cholinergic synaptic response. This work will be of high interest to those studying this brain region, and potentially to the larger neuroscience community studying motivated behavior.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, the authors demonstrate for the first time that opioid signaling has opposing effects on the same target neuron depending on the source of the input. Further, the authors provide evidence to support the role of potassium channels in regulating a brake on glutamatergic and cholinergic signaling, with the latter finding being developmentally regulated and responsive to opioid treatment. This evidence solves a conundrum regarding cholinergic signaling in the interpeduncular nucleus that evaded elucidation for many years.

      Strengths:

      This manuscript provides 3 novel and important findings that significantly advance our understanding of the medial habenula-interpeduncular circuitry:

      (1) Mu opioid receptor activation (mOR) reduces postsynaptic glutamatergic currents elicited from substance P neurons while simultaneously enhancing postsynaptic glutamatergic currents from cholinergic neurons, with the latter being developmentally regulated.

      (2) Substance P neurons from the Mhb provide functional input to the rostral nucleus of the IPN, in addition to the previously characterized lateral nuclei.

      (3) Potassium channels (Kv1.2) provide a break on neurotransmission in the IPN,

      The findings here suggest that the authors have identified a novel mechanism for the normal function of neurotransmission in the IPN, so it would be expected to be observable in almost any animal. In the revised manuscript, the authors put forth significant effort to increase the n, thus increasing the confidence in the observations.

      There are also significant sex differences in nAChR expression in the IPN that might not be functionally apparent using the low n presented here. In the revised manuscript, the authors increased the n, and provided data to the reviewers that no significant sex differences were apparent, although there was a trend. Future studies should examine sex differences in detail.

      There are also some particularly novel observations that are presented but not followed up on, and this creates a somewhat disjointed story. For example, in Figure 2, the authors identify neurons in which no response is elicited by light stimulation of ChAT-neurons, but application of DAMGO (mOR agonist) un-silences these neurons. Are there baseline differences in the electrophysiological or morphological properties of these "silent" neurons compared to the responsive neurons? In the revised manuscript, the authors directly tested this with new experiments in SST+ neurons in the IPN, demonstrating convincingly that mOR activation unsilences these neurons.

      With the revisions, the authors have addressed the reviewers' concerns and significantly improved the manuscript. I find no further weaknesses.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, Chittajallu and colleagues present compelling evidence that mu opioid receptor (MOR) activation can potentiate synaptic neurotransmission in a medial habenula to interpeduncular nucleus (mHb-IPN) subcircuit. While, projections from mHb tachykinin 1 (Tac1) neurons onto lateral IPN neurons show a canonical opioid-induced synaptic depression in glutamate release, excitatory neurotransmission in mHb choline acetyltransferase (ChAT) projections to the rostral IPN is potentiated by opioids. This function emerges around age P27 in mice, when MOR expression in the IPN peaks.

      Strengths:

      Carefully executed electrophysiological experiments with appropriate controls. Interesting description of a neurodevelopmental change in the effects of opioids on mHb-IPN signaling.

      Weaknesses:

      A minor concern is that the genetic strategy used to target the mHb-IPN pathway (constitutive ChR2 expression in all ChAT+ and Tac1+ neurons) might not specifically target this projection. Future studies are needed to examine the precise mechanism whereby MOR signaling can potentiate glutamatergic neurotransmission in ChAT+ MHb-IPN projections."

    4. Reviewer #3 (Public review):

      Summary:

      Here the authors describe the role of mORs in synaptic glutamate release from substance P and cholinergic neurons in the medial habenula to interpeduncular nucleus (IPN) circuit in adult mice. They show that mOR activation reduces evoked glutamate release from substance P neurons yet increases evoked glutamate release and Ach release from cholinergic neurons. Unlike glutamate release, Ach release is only detected when potassium channels are blocked with 4-AP or dendrotoxin. The authors also report a previously unidentified glutamatergic input to IPR from SP neurons and describe the developmental timing of mOR- facilitation in adolescent mice.

      Strengths:

      - The experiments provide new insight into the role of mORs in controlling evoked glutamate release in a circuit with high levels of mORs and established roles in relevant behaviors.

      - The experiments are rigorous, and the results are clear cut. The conclusions are supported by the data.

      - The findings will be of interest to those working in the field of synaptic transmission and those interested in the function of the medial habenula or interpeduncular nucleus, as well as those seeking to understand the role of opioids on normal and pathological behaviors.

      Weaknesses:

      - The mechanistic underpinnings of these interesting and novel results are not pursued.

    1. eLife Assessment

      This important study elucidates the role of the exocyst component EXOC6A at distinct stages of ciliogenesis, which advances our understanding of ciliary membrane remodeling and cilium formation. The authors provide solid evidence that EXOC6A interacts with myosin-Va and is dynamically recruited via dynein-, microtubule-, and actin-dependent mechanisms, to support proper formation of the ciliary membrane. The study will be of interest to cell biologists and other researchers interested in vesicular trafficking, organellar membrane dynamics, and ciliogenesis.

    2. Reviewer #1 (Public review):

      Summary:

      The study by Lin et al. studies the role of EXOC6A in ciliogenesis and its relationship with the interactor myosin-Va using a range of approaches based on the RPE1 cell line model. They establish its spatio-temporal organization at centrioles, the forming ciliary vesicle and ciliary sheath using ExM, various super-resolution techniques, and EM, including correlative light and electron microscopy. They also perform live imaging analyses and functional studies using RNAi and knockout. They establish a role of EXOC6A together with myosin-Va in Golgi-derived, microtubule- and actin-based vesicle trafficking to and from the ciliary vesicle and sheath membranes. Defects in these functions impair robust ciliary shaft and axoneme formation due to defective transition zone assembly.

      Strengths:

      The study provides very high-quality data that support the conclusions. In particular, the imaging data is compelling. It also integrates all findings in a model that shows how EXOC6A participates in multiple stages of ciliogenesis and how it cooperates with other factors.

      Weaknesses:

      The precise role of EXOC6A remains somewhat unclear. While it is described as a component of the exocyst, the authors do not address its molecular functions and whether it indeed works as part of the exocyst complex during ciliogenesis.

    3. Reviewer #2 (Public review):

      Summary:

      The molecular mechanisms underlying ciliogenesis are not well understood. Previously, work from the same group (Wu et al., 2018) identified myosin-Va as an important protein in transporting preciliary vesicles to the mother vesicles, allowing for initiation of ciliogenesis. The exocyst complex has previously been implicated in ciliogenesis and protein trafficking to cilia. Here, Lin et al. investigate the role of exocyst complex protein EXOC6A in cilia formation. The authors find that EXOC6A localizes to preciliary vesicles, ciliary vesicles, and the ciliary sheath. EXOC6A colocalizes with Myo-Va in the ciliary vesicle and the ciliary sheath, and both proteins are removed from fully assembled cilia. EXOC6A is not required for Myo-Va localization, but Myo-VA and EHD1 are required for EXOC6A to localize in ciliary vesicles. The authors propose that EXOC6A vesicles continually remodel the cilium: FRAP analysis demonstrates that EXOC6A is a dynamic protein, and live imaging shows that EXOC6A fuses with and buds off from the ciliary membrane. Loss of EXOC6A reduces, but does not eliminate, the number of cilia formed in cells. Any cilia that are still present are structurally abnormal, with either bent morphologies or the absence of some transition zone proteins. Overall, the analyses and imaging are well done, and the conclusions are well supported by the data. The work will be of interest to cell biologists, especially those interested in centrosomes and cilia.

      Strengths:

      The TEM micrographs are of excellent quality. The quality of the imaging overall is very good, especially considering that these are dynamic processes occurring in a small region of the cell. The data analysis is well done and the quantifications are very helpful. The manuscript is well-written and the final figure is especially helpful in understanding the model.

      Weaknesses:

      Additional information about the functional and mechanistic roles of EXOC6A would improve the manuscript greatly.

    4. Reviewer #3 (Public review):

      Summary:

      Lin et al report on the dynamic localization of EXOC6A and Myo-Va at pre-ciliary vesicles, ciliary vesicles, and ciliary sheath membrane during ciliogenesis using three-dimensional structured illumination microscopy and ultrastructure expansion microscopy. The authors further confirm the interaction of EXOC6A and Myo-Va by co-immunoprecipitation experiments and demonstrated the requirement of EHD1 for the EXOC6A-labeled ciliary vesicles formation. Additional experiments using gene-silencing by siRNA and pharmacological tools identified the involvement of dynein-, microtubule-, and actin in the transport mechanism of EXOC6A-labeled vesicles to the centriole, as they have previously reported for Myo-Va. Notably, loss of EXOC6A severely disrupts ciliogenesis, with the majority of cells becoming arrested at the ciliary vesicle (CV) stage, highlighting the involvement of EXOC6A at later stages of ciliogenesis. As the authors observe dynamic EXOC6A-positive vesicle release and fusion with the ciliary sheath, this suggests a role in membrane and potentially membrane protein delivery to the growing cilium past the ciliary vesicle stage. While CEP290 localization at the forming cilium appears normal, the recruitment of other transition zone components, exemplified by several MKS and NPHP module components, was also impaired in EXOC6A-deficient cells.

      Strengths:

      (1) By applying different microscopy approaches, the study provides deeper insight into the spatial and temporal localization of EXOC6A and Myo-Va during ciliogenesis.

      (2) The combination of complementary siRNA and pharmacological tools targeting different components strengthens the conclusions.

      (3) This study reveals a new function of EXOC6A in delivering membrane and membrane proteins during ciliogenesis, both to the ciliary vesicle as well as to the ciliary sheath.

      (4) The overall data quality is high. The investigation of EXOC6A at different time points during ciliogenesis is well schematized and explained.

      Weaknesses:

      (1) Since many conclusions are based on EXOC6A immunostaining, it would strengthen the study to validate antibody specificity by demonstrating the absence of staining in EXOC6A-deficient cells.

      (2) While the authors generated an EXOC6A-deficient cell line, off-target effects can be clone-specific. Validating key experiments in a second independent knockout clone or rescuing the phenotype of the existing clone by re-expressing EXOC6A would ensure that the observed phenotypes are due to EXOC6A loss rather than unintended off-target effects.

      (3) Some experimental details are lacking from the materials and methods section. No information on how the co-immunoprecipitation experiments have been performed can be found. The concentrations of pharmacological agents should be provided to allow proper interpretation of the results, as higher or lower doses can produce nonspecific effects. For example, the concentrations of ciliobrevin and nocodazole used to treat RPE1 cells are not specified and should be included. More precise settings for the FRAP experiments would help others reproduce the presented data. Some details for the siRNA-based knockdowns, such as incubation times, can only be found in the figure legends.

      Taken together, the authors achieved their goal of elucidating the role of EXOC6A in ciliogenesis, demonstrating its involvement in vesicle trafficking and membrane remodeling in both early and late stages of ciliogenesis. Their findings are supported by experimental evidence. This work is likely to have an impact on the field by expanding our understanding of the molecular machinery underlying cilia biogenesis, particularly the coordination between the exocyst complex and cytoskeletal transport systems. The methods and data presented offer valuable tools for dissecting vesicle dynamics and cilium formation, providing a foundation for future research into ciliary dysfunction and related diseases. By connecting vesicle trafficking to structural maturation of an organelle, the study adds important context to the broader description of cellular architecture and organelle biogenesis.

    1. eLife Assessment

      This valuable study investigates the role of HIF1a signaling in epicardial activation and neonatal heart regeneration in mice. Using a combination of genetic and pharmacological approaches, the authors demonstrate that stabilization of HIF1a enhances epicardial activation and extends the regenerative capacity of the heart beyond the typical neonatal window following myocardial infarction. The main conclusion is well supported by solid data, although some minor concerns regarding experimental interpretation require further clarification to ensure accuracy.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts.

      Strengths:

      The study presents convincing genetic and pharmacological approaches on the role of hypoxia signaling enhance the regenerative potential of the epicardium

      Weaknesses:

      The major weakness remains the lack of convincing evidence demonstrating the role of hypoxia signaling in EMT modulation in the epicardial cells. The authors claimed that EMT assays adopted in this study are based on similar previous studies. Surprisingly, two of the references provided correspond to their own research group (PMID: 17108969, PMID: 19235142), limiting the credit for such claims, and the other two (PMID: 27023710, PMID: 12297106) assessment of cell migration but not EMT is reported. Thus, EMT remains to be convincingly demonstrated.

    3. Reviewer #2 (Public review):

      Summary:

      In this study, Gamen et al. investigated the roles of hypoxia and HIF1a signaling in regulating epicardial function during cardiac development and neonatal heart regeneration. The authors identified hypoxic regions in the epicardium during development and demonstrated that genetic and pharmacological stabilization of HIF1a during neonatal heart injury prolonged epicardial activation, preserved myocardium, enhanced infarct resolution, and maintained cardiac function beyond the normal postnatal regenerative window.

      Strengths:

      HIF1a signaling was manipulated in an epicardium-specific manner using appropriate genetic tools.

      Weaknesses:

      Some conclusions still need clarification.

      Comments on revisions:

      (1) The authors' comment on the partial overlap of HP1 and HIF1a IF signals (HIF1a is highly unstable ... broader regions of hypoxia) is reasonable and would help readers interpret the data if included in the text describing Fig. 1.

      (2) The conclusion regarding WT1+ cells in Fig. 2a and b remains unclear. Both panels display larger and smaller magenta cells, and when all are taken into account, the overall number does not appear substantially different. Additional clarification is needed on how the quantification was performed.

      (3) Regarding Figure 6-figure supplement 1c, it seems difficult to conclude the endothelial identity of WT1+ cells based on EMCN staining, as the markers do not overlap. The authors note that WT1 is upregulated in endothelial cells, but this has been reported in the context of injury, which differs from the context of the present study involving Molidustat.

    4. Reviewer #3 (Public review):

      Summary:

      The author's research here was to understand the role of hypoxia and hypoxia-induced transcription factors Hif-1a in the epicardium. The authors noted that hypoxia was prevalent in the embryonic heart and this persisted into neonatal stages until post natal day 7 (P7). Hypoxic regions in the heart were noted in the outer layer of the heart and expression of Hif-1a coincided with the epicardial gene WT1. It has been documented that at P7, the mouse heart cannot regenerate after myocardial infarction and the authors speculated that the change in epicardial hypoxic conditions could play a role in regeneration. The authors then used genetic and pharmacological tools to increase the activity of Hif genes in the heart and noted that there was a significant improvement in cardiac function when Hif-1a was active in the epicardium. The authors speculated that the presence of Hif-1a improved cell survival.

      Strengths:

      A focus on hypoxia and its effects on the epicardium in development and after myocardial infraction. This study outlines a potential to extend the regenerative time window in neonatal mammalian hearts.

      Weaknesses:

      While the observations of improved cardiac function is clear, the exact mechanism of how increased Hif-1a activity causes these effects is not completely revealed. The authors mention improved myocardium survival, but do not include studies to demonstrate this.

      There is an indication that fibrosis is decreased in hearts where Hif activity is prolonged, but there are no studies to link hypoxia and fibrosis.

      Comments on revisions:

      In the manuscript revision, the authors address my comments. They outline differences between genetic disruption of Phd2 and chemical inactivation could be due to dosing and drug half-life of Molidustat. The other comments are addressed by explaining that they have analyzed enough heart sections and hearts to come to their conclusions. The authors also state they cannot generate more numbers for this study, therefore I accept their conclusions as stated.

    5. Author response:

      The following is the authors’ response to the original reviews

      eLife Assessment

      This valuable study investigates the role of HIF1a signalling in epicardial activation and neonatal heart regeneration in mice. Through a combination of genetic and pharmacological approaches, the authors show that stabilization of HIF1a enhances epicardial activation and extends the regenerative capacity of the heart beyond the typical neonatal window following myocardial infarction (MI). However, several aspects of the study remain incomplete and would benefit from further clarification and additional experimental support to solidify the conclusions.

      We reveal herein prolonged epicardial activation following myocardial infarction (MI) beyond post-natal days 1-7 (P1-P7) by genetic or pharmacological stabilisation of HIF-signalling. This extends the so-called “regenerative window” during an adult-like response to injury, leading to enhanced survived myocardium and functional improvement of the heart, even against a backdrop of persistent, albeit reduced, fibrosis. The epicardium is known to enhance cardiomyocyte proliferation and myocardial growth during heart development via trophic growth factor (for example, IGF-1, FGF, VEGF, TGFβ and BMP) signalling (reviewed in PMID:29592950) and epicardium-derived cell-conditioned medium reduces infarct size and improves heart function (PMID: 21505261). Further experiments, outside of the scope of the current study, are required to determine whether activated neonatal epicardium elicits similar paracrine support to sustain the myocardium and heart function after injury beyond P7 into adulthood.

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts.

      Strengths:

      The study presents convincing genetic and pharmacological approaches to the role of hypoxia signaling in enhancing the regenerative potential of the epicardium.

      Weaknesses:

      The major weakness is the lack of convincing evidence demonstrating the role of hypoxia signaling in EMT modulation in epicardial cells. Additionally, novel experimental approaches should be performed to allow for the translation of these findings to the clinical arena.

      We respectfully disagree that we have not convincingly demonstrated a role for HIF-signalling in promoting epicardial EMT. We adopt epicardial explant assays utilising a well characterised ex vivo protocol previously described for studying EMT in embryonic, neonatal and adult epicardium (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142). These assays demonstrate in WT1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> explants enhanced cobblestone to spindle-like change in cell morphology, increased cell migration, appearance of stress fibres and an up-regulation of the mesenchymal marker alpha-smooth muscle actin (αSMA); all parameters associated with EMT. In addition, our in vivo analyses of Wt1<sup>CreERT2</sup>;Phd2<sup>fl/fl</sup> hearts, in response to neonatal injury, reveal elevated numbers of WT1+ epicardial cells within the sub-epicardial region and underlying myocardium as is associated with active EMT and subsequent migration from the epicardium.

      Reviewer #2 (Public review):

      Summary:

      In this study, Gamen et al. investigated the roles of hypoxia and HIF1a signaling in regulating epicardial function during cardiac development and neonatal heart regeneration. They found that WT1<sup>+</sup> epicardial cells become hypoxic and begin expressing HIF1a from mid-gestation onward. During development, epicardial HIF1a signaling regulates WT1 expression and promotes coronary vasculature formation. In the postnatal heart, genetic and pharmacological upregulation of HIF1a sustained epicardial activation and improved regenerative outcomes.

      Strengths:

      HIF1a signaling was manipulated in an epicardium-specific manner using appropriate genetic tools.

      Weaknesses:

      There appears to be a discrepancy between some of the conclusions and the provided histological data. Additionally, the study does not offer mechanistic insight into the functional recovery observed.

      We respectfully disagree with the comment that our histological data does not support our conclusions and expand on this in the response to specific reviewer comments. We agree that further mechanistic experiments outside of the scope of the current study are required to identify precisely how activated neonatal epicardium results in increased healthy myocardium after injury beyond post-natal day 7 (P7).

      Reviewer #3 (Public review):

      Summary:

      The authors' research here was to understand the role of hypoxia and hypoxia-induced transcription factor Hif-1a in the epicardium. The authors noted that hypoxia was prevalent in the embryonic heart, and this persisted into neonatal stages until postnatal day 7 (P7). Hypoxic regions in the heart were noted in the outer layer of the heart, and expression of Hif-1a coincided with the epicardial gene WT1. It has been documented that at P7, the mouse heart cannot regenerate after myocardial infarction, and the authors speculated that the change in epicardial hypoxic conditions could play a role in regeneration. The authors then used genetic and pharmacological tools to increase the activity of Hif genes in the heart and noted that there was a significant improvement in cardiac function when Hif-1a was active in the epicardium. The authors speculated that the presence of Hif-1a improved cell survival.

      Strengths:

      A focus on hypoxia and its effects on the epicardium in development and after myocardial infarction. This study outlines the potential to extend the regenerative time window in neonatal mammalian hearts.

      We thank the reviewer for this positive endorsement and recognition of the importance of mechanistic insight into how to extend the window of neonatal heart regeneration.

      Weaknesses:

      While the observations of improved cardiac function are clear, the exact mechanism of how increased Hif-1a activity causes these effects is not completely revealed. The authors mention improved myocardium survival, but do not include studies to demonstrate this.

      We report an increase in healthy myocardium arising from prolonged activation of the epicardium during the neonatal window and following injury at post-natal day 7 (P7). We speculate this recapitulates the role of the epicardium during heart development which is known to be a source of trophic growth factors that can enhance myocardial growth. Further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      There is an indication that fibrosis is decreased in hearts where Hif activity is prolonged, but there are no studies to link hypoxia and fibrosis.

      We believe the decreased fibrosis is a natural consequence of the increase in survived myocardium arising from the activated epicardium. There is strong precedent here following injury at post-natal day 1 (P1) in which fibrosis is evident early-on but is resolved over time with growth of the myocardium in the regenerating heart (PMID: 23248315).

      Recommendations for the authors:

      Reviewing Editor Comments:

      (1) Address issues related to image quality, colocalization, sample labeling, appropriate controls, and quantification - particularly in Figures 1, 2, 6, and Supplementary Figure 9. Increase sample size as noted by reviewers.

      The issues of co-localisation and sample labelling have been addressed under response to reviewers. We are unable to increase sample numbers but have clarified the number of regions per section and numbers of sections per heart analysed where appropriate.

      (2) Clarify the effects of epicardial HIF1a activation on neovascularization.

      We have removed reference in the abstract to an effect on neovascularisation.

      (3) Extend assessments of epicardial hypoxia and HIF1a expression to earlier embryonic stages, when epicardial EMT is more active.

      Our earliest timepoint of E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445). In the same study, E11.5 lineage tracing of epicardial cells is restricted to outer layer of the heart; thus, our timepoints are representative in capturing both the onset and progression of in vivo EMT.

      (4) Strengthen EMT assays and mechanistic modeling. Provide evidence from physiologically relevant models, as current 2D culture assays do not adequately support conclusions about EMT. Include additional EMT markers and quantification where appropriate.

      We respectfully disagree that epicardial explants are not a valid assay for assessing EMT. As noted under responses to reviewers, such primary explants have been widely described elsewhere (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142) and enable documentation of multiple parameters that are associated with active EMT, including an assessment of the extent of cell migration, cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We support our findings in explants by revealing reduced WT1+ epicardium-derived cells (EPDCs) in the sub-epicardial region and underlying myocardium of WT1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> embryonic hearts (data in Figure 2) indicative of impaired epicardial EMT and migration of EPDCs and in vivo following neonatal MI with pharmacological inhibition of PHD2, where we observe the reciprocal phenotype of increased numbers of epicardium-derived cells emerging from the outer epicardial layer (data in Figure 6).

      (5) Strengthen mechanistic insights into the role of epicardial cells in the functional recovery observed in MI hearts.

      We agree that further experiments are required, out-of-scope of this study, to define a mechanistic link between HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Reviewer #1 (Recommendations for the authors):

      The manuscript by Gamen et al. analyzed the functional role of HIF signaling in the epicardium, providing evidence that stabilization of the hypoxia signaling pathway might contribute to neonatal heart regeneration. By generating different conditionally mouse mutants and performing pharmacological interventions, the authors demonstrate that stabilizing HIF signaling enhances cardiac regeneration after MI in P7 neonatal hearts. The study is potentially interesting, but it presents several major caveats.

      (1) One of the critical points reported in the early stages of this study is the early co-localization of Wt1, the hypoxic report (HP1), and HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) during embryonic development. Figure 1 is meant to report such findings. However, unfortunately, I hardly see any co-localization at all in the Wt1+ epicardial cells for HP1, with some colocalization is seen for HIF1 and 2 alpha, although none of these data are quantified. Thus, it is hard to believe such co-localization.

      We respectfully disagree with this comment. We highlight cells in Figure 1 that are co-stained for WT1+ and HP1. In addition, we identify HIF1-α and HIF2- α positive cells which either reside within the epicardium, as the outer cell layer, or within the underlying sub-epicardial region, respectfully.

      (2) The authors claimed that they have analyzed the expression of the hypoxic report, as well as Wt1 and the HIF signaling pathways master regulators (i.e., HIF1a and HIF1b) in the AV groove, as compared to the apex, in embryonic heart ranging from E12.5 to E18.5 (Figure 1). Unfortunately, all images provided that are tagged as AV groove are rather misleading. They do not represent the AV groove but part of the right ventricular free wall. If the authors want to refer to the AV groove, AV cushions should be visible underneath.

      We have removed specific reference to the AV groove and refer to the highlighted regions as the “Base” of the heart.

      (3) The authors analyzed the hypoxic condition of the developing heart from E12.5 to E18.5. However, it remains unclear why the authors only explored the hypoxic conditions from E12.5 onwards, since epicardial EMT mainly occurs earlier than this time point, i.e., E10.5 onwards. Therefore, it would be needed to explore it already at this earlier time point.

      We respectfully disagree with the reviewer and refer to the comment above regarding the fact that E12.5 marks the onset of epicardial EMT and E13.5 is the stage with the most significant mobilisation of epicardium-derived cells (EPDCs) into the sub-epicardial region and underlying myocardium (PMID: 32359445).

      (4) The authors reported a conditional mouse model of HIF1alpha deletion by using the Wt1CreERT2 driver. Curiously, Wt1 is dependent on hypoxia signaling (i.e., HIF1a). Therefore, it is unclear whether there is a negative feedback loop between the deletion of Hif1alpha and the activation of the Cre driver might have functional consequences. Convincing evidence should be provided that such crosstalk does not interfere with Hif1alpha inactivation, and therefore, appropriate controls should be run in parallel.

      We discount a negative feedback loop in this instance based on the fact we have utilised heterozygous mice for the WT1<sup>CreERT2/+</sup> line and observe a consistent and reproducible phenotype for the developing hearts on a Wt1<sup>CreERT2/+</sup>;Hif1a<sup>fl/fl</sup> background and following injury in Wt1<sup>CreERT2/+</sup>;Phd2<sup>fl/fl</sup> mice. Collectively this indicates that the WT1-CreERT2 driver is active in the context of diminishing HIF-1α and Phd2, respectively. In addition, have carried out parallel experiments using epicardial explants derived from R26R-CreERT2;Phd2<sup>fl/fl</sup> (Figure 3) to circumvent any potential confounding issues; the results of which are consistent with increased epicardial EMT in support of our overall hypothesis.

      (5) On Figure 2a-f the authors reported that epicardial cells are diminished in Wt1CreERT2Hif1alpha mice as compared to controls. I am very sorry, but I do not see any difference. Furthermore, it is unclear to me how the authors quantified such differences, i.e., what marker signal did they use and how it was performed (Figure 2c and d)?

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining in Figure 2, which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (comparing magenta cells in panels a) versus b). Quantification was carried out for numbers of WT1+ cells residing within the PDPN-positive epicardium (and underlying PDPN-negative myocardium) across multiple images from multiple sections and multiple hearts.

      (6) On Figure 2g, the authors reported differences in total vessel length. Are they referring to impaired microvasculature development? Or is this analysis also including major coronary vessels? What about the major coronary vessels and trees, is there any affection?

      This analysis refers to the microvasculature and not the major coronary arteries or coronary trees.

      (7) The authors reported that there might be some differences in EMT markers, but unfortunately, all of them are analyzed on 2D cultures, where no substrate for EMT is present, i.e., an underlying ECM bed. Thus, the authors cannot claim that EMT is altered. Additional experiments using either collagen substrate and/or Matrigel are required to fully demonstrate that EMT is impaired. Furthermore, quantitative analyses of such differences should be provided.

      The 2D cultures are epicardial explants from mutant versus wild type hearts and represent a widely adopted previously published ex-vivo assay for investigating epicardial EMT across embryonic to adult stages (PMID: 27023710, PMID: 12297106; PMID: 17108969, PMID: 19235142); including an assessment of the extent of migration and cobblestone (epithelial) to spindle-like (mesenchymal) cell morphologies, stress fibre formation and expression of alpha-smooth muscle actin as a mesenchymal marker. We do not understand the comment regarding an “underlying ECM bed” as the cells exhibit EMT routinely on tissue culture plastic and will deposit their own ECM during the culture time course and in response to EMT/cell migration. In terms of quantification this was carried out for scratch assay experiments, as a proxy for EMT and emergent mesenchymal cell migration, as presented in Figure 3i, j with significant enhanced scratch closure and cell migration following Molidustat treatment.

      (8) The description of data provided on Supplementary Figure 5 is spurious and should be removed. A note in the discussion might be sufficient.

      We respectfully disagree. The ChIP-seq data, in what is now Figure 2- figure supplement 3, highlights a HIF-1 α binding site within the Wt1 locus suggesting putative upstream regulation of WT1 by HIF-1α. Thus this provides a potential explanation as to how HIF-1α may activate the epicardium through up-regulation of Wt1/WT1.

      (9) On Figure 3, the authors further illustrate the change of EMT markers using ex vivo cardiac explants. They reported increased expression of Snai2 that, although statistically significant, is most likely of no biological relevance (increase of only 20% at transcript level). What about Snai1, Prrx1, and other EMT promoters? Are they also induced? As previously stated, these 2D cultures do not provide supporting evidence that EMT is occurring, thus 3D gel assays should be performed in which Z-axis analyses will provide evidence on the different migratory behaviour of those cells.

      We respectfully suggest that a 20% change in snai2 expression is biologically meaningful with respect to EMT. This in-turn is supported by associated cell migration, reduced ZO-1 expression, increased stress fibres and increased alpha-SMA as a mesenchymal marker; all properties associated with active EMT. Other suggested markers have not been validated as formally required for EMT, for example Snai1 (PMID: 23097346). The migratory capacity of targeted versus epicardial cells was assessed by combined explant and scratch assay experiments.

      (10) The description of single-cell analyses is very incomplete. Which mice were used for these analyses, wildtype control, or hypoxic mice? Please provide a clearer description of the samples used. Additionally, the entire rationale of these analyses is dubious. Doing single-cell analyses to analyze a couple or three markers in a very small cell population is rather ridiculous. qPCR might be far more appropriate and convincing, or a bulk RNAseq analysis of isolated epicardial cells.

      The single-cell analyses represent an unbiased assessment of different pathways in epicardial cells (identified bioinformatically) between intact P1 and P7 stages in wild type (control) hearts, with a focus on hypoxia-related gene expression and HIF-dependent pathways. It was not designed to analyse a small number of genes, rather global differences in the hypoxic states between P1 and P7 hearts. Selected genes (Vegfa, Pdk3, Egln 1 (Phd2)) were analysed to highlight the key differences in hypoxic signalling across the regenerative window. The fact the hearts were uninjured/intact is clarified in the text and legends for Figure 4 and now Figure 4-figure supplement 1.

      (11) The analyses provided in Figure 5 are very interesting and their findings are very relevant. However, I would think that the complementary experimental approach should also be done, i.e, MI followed by activation with tamoxifen, since that situation would be more realistic in the clinical setting.

      Tamoxifen causes respiratory failure in neonates with MI, so the two cannot be combined at the same time or soon after surgery. Moreover, tamoxifen takes significant time to take effect on targeted gene down-regulation which may negate sufficient activation of the epicardium following injury.

      The experiments in Figure 5 were designed to demonstrate that prolonged heart regeneration could be elicited in a cell-specific (epicardial-specific) manner via a genetic approach. The pharmacological experiments in Figure 6 are complementary in this regard by demonstrating equivalent effects with drug (Molidustat) delivery to reduce PHD2 and stabilise HIF post-MI.

      (12) In Figure 6, expression of Wt1 is highly prominent in P7 controls, mainly restricted to the epicardial lining while in the experimental setting, such Wt1 expression is broadly distributed on the subepicardial space, nicely demonstrating epicardial activation. However, it is very surprising to see such Wt1 expression in controls, something that is not expected, as compared to the data reported in Figure 4g. Could the authors please reconcile these findings?

      Figure 6 represents the injury setting and Figure 4g the intact setting (as clarified above, in the text and revised figure legends). Hence in the latter WT1 expression is significantly reduced in the P7 heart, as anticipated. With injury at P7 we anticipate activation of WT1 in control hearts, albeit restricted to the epicardial layer (as occurs in adult hearts, PMID: 21505261). In contrast, following Molidustat-treatment of P7 hearts post-MI we observe extensive epicardial expansion into the sub-epicardial region and EPDC migration into the underlying myocardium (Figure 6b).

      Reviewer #2 (Recommendations for the authors):

      The role of hypoxia and HIF1a signaling in epicardial activation is an important topic, and the genetic approaches employed in this study are appropriate. However, several aspects of the study remain unclear and would benefit from further clarification or explanation by the authors:

      (1) The authors detected hypoxic regions using an anti-pimonidazole fluorescence-conjugated monoclonal antibody (HP1). The data would become more compelling if negative and positive controls were provided.

      We believe the HP1 staining is compelling in the images shown and is consistent with hypoxic regions of the developing heart. We reveal HP1 staining at cellular resolution with neighbouring cells positive and negative for the HP1 signal in the apex of the heart and within the epicardium and sub-epicardial regions at E12.5 (Figure 1a) and diminished/altered hypoxic/HP1 regional signal through subsequent developmental stages at E14.5-18.5 (Figure 1a-d).

      (2) Many HIF1a-positive cells in the AV groove region do not appear to overlap with HP1 staining (Figure 1a). Providing a low-magnification image of HIF1α expression would be helpful to better assess the extent of overlap with HP1 staining

      HIF-1 is highly unstable and hence detection of HIF-1+ cells will likely only sample of cells compared to HP1 which is a surrogate for broader regions of hypoxia.

      (3) Although the authors conclude that epicardial HIF1a deletion results in a significant reduction of WT1⁺ cells in both the epicardium and myocardium (Figure 2a-d), the provided images are not sufficiently clear to fully support this interpretation. Providing additional evidence to support this conclusion would be helpful.

      We respectfully disagree with the reviewer and draw attention to the single channel panels of WT1+ staining which show clear differences between numbers of epicardial cells in the mutant mice compared to controls (Figure 2a versus 2b; magenta WT1+ staining).

      (4) Similar to the point raised above, the authors' conclusion regarding the increased expression of WT1 following Molidustat treatment does not appear to be fully supported by the provided images (Figure 6b-f). Immunofluorescence staining for WT1 does not clearly demonstrate epicardial expression in the remote zone of either the control or Molidustat-treated hearts. In addition, while an increase of WT1<sup>+</sup> cells is observed in the infarct zone of the Molidustat-treated heart, it is somewhat unexpected that such expansion is not evident in the corresponding region of the control heart, given that epicardial cells typically expand near the infarct area. Clarification on these points would be helpful.

      Figure 6b reveals WT1 expression in controls (upper panel set) that is reactivated proximal to the infarct region, given WT1 is not expressed in adult epicardium but restricted to the epicardial layer (as occurs in injured adult mouse hearts PMID: 21505261). This contrasts with what is observed in the Molidustat-treated P7 hearts post-MI, where we observe epicardial expansion and migration of WT1+ cells into the underlying myocardium (Figure 6b, lower panel set, infarct zone).

      (5) The authors conclude that WT1<sup>+</sup> cells in the myocardial tissue exhibit endothelial identity based on the colocalization of WT1 and EMCN signals (Supplementary Figure 9c). However, this interpretation is difficult to assess, as WT1 is a nuclear marker and EMCN is a membrane protein, which makes precise colocalization challenging to confirm with confidence. Additional supporting evidence may be necessary to substantiate this conclusion.

      WT1 is known to be up regulated in endothelial cells in response to injury as shown previously in several studies (for example, PMID: 25681586). Here we show clear co-localisation of nuclear WT1 and cytoplasmic Endomucin (EMCN) in what is now Figure 6- figure supplement 1c and would encourage the reviewer and readers to magnify the image by zooming-in on the relevant co-stained panel.

      (6) The authors conclude that activation of epicardial HIF1a signaling has no effect on neovascularization in postnatal MI hearts (Figure 5c). However, the abstract states: "Finally, a combination of genetic and pharmacological stabilisation of HIF ... increased vascularisation, augmented infarct resolution and preserved function beyond the 7-day regenerative window" (Lines 38-41). Clarification regarding this apparent discrepancy would be appreciated.

      The abstract has been altered to remove the statement of increased vascularisation.

      (7) The study appears somewhat incomplete, as it lacks mechanistic insight into the functional recovery observed following epicardial Phd2 deletion and Molidustat treatment in postnatal MI hearts. Although the authors suggest a potential paracrine role of the epicardium in protecting cardiomyocytes from apoptosis, this hypothesis has not been experimentally addressed. Incorporating such analysis would help to reinforce the study's conclusions.

      Further experiments are required, which are out-of-scope of this study, to define a mechanistic link between the genetic or pharmacological stabilisation of HIF-signalling, epicardial activation and myocardial survival in the setting of prolonged neonatal heart regeneration.

      Other points:

      (1) Providing single-channel images for Figures 1a-d and 6g would be helpful for clarity and interpretation.

      We believe the combined channel views of co-staining for two markers on a background of DAPI staining to pin-point cell nuclei, are informative and support our conclusions.

      (2) Have the authors considered using AngioTool to quantify the number of vessels in Figure 5b-c?

      AngioToolTM was used to quantify the vessels, as we have used previously (PMID: 33462113) and this is now added to the methods and legend of Figure 2.

      Reviewer #3 (Recommendations for the authors):

      There are several areas where the manuscript can be improved, such that its conclusions can be solidified.

      (1) The authors highlight a point where blocking Phd2 can enhance survival of cardiac tissue, but did not report on survival markers. They surmised that apoptosis could be decreased in Phd2 mutant or Molidustat treatment but did not show this. The authors should determine if apoptosis is decreased in the myocardium and epicardium.

      We show evidence of increased levels of healthy myocardium in the genetic and pharmacological models of stabilised HIF-signalling. We exclude increased cardiac hypertrophy or increased cardiomyocyte proliferation as causative, so suggest as a reasonable alternative enhanced survival, albeit this need not necessarily be via an apoptotic pathway given the incidence of necrotic cell death during MI. We are unable to generate new surgeries and mutant/treated heart samples to analyse for apoptotic markers at this stage.

      (2) There appears to be no difference in cardiomyocyte proliferation in Molidustat-treated animals, but the experiment was only performed on 2 to 3 animals. This is too small a sample size to conclude from these results. The authors should increase the sample size to make this assertion.

      We respectfully disagree that we are unable to conclude no effect on cardiomyocyte proliferation. We analysed multiple heart regions per section, for EdU+/cTnT+ colocalised signals across several sections per heart, set against a consistency of effect on other parameters in hearts treated with Molidustat. We are unable to generate more P7 heart surgeries +/- Molidustat and +/- EdU at this stage.

      (3) It is curious as to how, after myocardial infarction, the fibrotic scar tissue is decreased in the Phd2 deletion but not as profound in Molidustat-treated mice at d21. Can the authors speculate why the difference exists and how this decrease arises? For example, are there decreased pro-inflammatory signals in Phd2 deleted mice? Is there decreased collagen deposition and ECM gene expression? Do macrophage recruitment into the infarct zone differ between mutant/treated vs WT?

      The representative images in Figure 6k reveal a trend towards reduced fibrosis with Molidistat treatment (Figure 6l), but across all hearts analysed this was not as significant as observed in the epicardial-specific deletion injured hearts (Figure 5g, h). This may be due to the relatively short half-life of Molidustat (approximately 4-10 hours, PMID: 32248614), the dosing regimen for the drug and/or the fact that it was not specifically delivered/targeted to the epicardium.

      (4) The magnified images in Figure 1 do not match the boxes in the whole heart images. It is unclear what the white boxes signify.

      The white boxes have been removed from Figure 1. The magnified image panels are from serial heart sections and this is now clarified in the Figure 1 legend.

    1. eLife Assessment

      This fundamental work substantially advances our understanding of how the glycocalyx of cells provide a non-specific barrier for the interaction of viruses with cell-surface receptors. Using both in vitro experiments and in vivo manipulations they provide compelling evidence for the properties of the glycocalyx to serve as an energy barrier as a main attribute of its mode of action. The work will be of broad interest to virologists and the cell biology community that studies host-pathogen interactions.

    2. Joint Public Review:

      This manuscript tests the notion that bulky membrane glycoproteins suppress viral infection through non-specific interactions. Using a suite of biochemical, biophysical, and computational methods in multiple contexts (ex vivo, in vitro, and in silico), the authors collect compelling evidence supporting the notion that (1) a wide range of surface glycoproteins erect an energy barrier for the virus to form stable adhesive interface needed for fusion and uptake and (2) the total amount of glycan, independent of their molecular identity, additively enhanced the suppression.

      As a functional assay the authors focus on viral infection starting from the assumption that a physical boundary modulated by overexpressing a protein-of-interest could prevent viral entry and subsequent infection. Here they find that glycan content (measured using the PNA lectin) of the overexpressed protein and total molecular weight, that includes amino acid weight and the glycan weight, is negatively correlated with viral infection. They continue to demonstrate that it is in effect the total glycan content, using a variety of lectin labelling, that is responsible for reduced infection in cells. Because the authors do not find a loss in virus binding this allows them to hypothesize that the glycan content presents a barrier for the stable membrane-membrane contact between virus and cell. They subsequently set out to determine the effective radius of the proteins at the membrane and demonstrate that on a supported lipid bilayer the glycosylated proteins do not transition from the mushroom to the brush regime at the densities used. Finally, using Super Resolution microscopy they find that above an effective radius of 5 nm proteins are excluded from the virus-cell interface.

      The experimental design does not present major concerns and the results provide insight on a biophysical mechanism according to which, repulsion forces between branched glycan chains of highly glycosylated proteins exert a kinetic energy barrier that limits the formation of a membrane/viral interface required for infection.

      In their revised manuscript and rebuttal, the authors address several general and specific concerns that were raised about their first submission. The revised manuscript now makes the strength of the evidence supporting their claims, compelling.

    3. Author response:

      The following is the authors’ response to the original reviews.

      Public Review

      GENERAL QUESTIONS:

      (1) For many enveloped viruses, the attachment factors - paradoxically - are also surface glycoproteins, often complexed with a distinct fusion protein. The authors note here that the glycoportiens do not inhibit the initial binding, but only limit the stability of the adhesive interface needed for subsequent membrane fusion and viral uptake. How these antagonistic tendencies might play out should be discussed.

      When the surface density of receptor molecules for a virus with glycans increases, the density of free glycans not bound to the virus increases along with the amount of virus adsorbed. However, if the total amount of glycans is considered to be a function of the receptor density, the reaction may become more complicated. This complication may also be affected by the prolonged infection. If the receptor density on the cell surface is high, the infection inhibitory effect of glycans may not be obtained in a system in which a high concentration of virus is supplied from the outside world for a long time. This is because once viruses have entered the cell, they accumulate inside the cell, and viral infection is affected by the total accumulated amount, which is the integration of the number of viruses that have entered over time. This distinction indicates that the virus entry reaction and the total amount of infection in the cell must be considered separately. This is an important point, but it was not clearly mentioned in the original manuscript.

      Our experiments were conducted under conditions that clearly allowed us to detect the virusinhibiting function of glycans without being affected by the above points. In order to clarify these points, we will revise this article as follows, referring to an experiment that is somewhat related to this discussion (the Adenovirus infection experiment into HEK293T cells shown in Figure S1F)..

      (Page-3, Introduction)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (Page 20, Discussion)

      When the virus receptor is a glycoprotein or glycan itself, the inhibition of virus infection by glycans becomes more complex because the total amount of glycans is also a function of the receptor density. It is also important to note that the total amount of infection into a cell is the time integral of virus entry. Even if the probability of virus entry is significantly reduced by glycans, the cumulative number of virus entries may increase if high concentrations of virus continue to be supplied from outside the cell for a long period of time. In the case of Adenovirus, which continues to amplify in HEK293T cells after infection, we showed that MUC1 on the cell surface has an inhibitory effect on long-term cumulative infection (Supplementary Figure 1F). However, such an accumulation effect may be caseby-case depending on the virus cell system, and may be more pronounced when the cell surface density of virus receptor molecules is high. As a result, if the virus receptor molecule is a glycan or glycoprotein and infection continues for a long period of time, the infection inhibition effect may not be observed despite an apparent increase in the total amount of glycans in the cell. In any case, our results clarified the factor of virus entry inhibition dependent on the total amount of glycans because appropriate conditions were set.

      (2) Unlike polymers tethered to solid surface undergoing mushroom-to-brush transition in densitydependent manner, the glycoproteins at the cell surface are of course mobile (presumably in a density-dependent manner). They can thus redistribute in spatial patterns, which serve to minimize the free energy. I suggest the authors explicitly address how these considerations influence the in vitro reconstitution assays seeking to assess the glycosylation-dependent protein packing.

      We performed additional experiments using lipid bilayers that had lost fluidity, and found that there is no significant difference in protein binding between fluid and nonfluid bilayers. The redistribution of molecules due to molecular fluidity may play some roles but not in our experimental systems. It suggests that glycoproteins can generate intermolecular repulsion even in fluid conditions such as cell membranes, just as they do on the solid phase. This experiment was also very useful because it allowed us to compare our results in the fluid bilayer with solid-state measurements of saturation molecular density and the brush transition. This comparison gave us confidence that in the reconstituted membrane system, even at saturation density, the membrane proteins are not as stretched as they are in the condensed brush state. We have therefore added a new paragraph and a new figure (Supplementary Fig. 5B) to discuss this issue, as follows:

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      (3) The discussion of the role of excluded volume in steric repulsion between glycoprotein needs clarification. As presented, it's unclear what the role of "excluded volume" effects is in driving steric repulsion? Do the authors imply depletion forces? Or the volume unavailable due to stochastic configurations of gaussian chains? How does the formalism apply to branched membrane glycoproteins is not immediately obvious.

      Regarding the excluded volume due to steric repulsion between glycoproteins, we considered the volume that cannot be used by glycans as Gaussian chains branching from the main chain. We would like to expand on this point by adding several papers that make similar arguments. I'm glad you brought this up because we hadn't considered depletion forces - the excluded volume between glycoproteins should generate a depletion force, but in this case we believe this force will not have a significant effect on viruses that are larger than the glycoproteins. We also attempted to clarify the discussion in this section by focusing on intermolecular repulsion, and restructured paragraphs, which are also related to General Question 2 and Specific Question 2. The relevant part has been revised as follows. (page 15~page16):

      To compare the packing of proteins with different molecular weights and R<sub>F</sub>, These were smaller than the coverage of molecules at hexagonal close packing that is ~90.7%. In contrast, the coverage of b-CD43 and b-MUC1 at saturated binding was estimated to be greater than 100% under this normalization standard, indicating that the mean projected sizes of these molecules in surface direction were smaller than those expected from their R<sub>F</sub> Thus, it is clear that glycosylation reduces the saturation density of membrane proteins, regardless of molecular size.

      Highly glycosylated proteins resisted densification, indicating that some intermolecular repulsion is occurring. In the framework of polymer brush theory, the intermolecular repulsion of densely packed highly glycosylated proteins is due to an increase in either f<sub>el</sub>, f<sub>int</sub> (d<R<sub>F</sub>), or both (Hansen et al., 2003; Wu et al., 2002). The term of intermolecular interaction, f<sub>int</sub>, is regulated by intermolecular steric repulsion, which occurs when neighboring molecules cannot approach the excluded volume created by the stochastic configuration of the polymer chain (Attili et al., 2012; Faivre et al., 2018; Kreussling and Ullman, 1954; Kuo et al., 2018; Paturej et al., 2016). The magnitude of this steric repulsion depends largely on R<sub>F</sub> in dilute solutions, but the molecular structure may also affect it when molecules are densified on a surface. In other words, the glycans protruding between molecules can cause steric inhibition between neighboring proteins (Figure 5D). Such intermolecular repulsion due to branched side chains occurs only when the molecules are in close proximity and sterically interact on a twodimensional surface, but not in dilute solution, and does not occur in unbranched polymers such as underglycosylated proteins (Figure 5D). Based on the above, we propose the following model for membrane proteins: Only when the membrane proteins are glycosylated does strong steric repulsion occur between neighboring molecules during the densification process, suppressing densification.

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub>, is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (4) The authors showed that glycoprotein expression inversely correlated with viral infection and link viral entry inhibition to steric hindrance caused by the glycoprotein. Alternative explanations would be that the glycoprotein expression (a) reroutes endocytosed viral particles or (b) lowers cellular endocytic rates and via either mechanism reduce viral infection. The authors should provide evidence that these alternatives are not occurring in their system. They could for example experimentally test whether non-specific endocytosis is still operational at similar levels, measured with fluid-phase markers such as 10kDa dextrans.

      The results of the experiment suggested by the reviewer are shown in the new Supplementary Figure 3B. (This results in generation of a new Supplementary Figure 3, and previous Supplementary Figures 4-5 are now renumbered as Supplementary Figures 5-6). Endocytosis of 10KDa dextran was attenuated by the expression of several large-sized molecules, but was not affected by the expression of many other glycoproteins that have the ability to inhibit infection. These results were clearly different from the results in which virus infection was inhibited more by the amount of glycan than by molecular weight. Therefore, it was found that many glycoproteins inhibit virus infection through processes other than endocytosis. Based on the above, we have added the following to the manuscript: (p9 New paragraph:)

      We also investigated the effect of membrane glycoproteins on membrane trafficking, another process involved in viral infection. Expression of MUC1 with higher number of tandem repeats reduced the dextran transport in the fluid phase, while expression of multiple membrane glycoproteins that have infection inhibitory effects, including truncated MUC1 molecules, showed no effect on fluid phase endocytosis, indicating a molecular weight-dependent effect (Supplementary Figure 3B). The molecular weight-dependent inhibition of endocytosis may be due to factors such as steric inhibition of the approach of dextran molecules or a reduction in the transportable volume within the endosome. In any case, it is clear that many low molecular weight glycoproteins inhibit infection by disturbing processes other than endocytosis. Based on the above, we focus on the effect of glycoproteins on the formation of the interface between the virus and the cell membrane.

      (5) The authors approach their system with the goal of generalizing the cell membrane (the cumulative effect of all cell membrane molecules on viral entry), but what about the inverse? How does the nature of the molecule seeking entry affect the interface? For example, a lipid nanoparticle vs a virus with a short virus-cell distance vs a virus with a large virus-cell distance?

      Thank you for your interesting comment. If the molecular size of the ligand is large, it should affect virus adsorption and molecular exclusion from the interface. In lipid nanoparticle applications, controlling this parameter may contribute to efficiency. In addition, a related discussion is the influence of virus shell molecules that are not bound to the receptor. I will revise the text based on the above.

      Discussion (as a new paragraph after the paragraph added in Q1):

      In this study, we attempted to generalize the surface structure on the cell side, but the surface structure on the virus side may also have an effect. The efficiency of virus adsorption and the efficiency of cell membrane protein exclusion from the interface will change depending on the molecular length of the receptor-ligand, although receptor priming also has an effect. In addition, free ligands of the viral envelope or other coexisting glycoproteins may also have an effect as they are also required for exclusion from the virus-cell interface. In fact, there are reports that expression of CD43 and PSGL-1 on the virus surface reduces virus infection efficiency (Murakami et al., 2020). Such interface structure may be one of the factors that determine the infection efficiency that differs depending on the virus strain. More generally, modification of the surface structure may be effective for designing materials such as lipid nanoparticles that construct the interface with cell.

      SPECIFIC QUESTIONS:

      (1) The proposed mechanism indicates that glycosylation status does not produce an effect in the "trapping" of virus, but in later stages of the formation of the virus/membrane interface due to the high energetic costs of displacing highly glycosylated molecules at the vicinity of the virus/membrane interface. It is suggested to present a correlation between the levels of glycans in the Calu-3 cell monolayers and the number of viral particles bound to cell surface at different pulse times. Results may be quantified following the same method as shown in Figure 2 for the correlation between glycosylation levels and viral infection (in this case the resulting output could be number of viral particles bound as a function of glycan content).

      The results of this experiment are now shown as Supplementary Figure 2F and 2G. We compared the amount of virus bound after incubation for 10 minutes or for 3 hours as in the infection experiment, but no negative correlation was found between the total amount of glycans on the surface of the Calu3 monolayer and the amount of virus bound. Interestingly, there was a sight positive correlation was detected, which may be due to concentrated virus receptor expressions in glycan-enriched cells. This result shows that glycoproteins do not strongly inhibit virus binding. We will amend the text as follows (see also Q6).

      (Page 10)

      Glycans could be one of the biochemical substances ……We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no negative correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). The slight positive correlation between bound virus and glycans may be due to higher expression levels of viral receptors in glycan-rich cells. ….

      (2) The use of the purified glycosylated and non-glycosylated ectodomains of MUC1 and CD-43 to establish a relationship between glycosylation and protein density into lipid bilayers on silica beads is an elegant approach. An assessment of the impact of glycosylation in the structural conformation of both proteins, for instance determining the Flory radius of the glycosylated and non-glycosylated ectodomains by the FRET-FLIM approach used in Figure 4 would serve to further support the hypothesis of the article.

      Unfortunately, the proposed experiment did not provide a strong enough FRET signal for analysis. This was due in part to the difficulty in constructing a bead-coated bilayer incorporating PlasMem Bright Red, which established a good FRET pair in cell experiments. We also tried other fluorescent molecules, but were unable to obtain a strong and stable FRET signal. Another reason may be that the curvature of the beads is larger than that of the cells, making it difficult to obtain a sufficient cumulative FRET effect from multiple membrane dyes. We plan to improve the experimental system in the future.

      On the other hand, we also found that in this system, the signal changes were very subtle, making it difficult to detect molecular conformational changes using FRET. After reconsidering general questions (2) and (3), we speculated that the molecular density in the experiment, even at saturation binding, was below or at most equivalent to the brush transition point. In other words, proteins on the bead-coated bilayer may not be significantly extended in the vertical direction. Therefore, the conformational changes of these proteins may not be large enough to be detected by the FRET assay. We updated Figure 3C and Figure 5D (model description) to better reflect the above discussion and introduced the following discussion in the manuscript.

      (page11)

      We introduced the framework of conventional polymer brush theory to study the structure of viruscell interfaces containing proteins……. Numerous experimental measurements of the formation of polymer brushes have also been reported (Overney et al., 1996; Wu et al., 2002; Zhao and Brittain, 2000). In these measurements, the transition to a brush typically occurs at a density higher than that required to pack a surface with hemispherical polymers of diameter R<sub>F</sub>. This is the point at which the energy loss due to repulsive forces between adjacent molecules (f<sub>int</sub>) exceeds the energy required to stretch the polymer perpendicularly into a brush (f<sub>el</sub>).

      (page16)

      The molecular structural state of these proteins needs to be further discussed to estimate the contribution of f<sub>el</sub>, which represents resistance to molecular elongation. Our results suggest that these densely packed nonglycosylated molecules are no longer in a free mushroom state. However, their saturation density was several times lower than previously reported brush transition densities, such as 65000 µm<sup>-2</sup> for 17 kDa polyacrylamide (R<sub>F</sub> ~ 15 nm) on a solid surface (Wu et al., 2002). To compare our data on fluid bilayers with previously reported data on solid surfaces, we performed additional experiments with lipid bilayers that lost fluidity. No significant changes in protein binding between fluid and nonfluid bilayers were observed for both b-MUC1 and g-MUC1 molecules (Supplementary Figure 5B). This result suggests that membrane fluidity does not affect the average intermolecular distance or other relevant parameters that control molecular binding in the reconstituted system. Based on these, we speculate that the saturated protein density observed in our experiments is lower than or at most comparable to the actual brush transition density. Thus, although these crowded proteins may be restricted from free random motion, they are not significantly extended as in the condensed brush state, in which the contribution of resistance to molecular extension f<sub>el</sub> is expected to be small relative to the overall free energy of the system.

      Note that this does not mean that glycoproteins cannot form condensed brush structures: in fact, highly glycosylated molecules (e.g., MUC1) can form brush structures in cells when such proteins are expressed at very high densities. (Shurer et al., 2019). In these cells, ………. Such membrane deformation results in the increase of total surface area to reduce the density of glycoproteins, indicating that there is strong intermolecular repulsion between glycoproteins. In any case, the free energy of the system is determined by the balance between protein binding and insertion into the membrane, protein deformation, and repulsive forces between proteins, which determine the density of proteins depending on the configuration of the system. Thus, although strong intermolecular repulsions were prominently observed in our simplified system, this may not be the case in other systems. ……

      (3) The MUC1 glycoprotein is reported to have a dramatic effect in reducing viral infection shown in Fig 1F. On the contrary, in a different experiment shown in Fig2D and Fig2H MUC1 has almost no effect in reducing viral infection. It is not clear how these two findings can be compatible.

      The immunostaining results show that the density of MUC1 molecules is very low in the experimental system in Figure 2 (Figure 2C), which is supported by the SC-RNASeq data (as shown in Supplementary Figure 2A, MUC1 is not listed as a top molecule). In other words, the MUC1 expression level in this experiment is too low to affect virus infection inhibition. On the other hand, the Pearson correlation function represents the strength of the linear relationship between two variables, so it is not the most appropriate indicator for seeing the correlation with the MUC1 expression level, which has little change (Figure 2D, 2F). In fact, even TOS analysis, which can see the correlation by focusing on the cells with the highest expression level, cannot detect the correlation (Figure 2H).Therefore, the MUC1 data in Figure 2DFH will be annotated and corrected in the figure legend.

      Figure2 Legend:

      MUC1 has a small mean expression level and variance, and is more affected by measurement noise than other molecules when calculating the Pearson correlation function (Figure 2C-2F). In addition, the number of cells in which expression can be detected is small, so no significant correlation was detected by TOS analysis (Figure 2H).

      (4) Why is there a shift in the use of the glycan marker? How does this affect the conclusions? For the infection correlation relating protein expression with glycan content the PNA-lectin was used together with flow cytometry. For imaging the infection and correlating with glycan content the SSA-lectin is used.

      For each cell line, we selected the lectin that could be measured over the widest dynamic range. This lectin is thought to recognize the predominant glycan species in the cell line (Fig. S1C, Fig. 2D). In our model, we believe that viral infection inhibition is not specific to the type of sugar, but is highly dependent on the total amount of glycans. If this hypothesis is correct, the reason we used different lectins in each experiment is simply to select the lectin that recognizes the most predominant glycan species that is most convenient for predicting the total amount of glycans in cells. This hypothesis is consistent with our observations, where the total amount of glycans estimated by different lectins could explain the infection inhibition in a similar way in the experiments in Figures 1 and 2, and the TOS analysis in Figure 2 showed that minor glycans also have an infection inhibitory effect. On the other hand, it is of course possible to predict the total amount of glycans more accurately by obtaining as much information on glycans as possible (related to Q5). Based on the above discussion, the manuscript will be revised as follows.

      Page5

      Using HEK293T cell lines exogenously expressing genes of these proteins tagged with fluorescent markers, their glycosylation was measured by binding of a lectin from Arachis hypogaea (PNA), and the number of these proteins in the cells was measured simultaneously. PNA was used for the measurement because it has a wider dynamic range than other lectins (Supplementary Figure 1C). This suggests that GalNAc recognized by PNA is predominantly present on glycans of HEK293T cells, especially on the termini of glycans that are amenable to lectin binding, compared to other saccharides.. …

      page9  

      Our findings suggest that membrane glycoproteins nonspecifically inhibit viral infection, and we hypothesize that their inhibitory function is also nonspecific depending on the type of glycan. Our hypothesis is consistent with the observations in the TOS analysis. Although minor saccharide species in the system (such as GlcNAc and GalNAc recognized by DSA, WGA, or PNA) showed anticolocalization with infection, their scores were much lower than those of major saccharide species. This suggests that all major and minor saccharide species have an infection inhibitory effect, but cells enriched with minor type glycans are only partially present in the system, and the contribution of these cells to virus inhibition is also partial. It is also consistent with the observation that the amount of GalNAc recognized by PNA determines the virus infection inhibition in HEK 293T cells (Figure 1). Therefore, we believe that our assay using a single type of predominantly expressed lectin is still useful for estimating the total glycan content. Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of saccharide species would allow for more accurate estimation. It should be noted that the amount of bound lectin does not necessarily measure the overall glycan composition but likely reflects the sugar population at the free end of the glycan chain to which the lectin binds most.

      (5) The authors in several instances comment on the relevance and importance of the total glycan content. Nevertheless, these conclusions are often drawn when using only one glycan-binding lectin. In fact, the anti-correlation with viral infection is distinct for the various lectins (Fig 2D and Fig 2H). Would it make more sense to use a combination of lectins to get a full glycan spectrum?

      As stated in the answer to Q4, we believe that we were able to detect the infection-suppressing effect of the total glycan amount by using the measurement value of the major component glycan as an approximation. However, as you pointed out, if we could accurately measure the minor glycan components and add up their values, we believe that we could measure the total glycan amount more accurately. In order to measure multiple glycans simultaneously and with high accuracy, some kind of biochemical calibration may be necessary to compare the measurements of lectin-glycan pairs with different binding constants. We believe that these are very useful techniques, and would like to consider them as a future challenge. The corrections listed in Q4 are shown below.

      (Page 9)

      Nevertheless, the virus infection rate may show a better correlation with a more accurately estimated total glycan in each cell. For example, the use of multiple lectins with appropriate calibration to integrate multiple signals to simultaneously detect a wider range of glycans would allow for more accurate estimation. …….

      (6) Fig 3A shows virus binding to HEK cells upon MUC1 expression. Please provide the surface expression of the MUC1 so that the data can be compared to Fig 1F. Nevertheless, it is not clear why the authors used MUC expression as a parameter to assess virus binding. Alternatively, more conclusive data supporting the hypothesis would be the absence of a correlation between total glycan content and virus binding capacity.

      The relationship between the expression level of MUC1 in each cell and the amount of virus binding is shown in Supplementary Figure 3A. There is no correlation between the two. In HEK293T cells, many glycans are modified with MUC1, so MUC1 was used as the indicator for analysis (Supplementary Figure 1C). As you pointed out, it is better to use the amount of glycan as an indicator, so we analyzed the relationship between the amount of bound virus and the amount of glycan on the surface on the Calu-3 monolayer (Supplementary Figure 2F, 2G, introduced in the answer to Specific (Q1)). In any case, no correlation was found between virus binding and surface glycans. I will correct the manuscript as follows.

      (page 9)

      Glycans could be one of the biochemical substances that link the intracellular molecular composition and macroscopic steric forces at the cell surface. To clarify this connection, we further investigated the mechanism by which membrane glycoproteins inhibit viral infection. First, we measured viral binding to cells to determine which step of infection is inhibited. We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). Similarly, on the two-dimensional culture surface of Calu-3 cells, no correlation was observed between the number of viruses bound and the total amount of glycans on the cell surface (Supplementary Figure 2F-G). These results indicate that glycoproteins do not inhibit virus binding to cells, but rather inhibit the steps required for subsequent virus internalization.

      (7) While the use of the Flory model could provide a simplification for a (disordered) flexible structure such as MUC1, where the number of amino acids equals N in the Flory model, this generalisation will not hold for all the proteins. Because folding will dramatically change the effective polypeptide chain-length and reduce available positioning of the amino acids, something the authors clearly measured (Fig 4G), this generalisation is not correct. In fact, the generalisation does not seem to be required because the authors provide an estimation for the effective Flory radius using their FRET approach

      Current theories generalizing the Flory model to proteins are incomplete, and it is certainly not possible to accurately estimate the size of individual molecules undergoing different folding. However, we found such a generalized model to be useful in understanding the overall properties of membrane proteins. In our experiments, we were indeed able to obtain the R<sub>F</sub>s of some individual molecules by FRET measurements. However, this modeling made it possible to estimate the distribution range of the RFs, including for larger proteins that cannot be measured by FRET. For example, from our results, we can estimate that the upper limit of the RFs of the longest membrane proteins is about 10.5 nm, assuming that the proteins follow the Flory model in all respects except for the shortening of the effective length due to folding. These analyses are useful for physical modeling of nonspecific phenomena, as in our case.

      In order to discuss the balance between such theoretical validity and the convenience of practical handling, we revise the manuscript as follows.

      (page 13) 

      This shift in ν indicates that glycosylation increases the size of the protein at equilibrium, but the change in R<sub>F</sub> is slight, e.g., a 1.3-fold increase for one of the longest ectodomains with N = 4000 when these values of ν are applied. This calculation also gives a rough estimate of the upper limit of the R<sub>F</sub> of the extracellular domains of all membrane proteins in the human genome (approximately 10.5 nm). Physically, this change in ν by glycosylation may be caused by the increased intramolecular exclusion induced sterically between glycan chains. This estimated ν are much smaller than that of 0.6 for polymers in good solvents, possibly due to protein folding or anchoring effects on the membrane. In fact, the ν of an intrinsically disordered protein in solution has been reported to be close to 0.6 (Riback et al., 2019; Tesei et al., 2024). Overall, these analyses using the Flory model provide information on the size distribution of membrane proteins and the influence of glycans, although the model cannot predict the exact size of each protein due to its specific folding.

      MINOR COMMENTS/EDITS:

      (1) In Figures 2A and 2C, as well as Supplemental Figure 2C, the fluorescent images indicate that GFP expression differs among the various groups. Ideally, these should be at the same GFP expression level, as the glycan and antibody staining occurred post-viral infection. For instance, ACE2 is a well-known positive control and should enhance SARS-CoV-2 infection. Yet, based on the findings presented in Supplemental Figure 2C, ACE2 appears to correlate with the lowest infection rate. The relationship between the infection rate and key glycoproteins needs clearer quantification.

      We measured the virus inhibition effect specific to each molecule using a cell line expressing low levels of viral receptors and glycoproteins (Fig. 1). On the other hand, the system in Fig. 2 contains diverse viral receptors and glycoproteins and has not been genetically manipulated. (We apologize that there was a typo in our description of experiment, which will be corrected, as shown below). The variation in infection rate between samples was caused by multiple factors but was not related to the molecule for which the correlation was measured. The receptor-based normalization used in the experiment in Fig. 1 cannot be applied in this system in Fig.2 due to the complexity of the gene expression profile. Therefore, instead of such parameter-based normalization, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      To test this hypothesis, we infected a monolayer of epithelial cells endogenously expressing highly heterogeneous populations of glycoproteins with SARS-CoV-2-PP, and measured viral infection from cell to cell visually by microscope imaging. …

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (2) For clarity, the authors should consider separating introductory and interpretive remarks from the presentation of results. These seem to get mixed up. The introduction section could be expanded to include more details about glycoproteins, their relevance to viral infection, and explanations of N- and O-glycosylation.

      Following the suggestion, (1) we added an explanation of the relationship between glycoproteins and viral infection, and N-glycosylation and O-glycosylation to the Introduction section, and (2) moved the introductory parts in the Results section to the Introduction section, as follows.

      (1; page3)

      While there are known examples of glycans that function as viral receptors (Thompson et al., 2019), these results demonstrate that a variety of glycoproteins negatively regulate viral infection in a wide range of systems. These glycoprotein groups have no common amino acid sequences or domains. The glycans modified by these proteins include both the N-type, which binds to asparagine, and the O-type, which binds to serine and threonine. Furthermore, there have been no reports of infection-suppressing effects according to the specific monosaccharide type in the glycan. All of these results suggest that bulky membrane glycoproteins nonspecifically inhibit viral infection.

      (2 : Page 4-5)

      To confirm that glycans are a general chemical factor of steric repulsion, an extensive list of glycoproteins on the cell membrane surface would be useful. The wider the range of proteins to be measured, the better. Therefore, we collect information on glycoproteins on the genome and compile them into a list that is easy to use for various purposes. Then, by analyzing sample molecules selected from this list, it may be possible to infer the effect of the entire glycoprotein population on the steric inhibition of virus infection, despite the complexity and diversity of the Glycome (Dworkin et al., 2022; Huang et al., 2021; Moremen et al., 2012; Rademacher et al., 1988). Elucidation of the mechanism of how glycans regulate steric repulsion will also be useful to quantitatively discuss the relationship between steric repulsion and intracellular molecular composition. For this purpose, we apply the theories of polymer physics and interface chemistry.

      Results

      List of membrane glycoproteins in human genome and their inhibitory effect on virus infection

      To test the hypothesis that glycans contribute to steric repulsion at the cell surface, we first generate a list of glycoproteins in the human genome and then measure the glycan content and inhibitory effect on viral infection of test proteins selected from the list (Figure 1A). To compile the list of glycoproteins, we ….

      (3) In the sentence, "glycoproteins expressed lower than CD44 or other membrane proteins including ERBB2 did not exhibit any such correlation, although ERBB2 expressed ~4 folds higher amount than CD44 and shared ~7% among all membrane proteins," it is unclear which protein has a higher expression level: CD44 or ERBB2? Furthermore, the use of the word "although" needs clarification.

      Corrected as follows:

      (page 8)

      ……showed a weak inverse correlation with viral infection; even such a weak correlation was not observed with other proteins, including ERBB2, which is approximately four-fold more highly expressed than CD44

      (4) In Supplementary Figure 5, please provide an explanation of the data in the figure legend, particularly what the green and red signals represent.

      Corrected as follows:

      STORM images of all analyzed cells, expressing designated proteins. The detected spots of SNAPsurface Alexa 647 bound to each membrane protein are shown in red, and the spots of CF568conjugated anti-mouse IgG secondary antibody that recognizes Spike on SARS-CoV2-PP are shown in green. For cells, a pair of two-color composite images and a CF658-only image are shown. Numbers on axes are coordinates in nanometer.

      (5) It would be good to see a comprehensive demonstration of the exact method for estimation of membrane protein density (in the SI), since this is an integral part of many of the analyses in this paper. The method is detailed in the Methods section in text and is generally acceptable, but this methodology can vary quite widely and would be more convincing with calibration data provided.

      We added flow cytometry and fluorometer data for calibration (Supplementary Figure 1L,M) and introduced a sentence explaining the procedure for obtaining the values used for calibration as follows:

      (page 54)

      …….Liposome standards containing fluorescent molecules (0.01– 0.75 mol% perylene (Sigma), 0.1– 1.25 mol% Bodipy FL (Thermo), and 0.005– 0.1% DiD) as well as DOPC (Avanti polar lipids) were measured in flow cytometry (Supplmentary Figure 1L). Meanwhile, by fluorimeter, fluorescence signals of these liposomes and known concentrations of recombinant mTagBFP2, AcGFP and TagRFP-657 proteins and SNAP-Surface 488 and Alexa 647 dyes (New England Biolabs) were measured in the same excitation and emission ranges as in flow cytometry assays (Supplementary Figure 1M). Ratios between the integral of fluorescent intensities in this range between two dyes of interest are used for converting the signals measured in flow cytometry. Additional information needed for calibration is the size difference between liposomes and cells. The average diameter of liposomes is measured to be 130 nm, and the diameter of HEK 293T cells is estimated to be 13 µm (Furlan et al., 2014; Kaizuka et al., 2021b; Ushiyama et al., 2015). From these data, the signal from cells acquired by flow cytometry can be calibrated to molecular surface density. For example, the Alexa 647 signal acquired by flow cytometry can be converted to the signal of the same concentration of DID dye using fluorometer data, but the density of the dye is unknown at this point. This converted DID signal can then be calibrated to the density on liposomes rather than cells using liposome flow cytometry data. Finally, adjusted for the size difference between liposomes and cells, the surface molecular density on cells is determined. By going through one cycle of these procedures, we could obtain calibration unit, such as 1 flow cytometry signal for a cell in the designated illumination and detection setting = 0.0272 mTagBFP2 µm<sup>-2</sup> on cell surface.

      (Figure legend, Supporting Figure 1: )

      … L. Flow cytometry measurements for liposomes containing serially diluted dye-conjugated lipids and fluorescent membrane incorporating molecules (Bodipy-FL, peryelene, and DID) with indicated mol%. Linear fitting shown was used for calibration.  M. Fluorescence emission spectrum for equimolar molecules (50µM for green and far-red channels, and 100µM for blue channel), excited at 405 nm, 488 nm, and 638 nm, respectively. Membrane dyes were measured as incorporated in liposomes. Purified recombinant mTagBFP2 was used.

      (6) Fig 2A: The figure legend should describe the microscopy method for a quick and easy reference.

      Corrected as follows:

      (Figure legend, Figure 2)

      A. Maximum projection of Z-stack images at 1 µm intervals taken with a confocal microscope. SARSCoV2-pp-infected, air-liquid interface (ALI)-cultured Calu-3 cell monolayers were chemically fixed and imaged by binding of Alexa Fluor 647-labeled Neu5AC-specific lectin from Sambucus sieboldiana (SSA) and GFP expression from the infecting virus.

      (7) Fig 2B: what is the color bar supposed to represent? Is it the pixel density per a particular value? Units and additional description are required. In addition, these are "arbitrary units" of fluorescence, but you should tell us if they've been normalized and, if so, how. They must have been normalized, since the values are between 0 and 1, but then why does the scale bar for SSA only go to 0.5?

      The color bar shows the number of pixels for each dot, resulting in the scale for density scatter plot. The scale on the X-axis was incorrect. All these issues have been fixed in this revision, in the figure and in the legend as follows.

      (Figure legend, Figure 2)

      B. Density scatter plot of normalized fluorescence intensities in all pixels in Figure 2A in both GFP and SSA channels. Color indicates the pixel density.  

      (8) Fig 3D has a typo: this should most likely be "grafted polymer."

      (9) Fig 3E has a suspected typo: in the text, the author uses the word "exclusion" instead of "extrusion." The former makes more sense in this context.

      (10) Fig 5A has a typo: "Suppoorted" instead of Supported Lipid Bilayer.

      (11) Fig 7E-F has a suspected typo: Again, this should most likely be the word "exclusion" instead of "extrusion."

      Thank you so much for pointing out these mistakes, I have corrected them all as suggested.

      (12) Which other molecules are referred to, on page 6 (middle), that do not have an inhibitory effect? Please specify.

      We specified the molecules that have inhibitory effects, and revised as follows: 

      These proteins include those previously reported (MUC1, CD43) as well as those not yet reported (CD44, SDC1, CD164, F174B, CD24, PODXL) (Delaveris et al., 2020; Murakami et al., 2020). In contrast, other molecules (VCAM-1, EPHB1, TMEM123, etc.) showed little inhibitory effect on infection within the density range we used.

      (13) Fig 2 B: the color LUT is not labelled nor explained.

      Corrected as described in (7)

      (14) Please provide the scale bars for figures Fig 2A, C, E and Suppl Fig 2C, D.

      Corrected. 

      (15) Please provide the name for the example of a 200 aa protein that is meant to inhibit viral infection but is not bigger than ACE2. Also providing the densities in Fig 3A would help to correlate the data to Fig 1F.

      Corrected as follows: 

      (page 10)

      We found that a large number of SARS-CoV2-PP can still bind to cells even when cells expressed sufficient amounts of the glycoprotein (mean density ~50 µm<sup>-2</sup>) that could account for the majority of glycans within these cells and inhibit viral infection (Figure 3A). …..

      In our measurements, a protein with extracellular domain of ~200 amino acids (e.g. CD164 (138aa)) at a density of ~100 μm-2 showed significant inhibition in viral infection. This molecule is shorter than the receptor ACE2 (722 aa),

      (16) In the experiments conducted in HeK cells expressing the different glycoproteins studies it is mentioned that results of infection were normalised by the amount ACE2 expression. Is the expression of receptor homogenous in the experiments conducted in Figure 2? Clarify in the methods if the expression of receptor has been quantified and somehow used to correct the intensity values of GFP used to determine infection.

      As also explained for Q1, the system in Fig. 2 contains diverse viral receptors and glycoproteins, and the receptor-based normalization used in the experiment in Fig. 1 cannot be applied. Instead, we applied Pearson correlation and TOS analysis. In the calculation of Pearson correlation, intensities are normalized. TOS analysis allows the analysis of colocalization between the groups with the highest fluorescence intensity. Therefore, in both cases of variation in overall infection rate and variation in the distribution of infected populations, samples with large variations can be reasonably compared by Pearson correlation and TOS analysis, respectively. We extend the discussion on statistics and revise the manuscript as follows.

      (page 8-9)

      Pearson correlation is effective for comparing samples with varying scales of data because it normalizes the data values by the mean and variance. However, as observed in our experiments, this may not be the case when the distribution of data within a sample varies between samples. In addition, as has already been reported, the distribution of infected cells often deviates significantly from the normal distribution of data that is the premise of Pearson correlation (Russell et al., 2018) (Figure 2B). To further analyze data in such nonlinear situations, we applied the threshold overlap score (TOS) analysis (Figure 2G-H, Supplementary Figure 2E). This is one statistical method for analyzing nonlinear correlations, and is specialized for colocalization analysis in dual color images (Sheng et al., 2016). TOS analysis involves segmentation of the data based on signal intensity, as in other nonlinear statistics (Reshef et al., 2011). The computed TOS matrix indicates whether the number of objects classified in each region is higher or lower than expected for uniformly distributed data, which reflects co-localization or anti-localization in dual-color imaging data. For example, calculated TOS matrices show strong anti-localization for infection and glycosylation when both signals are high (Figure 2GH). This confirms that high infection is very unlikely to occur in cells that express high levels of glycans. The TOS analysis also yielded better anti-localization scores for some of the individual membrane proteins, especially those that are heterogeneously distributed across cells (Figure 2H). This suggests that TOS analysis can highlight the inhibitory function of molecules that are sparsely expressed among cells, reaffirming that high expression of a single type of glycoprotein can create an infection-protective surface in a single cell and that such infection inhibition is not protein-specific. In contrast, for more uniformly distributed proteins such as the viral receptor ACE2, TOS analysis and Pearson correlation showed similar trends, although the two are mathematically different (Figure 2D, 2H). Because glycoprotein expression levels and virus-derived GFP levels were treated symmetrically in these statistical calculations, the same logic can be applied when considering the heterogeneity of infection levels among cells. Therefore, it is expected that TOS analysis can reasonably compare samples with different virus infection level distributions by focusing on cells with high infection levels in all samples.

      (17) Can you provide additional details about the method of thresholding to eliminate "background" localisations in STORM?

      Method section was corrected as follows: 

      (page 59)

      …Viral protein spots not close to cell membranes were eliminated by thresholding with nearby spot density for cell protein. Specifically, the entire image was pixelated with a 0.5µm square box and all viral protein signals within the box that had no membrane protein signals were removed. Also, viral protein spots only sparsely located were eliminated by thresholding with nearby spot density for viral protein. This thresholding process removed any detected viral protein spot that did not have more than 100 other viral protein spots within 1µm.

      (18) The article says "It was shown that the number of bound lectins correlated with the amount of glycans, not with number of proteins (Figure 1E)". Figure 1E correlates experimental PNA/mol with predicted glycosylation sites, not with the number of expressed proteins. Correct sentence with the right Figure reference.

      As you pointed out, the meaning of this sentence was not clear. We have amended it as follows to clarify our intention:

      (page 8)

      Since a wide range of glycoproteins inhibit viral infection, it is possible that all types of glycoproteins have an additive effect for this function. ……. In this cell line, this inverse correlation was most pronounced when quantifying N-acetylneuraminic acid (Neu5AC, recognized by lectins SSA and MAL) compared to the various types of glycans, while some other glycans also showed weak correlations (Supplementary Figure 2C). These results showed that the amount of virus infection in cell anticorrelated with the amount of total glycans on the cell surface. As amount of glycans is determined by the total population of glycocalyx, infection inhibitory effect can be additive by glycoprotein populations as we hypothesized.

      If the inhibitory effect is nonspecific and additive, the contribution of each protein is likely to be less significant. To confirm this, we also measured the correlation between the density of each glycoprotein and viral infection. CD44, which was shown to…….. Our results demonstrate that total glycan content is a superior indicator than individual glycoprotein expression for assessing infection inhibition effect generated by cell membrane glycocalyx. These results are consistent with our hypothesis regarding the additive nature of the nonspecific inhibitory effects of each glycoprotein.

    1. eLife Assessment

      Endothelial cell-specific loss of TGF-beta signaling in mice leads to CNS vascular defects, specifically impairing retinal development and promoting immune cell infiltration. The data are solid, showing that loss of TGF-beta signaling triggers vascular inflammation and attracts immune cells specific to CNS vasculature. These findings are important, highlighting TGF-beta's role in maintaining vascular-immune homeostasis and its therapeutic potential in neurovascular inflammatory diseases.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript analyses the effects of deleting the TgfbR1 and TgfbR2 receptors from endothelial cells at postnatal stages on vascular development and blood-retina barrier maturation in the retina. The authors find that deletion of these receptors affects vascular development in the retina but importantly it affects the infiltration of immune cells across the vessels in the retina. The findings demonstrate that Tgf-beta signaling through TgfbR1/R2 heterodimers regulates primarily the immune phenotypes of endothelial cells in addition to regulating vascular development, but has minor effects on the BRB maturation. The data provided by the authors provides a solid support for their conclusions.

      Strengths:

      (1) The manuscript uses a variety of elegant genetic studies in mice to analyze the role of TgfbR1 and TgfbR2 receptors in endothelial cells at postnatal stages of vascular development and blood-retina barrier maturation in the retina.

      (2) The authors provide a nice comparison of the vascular phenotypes in endothelial-specific knockout of TgfbR1 and TgfbR2 in the retina (and to a lesser degree in the brain) with those from Npd KO mice (loss of Ndp/Fzd4 signaling) or loss of VEGF-A signaling to dissect the specific roles of Tgf-beta signaling for vascular development in the retina.

      (3) The snRNAseq data of vessel segments from the brains of WT versus TgfbR1 -iECKO mice provides a nice analysis of pathways and transcripts that are regulated by Tgf-beta signaling in endothelial cells.

      Weaknesses (Original Submission):

      (1) The authors claim that choroidal neovascular tuft phenotypes are similar in TgfbrR1 KO and TgfbrR2 KO mice. However, the phenotypes look more severe in the TgfbrR1 KO rather than TgfbrR2 KO mice. Can the authors show a quantitative comparison of the number of choroidal neovascular tufts per whole eye cross-section in both genotypes?

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation, the authors claim that there is increased vascular leakage in the TgfbR1 KO mice. However, there does not seem like Sulfo-NHS-biotin is leaking outside the vessels. Therefore, it cannot be increased vascular permeability. Can the authors provide a detailed quantification of the leakage phenotype?

      (3) The immune cell phenotyping by snRNAseq seems premature as the number of cells is very small. The authors should sort for CD45+ cells and perform single cell RNA sequencing.

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include some tracers in addition to serum IgG leakage.

      (5) A previous study (Zarkada et al., 2021, Developmental Cell) showed that EC-deletion of Alk5 affects the D tip cells. The phenotypes of those mice look very similar to those shown for TgfbrR1 KO mice. Are D tip cells lost in these mutants by snRNAseq?

      Comments on revisions:

      The authors have addressed the major weaknesses that I raised with the original submission adequately in the revised manuscript.

    3. Reviewer #2 (Public review):

      Summary:

      The authors meticulously characterized EC-specific Tgfbr1, Tgfbr2, or double knockout in the retina, demonstrating through convincing immunostaining data that loss of TGF-β signaling disrupts retinal angiogenesis and choroidal neovascularization. Compared to other genetic models (Fzd4 KO, Ndp KO, VEGF KO), the Tgfbr1/2 KO retina exhibits the most severe immune cell infiltration. The authors proposed that TGF-β signaling loss triggers vascular inflammation, attracting immune cells - a phenotype specific to CNS vasculature, as non-CNS organs remain unaffected.

      Strengths:

      The immunostaining results presented are clear and robust. The authors performed well-controlled analyses against relevant mouse models. snRNA-seq corroborates immune cell leakage in the retina and vascular inflammation in the brain.

      Comments on revisions:

      The authors have revised the manuscript and addressed all my questions.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Weaknesses: 

      (1) The authors claim that choroidal neovascular tuft phenotypes are similar in TgfbrR1 KO and TgfbrR2 KO mice. However, the phenotypes look more severe in the TgfbrR1 KO rather than TgfbrR2 KO mice. Can the authors show a quantitative comparison of the number of choroidal neovascular tufts per whole eye cross-section in both genotypes? 

      Thank you for asking about this.  Each VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retina exhibits multiple zones of choroidal neovascularization.  The examples in Figures 1 and Figure 1 – Figure supplements 1 and 2 are mostly from retinas with loss of TGFBR1, but we could have chosen similar examples from retinas with loss of TGFBR2.  The quantification in the original version of Figure 1- Figure supplement 1 panel C had a labeling error.  It actually showed the quantification choroidal neovascularization (CNV) in the sum of both VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, not only in VE-cad-CreER;TGFBR1 CKO/- retinas as originally labeled.  The point that it made is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now updated that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  We think it likely that a more extensive sampling would show little or no difference between these two genotypes – but the data is what it is. This is now described in the Results section. 

      We have also added a panel D to Figure 1- Figure supplement 1, which shows a retina flatmount analysis of CNV.  This is done by mounting the retina with the photoreceptor side up so that the outer retina can be optimally imaged. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors claim that there is increased vascular leakage in the TgfbR1 KO mice. However, it does not seem like Sulfo-NHS-biotin is leaking outside the vessels. Therefore, it cannot be increased vascular permeability. Can the authors provide a detailed quantification of the leakage phenotype? 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing. 

      Thank you for raising this point.  For the revised manuscript, we have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociate the tissue and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present after tissue homogenization.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusions as the original analysis: the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      As described in our response to query 2, we have conducted additional experiments to look at vascular leakage in control, VE-cad-CreER;TGFBR1 CKO/-, and NdpKO retinas.  We have also looked at Sulfo-NHS-biotin leakage in the VE-cadCreER;TGFBR1 CKO/- brain, and it is indistinguishable from WT controls.  Since Sulfo-NHS-biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) A previous study (Zarkada et al., 2021, Developmental Cell) showed that EC-deletion of Alk5 affects the D tip cells. The phenotypes of those mice look very similar to those shown for TgfbrR1 KO mice. Are D-tip cells lost in these mutants by snRNAseq? 

      Please note: Alk5 is another name for TGFBR1.  This is noted in the second sentence of paragraph 4 of the Introduction.  The reviewer is correct: there are a lot of similarities because these are exactly the same KO mice.  Also, Zarkada and we used the same VEcadCreER to recombine the CKO allele.  The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published in Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251.  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.  Finally, we have no reason to doubt the results of Zarkada et al.

      Reviewer #2 (Public review): 

      Summary:

      The authors meticulously characterized EC-specific Tgfbr1, Tgfbr2, or double knockout in the retina, demonstrating through convincing immunostaining data that loss of TGF-β signaling disrupts retinal angiogenesis and choroidal neovascularization. Compared to other genetic models (Fzd4 KO, Ndp KO, VEGF KO), the Tgfbr1/2 KO retina exhibits the most severe immune cell infiltration. The authors proposed that TGF-β signaling loss triggers vascular inflammation, attracting immune cells - a phenotype specific to CNS vasculature, as non-CNS organs remain unaffected. 

      Strengths: 

      The immunostaining results presented are clear and robust. The authors performed well-controlled analyses against relevant mouse models. snRNA-seq corroborates immune cell leakage in the retina and vascular inflammation in the brain. 

      Weaknesses: 

      The causal link between TGF-β loss, vascular inflammation, and immune infiltration remains unresolved. The authors' model posits that EC-specific TGF-β loss directly causes inflammation, which recruits immune cells. However, an alternative explanation is plausible: Tgfbr1/2 KO-induced developmental defects (e.g., leaky vessels) permit immune extravasation, subsequently triggering inflammation. The observations that vein-specific upregulation of ICAM1 staining and the lack of immune infiltration phenotypes in the non-CNS tissues support the alternative model. Late-stage induction of Tgfbr1/2 KO (avoiding developmental confounders) could clarify TGF-β's role in retinal angiogenesis versus anti-inflammation. 

      Thank you for raising this point.  Your comment prompted us to look at this question in greater depth with more experiments.  We have expanded Figure 2 to show and quantify a comparison between control (i.e. phenotypically WT), NdpKO, and TGFBR1 endothelial KO and we have expanded the associated part of the Results section (Figure 2C and D).  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in or around the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is minimally or not at all leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      In the revised manuscript, we have expanded the Discussion section to address the two alternative hypotheses raised by the reviewer.  Here are the relevant data in a nutshell: (1) vascular leakage into the parenchyma, as measured with sulfo-NHSbiotin, in TGFBR1 endothelial CKO retinas is far less than in NdpKO retinas, where nearly all ECs convert to a fenestration+ (PLVAP+) phenotype and there is leakage of sulfo-NHS-biotin, (2) ICAM1 in ECs in TGFBR1 endothelial CKO retinas increases several-fold more than in NdpKO or Frizzled4KO retinas, (3) TGFBR1 endothelial CKO retinas have more infiltrating immune cells than NdpKO or Frizzled4KO retinas, and (4) in TGFBR1 endothelial CKO retinas large numbers of immune cells are observed within and adjacent to blood vessels.  We think that the simplest explanation for these data is that loss of TGFbeta signaling in ECs causes an endothelial inflammatory state with enhanced immune cell extravasation.  That said, the case for this model is not water-tight, and there could be less direct mechanisms at play.  In particular, this model does not explain why the inflammatory phenotype is limited to CNS (and especially retinal) vasculature.

      Regarding the last sentence of the reviewer’s comment (“Late stage induction…”), we have tried activating CreER recombination at different ages and we observe a large reduction in the inflammatory phenotype when recombination is initiated after vascular development is complete.   This observation suggests that the vascular developmental/anatomic defect – and perhaps the resulting retinal hypoxia response – is required for the inflammatory phenotype.  In the revised manuscript we have expanded the Results and Discussion sections to describe this observation.

      Reviewer #1 (Recommendations for the authors): 

      Suggestions for experiments: 

      (1) The authors need to show a quantitative comparison of the number of choroidal neovascular tufts per whole eye crosssection in both genotypes (TgfbR1 and TgfbR2 KO mice). 

      Thank you for raising this point.  The quantification in the original version of Figure 1- Figure supplement 1 panel C was mis-labeled.  It quantifies choroidal neovascularization (CNV) in both VE-cad-CreER;TGFBR1 CKO/- and VE-cadCreER;TGFBR2 CKO/- retinas, not VE-cad-CreER;TGFBR1 CKO/- retinas only as originally labeled.  The point it makes is that CNV is seen with loss of TGF-beta signaling but not in control retinas or retinas with loss of Norrin signaling.  We have now corrected that plot by separating the data points for VE-cad-CreER;TGFBR1 CKO/- and VE-cad-CreER;TGFBR2 CKO/- retinas, so that they can be compared to each other.   The result shows ~2.5-fold more CNV in VE-cad-CreER;TGFBR2 CKO/- retinas compared to VE-cad-CreER;TGFBR1 CKO/-.  This is now described in the Results section. 

      (2) In the analysis of Sulfo-NHS-Biotin leakage in the retina to assess blood-retina barrier maturation. The authors should provide a detailed quantification of the leakage phenotype outside the vessels into the CNS parenchyma, both in the retina and brain, in TgfbR1 KO mice. 

      Thank you for raising this point.  There is no detectable Sulfo-NHS-biotin leakage into the brain parenchyma in VE-cadCreER;TGFBR1 CKO/- mice.  We have expanded Figure 2 to show and quantify the data for retinal vascular leakage (Figure 2C and D).  The data show that in VE-cad-CreER;TGFBR1 CKO/- mice there is accumulation of Sulfo-NHS-biotin in the vascular tufts but minimal accumulation elsewhere in the retinal vasculature and minimal leakage of Sulfo-NHS-biotin into the retinal parenchyma.

      (3) The immune cell phenotyping by snRNAseq is premature, as the number of cells is very small. The authors should sort for CD45+ cells and perform single-cell RNA sequencing to ascertain these preliminary data. 

      Thank you for raising this point.  We have performed additional snRNAseq analyses using the same tissue processing protocol as for our original snRNAseq data to increase the numbers of cells.  We have opted to homogenize the tissue and prepare nuclei (our original method) rather than dissociating the cells and FACS sorting for CD45+ cells because the nuclear isolation approach is unbiased – we assume that nuclei from all cell types are present.  By contrast, we cannot be certain that CD45 FACS will capture the full range of immune cells, since some cells may not express CD45, may express CD45 at low level, or may be tightly adherent to other cells, such as vascular endothelial cell.  Additionally, by following the original protocol, we can combine the original snRNAseq dataset of and the new snRNAseq dataset.  In the revised manuscript we present the snRNAseq data from the combination of the original and the more recent snRNAseq datasets (revised Figure 4; N=628 immune cell nuclei).  The new analysis comes to the same conclusion as in the original submission, namely that the immune cell infiltrate in the mutant retinas is composed of a wide variety of immune cells.  The Results section has been expanded to describe this new data and analysis.    

      (4) The analysis of BBB leakage phenotype in TgfbR1 KO mice needs to be more detailed and include tracers as well as serum IgG leakage. 

      Sulfo-NHS biotin leakage in the VE-cad-CreER;TGFBR1 CKO/- brain is minimal, and it is indistinguishable from WT controls.  Since Sulfo-NHS biotin is a low MW tracer (<1,000 kDa), this implies that loss of TGF-beta signaling does not increase non-specific diffusion of either low or high MW molecules.  Therefore, the elevated levels of IgG in the brain parenchyma in young VE-cad-CreER;TGFBR1 CKO/- mice (Figure 8A) likely represents specific transport of IgG across the BBB.  Such transport is known to occur via Fc receptors expressed on vascular endothelial cells, although it is normally greater in the brain-to-blood direction than in the blood-to-brain direction.  For example, see Lafrance-Vanasse et al (2025) Leveraging neonatal Fc receptor (FcRn) to enhance antibody transport across the blood brain barrier.  Nat Commun. 16:4143.  This is now described in greater detail in the Results section.

      (5) The authors should perform a more detailed RNAseq analysis of tip and stack (stalk) cells in TgfbrR1 KO mice to determine whether D tip cells are lost in these mutants by snRNAseq. 

      The proposed snRNAseq analysis would serve as an independent check on the diving (D) tip vs stalk cell analyses published by Zarkada et al, who analyzed the same VE-cad-CreER;TGFBR1 CKO/- mutant mice, although they refer to the TGFBR1 gene by its alternate name ALK5 [Zarkada et al (2021) Specialized endothelial tip cells guide neuroretina vascularization and blood-retina-barrier formation. Dev Cell 56:2237-2251].  We have not gone in this direction because the question of tip vs. stalk cells and of subtypes of tip cells in WT vs. mutant retinas is beyond our focus on choroidal neovascularization and the role of immune cells and vascular inflammation.  The proposed snRNAseq analysis would also require a major effort since tip cells are rare and must be harvested from large numbers of early postnatal retinas followed by FACS enrichment for vascular endothelial cells.

      Suggestions for improving the manuscript:  

      (6) The statement that ECs acquire properties of immune cells (Page 2, Line 90) is incorrect. Endothelial cells may acquire characteristics of antigen presenting cells. 

      Thank you for that correction.  Based on the review from Amersfoort et al (2022) (Amersfoort J, Eelen G, Carmeliet P. (2022) Immunomodulation by endothelial cells - partnering up with the immune system? Nat Rev Immunol 22:576-588) and the articles cited in it, we have changed the sentence to “Although vascular endothelial cells (ECs) are not generally considered to be part of the immune system, in some locations and under some conditions they acquire properties characteristic of immune cells, including secretion of cytokines, surface display of co-stimulatory or co-inhibitory receptors, and antigen presentation in association with MHC class II proteins (Pober and Sessa, 2014; Amersfoort et al., 2022).”  

      (7) The statement in Page 3, Line 100-101 [In CNS ECs, quiescence is maintained in part by the actions of astrocyte-derived Sonic Hedgehog, with the result that few immune cells other than resident microglia are found within the CNS (Alvarez et al., 2011).] is incomplete. Wnt signaling also suppresses the expression of leukocyte adhesion molecules from endothelial cells and therefore helps with immune cell quiescence. 

      Thank you for raising that point.  We have expanded that sentence to include Wnt signaling in CNS endothelial cells, as described in the following reference: Lengfeld JE, Lutz SE, Smith JR, Diaconu C, Scott C, Kofman SB, Choi C, Walsh CM, Raine CS, Agalliu I, Agalliu D. (2017) Endothelial Wnt/beta-catenin signaling reduces immune cell infiltration in multiple sclerosis. Proc Natl Acad Sci USA 114:E1168-E1177.

      (8) It may be beneficial for the reader to separate the results of the vascular phenotypes related to choroidal neovascularization compared to retinal vascular development. 

      Thank you for this suggestion.  The two topics are partly overlapping: choroidal neovascularization is described in Figure 1, and retinal development is described in Figures 1 and 2.  The challenge is that some of same images illustrate both phenotypes as in Figure 1, so the topics cannot be easily separated.

      (9) In addition to comparing the phenotypes in Tgfb signaling mutant mice with Wnt signaling and VEGF-A signaling mutants, the authors should compare and contrast their data with those found in Alk5 KO mice, as there are a lot of similarities. 

      The reviewer has alerted us to a nomenclature challenge which we will try to resolve in the introduction: Alk5 is just another name for TGFBR1.  The reviewer is correct: there are a lot of similarities between the present study and that of Zarkada et al (2021) because both use the same TGFBR1(=Alk5) CKO mice.

      Reviewer #2 (Recommendations for the authors): 

      Figure 2 

      For 2B, the authors should clarify whether the two regions shown in the Tgfbr1 KO retina (P14) represent central vs. peripheral areas, as phenotype severity varies. 

      For 2C, does the uneven biotin accumulation reflect developmental gradients (e.g., central-peripheral maturation timing)? 

      Thank you for raising these points.  Regarding Figure 2B, these images are all from the mid-peripheral retina, where the phenotype is moderately severe.  This is now noted in the figure legend.

      Regarding Figure 2C, the reviewer is correct that the pattern of Sulfo-NHS-biotin is uneven in VEcadCreER;Tgfbr1CKO/- retinas – it accumulates only in the tufts.  We have expanded Figure 2C to show a comparison between control (i.e.

      phenotypically WT), NdpKO, and TGFBR1 endothelial KO retinas, and we have expanded the associated part of the Results section.  In a nutshell, control retinas show little Sulfo-NHS-biotin accumulation in the vasculature or in the parenchyma; NdpKO retinas show Sulfo-NHS-biotin accumulation in the vasculature and in the parenchyma (i.e., the area between the vessels); and VEcadCreER;Tgfbr1CKO/- retinas show Sulfo-NHS-biotin accumulation in the vascular tufts with minimal accumulation in the non-tuft vasculature and minimal leakage into the parenchyma.   The conclusion is that the bulk of the retinal vasculature in TGFBR1 endothelial KO mice is not leaky – very different from the situation with loss of Norrin/Frizzled4 signaling.

      Figure 6 

      The claim that PECAM1+ rings on veins reflect EC-immune cell binding is uncertain, as PECAM1 is also known to be expressed by immune cells. The complete correlation of PECAM1 and CD45 staining signals suggests that a subset of immune cells upregulates PECAM1. The VEcadCreER;Tgfbr1 flox/-; SUN1:GFP reporter would be helpful to delineate ECimmune cell proximity. Super-resolution imaging with Z-stacks could also resolve spatial relationships (luminal vs. abluminal immune cell adhesion). 

      Thank you for this comment.  The reviewer is correct that, at the resolution of these images, we cannot determine whether the PECAM1 immunostaining signal is derived from ECs, from leukocytes, or from both.  This is now stated in the Results section.  The PECAM1-rich endothelial ring structure associated with leukocyte extravasation has been characterized in various publications, for example in (1) Carman CV, Springer TA. (2004) A transmigratory cup in leukocyte diapedesis both through individual vascular endothelial cells and between them. J Cell Biol 167:377-388 and (2) Mamdouh Z, Mikhailov A, Muller WA. (2009) Transcellular migration of leukocytes is mediated by the endothelial lateral border recycling compartment. J Exp Med 206:2795-2808.  The ring structures visualized in Figure 6D by PECAM1 immunostaining conform to the ring structures described in these and other papers.  In showing these structures, our point is simply that they likely represent sites of leukocyte extravasation.  This is now clarified in the text.  We have also added some additional references on leukocyte extravasation and the ring structures.

      Figure 7 

      A time-course analysis of ICAM1 would strengthen the mechanistic model. Does ICAM1 upregulation precede immune infiltration (supporting inflammation as the primary defect)? Given that immune cells appear by P14 (per snRNA-seq), is ICAM1 elevated earlier? 

      This is an interesting idea, but based on what is known about leukocyte adhesion and extravasation we predict that there will not be a clean temporal separation between ICAM1 induction and leukocyte adhesion/infiltration.  That is, if the proinflammatory state causes an increase in the number of leukocytes, then as ICAM1 levels increase, leukocyte adhesion would also increase.  Similarly, if the presence of leukocytes increases the pro-inflammatory state, then as the number of leukocytes increases, the levels of ICAM1 would be predicted to increase.  Thus, we think that a time course analysis is unlikely to provide a definitive conclusion.

      Figure 8-SF1 

      In brain slices, a transient pan-IgG accumulation suggests a self-resolving defect in the BBB. However, this BBB impairment appears to be spatiotemporally distinct from ICAM1 upregulation. ICAM1 staining is restricted to the lesion site, aligning with immune cell-driven inflammation. 

      Thank you for raising these points.  The reviewer is correct that these observations don’t fit together in a clear way.  There does not appear to be a general increase in brain vascular permeability in VE-cad-CreER;TGFBR1 CKO/- mice, as shown by sulfo-NHS-biotin.  However, there is a large and transient increase in IgG in the brain parenchyma, suggestive of a general vascular alteration, and – as the reviewer correctly notes – it is not accompanied by a generalized increase in ICAM1 vascular immunostaining.  At this point, we don’t have any real insight into the mechanistic basis of the transient IgG increase.

      Thank you for handling this manuscript.

    1. eLife Assessment

      This cleverly designed and potentially important work supports our understanding regarding how and whether social behaviours promoting egalitarianism can be learned, even when implementing these norms entails a cost for oneself. However, the evidence supporting the major claims is currently incomplete, with the major limitation being whether Ps truly learn egalitarianism from a teacher or instead exhibit reduced guilt across time that is reduced when observing others behaving more selfishly. With a strengthening of the supporting evidence, this work will be of interest to a wide range of fields, including cognitive psychology/neuroscience, neuroeconomics, and social psychology, as well as policy making.

    2. Reviewer #1 (Public review):

      Summary:

      Zhang et al. addressed the question of whether advantageous and disadvantageous inequality aversion can be vicariously learned and generalized. Using an adapted version of the ultimatum game (UG), in three phases, participants first gave their own preference (baseline phase), then interacted with a "teacher" to learn their preference (learning phase), and finally were tested again on their own (transfer phase). The key measure is whether participants exhibited similar choice preference (i.e., rejection rate and fairness rating) influenced by the learning phase, by contrasting their transfer phase and baseline phase. Through a series of statistical modeling and computational modeling, the authors reported that both advantageous and disadvantageous inequality aversion can indeed be learned (Study 1), and even be generalised (Study 2).

      Strengths:

      This study is very interesting, that directly adapted the lab's previous work on the observational learning effect on disadvantageous inequality aversion, to test both advantageous and disadvantageous inequality aversion in the current study. Social transmission of action, emotion, and attitude have started to be looked at recently, hence this research is timely. The use of computational modeling is mostly appropriate and motivated. Study 2 that examined the vicarious inequality aversion on conditions where feedback was never provided is interesting and important to strengthen the reported effects. Both studies have proper justifications to determine the sample size.

      Weaknesses:

      Despite the strengths, a few conceptual aspects and analytical decisions have to be explained, justified, or clarified.

      INTRODUCTION/CONCEPTUALIZATION

      (1) Two terms seem to be interchangeable, which should not, in this work: vicarious/observational learning vs preference learning. For vicarious learning, individuals observe others' actions (and optionally also the corresponding consequence resulted directly by their own actions), whereas, for preference learning, individuals predict, or act on behalf of, the others' actions, and then receive feedback if that prediction is correct or not. For the current work, it seems that the experiment is more about preference learning and prediction, and less so about vicarious learning. But the intro and set are heavily around vicarious learning, and late the use of vicarious learning and preference learning is rather mixed in the text. I think either tone down the focus on vicarious learning, or discuss how they are different. Some of the references here may be helpful: Charpentier et al., Neuron, 2020; Olsson et al., Nature Reviews Neuroscience, 2020; Zhang & Glascher, Science Advances, 2020

      EXPERIMENTAL DESIGN

      (2) For each offer type, the experiment "added a uniformly distributed noise in the range of (-10 ,10)". I wonder how this looks like? With only integers such as 25:75, or even with decimal points? More importantly, is it possible to have either 70:30 or 90:10 option, after adding the noise, to have generated an 80:20 split shown to the participants? If so, for the analyses later, when participants saw the 80:20 split, which condition did this trial belong to? 70:30 or 90:10? And is such noise added only to the learning phase, or also to the baseline/transfer phases? This requires some clarification.

      (3) For the offer conditions (90:10, 70:30, 50:50, 30:70, 10:90) - are they randomized? If so, how is it done? Is it randomized within each participants, and/or also across participants (such that each participant experienced different trial sequences)? This is important, as the order especially for the leanring phase can largely impact on the preference learning of the participants.

      STATISTICAL ANALYSIS & COMPUTATIONAL MODELING

      (4) In Study 1 DI offer types (90:10, 70:30), the rejection rate for DI-AI averse looks consistently higher than that for DI averse (ie, blue line is above the yellow line). Is this significant? If so, how come? Since this is a between-subject design, I would not anticipate such a result (especially for the baseline). Also, for the LME results (eg, Table S3), only interactions were reported but not the main results.

      (5) I do not particularly find this analysis appealing: "we examined whether participants' changes in rejection rates between Transfer and Baseline, could be explained by the degree to which they vicariously learned, defined as the change in punishment rates between the first and last 5 trials of the Learning phase." Naturally, participants' behavior in the first 5 trials in the learning phase will be similar to those in the baseline; and their behavior in the last 5 trials in the learning phase would echo those at the transfer phase. I think it would be stronger to link the preference learning results to the chance between baseline and transfer phase, eg, by looking at the difference between alpha (beta) at the end of the learning phase and the initial alpha (beta).

      (6) I wonder if data from the baseline and transfer phases can also be modeled, using a simple Fehr-Schimdt model? This way, the change in alpha/beta can also be examined between the baseline and transfer phase.

      (7) I quite liked Study 2 that tests the generalization effect, and I expected to see an adapted computational modeling to directly reflect this idea. Indeed, the authors wrote "[...] given that this model [...] assumes the sort of generalization of preferences between offer types [...]". But where exactly did the preference learning model assumed the generalization? In the methods, the modeling seems to be only about Study 1; did the authors advise their model to accommodate Study 2? The authors also ran simulation for the learning phase in Study 2 (Figure 6), and how did the preference updated (if at all) for offers (90:10 and 10:90) where feedback was not given? Extending/Unpacking the computational modeling results for Study2 will be very helpful for the paper.

      Comments on revisions:

      I kept my original public review, so that future readers can see the progress and development of the manuscript.

      The authors have largely addressed my original questions/concerns, and I have two outstanding comments.

      (a) Related to my original comment #6, where I suggested to apply the F-S model also to the baseline and transfer phase. The authors were inclined not to do it, but in fact later in comment #7 and in the manuscript they opted to use a more complex F-S-based model to their learning phase. I agree that the rejection rate is indeed a clear indication, but for completeness, it'd be more consistent and compelling if the paper follows a model-free (model-agnostic) and model-based approach in all phases of the experiment.

      (b) Related to my original comment #4, I appreciate that the authors have provided more details of their LMM models. But I don't think it is accurate regardless. First, all offer levels (50:50, 30:70, 10:90), should not be coded as pure categorical levels. In fact, they have an ordinal meaning, a single ordinal predictor with three levels should be used. This also avoids the excessive number of interactions the authors have pointed out.

      Second, running a model with only interactions without main effects is flawed. All textbooks on stats emphasize that without the presence of the main effects, the interpretation of interaction only is biased.

      So these LMMs needs to be revised before the manuscript eventually gets to a version of record.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates whether individuals can learn to adopt egalitarian norms that incur a personal monetary cost, such as rejecting offers that benefit them more than the giver (advantageous inequitable offers). While these behaviors are uncommon, two experiments aim to demonstrate that individuals can learn to reject such offers by observing a "teacher" who follows these norms. The authors use computational modelling to argue that learners adopt these norms through a sophisticated process, inferring the latent structure of the teacher's preferences, akin to theory of mind.

      Strengths:

      This paper is well-written and tackles an important topic relevant to social norms, morality, and justice. The findings are promising (though further control conditions are necessary to support the conclusions). The study is well-situated in the literature, with a clever experimental design and a computational approach that may offer insights into latent cognitive processes. In the revision, the authors clarified some questions related to the initial submission.

      Weaknesses:

      Despite these strengths, I remain unconvinced that the current evidence supports the paper's central claims. Below, I outline several issues that, in my view, limit the strength of the conclusions.

      (1) Experimental Design and Missing Control Condition:

      The authors set out to test whether observing a "teacher" who is averse to advantageous inequity (Adv-I) will affect observers' own rejection of Adv-I offers. However, I think the design of the task lacks an important control condition needed to address this question. At present, participants are assigned to one of two teachers: DIS or DIS+ADV. Behavioral differences between these groups can only reveal relative differences in influence; they cannot establish whether (and how) either teacher independently affects participants' own behavior. For example, a significant difference between conditions can emerge even if participants are only affected by the DIS teacher and are not affected at all by the DIS+ADV teacher. What is crucially missing here is a no-teacher control condition, which can then be compared with each teacher condition separately. This control condition would also control for pure temporal effects unrelated to teacher influence (e.g., increasing Adv-I rejections due to guilt build-up).

      While this criticism applies to both experiments, it is especially apparent in Experiment 2. As shown in Figure 4, the interaction for 10:90 offers reflects a decrease in rejection rates following the DIS teacher, with no significant change following the DIS+ADV teacher. Ignoring temporal effects, this pattern suggests that participants may be learning NOT to reject from the DIS teacher, rather than learning to reject from the DIS+ADV teacher. On this basis, I do not see convincing evidence that participants' own choices were shaped by observing Adv-I rejections.

      In the Discussion, the authors write that "We found that participants' own Adv-I-averse preferences shifted towards the preferences of the Teacher they just observed, and the strength of these contagion effects related to the degree of behavior change participants exhibited on behalf of the Teachers, suggesting that they internalized, at least somewhat, these inequity preferences." However, there is no evidence that directly links the degree of behaviour change (on the teacher's behalf) to contagion effects (own behavioural change). I think there was a relevant analysis in the original version, but it was removed from the current version.

      (2) Modelling Efforts: The modelling approach is underdeveloped. The identification of the "best model" lacks transparency, as no model-recovery results are provided. Additionally, behavioural fits for the losing models are not shown, leaving readers in the dark about where these models fail. Readers would benefit from seeing qualitative/behavioural patterns that favour the winning model. Moreover, the reinforcement learning (RL) models used are overly simplistic, treating actions as independent when they are likely inversely related. For example, the feedback that the teacher would have rejected an offer provides evidence that rejection is "correct" but also that acceptance is "an error," and the latter is not incorporated into the modelling. In other words, offers are modelled as two-armed bandits (where separate values are learned for reject and accept actions), but the situation is effectively a one-armed bandit (if one action is correct, the other is mistaken). It is unclear to what extent this limitation affects the current RL formulations. Can the authors justify/explain their reasoning for including these specific variants? The manuscript only states Q-values for reject actions, but what are the Q-values for accept actions? This is unclear.

      In Experiment 2, only the preferred model is capable of generalization, so it is perhaps unsurprising that this model "wins." However, this does not strongly support the proposed learning mechanism, lacking a comparison with simpler generalizing mechanisms (see following comments).

      (3) Conceptual Leap in Modelling Interpretation: The distinction between simple RL models and preference-inference models seems to hinge on the ability to generalize learning from one offer to another. Whereas in the RL models, learning occurs independently for each offer (hence no cross-offer generalization), preference inference allows for generalization between different offers. However, the paper does not explore "model-free" RL models that allow generalization based on the similarity of features of the offers (e.g., payment for the receiver, payment for the offer-giver, who benefits more). Such models are more parsimonious and could explain the results without invoking a theory of mind or any modelling of the teacher. In such model versions, a learner acquires a functional form that allows prediction of the teacher's feedback based on offer features (e.g., linear or quadratic weighting). Because feedback for an offer modulates the parameters of this function (feature weights), generalization occurs without necessarily evoking any sophisticated model of the other person. This leaves open the possibility that RL models could perform just as well or even outperform the preference learning model, casting doubt on the authors' conclusions.

      Of note: even the behaviourists knew that when Little Albert was taught to fear rats, this fear generalized to rabbits. This could occur simply because rabbits are somewhat similar to rats. But this doesn't mean Little Albert had a sophisticated model of animals that he used to infer how they behave.

      In their rebuttal letter, the authors acknowledge these possibilities, but the manuscript still does not explore or address alternative mechanisms.

      (4) Limitations of the Preference-Inference Model: The preference-inference model struggles to capture key aspects of the data, such as the increase in rejection rates for 70:30 DI offers during the learning phase (e.g., Fig. 3A, AI+DI blue group). This is puzzling. Thinking about this, I realized the model makes quite strong, unintuitive predictions which are not examined. For example, if a subject begins the learning phase rejecting the 70:30 offer more than 50% of the time (meaning the starting guilt parameter is higher than 1.5), then, over learning, the tendency to reject will decrease to below 50% (the guilt parameter will be pulled down below 1.5). This is despite the fact that the teacher rejects 75% of the offers. In other words, as learning continues, learners will diverge from the teacher. On the other hand, if a participant begins learning by tending to accept this offer (guilt < 1.5), then during learning, they can increase their rejection rate but never above 50%. Thus, one can never fully converge on the teacher. I think this relates to the model's failure in accounting for the pattern mentioned above. I wonder if individuals actually abide by these strict predictions. In any case, these issues raise questions about the validity of the model as a representation of how individuals learn to align with a teacher's preferences (given that the model doesn't really allow for such an alignment).

      In their rebuttal letter, the authors acknowledged these anomalies and stated that they were able to build a better model (where anomalies are mitigated, though not fully eliminated). But they still report the current model and do not develop/discuss alternatives. A more principled model may be a Bayesian model where participants learn a belief distribution (rather than point estimates) regarding the teacher's parameters.

      (5) Statistical Analysis: The authors state in their rebuttal letter that they used the most flexible random effect structure in mixed-effects models. But this seems not to be the case in the model reported in Table SI3 (the very same model was used for other analyses too). Indeed, here it seems only intercepts are random effects. This left me confused about which models were used.

    1. eLife Assessment

      This important study provides solid evidence for new insights into the role of Type-1 nNOS interneurons in driving neuronal network activity and controlling vascular network dynamics in awake, head-fixed mice. The authors use an original strategy based on the ablation of Type-1 nNOS interneurons with local injection of saporin conjugated to a substance P analogue into the somatosensory cortex. They show that ablation of type I nNOS neurons has surprisingly little effect on neurovascular coupling, although it alters neural activity and vascular dynamics.

    2. Reviewer #1 (Public review):

      Turner et al. present an original approach to investigate the role of Type-1 nNOS interneurons in driving neuronal network activity and in controlling vascular network dynamics in awake head-fixed mice. Selective activation or suppression of Type-1 nNOS interneurons has previously been achieved using either chemogenetic, optogenetic or local pharmacology. Here, the authors took advantage of the fact that Type-1 nNOS interneurons are the only cortical cells that express the tachykinin receptor 1 to ablate them with a local injection of saporin conjugated to substance P (SP-SAP). SP-SAP causes cell death in 90 % of type1 nNOS interneurons without affecting microglia, astrocytes and neurons. The authors report that the ablation has no major effects on sleep or behavior. Refining the analysis by scoring neural and hemodynamic signals with electrode recordings, calcium signal imaging and wide field optical imaging, they observe that Type-1 nNOS interneuron ablation does not change the various phases of the sleep/wake cycle. However, it does reduce low-frequency neural activity, irrespective of the classification of arousal state. Analyzing neurovascular coupling using multiple approaches, they report small changes in resting-state neural-hemodynamic correlations across arousal states, primarily mediated by changes in neural activity. Finally, they show that nNOS type 1 interneurons play a role in controlling interhemispheric coherence and vasomotion.

      In conclusion, these results are interesting, use state-of-the-art methods and are well supported by the data and their analysis. I have only a few comments on the stimulus-evoked haemodynamic responses that can be easily addressed:

      Comments on revisions:

      As I mentioned in my initial review, this study is important. In my opinion, it could be published as is. Nonetheless, I am still somewhat dissatisfied with the authors' responses to my earlier comments. I understand that the same animals were not used for both stimulation paradigms, which is unfortunate. Nonetheless, I would have appreciated it if the authors had provided a couple of experiments illustrating GCaMP7 signals during brief stimulation in their reply to the reviewers. I am still unconvinced by the authors' suggestion that the GCaMP7 signal would remain stable during removal of the vascular undershoot. Since the absence of the undershoot is notable, I anticipate that a significant part of the initial response to prolonged stimulation is influenced by processes that occur during the 0.1-second stimulation, processes that may involve a change in the bulk neuronal response.

      In short, the data could support or refute the following statement: "Loss of type-I nNOS neurons drove minimal changes in the vasodilation elicited by brief stimulation..."

    3. Reviewer #2 (Public review):

      Summary:

      This important study by Turner et al., examines the functional role of a sparse but unique population of neurons in the cortex that express Nitric oxide synthase (Nos1). To do this, they pharmacolologically ablate these neurons in focal region of whisker related primary somatosensory (S1) cortex using a saponin-Substance P conjugate. Using widefield and 2-photon microscopy, as well as field recordings, they examine the impact of this cell specific lesion on blood flow dynamics and neuronal population activity. Within primary somatosensory cortex after Nos1 ablation, they find changes in neural activity patterns, decreased delta band power, reduced sensory evoked changes in blood flow (specifically eliminates the sustained blood flow change after stimulation) and decreased vasomotion.

      Strengths:

      This was a technically challenging study and the experiments were executed in an expert manner. The manuscript was well written and I appreciated the cartoon summary diagrams included in each figure. The analysis was rigorous and appropriate. Their discovery that Nos1 neurons can have significant effects on blood flow dynamics and neural activity is quite novel that should seed many follow up, mechanistic experiments to explain this phenomenon. The conclusions were justified by the convincing data presented.

      Weaknesses:

      I did not find any major flaws with the study. I originally noted some potential issues with the authors' characterization of the lesion and its extent, but that has been resolved in the revised manuscript.

      Comments on revisions:

      The authors have thoughtfully addressed the relatively minor concerns I had originally raised. Congratulations to the authors for producing this important paper.

    1. eLife Assessment

      This paper addresses a significant question regarding the low overlap between genetic discoveries for human complex diseases and those for gene expression by emphasizing the contribution of cell-type-specific chromatin accessibility QTLs. The analyses supporting the main claims are convincing, and the key conclusions are valuable and of interest to readers in the fields of human genetics and functional genomics.

    2. Reviewer #1 (Public review):

      Most human traits and common diseases are polygenic, influenced by numerous genetic variants across the genome. These variants are typically non-coding and likely function through gene regulatory mechanisms. To identify their target genes, one strategy is to examine if these variants are also found among genetic variants with detectable effects on gene expression levels, known as eQTLs. Surprisingly, this strategy has had limited success, and most disease variants are not identified as eQTLs, a puzzling observation recently referred to as "missing regulation".

      In this work, Jeong and Bulyk aimed to better understand the reasons behind the gap between disease-associated variants and eQTLs. They focused on immune-related diseases and used lymphoblastoid cell lines (LCLs) as a surrogate for the cell types mediating the genetic effects. Their main hypothesis is that some variants without eQTL evidence might be identifiable by studying other molecular intermediates along the path from genotype to phenotype. They specifically focused on variants that affect chromatin accessibility, known as caQTLs, as a potential marker of regulatory activity.

      The authors present data analyses supporting this hypothesis: several disease-associated variants are explained by caQTLs but not eQTLs. They further show that although caQTLs and eQTLs likely have largely overlapping underlying genetic variants, some variants are discovered only through one of these mapping strategies. Notably, they demonstrate that eQTL mapping is underpowered for gene-distal variants with small effects on gene expression, whereas caQTL mapping is not dependent on the distance to genes. Additionally, for some disease variants with caQTLs but no corresponding eQTLs in LCLs, they identify eQTLs in other cell types.

      Altogether, Jeong and Bulyk convincingly demonstrate that for immune-related diseases, discovering the missing disease-eQTLs requires both larger eQTL studies and a broader range of cell types in expression assays. It remains to be seen what fractions of the missing disease-eQTLs will be discovered with either strategy and whether these results can be extended to other diseases or traits.

      It should be noted that the problem of "missing regulation" has been investigated and discussed in several recent papers, notably Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; Mostafavi et al., Nat. Genet. 2023. The results reported by Jeong and Bulyk are not unexpected in light of this previous work (all of which they cite), but they add valuable empirical evidence that mostly aligns with the model and discussions presented in Mostafavi et al.

    3. Reviewer #2 (Public review):

      eQTLs have emerged as a method for interpreting GWAS signals. However, some GWAS signals are difficult to explain with eQTLs. In this paper, the authors demonstrated that caQTLs can explain these signals. This suggests that for GWAS signals to actually lead to disease phenotypes, they must be accessible in the chromatin. This implies that for GWAS signals to translate into disease phenotypes, they need to be accessible within the chromatin.

      However, fundamentally, caQTLs, like GWAS, have the limitation of not being able to determine which genes mediate the influence on disease phenotypes. This limitation is consistent with the constraints observed in this study.

      (1) Reproducibility / Methods. The concrete numbers provided in the authors' response (e.g., 20 YRI LCL ATAC‑seq samples used only for peak discovery; caQTL mapping restricted to 100 GBR LCLs; 99,320 ATAC peaks tested vs 14,872 genes for eQTL; 373 European RNA‑seq samples, with clarification of overlap) do not appear to be reflected in the Methods. These specifics should be incorporated directly into the Methods sections.

      (2) Experimental evidence demonstrating transcription factor binding at representative caQTL peaks would strengthen causal interpretation of these loci.

      (3) Tissue/cell‑type specificity of caQTLs: Prior work supports that chromatin‑level effects are broadly shared across cellular states, whereas expression effects are more context‑specific; thus, caQTLs are generally less "state‑specific" than eQTLs. However, this does not imply equivalence across distinct cell types: caQTLs derived from different cell types may yield different results, particularly where accessibility is cell‑type restricted.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Most human traits and common diseases are polygenic, influenced by numerous genetic variants across the genome. These variants are typically non-coding and likely function through gene regulatory mechanisms. To identify their target genes, one strategy is to examine if these variants are also found among genetic variants with detectable effects on gene expression levels, known as eQTLs. Surprisingly, this strategy has had limited success, and most disease variants are not identified as eQTLs, a puzzling observation recently referred to as "missing regulation". 

      In this work, Jeong and Bulyk aimed to better understand the reasons behind the gap between disease-associated variants and eQTLs. They focused on immune-related diseases and used lymphoblastoid cell lines (LCLs) as a surrogate for the cell types mediating the genetic effects. Their main hypothesis is that some variants without eQTL evidence might be identifiable by studying other molecular intermediates along the path from genotype to phenotype. They specifically focused on variants that affect chromatin accessibility, known as caQTLs, as a potential marker of regulatory activity. 

      The authors present data analyses supporting this hypothesis: several disease-associated variants are explained by caQTLs but not eQTLs. They further show that although caQTLs and eQTLs likely have largely overlapping underlying genetic variants, some variants are discovered only through one of these mapping strategies. Notably, they demonstrate that eQTL mapping is underpowered for gene-distal variants with small effects on gene expression, whereas caQTL mapping is not dependent on the distance to genes. Additionally, for some disease variants with caQTLs but no corresponding eQTLs in LCLs, they identify eQTLs in other cell types. 

      Altogether, Jeong and Bulyk convincingly demonstrate that for immune-related diseases, discovering the missing disease-eQTLs requires both larger eQTL studies and a broader range of cell types in expression assays. It remains to be seen what fractions of the missing diseaseeQTLs will be discovered with either strategy and whether these results can be extended to other diseases or traits. 

      We thank the reviewer for their accurate summary of our study and positive review of our findings for immune-related diseases.

      It should be noted that the problem of "missing regulation" has been investigated and discussed in several recent papers, notably Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; Mostafavi et al., Nat. Genet. 2023. The results reported by Jeong and Bulyk are not unexpected in light of this previous work (all of which they cite), but they add valuable empirical evidence that mostly aligns with the model and discussions presented in Mostafavi et al. 

      We thank the reviewer for their positive review of our results and manuscript. As Reviewer #1 noted, whether our and others' observation extends to other diseases or traits is an open question. For instance, Figure 2b in Mostafavi et al., Nat. Genet. (2023) demonstrated that there was a spectrum of depletion of eQTLs and enrichment of GWAS signals in constrained genes across various tissues and traits, respectively. Therefore, gene expression constraint may play a larger or smaller role in different diseases or traits. That immune cell types and cell states are extremely diverse (Schmiedel et al., Cell (2018) and Calderon et al., Nat. Genet. (2019), just to name a few) likely adds to the complexity of gene regulation that contributes to immune-mediated disease.

      Reviewer #2 (Public Review): 

      Summary: 

      eQTLs have emerged as a method for interpreting GWAS signals. However, some GWAS signals are difficult to explain with eQTLs. In this paper, the authors demonstrated that caQTLs can explain these signals. This suggests that for GWAS signals to actually lead to disease phenotypes, they must be accessible in the chromatin. This implies that for GWAS signals to translate into disease phenotypes, they need to be accessible within the chromatin. 

      However, fundamentally, caQTLs, like GWAS, have the limitation of not being able to determine which genes mediate the influence on disease phenotypes. This limitation is consistent with the constraints observed in this study. 

      We thank the reviewer for their accurate summary of our results.

      (1) For reproducibility, details are necessary in the method section.

      Details about adding YRI samples in ATAC-seq: For example, how many samples are there, and what is used among public data? There is LCL-derived iPSC and differentiated iPSC (cardiomyocytes) data, not LCL itself. How does this differ from LCL, and what is the rationale for including this data despite the differences?

      Banovich et al., Genome Research (2018) (PMID: 29208628), who generated data using LCLderived iPSCs and differentiated iPSCs (cardiomyocytes), also generated ATAC-seq data from 20 YRI LCL samples. We analyzed those data to identify open chromatin regions (i.e., ATACseq peaks) in LCLs and merged the regions with open chromatin regions identified with 100 GBR LCL samples from two studies by Kumasaka et al. (Nature Genetics (2016)

      PMID: 26656845 and Nature Genetics (2019) PMID: 30478436). However, we restricted the caQTL analysis to only the 100 GBR samples because of possible ancestry effects and batch effects. We attempted caQTL analysis with the 20 YRI samples as well, but the result was noisy, likely due to smaller sample size and lower read depth of the ATAC-seq data.

      caQTL is described as having better power than eQTL despite having fewer samples. How does the number of ATAC peaks used in caQTL compare to the number of gene expressions used in eQTL?

      The number of ATAC peaks used in caQTL (99,320) is ~6.7 times greater than the number of genes (14,872) used in the eQTL analysis. Therefore, there is a higher chance of detecting a significant caQTL signal and a significant colocalization signal than there is for eQTLs. However, we reasoned that since distal eQTLs are more easily detected as caQTLs and since increasing the sample size of eQTLs through meta-analysis uncovered additional eQTL colocalization at loci with caQTL colocalization only, colocalized caQTLs are likely capturing disease-relevant regulatory effects.

      Details about RNA expression data: In the method section, it states that raw data (ERP001942) was accessed, and in data availability, processed data (E-GEUV-1) was used. These need to be consistent.

      Thank you for pointing this out. We used the processed data from Expression Atlas (https://www.ebi.ac.uk/gxa/experiments/E-GEUV-1/Results), and that's what we meant by "We downloaded RNA expression level data of the LCL samples from the Expression Atlas." We have revised the “RNA expression data preparation” section in our manuscript to make the text clearer.

      How many samples were used (the text states 373, but how was it reduced from the original 465, and the total genotype is said to be 493 samples while ATAC has n=100; what are the 20 others?), and it mentions European samples, but does this exclude YRI?

      We thank the reviewer for pointing out these points of confusion. Our reported count of 493 samples included YRI samples with RNA-seq data or ATAC-seq data that we ultimately did not use for QTL analyses. There were 373 European samples with RNA-seq data that we used for eQTL analysis, and 100 GBR samples (including some that overlap with the 373 European samples) that we used for caQTL analysis. We have revised the text to clarify these points.

      (2) Experimental results determining which TFs might bind to the representative signals of caQTL are required.

      We agree that caQTL colocalization is just the start of elucidating the regulatory mechanism of a GWAS locus. Determining which TFs are bound and which TFs' binding is altered would be necessary to describe the causal regulatory mechanism. For this, we utilized the Cistrome database to search for TFs whose binding overlaps the colocalized caQTL peaks. We present the results of this analysis in Supplementary Table 3 and Supplementary Figure 4, both of which we have added in our revised manuscript. Overall, protein factors associated with active transcription, such as POL2RA, and several immune cell TFs, including RUNX3, SPI1, and RELA, were frequently detected in those peaks. Detecting these factors in most peaks supports the likelihood that the colocalized caQTL peaks are active cis-regulatory elements. These results are consistent with our observation of enriched caQTL-mediated heritability in regions with active histone marks (Figure 1).

      (3) It is stated that caQTL is less tissue-specific compared to eQTL; would caQTL performed with ATAC-seq results from different cell types, yield similar results?

      We thank the reviewer for the question. Calderon et al. (PMID: 31570894) observed that "most effects on allelic imbalance (of ATAC-seq) were shared regardless of lineage or condition". Yet, there were regions where a different cell type or state would show inaccessibility (Figure 4d in Calderon et al.). Thus, we expect that ATAC-seq results from different cell types (e.g., T cells, B cells, monocytes, etc.) would lead to additional caQTLs showing colocalization at cell-typespecific open chromatin. However, if a region is accessible in both cell types, caQTL may be detected in both. Moreover, Alasoo et al., Nature Genetics (2018) (PMID: 29379200) observed that “many disease-risk variants affect chromatin structure in a broad range of cellular states, but their effects on expression are highly context specific.” In both studies, the authors investigated immune cell types, and there could be different observations in non-immune cell types and other diseases and traits.

      Reviewer #1 (Recommendations For The Authors): 

      I think it would strengthen the paper to explore gene-level differences in the discovery of caQTLs and eQTLs. For example, complex disease-relevant genes, on average, have more/longer regulatory domains (as shown by Wang and Goldstein, AJHG 2020; Mostafavi et al., Nat. Genet. 2023). Therefore, it is plausible that for such genes, caQTLs are much more easily discoverable than eQTLs due to (i) a larger mutational target size for caQTLs, and (ii) dispersion of expression heritability across multiple domains, which hampers the discovery of eQTLs but not caQTLs, which are studied independently of other domains in the region. In other words, discovered caQTLs and eQTLs likely vary in terms of their distance to genes (as the authors report), as well as their target genes.

      We thank the reviewer for the suggestion to explore gene-level differences. We expect that the effects of complex disease-relevant genes having more / longer regulatory domains, on average, to explain our observations. We agree on both of your points that there are many more regulatory elements that are captured as accessible regions than expressed genes and that genes often have multiple independent eQTLs leading to dispersion of heritability. The genelevel trend that we described was the distance of the regulatory element from the genes. Additional analyses would be a relevant future direction.

      Also considering gene-level analysis, Mostafavi et al. show that the types of biases they report for eQTLs also apply to other molecular QTLs. It would be valuable to compare GWAS hits with versus without caQTL colocalization. Similarly, it would be insightful to compare GWAS hits with both colocalized caQTLs and eQTLs to GWAS hits with colocalized caQTLs but no eQTLs in any of the cell types. 

      We thank the reviewer for the comment. Investigating for potential biases in the colocalized caQTL would be useful, but we considered it beyond the scope of this work. In terms of biological factors, we demonstrated through mediated heritability analyses that more accessible chromatin (based on ATAC-seq read coverage) and regions with active histone marks were enriched for autoimmune disease associations (Figure 1). Furthermore, as greater distance of the regulatory variant from the transcription start site significantly reduced the cis-heritability, we would expect that distance would play a major role, similar to Mostafavi et al.’s conclusions.

      I don't think the argument for the role of natural selection contributing to the "missing regulation" is presented accurately. Specifically, large eQTLs acting on top trait-relevant genes are under stronger selection and thus, on average, segregate at lower frequencies. This makes them difficult to discover in eQTL assays. However, if not lost, they contribute as much, if not more, to trait heritability than weaker eQTLs at the same gene because their larger effects compensate for their lower frequency. At the most extreme, selection should have a "flattening" effect (e.g., see Simons et al., PLOS Biol 2018; O'Connor et al., AJHG 2019): weak and strong eQTLs at the same gene are expected to contribute equally to heritability. Therefore, the statement "Consequently, only weak eQTL variants, often in regions distal to the gene's promoter, may remain and affect traits" is not correct. If this turns out to be empirically true, other models, such as pleiotropic selection, need to explain it. 

      We thank the reviewer for the correction. We agree with the comment and have revised the sentences in the introduction accordingly.

      It is worth speculating why caQTLs may be more consistent across cell types than cis-eQTLs. Additionally, readers may infer from the paper that the focus should shift from eQTLs to caQTLs, which may not be the authors' intention. Perhaps these approaches are complementary: caQTLs can help with TSS-distal disease variants, while finding the target gene and regulatory context is more straightforward with eQTL colocalization. Addressing these points in the discussion will be helpful.

      We appreciate the reviewer's suggestion to clarify the advantages of incorporating cis-eQTLs and caQTLs. Our argument is exactly as you put it, and we added a paragraph on this in the Discussion.

      I believe the authors could do more to contextualize their findings within the existing literature on the subject, particularly Umans et al., Trends in Genetics 2021; Connally et al., eLife 2022; and Mostafavi et al., Nat. Genet. 2023. For instance, Umans et al. suggest that "if most standard eQTLs are generally benign, increasing sample size and adding more tissue types in an effort to identify even more standard eQTLs may not help us to explain many more disease risk mutations". Conversely, Mostafavi et al. argue for a multipronged approach, which appears more aligned with the authors' conclusions.

      We followed the reviewer’s suggestion to place our work in the context of existing literature on this topic. Moreover, we clarified what our recommendations for future data generation are.

      I thought Figures 1C-D were unclear. 

      We added a sentence in the figure legend describing that stronger and more significant enrichment indicate that mediated heritability is concentrated in that subset.

      Reviewer #2 (Recommendations For The Authors): 

      Complete workflow figures for caQTL calling and eQTL calling are required. 

      To improve clarity of the caQTL and eQTL calling workflow, we added Supplementary Figure 1.

    1. eLife Assessment

      This paper reports a valuable finding that gastric fluid DNA content can be used as a potential biomarker for human gastric cancer. The evidence supporting the claims of the authors is solid, although an inclusion of explanations for the methodological limitations, moderate diagnostic performance, and the unexpected survival correlation would have strengthened the study. The work will be of interest to medical biologists working in the field of gastric cancer.

    2. Reviewer #1 (Public review):

      The study analyzes the gastric fluid DNA content identified as a potential biomarker for human gastric cancer. However, the study lacks overall logicality, and several key issues require improvement and clarification. In the opinion of this reviewer, some major revisions are needed:

      (1) This manuscript lacks a comparison of gastric cancer patients' stages with PN and N+PD patients, especially T0-T2 patients.

      (2) The comparison between gastric cancer stages seems only to reveal the difference between T3 patients and early-stage gastric cancer patients, which raises doubts about the authenticity of the previous differences between gastric cancer patients and normal patients, whether it is only due to the higher number of T3 patients.

      (3) The prognosis evaluation is too simplistic, only considering staging factors, without taking into account other factors such as tumor pathology and the time from onset to tumor detection.

      (4) The comparison between gfDNA and conventional pathological examination methods should be mentioned, reflecting advantages such as accuracy and patient comfort.

      (5) There are many questions in the figures and tables. Please match the Title, Figure legends, Footnote, Alphabetic order, etc.

      (6) The overall logicality of the manuscript is not rigorous enough, with few discussion factors, and cannot represent the conclusions drawn

    3. Reviewer #2 (Public review):

      Summary:

      The authors investigated whether the total DNA concentration in gastric fluid (gfDNA), collected via routine esophagogastroduodenoscopy (EGD), could serve as a diagnostic and prognostic biomarker for gastric cancer. In a large patient cohort (initial n=1,056; analyzed n=941), they found that gfDNA levels were significantly higher in gastric cancer patients compared to non-cancer, gastritis, and precancerous lesion groups. Unexpectedly, higher gfDNA concentrations were also significantly associated with better survival prognosis and positively correlated with immune cell infiltration. The authors proposed that gfDNA may reflect both tumor burden and immune activity, potentially serving as a cost-effective and convenient liquid biopsy tool to assist in gastric cancer diagnosis, staging, and follow-up.

      Strengths:

      This study is supported by a robust sample size (n=941) with clear patient classification, enabling reliable statistical analysis. It employs a simple, low-threshold method for measuring total gfDNA, making it suitable for large-scale clinical use. Clinical confounders, including age, sex, BMI, gastric fluid pH, and PPI use, were systematically controlled. The findings demonstrate both diagnostic and prognostic value of gfDNA, as its concentration can help distinguish gastric cancer patients and correlates with tumor progression and survival. Additionally, preliminary mechanistic data reveal a significant association between elevated gfDNA levels and increased immune cell infiltration in tumors (p=0.001).

      Weaknesses:

      The study has several notable weaknesses. The association between high gfDNA levels and better survival contradicts conventional expectations and raises concerns about the biological interpretation of the findings. The diagnostic performance of gfDNA alone was only moderate, and the study did not explore potential improvements through combination with established biomarkers. Methodological limitations include a lack of control for pre-analytical variables, the absence of longitudinal data, and imbalanced group sizes, which may affect the robustness and generalizability of the results. Additionally, key methodological details were insufficiently reported, and the ROC analysis lacked comprehensive performance metrics, limiting the study's clinical applicability.

    1. eLife Assessment

      This study reports important findings about the nature of feedback to primary visual cortex (V1) during object recognition. The state-of-the-art functional MRI evidence for the main claims is solid, and the combination of high-resolution fMRI with MEG yields significant insight into neural mechanisms. The findings presented here are relevant to a number of scientific fields such as object recognition, categorisation and predictive coding.

    2. Reviewer #1 (Public review):

      This study examines the spatiotemporal properties of feedback signals in the human brain during an object discrimination task. Using 7T fMRI and MEG, the authors show that task-relevant object category information can be decoded from both deep and superficial layers of V1, originating from occipito-temporal and posterior parietal cortices. In contrast, task-irrelevant category feedback does not appear in V1, even when the same objects are foveally presented. Low-level orientation information, however, is decodable from V1 regardless of task relevance and is supported by recurrence with occipito-temporal regions. These findings suggest that category decoding in V1 depends on task-driven feedback rather than feedforward visual features.

      Strengths

      This study leverages two advanced neuroimaging modalities attempting to connect object recognition across cortical layer and whole-brain levels. The revised manuscript strengthens the connection between the fMRI and MEG components.<br /> It also demonstrates that a peripheral object discrimination task is effective for isolating feedforward and feedback signals using 7T fMRI.<br /> It is particularly notable that no low-level features were fed back to V1's superficial layers in the peripheral object discrimination task. The authors further show that high- and low-level feedback to the foveal V1 are comparable in strength, supporting the idea that the superficial layer in V1 selectively represents task-relevant content.

      Weaknesses

      One alternative explanation for the absence of task-irrelevant category decoding in the foveal task could be that feedback enhancement may be required to decode complex features from V1 (compared to a coarse orientation feature). It would be informative to test whether the findings hold if the categorical boundary were defined through a low level feature other than orientation (e.g., frequency) (e.g. Ester, Sprague and Serences, 2020).

      I would like to echo the concerns raised by the other reviewer regarding multiple comparisons correction. It is important to apply correction procedures, especially given the number of statistical tests performed across brain regions where strict a priori hypotheses are unlikely. In the case of cluster-based statistics, the manuscript should clearly specify both the cluster-forming threshold and the significance threshold used for comparing true cluster masses to the shuffled distribution.

      Conclusion

      Overall, the results support the study's conclusions. This work addresses a timely question in object categorization and predictive coding-specifically, how feedback signals vary in content and timing across cortical layers.

    3. Reviewer #2 (Public review):

      Summary:

      This manuscript reports high-resolution functional MRI data and MEG data revealing additional mechanistic information about an established paradigm studying how foveal regions of primary visual cortex (V1) are involved in processing peripheral visual stimuli. Because of the retinotopic organization of V1, peripheral stimuli should not evoke responses in the regions of V1 that represent stimuli in the center of the visual field (the fovea). However, functional MRI responses in foveal regions do reflect the characteristics of peripheral visual stimuli - this is a surprising finding first reported in 2008. The present study uses fMRI data with sub-millimeter resolution to study the how responses at different depths in the foveal gray matter do or don't reflect peripheral object characteristics during 2 different tasks: one in which observers needed to make detailed judgments about object identity, and one in which observers needed to make more coarse judgments about object orientation. FMRI results reveal interesting and informative patterns in these two conditions. A follow-on MEG study yields information about the timing of these responses. Put together, the findings settle some questions in the field and add new information about the nature of visual feedback to V1.

      Strengths:

      (1) Rigorous and appropriate use of "laminar fMRI" techniques.

      (2) The introduction does an excellent job of contextualizing the work.

      (3) Control experiments and analyses are designed and implemented well

      Weaknesses:

      (1) The use of the term "low order" to describe object orientation is potentially confusing. During review, the authors considered this issue and responded that they would continue with the use of the term low-order to describe object orientation because a low-pass spatial frequency filter would provide object orientation information. This is certainly a reasonable perspective; nonetheless, this reviewer thinks spatial frequencies that low are not readily represented by neurons in early visual cortex and it is common to use "low-order" to refer to features extracted in early visual areas, so I think this causes confusion.

      (2) The methods contain a nice description of the methods for "correcting the vascular-related signals". I'm guessing this is the method that removed, e.g., 22% of foveal voxels (previous paragraph), but it's not entirely clear whether the voxel selection methods described in the "correcting the vascular-related signals" are describing the same processing step referred to in the previous paragraph as "a portion of voxels was removed based on large vein distribution".

      (3) It is quite difficult to perform laminar analyses across multiple visual areas because distortion compensation is not perfect and registration of functional to anatomical data will always be a bit better in some places and a bit worse in others. An ideal manuscript would include some images showing registration quality in V1, LOC, and IPS regions for a few different participants, or include some kind of quality metric indicating the confidence in depth assignments in different regions.

      (4) For the decoding analysis, it would be helpful to have more information about how samples were defined for each condition -- were the beta values for entire blocks used as samples for each condition, or were separate timepoints during a block used in the SVM as repeated samples for each condition?

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      (1.1) The authors argue that low-level features in a feedback format could be decoded only from deep layers of V1 (and not superficial layers) during a perceptual categorization task. However, previous studies (Bergman et al., 2024; Iamshchinina et al., 2021) demonstrated that low-level features in the form of feedback can be decoded from both superficial and deep layers. While this result could be due to perceptual task or highly predictable orientation feature (orientation was kept the same throughout the experimental block), an alternative explanation is a weaker representation of orientation in the feedback (even before splitting by layers there is only a trend towards significance; also granger causality for orientation information in MEG part is lower than that for category in peripheral categorization task), because it is orthogonal to the task demand. It would be helpful if the authors added a statistical comparison of the strength of category and orientation representations in each layer and across the layers.

      We agree that the strength of feedback information is related to task demand. Specifically, we would like to highlight the relationship between task demand and feedback information in the superficial layer. Previous studies have shown that foveal feedback information is observed only when the task requires the identity information of the peripheral objects (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In this study, we found that the deep layer represented both orientation and categorical feedback information, while the superficial layer only represented categorical information. This suggests that feedback information in the superficial layer may be related to (or enhanced by) the task demands. In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. This assumption is consistent with the anatomical connections of the superficial layer, which not only receives feedback connections but also sends outputs to higher-level regions for further processing. This is also consistent with Iamshchinina et al.’s observation that, when orientation information had to be mentally rotated and reported (i.e., task-relevant), it was observed in both the superficial and deep layers of V1. Bergmann et al. observed illusory color information in the superficial layer of V1, which may reflect a combination of lateral propagation and feedback mechanisms in the superficial layer that support visual filling-in phenomena. We have revised the discussion in the manuscript: In other words, if the experimental design required participants to discriminate orientation rather than object identity, we would expect stronger orientation information in foveal V1 and significant decoding performance of orientation feedback information in the superficial layer of foveal V1. Recent studies (Iamshchinina et al., 2021; Bergman et al., 2024) have also highlighted the relationship between feedback information and neural representations in V1 superficial layer.

      To further demonstrate the laminar profiles of low- and high-order information, we have re-analyzed the data and added more fine-scale laminar profiles with statistical comparisons in the revised manuscript. The results again showed significant neural decoding performances in the deep layer of both category and orientation information, and only significant decoding performances of category information in the superficial layer.

      (1.2) The authors argue that category feedback is not driven by low-level confounding features embedded in the stimuli. They demonstrate the ability to decode orientations, particularly well represented by V1, in the absence of category discrimination. However, the orientation is not a category-discriminating feature in this task. It could be that the category-discriminating features cannot be as well decoded from V1 activity patterns as orientations. Also, there are a number of these category discriminating features and it is unclear if it is a variation in their representational strength or merely the absence of the task-driven enhancement that preempts category decoding in V1 during the foveal task. In other words, I am not sure whether, if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding.

      The low-order features mentioned in the manuscript refer to visual information encoded intrinsically in V1, independent of task demands. In the foveal experiment, the task is to discriminate the color of fixation, which is unrelated to the category or orientation of the object stimuli. The results showed that only orientation information could be decoded from foveal V1. This indicates that low-order information, such as orientation, is strongly and automatically encoded in V1, even when it is irrelevant to the task. Meanwhile, category information could not be decoded, indicating that category information relies on feedback signals driven by attention or the task to the objects, both of which are absent in the fixation task. Other evidence indicates that category feedback is not driven by low-level features intrinsically encoded in V1. First, the laminar profiles of these two types of feedback information differ considerably (see response to 1.1). Second, only category feedback information was correlated with behavioral performance (MEG experiment). These findings demonstrate that category feedback information is task-driven and differs from the automatically encoded low-order information in foveal V1. The reviewer expressed some uncertainty that, whether “if orientation was a category-specific feature (sharpies are always horizontal and smoothies are vertical), there would still be no category decoding”. Our data showed that orientation could be automatically decoded in V1, regardless of task demand. Thus, if orientation was a category-specific feature in the foveal task (i.e., sharpies are always horizontal and smoothies are always vertical), category decoding would be successful in V1. However, in this scenario, the orientation and other shape features are not independent, thus preventing us to find out whether non-orientation shape features could be decoded in V1.  

      Reviewer #2 (Public review):

      (2.1) While not necessarily a weakness, I do not fully agree with the description of the 2 kinds of feedback information as "low-order" and "high-order". I understand the motivation to do this - orientation is typically considered a low-level visual feature. But when it's the orientation of an entire object, not a single edge, orientation can only be defined after the elements of the object are grouped. Also, the discrimination between spikies and smoothies requires detecting the orientations of particular edges that form the identifying features. To my mind, it would make more sense to refer to discrimination of object orientation as "coarse" feature discrimination, and orientation of object identity as "fine" feature discrimination. Thus, the sentence on line 83, for example, would read "Interestingly, feedback with fine and coarse feature information exhibits different laminar profiles.".

      We agree that the object orientation (invariant to object category or identity) is defined on a larger spatial scale than the local orientation features such as local edges, however, in this sense, the object orientation is a coarse feature. In contrast, the category-defining information is mainly contributed by the local shape information (i.e., little cubes vs. bumps), which is more fine-scale information. One way to look at this difference is that the object orientation information is mainly carried by low-spatial frequency information and will survive low-pass filtering, hence “coarse”; while the object category information would largely be lost if the objects underwent low-pass spatial filtering.

      We believe the labeling words “low-order” and “high-order” are consistent with the typical use of these terms in the literature, referring to features intrinsically encoded in early visual cortex vs. in high level object sensitive cortical regions. The more important aspects of our results are in their differential engagement in feedforward vs. feedback processing, with low-order features automatically represented in the early visual cortex during feedforward processing while high-order features represented due to feedback processing. Results from the foveal fMRI experiment (Exp. 2) strongly support this assumption that, when objects were presented at the fovea and the task was a fixation color task irrelevant to object information, foveal V1 could only represent orientation information, not category information. Notably, there was a dramatic difference in decoding performance in foveal V1 between Exp.1 and Exp.2, which ruled out the argument that both orientation and category information were driven by local edge information represented in V1.

      (2.2) Figure 2 and text on lines 185, and 186: it is difficult to interpret/understand the findings in foveal ROIs for the foveal control task without knowing how big the ROI was. Foveal regions of V1 are grossly expanded by cortical magnification, such that the central half-degree can occupy several centimeters across the cortical surface. Without information on the spatial extent of the foveal ROI compared to the object size, we can't know whether the ROI included voxels whose population receptive fields were expected to include the edges of the objects.

      The ROI of foveal V1 was defined using data from independent localizer runs. In each localizer run, flashing checkerboards of the same size as the objects in the task runs were presented at the fovea or in the periphery. The ROI of foveal V1 was identified as the voxels responsive to the foveal checkerboards. In other words, The ROI of foveal V1 included the voxels whose population receptive fields covered the entire object in the foveal visual field.

      We included a figure in the revised manuscript comparing the activation maps induced by the foveal object stimulus in the task runs with the ROI coverage defined by the localizer runs. 

      (2.3) Line 143 and ROI section of the methods: in order for the reader to understand how robust the responses and analyses are, voxel counts should be provided for the ROIs that were defined, as well as for the number (fraction) of voxels excluded due to either high beta weights or low signal intensity (lines 505-511).

      In the revised manuscript, we have included the number of voxels in each ROI and the criteria for voxel selection:

      For each ROI, the number of voxels depended on the size of the activated region, as estimated from the localizer data. The numbers are as follows: foveal V1, 2185 ± 389; peripheral V1, 1294± 215; LOC, 3451 ± 863; and pIPS, 5154 ± 1517. To avoid the signals of large vessels, a portion of voxels was removed based on the distribution of large vessels: V1 foveal, 22.5% ± 6.6%; V1 peripheral, 6.8% ± 3.9%; LOC, 16.1% ± 8.1% ; and pIPS, 5.1% ± 3.2%. For the decoding analysis, the top 500 responsive voxels in each ROI were selected to balance the voxel numbers across different ROIs for training and testing the decoder.

      (2.4) I wasn't able to find mention of how multiple-comparisons corrections were performed for either the MEG or fMRI data (except for one Holm-Bonferonni correction in Figure S1), so it's unclear whether the reported p-values are corrected.

      For the fMRI results, there is strong evidence showing that feedback information is sent to the foveal V1 during a peripheral object task (Williams et al., 2008; Fan et al., 2016; Yu and Shim, 2016). In addition, anatomical and functional evidence shows that the superficial and deep layers of V1 receive feedback information during visual processing. Therefore, in the current study, we specifically examined two types of feedback information in the superficial and deep layers of foveal V1, and did not apply multiple-comparison correction to the decoding results.

      Regarding the MEG results, since we did not have a strong prior about when feedback information would arrive in the foveal V1, a cluster-based permutation method was used to correct for multiple comparisons in each time course. Specifically, for each time point, the sign of the effect for each participant was randomly flipped 50000 times to obtain the null hypothesis distribution for each time point. Clusters were defined as continuous significant time points in the real and flipped time series, and the effects in each cluster were summed to create a cluster-based effect. The most significant cluster-based effect in each flipped time series was then used to generate the corrected null hypothesis distribution.

      We included these clarifications in Significance testing part of the revised manuscript.

      Reviewer #1 (Recommendations for the authors):

      It would be helpful if the authors could elaborate more on the fMRI decoding results in higher-order visual areas in the Discussion (there are recent studies also investigating higher-order visual areas (Carricarte et al., 2024) and associative areas (Degutis et al., 2024)) and relate it to the MEG information transmission results between the areas overlapping with the regions recorded in the fMRI part of the study.

      We have discussed the fMRI decoding results in the LOC and IPS in the revised manuscript: 

      In the current study, fMRI signals from early visual cortex and two high-level brain regions (LOC and pIPS) were recorded. Neural dynamics of these regions were extracted from MEG signals. Decoding analyses based on fMRI and MEG signals consistently showed that object category information could be decoded from both regions. These findings raise an important question:  Further Granger causality analysis indicates that the feedback information in foveal V1 was mainly driven by signals from the LOC. Layer-specific analysis showed that category information could be decoded in the middle and superficial layers of the LOC. A reasonable interpretation of this result is that feedforward information from the early visual cortex was received by the LOC’s middle layer, then the category information was generated and fed back to foveal V1 through the LOC’s superficial layer. A recent study (Carricarte et al., 2024) found that, in object selective regions in temporal cortex, the deep layer showed the strongest fMRI responses during an imagery task. Together, the results suggest that the deep and superficial layers correspond to different feedback mechanisms. It is worth noting that other cortical regions may also generate feedback signals to the early visual cortex. The current study did not have simultaneously recorded fMRI signals from the prefrontal cortex, but it has been shown that feedback signals can be traced back to the prefrontal cortex during complex cognitive tasks, such as working memory (Finn et al., 2019; Degutis et al., 2024). Further fMRI studies with submillimeter resolution and whole-brain coverage are needed to test other potential feedback pathways during object processing.

      The behavioral performance seems quite low (67%), could authors explain the reasons for it?

      We designed the object stimuli to be difficult to distinguish on purpose. Some of our pilot data showed that the more involved the participants were in the peripheral object task, the easier the foveal feedback information was to decoded. It is reasonable to assume that if the peripheral objects were easily distinguishable, the feedback mechanism may not be fully recruited during object processing. Furthermore, since we were decoding category and orientation information rather than identity information, the difficulty of distinguishing two objects from the same category and with the same orientation would not affect the decoding of category and orientation information in the neural signals.

      Reviewer #2 (Recommendations for the authors):

      (1) Line 52: the meaning of the sentence starting with "However, ..." is not entirely clear. Maybe the word "while" is missing after the first comma?

      (2) Line 224. If I'm understanding the rationale for the MEG analysis correctly, it was not possible to localize foveal regions, but the cross-location decoding analysis was used to approximate the strength and timing of feedback information. If this is the case, "neural representations in the foveal region" were not extracted.

      (3) Figure 4. The key information is too small to see. The lines indicating where decoding performance was significant are quite thin but very important, and the text next to them indicating onset times of significant decoding is in such a small font size I needed to zoom in to 300% to read it (yes, my eyes are getting old and tired). Increasing the font size used to represent key information would be nice.

      (4) Figure 4 caption. Line 270 describes the line color in the plots as yellow, but that color is decidedly orange to my eye.

      (5) Line 340/341: Papers that define and describe feedback-receptive fields seem important to cite here:

      Keller, A. J., Roth, M. M., & Scanziani, M. (2020). Feedback generates a second receptive field in neurons of the visual cortex. Nature, 582(7813), 545-549.

      Kirchberger, L., Mukherjee, S., Self, M. W., & Roelfsema, P. R. (2023). Contextual drive of neuronal responses in mouse V1 in the absence of feedforward input. Science advances, 9(3), eadd2498.

      (6) Lines 346-350: this sentence seems to have some missing or misused words, because the syntax isn't intact.

      (7) Line 367: supports should be support.

      We thank the reviewers for the comments and have corrected them in the manuscript.

    1. eLife Assessment

      This important study identifies a plant-derived metabolite, betulin, as an effective natural insecticide against aphids and uncovers its specific molecular target. The evidence is compelling, combining greenhouse and field efficacy trials with rigorous molecular, genetic, and electrophysiological approaches that converge on a conserved binding site in the aphid GABA receptor. While additional work is needed to fully assess potential off-target effects and ecological safety, the study provides a strong mechanistic foundation. These findings will be of interest to researchers in plant biology, chemical ecology, and sustainable pest management.

    2. Reviewer #1 (Public review):

      Wang, Junxiu et al. investigated the underlying molecular mechanisms of the insecticidal activity of betulin against the peach aphid, Myzus persicae. There are two important findings described in this manuscript: (a) betulin inhibits the gene expression of GABA receptor in the aphid, and (b) betulin binds to the GABA receptor protein, acting as an inhibitor. The first finding is supported by RNA-Seq and RNAi, and the second one is convinced with MST and electrophysiological assays. Further investigations on the betulin binding site on the receptor protein provided a fundamental discovery that T228 is the key amino acid residue for its affinity, thereby acting as an inhibitor, backed up by site-directed mutagenesis of the heterologously-expressed receptor in E. coli and by CRISPR-genome editing in Drosophila.

      Comments on revisions:

      All of my review comments have been addressed, and the manuscript has been revised accordingly.

    3. Reviewer #2 (Public review):

      Summary:

      This important study shows that betulin from wild peach trees disrupts neural signaling in aphids by targeting a conserved site in the insect GABA receptor. The authors present a nicely integrated set of molecular, physiological, and genetic experiments to establish the compound's species-specific mode of action. While the mechanistic evidence is solid, the manuscript would benefit from a broader discussion of evolutionary conservation and potential off-target ecological effects.

      Strengths:

      The main strengths of the study lie in its mechanistic clarity and experimental rigor. The identification of a betulin-binding single threonine residue was supported by (1) site-directed mutagenesis and (2) functional assays. These experiments strongly support the specificity of action. Furthermore, the use of comparative analyses between aphids and fruit flies demonstrates an important effort to explore species specificity, and the integration of quantitative data further enhances the robustness of the conclusions.

      Comments on revisions:

      The revision satisfactorily addresses my concerns on evolutionary context, methodological clarity, and ecological risk.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Wang, Junxiu et al. investigated the underlying molecular mechanisms of the insecticidal activity of betulin against the peach aphid, Myzus persicae. There are two important findings described in this manuscript: (a) betulin inhibits the gene expression of GABA receptor in the aphid, and (b) betulin binds to the GABA receptor protein, acting as an inhibitor. The first finding is supported by RNA-Seq and RNAi, and the second one is convinced with MST and electrophysiological assays. Further investigations on the betulin binding site on the receptor protein provided a fundamental discovery that T228 is the key amino acid residue for its affinity, thereby acting as an inhibitor, backed up by site-directed mutagenesis of the heterologously-expressed receptor in E. coli and by CRISPR-genome editing in Drosophila.

      Although the manuscript does have strengths in principle, the weaknesses do exist: the manuscript would benefit from more comprehensive analyses to fully support its key claims in the manuscript. In particular:

      (1) The Western blotting results in Figure 5A & B appear to support the claim that betulin inhibits GABR gene expression (L26), as a decrease in target protein levels is often indicative of suppressed gene expression. The result description for Figure 5A & B is found in L312-L316, within Section 3.6 ("Responses of MpGABR to betulin"), where MST and voltage-clamp assays are also presented. It seems the observed decrease in MpGABR protein content is due to gene downregulation, rather than a direct receptor protein-betulin interaction. However, this interpretation lacks discussion or analysis in either the corresponding results section or the Discussion. In contrast, Figures 5C-F are specifically designed to illustrate protein-betulin interactions. Presenting Figure 5A & B alongside these panels might lead to confusion, as they support distinct claims (gene expression vs. protein binding/inhibition). Therefore, I recommend moving Figure 5A & B either to the end of Figure 3 or to a separate figure altogether to improve clarity and logical flow. A minor point in the Western blotting experiment is that although GAPDH was used as a reference protein, there is no explanation in the corresponding M&M section.

      We thank the reviewer for the concise and accurate summary and appreciate the constructive feedback on the article’s strengths and weaknesses.

      (A) According to your suggestion, the original Figure 5A and B have been inserted into Figure 3, following Figure 3D. The original Figure 3E-I has been saved as a new figure, to illustrate the RNAi assay.

      (b) “GAPDH was used as a reference protein” has been supplied in the M&M section, see

      Line 209.

      (2) The description of the electrophysiological recording experiment is unclear regarding the use of GABA. I didn't realize that GABA, the true ligand of the GABA receptor, was used in this inhibition experiment until I reached the Results section (L321), which states, "In the presence of only GABA, a fast inward current was generated." Crucially, no details are provided on the experiment itself, including how GABA was applied (e.g., concentration, duration, whether GABA was treated, followed by betulin, or vice versa). This information is essential for reproducibility. Please ensure these details are thoroughly described in the corresponding M&M section.

      We thank the reviewer for the valuable comments.

      (a) Detailed information on how to apply GABA has been added to the corresponding M&M section (Lines 260-263): After 3 days of incubation, the oocytes were used for electrophysiological recording. GABA was dissolved in 1 × Ringer's solution to prepare 100 µM GABA solution. Subsequently, the 100 µM GABA solutions containing different concentrations of betulin (0, 5, 10, 20, 40, 80, 160, 320 µM) were used to perfuse the oocytes.

      (b) Additionally, we also checked other contents of M&M section to ensure that sufficient detail has been supplied.

      (3) The phylogenetic analysis, particularly concerning Figures 4 and 6B, needs significant attention for clarity and representativeness. First, your claim that MpGABR is only closely related to CAI6365831.1 (L305-L310) is inconsistent with the provided phylogenetic tree, which shows MpGABR as equally close to Metopolophium dirhodum (XP_060864885.1) and Acyrthosiphon pisum (XP_008183008.2). Therefore, singling out only Macrosiphum euphorbiae (CAI6365831.1) is not supported by the data. Second, the representation of various insect orders is insufficient. All 11 sequences in the Hemiptera category (in both Figure 4 and Figure 6B) are exclusively from the Aphididae family. This small subset cannot represent the highly diverse Order Hemiptera. Consequently, statements like "only THR228 was conserved in Hemiptera" (L338), "The results of the sequence alignment revealed that only THR228 was conserved in Hemiptera" (L430), or "THR228... is highly conserved in Hemiptera" (L486) are not adequately supported. Third, similar concerns apply to the Diptera order, which includes 10 Drosophila and 2 mosquito samples (not diverse or representative enough), and likely to other orders as well. Thereby, the Figure 6B alignment should be revised accordingly to reflect a more accurate representation or to clarify the scope of the analysis. Fourth, there's a discrepancy in the phylogenetic method used: the M&M section (L156) states that MEGA7, ClustalW, and the neighbor-joining method were used, while the Figure 4 caption mentions that MEGA X, MUSCLE, and the Maximum likelihood method were employed. This inconsistency needs to be clarified and made consistent throughout the manuscript. Fifth, I have significant concerns about the phylogenetic tree itself (Figure 4). A small glitch was observed at the Danaus plexippus node, which raises suspicion regarding potential manipulation after tree construction. More critically, the tree, especially within Coleoptera, does not appear to be clearly resolved. I am highly concerned about whether all included sequences are true GABR orthologs or if the dataset includes partial or related sequences that could distort the phylogeny. Finally, for Figure 6B, both protein (XP_) and nucleotide (XM_) sequences were mix used. I recommend using the protein sequences instead of nucleotide sequences in this figure panel, as protein sequences are more directly informative.

      We thank the reviewer for the careful reading and valuable comments.

      (a) Firstly, according to your comments, phylogenetic analysis has been re-performed with more represent species from each Order (Fig. 5 and Fig. 7B). The results revealed that only THR228 was conserved across 11 species in the Aphididae family of Hemiptera. Therefore, the expressions like "only THR228 was conserved in Hemiptera" have been revised to “among the four residues, only THR228 was conserved across 11 species in the Aphididae family of Hemiptera” (Line 106, Line 369, Line 477, and Lines 563-564).

      (b) We have modified the description of Fig. 5 (the original Fig. 4): MpGABR  (XP_022173711.1) was found to be genetically closely related to CAI6365831.1 from Macrosiphum euphorbiae, XP 008183008.2 from Acyrthosiphon pisum, and XP 060864885.1 from Metopolophium dirhodum (Fig. 5 and Table S6). See Lines 342-346.

      (c) Phylogenetic analysis was performed using MEGA7 with multiple amino acid sequence alignment (ClustalW) and the neighbor-joining method. We have revised the Fig. 5 (the original Fig. 4) caption to make it accurate and consistent throughout the manuscript.

      (d) We are sorry about the small glitch at the Danaus plexippus node. Actually, after the phylogenetic tree was constructed, it was imported in Adobe Illustration for coloring and classification annotation. There may have been operational errors during the process of resizing the image, resulting in the occurrence of the small glitch. Besides, the unclear clustering of Coleoptera may be due to improper regulation of distance (pixels) of branch from nodes. Again, thanks for your careful reading. We have rebuilt the phylogenetic tree.

      (e) Based on your suggestion, the sequence IDs have been unified as the protein sequence IDs (Fig. 5, Fig. 7B and Table S6)

      (4) The Discussion section requires significant revision to provide a more insightful and interpretative analysis of the results. Currently, much of the section primarily restates findings rather than offering deeper discussion. For instance, L409-L419 restate the results, followed by the short sentence "Collectively, these results suggest that betulin may have insecticidal effects on aphids by inhibiting MpGABR expression". It could be further expanded to make it beneficial to elaborate on proposed mechanisms by which gene expression might be suppressed, including any potential transcription factors involved. In contrast, while L422-L442 also initially summarize results, the subsequent paragraph (L445-L472) effectively discusses the potential mechanisms of inhibitory action and how mortality is triggered, which is a good model for other parts of the section. However, all the discussion ends up with a short statement, "implying that betulin acts as a CA of MpGABR" (L472), which appears to be a leap. The inference that betulin acts as a competitive antagonist (CA) is solely based on the location of its extracellular binding site, which does not exactly overlap with the GABA binding site. It needs stronger justification or actually requires further experimental validation. The authors should consider rephrasing this statement to acknowledge the need for additional studies to definitively confirm this mechanism of action.

      We appreciate the reviewer's careful reading and valuable feedback, which will certainly enhance the quality of our manuscript.

      (a) Possible reasons for the effect of betulin on MpGABR expression have been discussed in our manuscript (Lines 455-466): The regulation of gene expression is sophisticated and delicate (Pope and Medzhitov 2018). The regulatory network controlling GABR expression remains unclear. In adult rats, epileptic seizures has been reported to increase the levels of brain-derived neurotrophic factor (BDNF), which in turn prompted the transcription factors CREB and ICER to reduce the gene expression of the GABR α1 subunit (Lund et al. 2008). In Drosophila, it has been demonstrated that WIDE AWAKE, which regulated the onset of sleep, interacted with the GABR and upregulated its expression level (Liu et al. 2014). In Drosophila brain, circular RNA circ_sxc was found to inhibit the expression of miR-87-3p in the brain through sponge adsorption, thereby regulating the expression of neurotransmitter receptor ligand proteins, including GABR, and ensuring the normal function of synaptic signal transmission in brain neurons (Li et al. 2024). However, it remains unclear how betulin reduces the expression of MpGABR, and further research is needed.

      (b) In the Discussion section, we acknowledged the need for further research to ultimately confirm the mechanism by which betulin competes with GABA for binding to MpGABR (Lines 532-535): Although the mechanism by which betulin competes with GABA for binding to MpGABR requires further experimental validation, our work may have provided a novel target for developing insecticides.

      (c) Besides, we have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (d) Furthermore, the discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #2 (Public review):

      Summary:

      This important study shows that betulin from wild peach trees disrupts neural signaling in aphids by targeting a conserved site in the insect GABA receptor. The authors present a nicely integrated set of molecular, physiological, and genetic experiments to establish the compound's species-specific mode of action. While the mechanistic evidence is solid, the manuscript would benefit from a broader discussion of evolutionary conservation and

      potential off-target ecological effects.

      Strengths:

      The main strengths of the study lie in its mechanistic clarity and experimental rigor. The identification of a betulin-binding single threonine residue was supported by (1) site-directed mutagenesis and (2) functional assays. These experiments strongly support the specificity of action. Furthermore, the use of comparative analyses between aphids and fruit flies demonstrates an important effort to explore species specificity, and the integration of quantitative data further enhances the robustness of the conclusions.

      Weaknesses:

      There are several important limitations that need to be addressed. The manuscript does not explore whether the observed sensitivity to betulin reflects a broadly conserved feature of GABA receptors across animal lineages or a more lineage-specific adaptation. This evolutionary context is crucial for understanding the broader significance of the findings.

      In addition, while the compound's aphicidal effect is well established, the potential for off-target effects in non-target organisms - especially vertebrates - remains unaddressed, despite prior evidence that betulin interacts with mammalian GABAa receptors. There is little discussion on the ecological or environmental safety of exogenous betulin application, such as persistence, degradation, or exposure risks.

      We sincerely thank the reviewer for the time and effort dedicated to our manuscript's detailed review and assessment. The revision suggestions were constructive, and we have provided a point-by-point response to address them.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (c) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-553): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin has specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. In summary, before any field application, further research is needed on the environmental behavior, degradation process, and safety of betulin.

      Reviewer #1 (Recommendations for the authors):

      (1) L28 Provide the full name of MST.

      Thanks for your suggestion. The full name of MST, microscale thermophoresis, has been supplied.

      (2) L87 in the Order Hemiptera.

      Thanks for your suggestion. Corrected.

      (3) L99 "Leaf bioassay" would be better to differentiate the greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (4) L104 It should be 7 doses, including the "0 mg/mL" control.

      Thanks for your suggestion. Corrected.

      (5) L104 Since the LC50 of pymetrozine is 1.0612 mg/mL, a wider range of doses should have been tested compared to the dose range of betulin.

      Thanks for your comment.

      (a) Firstly, seven doses (0, 0.0625, 0.125, 0.25, 0.5, 1, and 2 mgmL<sup>-1</sup>) were set to calculate the LC50 of betulin and pymetrozine. Since the LC50 values of betulin and pymetrozine are 0.1641 and 1.0612 mgmL<sup>–1</sup>, respectively, which are within the set range, indicating that the set dose range is reasonable and the LC50 values of betulin and pymetrozine are reliable.

      (b) To compare the control effects of betulin and pymetrozine against M. persicae, LC50 of betulin (0.1641 mgmL<sup>-1</sup>) and pymetrozine (1.0612 mgmL<sup>-1</sup>) were used to treat M. persicae.

      (6) L109 Greenhouse and field bioassays.

      Thanks for your suggestion. Corrected.

      (7) L112 Tween-80 and acetone in L103. Keep the order consistent throughout the manuscript.

      Thanks for your suggestion. Corrected.

      (8) L122 Mortality was recorded at 1, 5, 9, and 14 days after treatment. Revise the other similar mistakes throughout the manuscript (e.g. L250, L254, L255, L256, L259, etc.).

      Thanks for your suggestion. Corrected.

      (9) L126 apterous instead of wingless (keep a consistent expression).

      Thanks for your suggestion. Corrected.

      (10) L138 Primer Premier?

      Thanks for your comment. Corrected.

      (11) L141 Add RPS18 primers in Table S2.

      Thanks for your comment. Corrected.

      (12) L155 MEGA7 vs. MEGAX (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (13) L156 NJ method vs. ML method (as described in the Figure 4 caption).

      Thanks for your comment. Corrected.

      (14) L157 2.7. RNAi assay (Remove "In vitro" and re-number the following M&M sections accordingly).

      Thanks for your comment. Corrected.

      (15) L163 Add dsGFP primers in Table S2.

      Thanks for your comment. Corrected.

      (16) L166 apterous instead of wingless (keep a consistent expression).

      Thanks for your comment. Corrected.

      (17) L172 Add the source of pET-B2M vector.

      pET-B2M vector was obtained from BGI (Shenzhen, China), which has been added in our manuscript (Line 194).

      (18) L195 coding sequence instead of cDNA.

      Thanks for your comment. Corrected.

      (19) L198 the mutations of R224A ...

      Thanks for your comment. Corrected.

      (20) L199 TYR), or T228R ...

      Thanks for your comment. Corrected.

      (21) L211 and 90 ng.

      Thanks for your comment. Corrected.

      (22) L213 genomic DNA instead of gDNA, because gDNA may be confused in the context of sgRNA.

      Thanks for your suggestion. Corrected.

      (23) L253 (Fig. 1A-B).

      Thanks for your comment. Corrected.

      (24) L268 Explain why these 15 DEGs were selected for qRT-PCR.

      Thanks for your comment. These 15 DEGs were randomly selected and act as representative DEGs with different expression levels. The reason for selection of these 15 DEGs were added in the manuscript (Lines 295-296).

      (25) L287 What about GABRB? It has a TM domain.

      GABRB refers to “gamma-aminobutyric acid receptor subunit beta-like” annotated on NCBI. Theoretically, it should contain four transmembrane structural domains, while it has only one, indicating that it is incomplete.

      (26) L297 Add dsGFP as another control group.

      Thanks for your comment. Corrected.

      (27) L299 increased by 30.44% (Remove a comma).

      Thanks for your comment. Corrected.

      (28) L308 XM_022318019.1 (or protein accession number with XP_).

      Thanks for your comment. Corrected.

      (29) L338 that THR228 was conserved only in Hemiptera.

      Thanks for your comment. Since our original intention was to emphasize that THR228 is the only conserved among the four key amino acid residues, after careful consideration, we retained the expression "only THR228".

      (30) L342 or T228R.

      Thanks for your comment. Corrected.

      (31) L382 Is pyrhidone a general name for pymetrozine?

      Thanks for your comment. Corrected.

      (32) L450 Remove "and so on".

      Thanks for your comment. Corrected.

      (33) Figure 1D: Remove "Environment friendly". Replace the plant pot image on the right side with the one sprayed with pymetrozine, like the one in Figure 1F.

      Thanks for your comment. 

      (a) "Environment friendly" in Figure 1D has been removed.

      (b) We have attempted to modify the Figure 1D according to your suggestion. However, the modified Figure 1D is similar to Figure 1F and appears monotonous. Therefore, we have retained the original framework of Figure 1D.

      (34) Figure 2E 111036117 and 111041856 are in different IDs (XM_). I suggest keeping GeneID in Figure 2E and Table S2, as shown in Table S4.

      Thanks for your comment. Corrected.

      (35) Figure 2H: Add unit of the heatmap values. Or just add the title (e.g., expression level) on top of the bar.

      Thanks for your comment. Corrected.

      (36) Figure 3A: Add "aa" next to 700.

      Thanks for your comment. Corrected.

      (37) Figure 3E-G: Revise the tick marks on Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (38) Figure 5C: Remove "1" and move "WT" up to the position where "1" was.

      Thanks for your comment. Corrected.

      (39) Figure 5D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5.

      Thanks for your comment. Corrected.

      (40) Figure 5E: Remove the decimal. (e.g. 5 uM, 10 uM, 20 uM, etc.).

      Thanks for your comment. Corrected.

      (41) Figure 6B: What are the numbers next to the amino acid sequences? Provide the information in the figure caption.

      Thanks for your comment. The numbers next to the amino acid indicates the site of the last residue of the key amino acids, which was supplied in the figure caption.  

      (42) Figure 6D: Revise the tick marks on the Y-axis: 0.0, 0.5, 1.0, and 1.5. The X-axis title should be betulin (see Figure 5D). In the figure caption at the 5th row from the top, R244A should be R224A.

      Thanks for your comment. Corrected.

      (43) Figure 7E: R122T (not R1272T).

      Thanks for your comment. Corrected.

      (44) Supplementary Figure 1: It should be Figure S1. Add dsGFP in the figure caption.

      Thanks for your comment. Corrected.

      (45) Figure S2: What are the two pink bars and the other bars in brown or blue? Add an appropriate explanation in the figure caption.

      Thanks for your comment. Corrected.

      (46) Table S1: r square?

      Thanks for your comment. It is “r square” and corrected.

      (47) Table S2: (a) Add horizontal lines to separate qPCR, RNAi, cloning, and heterologous expression from each other (b) Replace XM_022318017.1 and XM_022318019.1 with their corresponding GeneIDs, as shown in Table S4. (c) AK340444.1 is a sequence from another aphid (Acyrthosiphon pisum)-Revise it. (d) In the cloning primers, place MpGABR first, followed by MpGABRAP and MpGABRB, as shown in the manuscript and Table S5. (e) Also, in the cloning primers, MpGABRB and MpGABRAP use reverse primers without stop codon, while MpGABR uses stop codon (TCA = TGA in reverse)-Revise it accordingly. Otherwise, provide the reason.

      Thanks for your comment. Corrected.

      (48) Table S3: (a) Add "Drosophila melanogaster" and the target sequence ID in the table caption. Is it KF881792.1, as shown in Table S6? (b) Align the sequences to the left side. 

      Thanks for your comment. 

      (a) The GenBank number of target sequence is KF881792.1 (Drosophila melanogaster). We have added this information in the Table S3 note.

      (b) It has been adjusted according to your suggestion.

      (49) Table S5: (a) Replace the accession numbers with GeneID, as shown in Table S4. K340444.1 is a sequence from another aphid (Acyrthosiphon pisum), (b) Coding sequences with stop codon are 2082, 357, and 753, respectively, while the sequences without stop codon are 2079, 354, and 750, respectively. The lengths of the deduced amino acids are 693, 118, and 250. Revise accordingly.

      Thanks for your comment. Corrected.

      (50) Table S6: (a) Use GenBank No for protein sequences. There is no Gene ID in this table. (b) Order (instead of Class). (c) See my comment on the phylogenetic analysis above.

      Thanks for your comment. Corrected.

      (51) Table S7 (a) Add unit under "Binding Energy". (b) There are two ALA226 [Alkyl] with two different distances. (c) PHE227 at the bottom should be THR228?

      Thanks for your comment.

      (a) The unit of "Binding Energy" was kcalmol<sup>–1</sup>, and it was added in the table caption.

      (b) Refer to Figure 6A, there were two Alkyl interaction between ALA226 and betulin. Therefore, there were two ALA226 [Alkyl] with two different distances.

      (c) Similarly, there were two Pi-Alkyl interactions between PHE227 and betulin. Thus, there were two rows of PHE227 in the table.

      (52) Table S9 (a) R117T should be R122T. (b) r square?

      Thanks for your comment. a and b Corrected.

      Reviewer #2 (Recommendations for the authors):

      (1) Introduction

      (a) It lacks a deeper biological and evolutionary framing of the GABA receptor system. As GABA receptors are highly conserved across animal taxa, the observed interaction between betulin and the aphid GABA receptor could have broader implications. This possibility is not addressed in the current version, which limits the reader's appreciation of the relevance of this mode of action.

      (b) Previous reports of betulin activity in mammalian systems are not mentioned in the introduction, even though they are directly relevant to concerns about off-target toxicity. Therefore, the introduction should be revised to (i) briefly introduce the evolutionary conservation of GABA receptors, and (ii) acknowledge that betulin may affect a broader range of organisms, which sets up the need for caution in its application.

      Thanks for your important suggestions.

      (a) Briefly introduce the evolutionary conservation of GABA receptors has been added in the Introduction (Lines 90-98): Previous study has proposed that vertebrate and human GABR genes maintain a broad and conservative gene clustering pattern, while in invertebrates, this pattern is missing, indicating that these gene clusters formed early in vertebrate evolution and were established after diverging from invertebrates. Notably, invertebrates each possess a unique GABR gene pair, which are homologous with human GABR α and β subunits, suggesting that the existing GABR gene cluster evolved from an ancestral α - β subunit gene pair (Tsang et al. 2006). During the coevolution of plants and insects, the duplications and amino acid substitutions in GABR may be beneficial for the adaptation to insecticides and terpenoid compounds (Guo et al. 2023).

      (b) The possible effects of betulin on a broader range of organisms have been acknowledged in the Introduction section (Lines 68-77): An immune stimulant, Ir-Bet, was prepared using iridium complex and betulin, which evoked ferritinophagy-enhanced ferroptosis, thereby activating anti-tumor immunity (Lv 2023). The anti-inflammatory effect of betulin has been reported in macrophages at lymphoma site in mice (Szlasa et al. 2023). Betulin has been found to improve hyperlipidemia and insulin resistance and decrease atherosclerotic plaques by inhibiting the maturation of sterol regulatory element-binding protein (Tang et al. 2011). Besides, betulin and its derivatives have been found to exhibit insecticidal activity against Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024).

      (c) At the end of the introduction, we remind that betulin should be used with caution (Lines 111-112): However, given that betulin may affect a wider range of organisms, it should be used with caution.

      (2) Method

      Number of biological replicates in all assays and justification of thresholds used for significance in RNAi and survival experiments are not addressed in the manuscript.

      Thanks for your careful reading. We have checked Materials and Methods section and added corresponding number of biological replicates in all assays. Besides, the p-values for the corresponding significance analyses of RNAi and survival experiments have been added to our Manuscript.

      (2)  Discussion

      (a) Consistent with the comments on the Introduction, the absence of discussion on (i) the evolutionary conservation of GABA receptor sensitivity to betulin, (ii) potential off-target effects in non-target insects and vertebrates (if so, this cannot be use for "eco-friendly pesticide" as the authors stated in the manuscript), and (iii) ecological risks associated with the exogenous application of betulin limits both the interpretive depth and applied relevance of the study.

      (b) To strengthen the Discussion, the authors should consider addressing: (i) whether the observed sensitivity reflects a conserved pharmacological vulnerability across animal taxa or a lineage-specific adaptation; (ii) the potential ecological risks of deploying betulin as a bioinsecticide, and (iii) the need for future research into the environmental fate, degradation, and safety profile of betulin prior to any field-level application.

      Thank you for your valuable comments.

      (a) We have added the discussion of the sensitivity of GABA receptor to betulin in Discussion section (Lines 491-501): Studies on key amino acids that are crucial for GABR function has primarily focused on transmembrane regions. For instance, based on the mutational research and Drosophila GABR modeling approach, multiple key amino acids were identified as insecticide targets in the transmembrane domain (Nakao and Banba 2021). Guo et al. proposed that amino acid substitutions in the transmembrane domain 2 contribute to terpenoid insensitivity during plant-insect coevolution (Guo et al. 2023). However, these studies have neglected the extracellular domain. Our study signified that betulin targets the THR228 site in the extracellular domain of MpGABR, which is conserved only in the Aphididae family. Therefore, betulin is speculated to be a specific insecticidal substance evolved by plants in response to aphid infestation. Besides, further verification is needed to determine whether betulin is toxic to other insect species.

      (b) The discussion of potential ecological risks of deploying betulin as a bioinsecticide has been elaborated in our manuscript (Lines 538-551): The development of bioinsecticides should not only focus on the toxic effects of active substance on target organisms, but also on their influence on the ecosystem (Haddi et al. 2020). Although our results indicate that betulin had specific toxicity to aphids, previous studies have reported that betulin and its derivatives had effects on Plutella xylostella L. (Huang et al. 2025), Aedes aegypti (de Almeida Teles et al. 2024), and Drosophila melanogaster (Lee and Min 2024). Therefore, further research is needed to determine whether there are other insecticidal mechanisms or off target effects of betulin. Additionally, betulin exhibits a wide range of pharmacological activities (Amiri et al. 2020), which have been used to treat various diseases, such as cancer (Lv 2023), glioblastoma (Li et al. 2022), inflammation (Szlasa et al. 2023) and hyperlipidemia (Tang et al. 2011). Before applying betulin in the field, it is necessary to fully verify and consider whether betulin has any impact on farmers' health. Furthermore, will betulin cause residue or diffusion in the process of field application? Will long-term application promote the evolution of resistance to aphids or other insects? These issues also need further experimental verification. 

      (c) Additionally, at the end of the Discussion, we remind that more research is needed before any field application of betulin (Lines 551-553): In summary, before any field application, further research on the environmental behavior, degradation process, and safety of betulin is needed.

      Reference

      Amiri S, Dastghaib S, Ahmadi M, Mehrbod P, Khadem F, Behrouj H, Aghanoori M, Machaj F, Ghamsari M, Rosik J, Hudecki A, Afkhami A, Hashemi M, Los M, Mokarram P, Madrakian T, Ghavami S. 2020. Betulin and its derivatives as novel compounds with different pharmacological effects. Biotechnology Advances 38: 107409.

      de Almeida Teles AC, dos Santos BO, Santana EC, Durço AO, Conceição LSR, Roman Campos D, de Holanda Cavalcanti SC, de Souza Araujo AA, dos Santos MRV. 2024.

      Larvicidal activity of terpenes and their derivatives against Aedes aegypti: a systematic review and meta-analysis. Environmental Science and Pollution Research 31: 64703-64718.

      Guo L, Qiao X, Haji D, Zhou T, Liu Z, Whiteman NK, Huang J. 2023. Convergent resistance to GABA receptor neurotoxins through plant–insect coevolution. Nature Ecology & Evolution 7: 1444-1456.

      Haddi K, Turchen LM, Viteri Jumbo LO, Guedes RN, Pereira EJ, Aguiar RW, Oliveira EE. 2020. Rethinking biorational insecticides for pest management: unintended effects and consequences. Pest Management Science 76: 2286-2293.

      Huang X, Hao N, Shu L, Wei Z, Shi J, Tian Y, Chen G, Yang X, Che Z. 2025. Preparation and insecticidal activities of betulin-cinnamic acid-related hybrid compounds and insights into the stress response of Plutella xylostella L. Pest Management Science 81: 4243-4255.

      Lee HY, Min KJ. 2024. Betulinic acid increases the lifespan of Drosophila melanogaster via Sir2 and FoxO activation. Nutrients 16: 441.

      Li Q, Wang L, Tang C, Wang X, Yu Z, Ping X, Ding M, Zheng L. 2024. Adipose tissue exosome circ_sxc mediates the modulatory of adiposomes on brain aging by inhibiting brain dme-miR-87-3p. Molecular Neurobiology 61: 224-238.

      Li Y, Wang Y, Gao L, Tan Y, Cai J, Ye Z, Chen A, Xu Y, Zhao L, Tong S, Sun Q, Liu B, Zhang S, Tian D, Deng G, Zhou J, Chen Q. 2022. Betulinic acid self-assembled nanoparticles for effective treatment of glioblastoma. Journal of Nanobiotechnology 20: 39.

      Liu S, Lamaze A, Liu Q, Tabuchi M, Yang Y, Fowler M, Bharadwaj R, Zhang J, Bedont J,

      Blackshaw S, Lloyd Thomas E, Montell C, Sehgal A, Koh K, Wu Mark N. 2014. WIDE AWAKE mediates the circadian timing of sleep onset. Neuron 82: 151-166.

      Lund IV, Hu Y, Raol YH, Benham RS, Faris R, Russek SJ, Brooks Kayal AR. 2008. BDNF selectively regulates GABAA receptor transcription by activation of the JAK/STAT pathway. Science Signaling 1: ra9.

      Lv M, Zheng Y, Wu J, Shen Z, Guo B, Hu G, Huang Y, Zhao J, Qian Y, Su Z, Wu C, Xue X, Liu H, Mao Z. 2023. Evoking ferroptosis by synergistic enhancement of a cyclopentadienyl iridium-betulin immune agonist. Angewandte Chemie International Edition 62: e202312897.

      Nakao T, Banba S. 2021. Important amino acids for function of the insect Rdl GABA receptor. Pest Management Science 77: 3753-3762.

      Pope SD, Medzhitov R. 2018. Emerging principles of gene expression programs and their regulation. Molecular Cell 71: 389-397.

      Szlasa W, Ślusarczyk S, Nawrot Hadzik I, Abel R, Zalesińska A, Szewczyk A, Sauer N, Preissner R, Saczko J, Drąg M, Poręba M, Daczewska M, Kulbacka J, Drąg Zalesińska M. 2023. Betulin and its derivatives reduce inflammation and COX-2 cctivity in macrophages. Inflammation 46: 573-583.

      Tang JJ, Li JG, Qi W, Qiu WW, Li PS, Li BL, Song BL. 2011. Inhibition of SREBP by a small molecule, betulin, improves hyperlipidemia and insulin resistance and reduces atherosclerotic plaques. Cell Metabolism 13: 44-56.

      Tsang SY, Ng SK, Xu Z, Xue H. 2006. The evolution of GABAA receptor–like genes. Molecular Biology and Evolution 24: 599-610.

    1. eLife Assessment

      This study presents a valuable finding about how receptor-ligand binding pathways with multi-site phosphorylation can show non-monotonic responses to increasing ligand affinity and to kinase activity. The authors provide compelling evidence through a simple ordinary differential equation model of such signaling networks with the key new ingredient of ligand-induced receptor degradation. The work will be of interest to physicists and biologists working on signal transduction and biological information processing.

    2. Reviewer #1 (Public review):

      Summary:

      The authors study the steady-state solutions of ODE models for molecular signaling involving ligand binding coupled to multi-site phosphorylation at saturating ligand concentrations. Although the results are in principle general, the work highlights the receptor tyrosine kinases (RTK) as model systems. After presenting previous ODE model solutions, the authors present their own "kinetic sorting" model, which is distinguished by ligand-induced phosphorylation-dependent receptor degradation and the property that every phosphorylation state is signaling competent. The authors show that this model recovers the two types of non-monotonicity experimentally reported for RTKs: maximum activity for intermediate ligand affinity and maximum activity for intermediate kinase activity.

      The main contribution of the work is in demonstrating that their model can capture both types of non-monotonicity, whereas previous models could at most capture non-monotonicity of ligand binding.

      Strengths:

      The question of how energy dissipating, and thus non-equilibrium, molecular systems can achieve steady-state solutions not accessible to equilibrium systems is of fundamental importance in biomolecular information processing and self-organization. Although the authors do not address the energy requirements of their non-equilibrium model, their comparative analysis of different alternative non-equilibrium models provides insight into the design choices necessary to achieve non-monotonic control, a property that is inaccessible at equilibrium.

      The paper is succinctly written and easy to follow, and the authors achieve their aims by providing convincing numerical solutions demonstrating non-monotonicity over the range of parameter values encompassing the biologically relevant regime.

      Weaknesses:

      (1) A key motivating framework for this work is the argument that the ability to tune to recognize intermediate ligand affinities provides a control knob for signal selection that is available to non-equilibrium systems. As such, this seems like a compelling type of ligand selectivity, which is a question of broad interest. However, as the authors note in the results, the previously published "limited signaling model" already achieves such non-monotonicity to ligand binding affinity. The introduction and abstract do not clearly delineate the new contributions of the model.

      The novel benefit of the model introduced by the authors is that it also achieves non-monotonic response to kinase activity. Because such non-monotonicity is observed for RTK, this would make the authors' model a better fit for capturing RTK behavior. However, the broad significance of achieving non-monotonicity to kinase activity is not motivated or supported by empirical evidence in the paper. As such, the conceptual significance of the modified model presented by the authors is not clear.

      UPDATE: The authors have now clarified the significance of the model in elucidating how known motifs (multisite phosphorylation and active receptor degradation) could explain the behavior, including non-monotonicity. The authors have also provided compelling arguments for the biological significance of achieving non-monotonic kinase activity response.

      (2) Whereas previous models used in the literature are schematized in Figure 1, the model proposed by the author is missing (See line 97 of page 3). Without the schematic, the text description of the model is incomplete.

      UPDATE: this issue has been resolved.

      (3) The authors use the activity of the first phosphorylation site as the default measure of activity. This choice needs to be justified. Why not use the sum of the activities at all sites?

      UPDATE: This was a non-issue. The potential misunderstanding has been mitigated by clarifications in the text.

      Comments on revisions:

      All issues previously identified were convincingly addressed. I have no additional suggestions.

    3. Reviewer #2 (Public review):

      Summary:

      In classical models of signaling network, the signaling activity increases monotonically with the ligand affinity. However, certain receptors prefer ligands of intermediate affinity. In the paper, the authors present a new minimal model to derive generic conditions for ligand specificity. In brief, this requires multi-site phosphorylation and that high-affinity complexes be more prone to degrade. This particular type of kinetic discrimination allows to overcome equilibrium constraints.

      Strengths:

      The model is simple, and it adds only a few parameters to classical generic models. They moreover vary these additional parameters in ranges based on experimental observations. They explain how the introduction of these new parameters is essential to ligand specificity. Their model quantitatively reproduces the ligand specificity of a certain receptor. They finally provide testable prediction.

      Weaknesses:

      The naming of multiple variables as activity without precise definitions may be confusing to readers.

      Comments on revisions:

      I thank the authors for addressing my comments. One point remains regarding the naming of multiple variables as activity. Besides using other words, the authors may consider giving precise definitions of terms, e.g. by writing "We define kinase activity as the phosphorylation rate $\omega=k_p\tau$." A connection that appears only at line 204 in the present manuscript.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):

      Summary:

      The authors study the steady-state solutions of ODE models for molecular signaling involving ligand binding coupled to multi-site phosphorylation at saturating ligand concentrations. Although the results are in principle general, the work highlights the receptor tyrosine kinases (RTK) as model systems. After presenting previous ODE model solutions, the authors present their own "kinetic sorting" model, which is distinguished by ligand-induced phosphorylationdependent receptor degradation and the property that every phosphorylation state is signaling competent. The authors show that this model recovers the two types of non-monotonicity experimentally reported for RTKs: maximum activity for intermediate ligand affinity and maximum activity for intermediate kinase activity.

      The main contribution of the work is in demonstrating that their model can capture both types of non-monotonicity, whereas previous models could at most capture non-monotonicity of ligand binding.

      Strengths:

      The question of how energy-dissipating, and thus non-equilibrium, molecular systems can achieve steady-state solutions not accessible to equilibrium systems is of fundamental importance in biomolecular information processing and self-organization. Although the authors do not address the energy requirements of their non-equilibrium model, their comparative analysis of different alternative non-equilibrium models provides insight into the design choices necessary to achieve non-monotonic control, a property that is inaccessible at equilibrium.

      The paper is succinctly written and easy to follow, and the authors achieve their aims by providing convincing numerical solutions demonstrating non-monotonicity over the range of parameter values encompassing the biologically relevant regime.

      Weaknesses:

      (1) A key motivating framework for this work is the argument that the ability to tune to recognize intermediate ligand affinities provides a control knob for signal selection that is available to nonequilibrium systems. As such, this seems like a compelling type of ligand selectivity, which is a question of broad interest. However, as the authors note in the results, the previously published "limited signaling model" already achieves such non-monotonicity in ligand binding affinity. The introduction and abstract do not clearly delineate the new contributions of the model.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from the interplay between two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the non-monotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). 

      The novel benefit of the model introduced by the authors is that it also achieves a nonmonotonic response to kinase activity. Because such non-monotonicity is observed for RTK, this would make the authors' model a better fit for capturing RTK behavior. However, the broad significance of achieving non-monotonicity to kinase activity is not motivated or supported by empirical evidence in the paper. As such, the conceptual significance of the modified model presented by the authors is not clear.

      We thank the reviewer for this comment. We agree that the ability of our model to reproduce non-monotonic dependence on kinase/phosphatase activity was not sufficiently motivated in the original submission. We have now added a brief mention of the biological motivation for nonmonotonic kinase activity in the discussion (lines 229-247) to describe the potential biological significance of this behavior. In particular, non-monotonic kinase/phosphatase dependence may act as a safeguard, filtering out signaling cells with abnormally elevated kinase activity or suppressed phosphatase activity. In the presence of non-monotonic dependence on network activity, downstream signaling would remain contingent on extracellular cues, and cells with extreme kinase/phosphatase imbalances would fail to signal. This could prevent persistent, cueindependent activation, an especially important protective mechanism in pathways regulating metabolically taxing functions such as growth, proliferation, or mounting immune responses. Although direct experimental evidence for the widespread use of this mechanism is currently scarce, our motivation is supported both by the presence of similar regulatory behaviors of phosphatases which arise through distinct mechanisms (such as CD45 in T-cell receptor signaling, (Weiss, 2019)), but highlight the potential biological use of this strategy and by theoretical work on phosphorylation-dephosphorylation cycles, which demonstrates a similar effect in more general settings (Swain, 2013).

      (2) Whereas previous models used in the literature are schematized in Figure 1, the model proposed by the authors is missing (see line 97 of page 3). Without the schematic, the text description of the model is incomplete.

      We thank the reviewer for identifying this oversight, it has been corrected. See Figure 3 in the new text. 

      (3) The authors use the activity of the first phosphorylation site as the default measure of activity. This choice needs to be justified. Why not use the sum of the activities at all sites?

      We thank the reviewer for this comment. We in fact study all sites (Figure 5A in the resubmitted manuscript). Notably, as suggested by the reviewer, the concentration of the first site is indeed represented by the sum of concentrations of all phosphorylated species. The concentration of the 2<sup>nd</sup> site is represented by the sum of concentrations of all species except for the first one and so on (lines 153-155). 

      Reviewer #2 (Public review):

      Summary:

      In classical models of signaling networks, the signaling activity increases monotonically with the ligand affinity. However, certain receptors prefer ligands of intermediate affinity. In the paper, the authors present a new minimal model to derive generic conditions for ligand specificity. In brief, this requires multi-site phosphorylation and that high-anity complexes be more prone to degrade. This particular type of kinetic discrimination allows for overcoming equilibrium constraints.

      Strengths:

      The model is simple, and it adds only a few parameters to classical generic models. Moreover, the authors vary these additional parameters in ranges based on experimental observations. They explain how the introduction of these new parameters is essential to ligand specificity. Their model quantitatively reproduces the ligand specificity of a certain receptor. Finally, they provide a testable prediction.

      Weaknesses:

      The naming of certain variables may be confusing to readers.

      We apologize for the confusion due to unclear presentation. We have clarified our definitions throughout the manuscript. 

      Reviewer #1 (Recommendations for the authors):

      (1) The abstract and introduction present the problem as if this model is solving the fundamental problem of non-monotonic dependence on ligand affinity. However, as the authors noted in their results, this problem has already been solved by a previous phosphorylation model with N-state degradation. What the authors' new model achieves is the additional experimentally observed non-monotonicity of kinase activity dependence. The abstract and introduction should be changed to reflect the actual novel contributions and also to motivate the biological significance of non-montonic kinase activity dependence.

      We thank the reviewer for this comment. We apologize for any unclear language on our part. The purpose of our work was not to identify the unique reaction scheme to obtain nonmonotonic dependence of network activity on ligand affinity and kinase activity. Rather, we were interested in exploring how such a dependence could arise from two ubiquitous network motifs (multisite phosphorylation and active receptor degradation). Notably, as the reviewer later points out, previous models that incorporate only multisite phosphorylation only capture the nonmonotonic dependence of network activity on ligand affinity and not kinase/phosphatase activity. We have now clarified this in the abstract (lines 14-16) and the introduction (lines 55-59). We have also provided biological motivation behind nonmonotonic kinase activity dependance (lines 229-247). 

      (2) It is important to show (in the supplemental materials if needed) that the closest equilibrium analog to the model (for example, reversible rate constants from each of the activated states to an inactive state) does not achieve non-monotonicity with ligand affinity.

      We have added a model in the supplementary materials that represents a detailed balance Markov chain. In the model, we imagine that ligand bound receptors undergo a series of equilibrium transitions, all characterized by the same activation and inactivation rate. We show that at saturating ligand levels, the signaling output only depends on the ratio of the activation to the inactivation rate (i.e., the thermodynamic stability of the active site) (lines 466-488).

      (3) Schematics for earlier models are described in Figure 1. However, no schematic for the actual model proposed by the authors is shown. This should be added as a subpanel to Figure 1.

      We thank the reviewer for identifying our omission of our model schematic. We have included our model schematic as its own figure (Figure 3).

      (4) Minor: Figure 1 is referred to as Figure?? In line 97 of page 3.

      We thank the reviewer for identifying this error, it has been corrected. 

      Reviewer #2 (Recommendations for the authors):

      (1) There is an inconsistency between Figure 2(a) and Equation (1), it suggests that p_N is \omega^N/(\omega+\delta)^N. This makes more sense with the model defined in the supplementary material.

      We thank the reviewer for identifying this error. Equation (1) has been updated to reflect the correct relationship.

      (2) The figure presenting the model of the authors appears to be missing.

      We thank the reviewer for identifying this error, it has been corrected (Figure 3 in the new manuscript). 

      (3) The authors describe phosphorylation as irreversible in the intro, but then consider reversible phosphorylation in their model, which may be confusing to readers.

      We thank the reviewer for identifying this source of possible confusion. We have clarified that dephosphorylation is taken to be a distinct irreversible reaction, see lines 105 - 112.

      (4) The authors reuse similar names, e.g., network activity, kinase activity, signaling activity, activity. This is confusing.

      We apologize for the confusion. We note that, within the context of our model, there are important distinctions between signaling activity (the amount of signaling competent receptors) and kinase activity (value corresponding to the phosphorylation rate). We have attempted to use these different terms correctly and are happy to make clarifying corrections if there are any places where a term is misused.  

      (5) Several parameters are defined only in the captions of the figures, such as \beta and \rho.

      We thank the reviewer for identifying this omission, we have added the definitions of beta and rho to the main text (see line 129). 

      (6) The sentence at line 137 lacks some words: "Below, we kinetic...".

      We thank the reviewer for identifying this error, we have added the missing words (“Below, we show how kinetic…”).

      (7) The sentence at line 183 lacks some words: "When kinase activity...".

      We thank the reviewer for identifying this error. We have now corrected it. 

      (8) Figure 5 is very small.

      We will work with the production team to increase the size of this figure.

    1. eLife Assessment

      This important study characterizes and validates a new activity marker - fast labelling of engram neurons (FLEN) - which is transiently active and driven by cFos, allowing the monitoring of intrinsic and synaptic properties of engram neurons shortly after the learning experience. The results convincingly demonstrate the utility of this novel viral tool for studying early changes in the properties of engram cells. FLEN will provide a beneficial tool for the neuroscience community once it is made available at a plasmid repository.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Cupollilo et al describes the development, characterization and application of a novel activity labeling system; fast labelling of engram neurons (FLEN). Several such systems already exist but this study adds additional capability by leveraging an activity marker that is destabilized (and thus temporally active) as well as being driven by the full-length promoter of cFos. The authors demonstrate the activity dependent induction and timecourse of expression, first in cultured neurons and then in vivo in hippocampal CA3 neurons after one trial contextual fear conditioning. In a series of ex vivo experiments the authors perform patch clamp analysis of labeled neurons to determine if these putative engram neurons differ from non-labelled neurons using both the FLEN system as well as the previously characterized RAM system. Interestingly the early labelled neurons at 3 h post CFC (FLEN+) demonstrated no differences in excitability whereas the RAM labeled neurons at 24h after CFC had increased excitability. Examination of synaptic properties demonstrated an increase in sEPCS and mEPSC frequencies as well as those for sIPSCs and mIPSCs which was not due to a change in the mossy fiber input to these neurons.

      Strengths:

      Overall the data is of high quality and the study introduces a new tool while also reassessing some principles of circuit plasticity in the CA3 that have been the focus of prior studies.

      Weaknesses:

      No major weaknesses were noted

    3. Reviewer #2 (Public review):

      Summary:

      Cupollilo et al. investigate the properties of hippocampal CA3 neurons that express the immediate early gene cFos in response to a single foot shock. They compare ex-vivo the electrophysiological properties of these "engram neurons" labeled with two different cFos promoter-driven green markers: Their new virally delivered tool FLEN labels neurons 2-6 h after activity, while RAM contains additional enhancers and peaks considerably later (>24 h). Since the fraction of labeled CA3 cells is comparable with both constructs, it is assumed (but not tested) that they label the same population of activated neurons at different time points. Both FLEN+ and RAM+ neurons in CA3 receive more synaptic inputs compared to non-expressing control neurons, which could be a causal factor for cFos activation, or a very early consequence thereof. Frequency facilitation and E/I ratio of mossy fiber inputs were also tested, but are not different in both cFos+ groups of neurons. One day after foot shock, RAM+ neurons are more excitable than RAM- neurons, suggesting a slow increase in excitability as a major consequence of cFos activation.

      Strengths:

      The study is conducted to high standards and contributes significantly to our understanding of memory formation and consolidation in the hippocampus. Modifications of intrinsic neuronal properties seem to be more salient than overall changes in the total number of (excitatory and inhibitory) inputs, although a switch in the source of the synaptic inputs would not have been detected by the methods employed in this study

      Weaknesses:

      The new tool FLEN is not quantitatively compared to e.g. the TetTag reporter mouse. Nevertheless, the fluorescent images of FLEN+ neurons are quite convincing.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review):  

      Summary:

      The manuscript by Cupollilo et al describes the development, characterization, and application of a novel activity labeling system; fast labelling of engram neurons (FLEN). Several such systems already exist but this study adds additional capability by leveraging an activity marker that is destabilized (and thus temporally active) as well as being driven by the full-length promoter of cFos. The authors demonstrate the activity-dependent induction and time course of expression, first in cultured neurons and then in vivo in hippocampal CA3 neurons after one trial of contextual fear conditioning. In a series of ex vivo experiments, the authors perform patch clamp analysis of labeled neurons to determine if these putative engram neurons differ from non-labelled neurons using both the FLEN system as well as the previously characterized RAM system. Interestingly the early labelled neurons at 3 h post CFC (FLEN+) demonstrated no differences in excitability whereas the RAMlabelled neurons at 24h after CFC had increased excitability. Examination of synaptic properties demonstrated an increase in sEPCS and mEPSC frequencies as well as those for sIPSCs and mIPSCs which was not due to a change in the mossy fiber input to these neurons.

      Strengths:

      Overall the data is of high quality and the study introduces a new tool while also reassessing some principles of circuit plasticity in the CA3 that have been the focus of prior studies.

      Weaknesses:

      No major weaknesses were noted.

      Reviewer #2 (Public review): 

      Summary: 

      Cupollilo et al. investigate the properties of hippocampal CA3 neurons that express the immediate early gene cFos in response to a single foot shock. They compare ex-vivo the electrophysiological properties of these "engram neurons" labeled with two different cFos promoter-driven green markers: Their new tool FLEN labels neurons 2-6 h after activity, while RAM contains additional enhancers and peaks considerably later (>24 h). Since the fraction of labeled CA3 cells is comparable with both constructs, it is assumed (but not tested) that they label the same population of activated neurons at different time points. Both FLEN+ and RAM+ neurons in CA3 receive more synaptic inputs compared to non-expressing control neurons, which could be a causal factor for cFos activation, or a very early consequence thereof. Frequency facilitation and E/I ratio of mossy fiber inputs were also tested, but are not different in both cFos+ groups of neurons. One day after foot shock, RAM+ neurons are more excitable than RAM- neurons, suggesting a slow increase in excitability as a major consequence of cFos activation.

      Strengths: 

      The study is conducted to high standards and contributes significantly to our understanding of memory formation and consolidation in the hippocampus. Modifications of intrinsic neuronal properties seem to be more salient than overall changes in the total number of (excitatory and inhibitory) inputs, although a switch in the source of the synaptic inputs would not have been detected by the methods employed in this study

      Weaknesses: 

      With regard to the new viral tool, a direct comparison between the new tool FLEN and existing cFos reporters is missing. 

      Reviewer #1 (Recommendations for the authors):

      I have only minor suggestions for the authors to consider. 

      (1) In the in vitro characterization, the percentage of labelled neurons seems very low after a powerful and prolonged activation. It was somewhat surprising and raised the question of how accurately the FLEN construct reflects endogenous cFOS activity. Could the authors speak to this?

      The reviewer is correct that the level of FLEN positive neurons, as compared to mCherry positive neurons, is low as compared to studies using viral infection with RAM vectors in neuronal cultures (Sorensen et al, 2016, Sun et al, 2020), which is around 70-80% following chemical stimulation. The authors do not provide evidence however for a comparison with endogenous c-Fos activity in cell cultures. The reason for a discrepancy in the effect of chemical stimulation of cultured neurons is not clear, but may depend on culture conditions which may vary between labs. 

      FLEN was constructed using a mouse c-Fos promoter (-355 to +109) (Cen et al, 2003). To answer the reviewer’s question we performed an additional experiment in cultured neurons in which we found that 77.1 % of FLEN positive neurons were also c-fos positive neurons (using immunocytochemistry).

      (2) The authors compare the two labelling strategies and interpret their data with the presumption that both label a similar set of active neurons. This is particularly relevant when they suggest there might be a progressive increase in the excitability of active neurons with time. This is certainly a possibility, but the authors should also consider other possibilities that the two markers might label different populations of neurons. For example, if they require different thresholds for activation, it is possible that one is more sensitive to activity than the other. As these are unknown variables the authors should temper the interpretation accordingly.

      Indeed, the reviewer is correct that this limitation should be discussed. We have added this as a point of discussion in the text (line 355-358). In the article describing the RAM strategy (Sorensen et al, 2016) the authors use RAM to label DG neurons activated during an experience in a context A (Figure 4). Exploiting the fact that engram cells are re-activated when the animal is re-exposed to the same environment of training (memory recall), they performed c-Fos staining 90 minutes following either context A or context B re-exposure. The RAM-c-Fos overlap percentage was higher in A-A rather than A-B (A-A was a bit more than 20%). This means that RAM has captured a group of cells during training that, at least in part, were re-activated during recall. This could in part support the assumption that RAM and c-Fos share a certain overlap. Of course, this was done in DG, while we worked in CA3. In addition, both strategies label in their great majority c-Fos+ neurons (see above answer to point #1). This can not completely rule out the possibility that FLEN and RAM label partly distinct population of activated cells. 

      (3) An increase in the frequency of synaptic events is observed in neurons labelled with both markers. The authors propose that this may be due to an increase in synaptic contacts based on prior studies. However, as this is the first functional assessment why not consider changes in release probability as a mechanism for this finding? 

      We have added this as a possibility in the text (line 362-363).

      (4) It would be useful to include plots of the average frequency of m/sEPSCs and m/sIPSCs in Figures 4 and 5. These figures could also be combined into a single figure.

      We agree with the reviewer that figure 4 and 5 could be merged into a single figure. In the revised version, figure 5A becomes panel C in figure 4. Text and figure descriptions were adjusted accordingly.

      Reviewer #2 (Recommendations for the authors): 

      (1) Abstract, line 24: "In contrast, FLEN+ CA3 neurons show an increased number of excitatory inputs." RAM+ neurons also show an increased number of excitatory inputs, so this is not "in contrast". Also, not just excitatory, but also inhibitory synaptic inputs are more numerous in cFos+ neurons. Please improve the summary of your findings.

      “In contrast” referred to the fact that FLEN+ neurons do not show differences in excitability as compared to FLEN- neurons, as mentioned in the previous sentence. We now provide a more explicit sentence to explain this point: “On the other hand, like RAM+ neurons, FLEN+ CA3 neurons show an increased number of excitatory inputs.”

      (2) Novel tool: Destabilized cFos reporters were introduced 23 years ago and are also part of the TetTag mouse. I am not sure that changing the green fluorescent protein to a different version merits a new acronym (FLEN). To convince the readers that this is more than a branding exercise, the authors should compare the properties (brightness, folding time, stability) of FLEN to e.g. the d2EGFP reporter introduced by Bi et al. 2002 (J Biotechnol. 93(3):231) and show significant improvements.

      We thank the reviewer for this comment which compelled us to evaluate the features of other tools used to label neurons activated following contextual fear conditioing. The key properties of FLEN as compared to other tools used to label engrams is that: (i) it is a viral tool, as opposed to transgenic mice, (ii) a c-fos promoter drives the expression of a brightly fluorescent protein allowing their identification ex vivo for functional analysis, (iii) the fluorescent protein is rapidly destabilized, providing the possibility to label neurons only a few hours after their activation by a behavioural task.

      We did not find any viral tools providing the possibility to label c-fos activated neurons for functional assesment. We have not been able to find references for the use of the d2EGFP reporter introduced by Bi et al. 2002 in a behavioural context. One of the major difference and improvement is certainly the brightness of ZsGreen. In cell cultures, ZsGreen1 showed a 8.6-fold increase in fluorescence intensity as compared with EGFP (Bell et al, 2007).

      Amongst tools with comparable properties, eSARE was developed based on a synthetic Arc promoter driving the expression of a destabilized GFP (dEGFP) (Kawashima et al 2013). We initially used ESARE–dGFP but unfortunately, in our experimental conditions we found that the signal to noise ratio was not satisfactory (number of cells label in the home cage vs. following contextual fear conditining).

      We developed a viral tool to avoid the use of transgenic reporter lines which require laborious breeding and is experimentally less flexible. Nevertheless, many transgenic mice based on the expression of fluorescent proteins under the control of IEG promoters have been developed and used. Some of these mice show a time course of expression of the transgene which is comparable to FLEN. For instance, in organotypic slices from Tet-Tag mice, the time course of expression of EGFP slices follows with a small delay endogenous cFOS expression, and starts decaying after 4 hours (Lamothe-Molina et al, 2022). However, the fluorescence was too weak to visualize neurons in the slice (Christine Gee, personal communication), and imaging is perfomed after immunocytochemistry against GFP. 

      Therefore, we feel that the name given to the FLEN strategy is legitimate. The features of the FLEN strategy were summarized in the discussion (Lines 318-322).

      (3) Line 214: "...FLEN+ CA3 PNs do not show differences in [...] patterns of bursting activity as compared to control neurons." It looks quite different to me (Figure 3E). Just because low n precludes meaningful statistical analysis, I would not conclude there is no difference.

      We agree with the reviewer that the data in Figure 3E are not conclusive due to small sample size, which limits the reliability of statistical comparison. Additionally, the classification of bursting neurons is highly dependent on the specific criteria used, which vary considerably across the literature. To avoid overinterpretation or misleading conclusions, we decided to remove the panel E of Figure 3 showing the fraction of bursting neurons. Nevertheless, we draw the attention to the more robust and interpretable results: RAM⁺ neurons exhibit an increase in firing frequency and a distinct action potential discharge pattern, data which we believe are informative of altered excitability.

      (4) Line 304: Remove the time stamp.

      This was done.

      (5) Line 334: "...results may be explained by an overall increased activity of CA1 neurons..." I don't understand - isn't CA1 downstream of CA3? 

      The reviewer is correct that the sentence was misleading. We removed the reference to CA1, as it was more of a general principle about neuronal activity.

      (6) Line 381: "resolutive", better use "sensitive". 

      This was changed.

      (7) Figure S3: Fear-conditioned animals were 3 days off Dox, controls only 2 days. As RAM expression accumulates over time off Dox, this is not a fair comparison.

      We thank the reviewer for pointing out the incorrect reporting of the experimental design in Figure S3 panel A (bottom), which could lead to misinterpretation of results. In fact, the two groups of mice (CFC vs. HC) underwent all experimental steps in parallel. Specifically, both groups were maintained on and off Doxycycline for the same duration and received viral injection on the same day. 48 hours after Dox withdrawal, the CFC group was trained for contextual conditioning, while the HC group remained in the home cage in the holding room. All animals were thus sacrificed 72 hours after Dox removal. We have corrected the figure to accurately reflect this timeline.

      (8) Please provide sequence information for c-cFos-ZsGreen1-DR. Which regulatory elements of the cFos promoter are included, is the 5' NTR included? This information is very important.

      The information is now provided in the Methods section.

      (9) Please provide the temperature during pharmacological treatments (TTX etc.) before fixation.

      The pharmacological treatment was performed in the incubator at 37°C, this is now indicated in the methods.

    1. eLife Assessment

      This work derives a valuable general theory unifying theories of efficient information transmission in the brain with population homeostasis. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters. Applying this theory to the primary visual cortex, the authors present solid evidence that accounts for stimulus-specific and neuron-specific adaptation. Reviewers have provided additional suggestions for improving the readability of the manuscript, as well as discussing previous results on adapting coding as well as those aspects of experimental data that are not fully explained by the present theory.

    2. Reviewer #1 (Public review):

      This work derives a general theory of optimal gain modulation in neural populations. It demonstrates that population homeostasis is a consequence of optimal modulation for information maximization with noisy neurons. The developed theory is then applied to the distributed distributional code (DDC) model of the primary visual cortex to demonstrate that homeostatic DDCs can account for stimulus specific adaptation.

      Strengths:

      The theory of gain modulation proposed in the paper is rigorous and the analysis is thorough. It does address the issue in an interesting, general setting. The proposed approach separates the question of which bits of sensory information are transmitted (as defined by a specific computation and tuning curve shapes) and how well are they transmitted (as defined by the tuning curve gain optimized to combat noise). This separation permits the application of the developed theory to different neural systems.

      Weaknesses:

      The manuscript effectively consits of two parts: a general theory of optimal gain modulation and a DDC model of the visual cortex. From my perspective it is not entirely clear which components of the developed theory and the model it is applied to are essential to explain the experimental phenomena in the visual cortex (Fig. 12). This "separation" into two parts makes this work, in my view, somewhat diffused.

      Overall, I think this is an interesting contribution and I assess it positively. It has the potential of deepening our understanding of efficient neural representations beyond sensory periphery.

    3. Reviewer #2 (Public review):

      Summary:

      Using the theory of efficient coding, the authors study how neural gains may be adjusted to optimize information transmission by noisy neural populations while minimizing metabolic cost, under the assumption that other aspects of neural activity (i.e. tuning) are determined by the computation performed by the network.

      The manuscript first presents mathematical results for the general case where the computational goals of the neural population are not specified (the computation is implicit in the assumed tuning curves). It then develops the theory for a specific probabilistic coding scheme. The general theory provides an explanation for firing rate homeostasis at the level of neural clusters with firing rate heterogeneity within clusters. The specific application further explains stimulus-specific adaptation in visual cortex.

      The mathematical derivations, simulations and application to visual cortex data are solid as far as I can tell.

      This remains a highly technical manuscript although the authors have improved the clarity of presentation of the general theory (which is the bulk of the work presented) and better motivated/explained modeling assumptions and choices. In the second part, the manuscript focuses on a specific code (homeostatic DDC) showing that this can be implemented by divisive normalization and can explain stimulus-specific adaptation.

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main assumption, and insight, is that computational goals and efficiency can be in some sense factorized: tuning curve shapes are determined by the computational goal, whereas gains can be adjusted to optimize transmission of information.

      One key result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed a close to optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific adaptation in V1.

      The novelty and significance of the work are presented clearly in the newly extended Introduction and Discussion.

      Weaknesses:

      The manuscript remains hard to read. The general theory occupies most of the manuscript, as needed to convey it fully. But as a result the second part on homeostatic DDC and adaptation is somewhat underdeveloped and risks having less visibility than it might deserve.

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the 'adapter' is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here. The authors now acknowledge this limitation in the Discussion.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1(Public Review):

      Major comments:

      (1) Interpretation of key results and relationship between different parts of the manuscript. The manuscript begins with an information-transmission ansatz which is described as ”independent of the computational goal” (e.g. p. 17). While information theory indeed is not concerned with what quantity is being encoded (e.g. whether it is sensory periphery or hippocampus), the goal of the studied system is to *transmit* the largest amount of bits about the input in the presence of noise. In my view, this does not make the proposed framework ”independent of the computational goal”. Furthermore, the derived theory is then applied to a DDC model which proposes a very specific solution to inference problems. The relationship between information transmission and inference is deep and nuanced. Because the writing is very dense, it is quite hard to understand how the information transmission framework developed in the first part applies to the inference problem. How does the neural coding diagram in Figure 3 map onto the inference diagram in Figure 10? How does the problem of information transmission under constraints from the first part of the manuscript become an inference problem with DDCs? I am certain that authors have good answers to these questions - but they should be explained much better.

      We are very thankful to the reviewer for highlighting the potential confusion surrounding these issues, in particular the relationship between the two halves of the paper – which was previously exacerbated by the length of the paper. We have now added further explanations at different points within the manuscript to better disentangle these issues and clarify our key assumptions. We have also significantly cut the length of the paper by moving more technical discussions to the Methods or Appendices. We will summarise these changes here and also clarify the rationale for our approach and point out potential disagreements with the reviewer.

      Key to our approach is that we indeed do not assume the entire goal of the studied neural system (whether part of the sensory system or not) is to transmit the largest amount of information about the stimulus input (in the presence of noise). In fact, general computations, including the inference of latent causes of inputs, often require filtering out or ignoring some information in the sensory input. It is thus not plausible that tuning curves in general (i.e. in an arbitrary part of the nervous system) are optimised solely with regards to the criterion of information transmission. Accordingly we do not assume they are entirely optimised for that purpose. However, we do make a key assumption or hypothesis (which like any hypothesis might turn out to be partly or entirely wrong): that (1) a minimal feature of the tuning curve (its scale or gain) is entirely free to be optimised for the aim of information transmission (or more precisely the goal of combating the detrimental effect of neural noise on coding fidelity), (2) other aspects of the population tuning curve structure (i.e. the shape of individual tuning curves and their arrangement across the population) are determined by (other) computational goals beyond efficient coding. (Conceptually, this is akin to the modularization between indispensible error correction and general computations in a digital computer, and the need for the former to be performed in a manner that is agnostic as to the computations performed.) We have added two paragraphs in the manuscript which present the above rationale and our key hypothesis or assumption. The first of these was added to the (second paragraph of the) Introduction section, and the second is a new paragraph following Eq. 1 (which is about the gain-shape decomposition of the tuning curves, and the optimisation of the former based on efficient coding) of Results.

      Our paper can be divided into two parts. In the first part, we develop a general, computationally agnostic (in the above sense, just as in the digital computer example), efficient coding theory. In the second part, we apply that theory to a specific form of computation, namely the DDC framework for Bayesian inference. The latter theory now determines the tuning curve shapes. When combined with the results of the first part (which dictate the tuning curve scale or gain according to efficient coding theory), this “homeostatic DDC” model makes full predictions for the tuning curves (i.e., both scale and shape) and how they should adapt to stimulus statistics.

      So to summarise, it is not the case that the problem of information transmission (or rather mitigating the effect noise on coding fidelity under metabolic constraints), dealt with in the first part, has become a problem of Bayesian inference. But rather, the dictates of efficient coding for optimal gains for coding fidelity (under constraints) have been applied to and combined with a computational theory of inference.

      We have added new expository text before and after Eq. 17 in Sec. 2.7 (at the beginning of the second part of the paper on homeostatic DDCs) to again make the connection with the first part and the rationale for its combination with the original DDC framework more clear.

      With the changes outlined above, we believe and hope the connection between the two parts (which we agree with the reviewer, was indeed rather obscure previously) has been adequately clarified.

      (2) Clarity of writing for an interdisciplinary audience. I do not believe that in its current form, the manuscript is accessible to a broader, interdisciplinary audience such as eLife readers. The writing is very dense and technical, which I believe unnecessarily obscures the key results of this study.

      We thank the reviewer for this comment. We have taken several steps to improve the accessibility of this work for an interdisciplinary audience. Firstly, several sections containing dense, mathematical writing have now been moved into appendices or the Methods section, out from the main text; in their place we have made efforts to convey the core of the results, and to providing intuitions, without going into unnecessary technical detail. Secondly, we have added additional figures to help illustrate key concepts or assumptions (see Fig. 1B clarifying the conceptual approach to efficient coding and homeostatic adaptation, and Fig. 8A describing the clustered population). Lastly, we have made sure to refer back to the names of symbols more often, so as to make the analysis easier to follow for a reader with an experimental background.

      (3) Positioning within the context of the field and relationship to prior work. While the proposed theory is interesting and timely, the manuscript omits multiple closely related results which in my view should be discussed in relationship to the current work. In particular, a number of recent studies propose normative criteria for gain modulation in populations: • Duong, L., Simoncelli, E., Chklovskii, D. and Lipshutz, D., 2024. Adaptive whitening with fast gain modulation and slow synaptic plasticity. Advances in Neural Information Processing Systems

      Tring, E., Dipoppa, M. and Ringach, D.L., 2023. A power law describes the magnitude of adaptation in neural populations of primary visual cortex. Nature Communications, 14(1), p.8366.

      Ml ynarski, W. and Tkaˇcik, G., 2022. Efficient coding theory of dynamic attentional modulation. PLoS Biology

      Haimerl, C., Ruff, D.A., Cohen, M.R., Savin, C. and Simoncelli, E.P., 2023. Targeted V1 co-modulation supports task-adaptive sensory decisions. Nature Communications • The Ganguli and Simoncelli framework has been extended to a multivariate case and analyzed for a generalized class of error measures:

      Yerxa, T.E., Kee, E., DeWeese, M.R. and Cooper, E.A., 2020. Efficient sensory coding of multidimensional stimuli. PLoS Computational Biology

      Wang, Z., Stocker, A.A. and Lee, D.D., 2016. Efficient neural codes that minimize LP reconstruction error. Neural Computation, 28(12),

      We thank the reviewer again for bringing these works to our attention. For each, we explain whether we chose to include them in our Discussion section, and why.

      (1) Duong et al. (2024): We decided not to discuss this manuscript, as our assessment is that it is very relevant to our work. That study starts with the assumption that the goal of the sensory system under study is to whiten the signal covariance matrix, which is not the assumption we start with. A mechanistic ingredient (but not the only one) in their approach is gain modulation. However, in their case it is the gains of computationally auxiliary inhibitory neurons that is modulated and not (as in our case) the gain the (excitatory) coding neurons (i.e. those which encode information about the stimulus and whose response covariance is whitened). These key distinction make the connection with our work quite loose and we did not discuss this work.

      (2) Tring et al. (2023): We have added a discussion of the results of this paper and its relationship to the results of our work and that of Benucci et al. This appears in the 7th paragraph of the Discussion. This study is indeed highly relevant to our paper, as it essentially replicates the Benucci et al. experiment, this time in awake mice (rather than anesthetised cats). However, in contrast to the resul‘ts of Benucci et al., Tring et al. do not find firing rate homeostasis in mouse V1. A second, remarkable finding of Tring et al. is that adaptation mainly changes the scale of the population response vector, and only minimally affects its direction. While Tring et al. do not portray it as such, this behaviour amounts to pure stimulus-specific adaptation without the neuron-specific factor found in the Benucci et al. results (see Eq. 24 of our manuscript). As we discuss in our manuscript, when our homeostatic DDC model is based on an ideal-observer generative model, it also displays pure stimulus-specific adaptation with no neuronal factor. Our final model for Benucci’s data did contain a neural factor, because we used a non-ideal observer DDC (in particular, we assumed a smoother prior distribution over orientations compared to the distribution used in the experiment - which has a very sharp peak – as it is more natural given the inductive biases we expect in the brain). The resultant neural factor suppresses the tuning curves tuned to the adaptor stimulus. Interestingly, when gain adaptation is incomplete, and happens to a weaker degree compared to what is necessary for firing rate homeostasis, an additional neural factor emerges that is greater than one for neurons tuned to the adaptor stimulus. These two multiplicative neural factors can approximately cancel each other; such a theory would thus predict both deviation from homeostasis and approximately pure stimulus-specific adaptation. We plan to explore this possibility in future work.

      (3) Ml ynarski and Tkaˇcik (2022): We are now citing and discussing this work in the Discussion (penultimate paragraph), in the context of a possible future direction, namely extending our framework to cover the dynamics of adaptation (via a dynamic efficient gain modulation and dynamic inference). We have noted there that Mlynarski have used such a framework (which while similar has key technical differences with our approach) based on a task-dependent efficient coding objective to model top-down attentional modulation. By contrast, we have studied bottom-up and task-independent adaptation, and it would be interesting to extend our framework and develop a model to make predictions for the temporal dynamics of such adaptation.

      (4) Haimerl et al. (2023): We have elected not to include this work within our discussion either, as we do not believe it is sufficiently relevant to our work to warrant inclusion. Although this paper also considers gain modulation of neural activity, the setting and the aims of the theoretical work and the empirical phenomena it is applied to are very different from our case in various ways. Most importantly, this paper is not offering a normative account of gain modulation; rather, gain modulation is used as a mechanism for enabling fast adaptive readouts of task relevant information.

      (5) Yerxa et al. (2020): We have now included a discussion of this paper in our Discussion section. Note that, even though this study generalises the Ganguli and Simoncelli framework to higher diemsnions, just like that paper it still places strict requirements (which are arguably even more stringent in higher dimensions) on the form of the tuning curves in the population, viz. that there exists a differentiable transform of the stimulus space which renders these unimodal curves completely homogeneous (i.e., of the same shape, and placed regularly and with uniform density).

      (6) Wang et al. (2016): We have included this paper in our discussion as well. As above, this paper does not consider general tuning curves, and places the same constraint on their shape and arrangement as in Ganguli and Simoncelli paper.

      More detailed comments and feedback:

      (1) I believe that this work offers the possibility to address an important question about novelty responses in the cortex (e.g. Homann et al, 2021 PNAS). Are they encoding novelty per-se, or are they inefficient responses of a not-yet-adapted population? Perhaps it’s worth speculating about.

      We are not sure why the relatively large responses to “novel” or odd-ball stimuli should be considered inefficient or unadapted: in the context in which those stimuli are infrequent odd-balls (and thus novel or surprising when occurring), efficient coding theory would indeed typically predict a large response compared to the (relatively suppressed) responses to frequently occurring stimuli. Of course, if the statistics change and the odd-ball stimulus now becomes frequent, adaptation should occur and would be expected to suppress responses to this stimulus. As to the question of whether (large) responses to infrequent stimuli can or should be characterised as novelty responses: this is partly an interpretational or semantic issue – unless it is grounded in knowledge of how downstream populations use this type of coding in V1, which could then provide a basis for solidly linking them to detection of novelty per se. In short, our theory, could be applied to Homann et al.’s data, but we consider that beyond the scope of the current paper.

      (2) Clustering in populations - typically in efficient coding studies, tuning curve distributions are a consequence of input statistics, constraints, and optimality criteria. Here the authors introduce randomly perturbed curves for each cluster - how to interpret that in light of the efficient coding theory? This links to a more general aspect of this work - it does not specify how to find optimal tuning curves, just how to modulate them (already addressed in the discussion).

      We begin by addressing the reviewer’s more general concern regarding the fact that our theory does not address the problem of finding optimal tuning curves, only that of modulating them optimally. As we expound within the updated version of the paper (see the newly expanded 3rd paragraph in Sec. 2.1 and the expanded 2nd paragraph in Introduction), it is not plausible that the sole function of sensory systems, and neural circuits more generally, is the transmission of information. There are many other computational tasks which must be performed by the system, such as the inference of the latent causes of sensory inputs. For many such tasks, it is not even desirable to have complete transmission of information about the external stimulus, since a substantial portion of that information is not important for the task at hand, and must be discarded. For example, such discarding of information is the basis of invariant representations that occur, e.g., in higher visual areas. So we recognise that tuning curve shapes are in general dictated and shaped by computational goals beyond transmission of information or error correction. As such, we have remained agnostic as to the computational goals of neural systems and therefore the shape of the tuning curve. We have made the assumption and adopted the postulate that those computational goals determine the shape of the tuning curves, leaving the gains to be adjuted freely for the purpose of mitigating the effect noise on coding fidelity (this is similar to how error correction is done in computers independendently of the computations performed). by assuming that those computational goals are captured adequately by the shape of tuning curves, this leaves us free to optimise the gains of those curves for purely information theoretic objectives. Finally, we note that the case where the tuning curve shapes are additionally optimised for information transmission is a special case of our more general approach. For further discussion, see the updated version of our introduction.

      We now turn to our choice to model clusters using random perturbations. This is, of course, a toy model for clustering tuning curves within a population. With this toy model we are attempting to capture the important aspects of tuning curve clusters within the population while not over-complicating the simulations. Within any neural population, there will be tuning curves that are similar; however, such curves will inevitably be heterogeneous, as opposed to completely identical. Thus, when we cluster together similar curves there will be an “average” cluster tuning curve (found by, e.g., normalising all individual curves and taking the average), which all other tuning curves within the cluster are deviations from. The random perturbations we apply are our attempt to capture these deviations. However, note that the perturbations are not fully random, but instead have an “effective dimensionality” which we vary over. By giving the perturbations an effective dimensionality, we aim to capture the fact that deviations from the average cluster tuning curve may not be fully random, and may display some structure.

      (3) Figure 8 - where do Hz come from as physical units? As I understand there are no physical units in simulations.

      We have clarified this within the figure caption. The within-cluster optimisation problem requires maximising a quadratic program subject to a constraint on the total mean spike count of the cluster. The objective for the quadratic program is however mathematically homogeneous. So we can scale the variables and parameters in a consistent to be in units of Hz – i.e., turn them into mean firing rates, instead of mean spike counts, with an assumption on the length of the coding time interval. We fix this cluster firing rate to be k × 5 Hz, so that the average single-neuron firing rate is 5 Hz (based on empirical estimates – see our Sec. 2.5). This agrees with our choice of µ in our simulations (i.e., µ = 10) if we assume a coding interval of 0.1 seconds.

      (4) Inference with DDCs in changing environments. To perform efficient inference in a dynamically changing environment (as considered here), an ideal observer needs some form of posterior-prior updating. Where does that enter here?

      A shortcoming of our theory, in its current form, is that it applies only to the system in “steady-state”, without specifying the dynamics of how adaptation temporlly evolves (we assume the enrivonment has periods of relative stability that are of relatively long duration compared to the dynamical timescales of adaptation, and consider the properties of the well-adapted steady state population). Thus our efficient coding theory (which predicts homeostatic adaptation under the outlined conditions) is silent on the time-course over which homeostasis occurs. Likewise, the DDC theory (in its original formulation in Vertes & Sahani) is silent on dynamic updating of posteriors and considers only static inference with a fixed internal model. We have now discuss a new future directoin in the Discussion (where we cite the work of Mlynarski and Tkacik) to point out that our theory can in principle be extended (based on dynamic inference and efficient coding) to account for the dynamics of attention, but this is beyond the scope of the current work.

      (5) Page 6 - ”We did this in such a way that, for all , the correlation matrices, (), were derived from covariance matrices with a 1/n power-law eigenspectrum (i.e., the ranked eigenvalues of the covariance matrix fall off inversely with their rank), in line with the findings of Stringer et al. (2019) in the primary visual cortex.” This is a very specific assumption, taken from a study of a specific brain region - how does it relate to the generality of the approach?

      Our efficient coding framework has been formulated without relying on any specific assumptions about the form of the (signal or noise) correlation matrices in cortex. The homeostatic solution to this efficient coding problem, however, emerges under certain conditions. But, as we demonstrate in our discussion of the analytic solutions to our efficient coding objective and the conditions necessary for the validity of the homeostatic solution, we expect homeostasis to arise whenever the signal geometry is sufficiently high-dimensional (among other conditions). By this we mean that the fall-off of the eigenvalues of the signal correlation matrix must be sufficiently slow. Thus, a fall-off in the eigenvalue spectrum slower than 1/n would favor homeostasis even more than our results. If the fall off was faster, then whether or not (and to what degree) firing rate homeostasis becomes suboptimal depends on factors such as the fastness of the fall-off and also the size of the population. Thus (1) rate homeostasis does not require the specific 1/n spectrum, but that spectrum is consistent with the conditions for optimality of rate homeostasis, (2) in our simulations we had to make a specific choice, and relying on empirical observations in V1 was of course a well-justified choice (moreover, as far as we are aware, there have been no other studies that have characterised the spectrum of the signal covariance matrix in response to natural stimuli, based on large population recordings).

      Reviewer #2 (Public Review):

      Strengths:

      The problem of efficient coding is a long-standing and important one. This manuscript contributes to that field by proposing a theory of efficient coding through gain adjustments, independent of the computational goals of the system. The main result is a normative explanation for firing rate homeostasis at the level of neural clusters (groups of neurons that perform a similar computation) with firing rate heterogeneity within each cluster. Both phenomena are widely observed, and reconciling them under one theory is important.

      The mathematical derivations are thorough as far as I can tell. Although the model of neural activity is artificial, the authors make sure to include many aspects of cortical physiology, while also keeping the models quite general.

      Section 2.5 derives the conditions in which homeostasis would be near-optimal in the cortex, which appear to be consistent with many empirical observations in V1. This indicates that homeostasis in V1 might be indeed close to the optimal solution to code efficiently in the face of noise.

      The application to the data of Benucci et al 2013 is the first to offer a normative explanation of stimulus-specific and neuron-specific adaptation in V1.

      We thank the reviewer for these assessments.

      Weaknesses:

      The novelty and significance of the work are not presented clearly. The relation to other theoretical work, particularly Ganguli and Simoncelli and other efficient coding theories, is explained in the Discussion but perhaps would be better placed in the Introduction, to motivate some of the many choices of the mathematical models used here.

      We thank the reviewer for this comment; we have updated our introduction to make clearer the relationship between this work and previous works within efficient coding theory. Please see the expanded 2nd paragraph of Introduction which gives a short account of previous efficient coding theories and now situates our work and differentiates it more clearly from past work.

      The manuscript is very hard to read as is, it almost feels like this could be two different papers. The first half seems like a standalone document, detailing the general theory with interesting results on homeostasis and optimal coding. The second half, from Section 2.7 on, presents a series of specific applications that appear somewhat disconnected, are not very clearly motivated nor pursued in-depth, and require ad-hoc assumptions.

      We thank the reviewer for this suggestion. The reviewer is right to note that our paper contains both the exposition of a general efficient coding theory framework in addition to applications of that framework. Following your advice we have implemented the following changes. (1) significantly shortened or entirely moved some of the less central results in the second half of Results, to the Methods or appendices (this includes the entire former section 2.7 and significant shortening of the section on implementation of Bayes ratio coding by divisive normalisation). (2) We have added a new figure (Fig 1B) and two long pieces of text to the (2nd paragraph of) Introduction, after Eq. (1), and in Sec. 2.7 (introducing homeostatic DDCs) to more clearly explain and clarify the assumptions underlying our efficient coding theory, and its connection with the second half of the Results (i.e. application to DDC theory of Bayesian inference), and better motivate why we consider the homeostatic DDC.

      For instance, it is unclear if the main significant finding is the role of homeostasis in the general theory or the demonstration that homeostatic DDC with Bayes Ratio coding captures V1 adaptation phenomena. It would be helpful to clarify if this is being proposed as a new/better computational model of V1 compared to other existing models.

      We see the central contribution of our work as not just that homeostasis arises as a result of an efficient coding objective, but also that this homeostasis is sufficient to explain V1 adaptation phenomena - in particular, stimulus specific adaptation (SSA) - when paired with an existing theory of neural representation, the DDC (itself applied to orientation coding in V1). Homeostatic adaptation alone does not explain SSA; nor do DDCs. However, when the two are combined they provide an explanation for SSA. This finding is significant, as it unifies two forms of adaptation (SSA and homeostatic adaptation) whose relationship was not previously appreciated. Our field does not currently have a standard model of V1, and we do not claim to have provided one either; rather, different models have captured different phenomena in V1, and we have done so for homeostatic SSA in V1.

      Early on in the manuscript (Section 2.1), the theory is presented as general in terms of the stimulus dimensionality and brain area, but then it is only demonstrated for orientation coding in V1.

      The efficient coding theory developed in Section 2 is indeed general throughout, we make no assumptions regarding the shape of the tuning curves or the dimensionality of the stimulus. Further, our demonstrations of the efficient coding theory through numerical simulations - make assumptions only about the form of the signal and noise covariance matrices. When we later turn our attention away from the general case, our choice to focus on orientation coding in V1 was motivated by empirical results demonstrating a co-occurrence of neural homeostasis and stimulus specific adaptation in V1.

      The manuscript relies on a specific response noise model, with arbitrary tuning curves. Using a population model with arbitrary tuning curves and noise covariance matrix, as the basis for a study of coding optimality, is problematic because not all combinations of tuning curves and covariances are achievable by neural circuits (e.g. https://pubmed.ncbi.nlm.nih.gov/27145916/ )

      First, to clarify, our theory allows for complete generality of neural tuning curve shapes, and assumes a broad family of noise models (which, while not completely arbitrary, includes cases of biological relevance and/or models commonly used in the theoretical literature). Within this class of noise covariance models, we have shown numerical results for different values for different parameters of the noise covariance model, but more importantly, have analytically outlined the general properties and requirements on noise strength and structure (and its relationship to tuning curves and signal structure) under which homeostatic adaptation would be optimal. Regarding the point that not all combinations of tuning curves and noise covariances occur in biology or are achievable by neural circuits: (1) If we are guessing correctly the specific point of the reviewer’s reference to the review paper by Kohn et al. 2016, we have in fact prominently discussed the case of information limiting noise which corresponds to a specific relationship between signal structure (as determined by tuning curves) and noise structure (as specified by the noise covariance matrix). Our family of noise models include that biologically relevant case and we have indeed paid it particular attention in our simulations and discussions (see discussion of Fig. 7 in Sec. 2.3, and that of aligned noise in Sec. 2.5). (2) As for the more general or abstract point that not all combinations of noise covariance and tuning curve structures are achievable by neural circuits, we can make the following comments. First, in lieu of a full theoretical or empirical understanding of the achievable combinations (which does not exist), we have outlined conditions for homeostatic adaptations under a broad class of noise models and arbitrary tuning curves. If some combinations within this class are not realised in biology, that does not invalidate the theoretical results, as the latter have been derived under more general conditions, which nevertheless include combinations that do occur in biology and are achievable by neural circuits (which, as pointed out, include the important case of aligned noise and signal structure – as reviewed in Kohn et al.– to which we have paid particular attention).

      The paper Benucci et al 2013 shows that homeostasis holds for some stimulus distributions, but not others i.e. when the ’adapter’ is present too often. This manuscript, like the Benucci paper, discards those datasets. But from a theoretical standpoint, it seems important to consider why that would be the case, and if it can be predicted by the theory proposed here.

      The theory we provide predicts that, under certain (specified) conditions, we ought to see deviation from exact homeostatic results; indeed, we provide a first order approximation to the optimal gains in this case which quantifies such deviations when they are small. However, unfortunately the form of this deviation depends on a precise choice of stimulus statistics (e.g. the signal correlation matrix, the noise correlation matrix averaged over all stimulus space, and other stimulus statistics), in contrasts to the universality of the homeostatic solution, when it is a valid approximation. In our model of Benucci et al.’s experiment, we restrict to a simple one-dimensional stimulus space (corresponding to orientated gratings), without specifying neural responses to all stimuli; as such, we are not immediately able to make predictions about whether the homeostatic failure can be predicted using the specific form of deviation from homeostasis. However, we acknowledge that this is a weakness of our analysis, and that a more complete investigation would address this question. For reasons of space, we elected not to pursue this further. We have added a paragraph to our Discussion (8th paragraph) explaining this.

      Reviewer#1 (Recommendations for the authors):

      (1) To make the article more accessible I would suggest the following:

      (a) Include a few more illustrations or diagrams that demonstrate key concepts: adaptationof an entire population, clustering within a population, different sources of noise, inference with homeostatic DDCs, etc.

      We thank the reviewer for this suggestion - we have added an additional figure in (Figure 8, Panel A) to explain the concept of clustering within a population. We also added a new panel to Figure 1 (Figure 1B) which we hope will clarify the conceptual postulate underlying our efficient coding framework and its link to the second half of the paper.

      (b) Within the text refer to names of quantities much more often, rather than relying onlyon mathematical symbols (e.g. w,r,Ω, etc).

      We thank the reviewer for the suggestion; we have updated the text accordingly and believe this has improved the clarity of the exposition.

      (2) It is hard to distill which components of the considered theory are crucial to reproducing the experimental observations in Figure 12. Is it the homeostatic modulation, efficient coding, DDCs, or any combination of those or all of them necessary to reproduce the experiment? I believe this could be explained much better, also with an audience of experimentalists in mind.

      We have updated the text to provide additional clarity on this matter (see the pointers to these changes and additions in the revised manuscript, given above in response to your first comment). In particular, reproducing the experimental results requires combining DDCs with homeostatic modulation – with the latter a consequence of our efficient coding theory, and not an independent ingredient or assumption.

      (3) It would be good to comment on how sensitive the results are to the assumptions made, parameter values, etc. For example: do conclusions depend on statistics of neural responses in simulated environments? Do they generalize for different values of the constraint µ? This could be addressed in the discussion / supplementary material.

      This issue is already discussed extensively within the text - see Sec. 2.4, Analytical insight on the optimality of homeostasis, and Sec. 2.5, Conditions for the validity of the homeostatic solution to hold in cortex. In these sections, we outline that - provided a certain parameter combination is small - we expect the homeostatic result to hold. Accordingly, we anticipate that our numerical results will generalise to any settings in which that parameter combination remains small.

      (4) How many neurons/units were used for simulations?

      We apologies for omitting this detail; we used 10,000 units for our simulations. We have edited both the main text and the methods section to reflect this.

      (5) Typos etc: a) Figure 5 caption - the order of panels B and C is switched. b) Figure 6A - I suggest adding a colorbar.

      Thank you. We have relabelled the panels B and C in the appropriate figures so that the ordering in the figure caption is correct. We feel that a colourbar in figure 6A would be unnecessary, since we are only trying to convey the concept of uniform correlations, rather than any particular value for the correlations; as such we have elected not to add a colourbar. We have, however, added a more explicit explanation of this cartoon matrix in the figure caption, by referring to the colors of diagonal vs off-diagonal elements.

      Reviewer#2 (Recommendations for the authors):

      The text on page 10, with the perturbation analysis, could be moved to a supplement, leaving here only the intuition.

      We thank the reviewer for this suggestion; we have moved much of the argument into the appendix so as to not distract the reader with unnecessary technical details.

      Text before eq. 12 “...in cluster a maximize the objective...” should be ‘minimize’?

      The cluster objective as written is indeed maximised, as stated in the text. Note that, in the revised manuscript, this argument has been moved to an appendix to reduce the density of mathematics in the main text.

      Top of page 25 “S<sub>0</sub> and S<sub>0</sub>” should be “S<sub>0</sub> and S<sub>1</sub>”?

      Thank you, we have corrected the manuscript accordingly.

    1. eLife Assessment

      This important study investigates nerve-injury-induced allodynia by studying the role of a subpopulation of excitatory dorsal horn CCK+ neurons that express the estrogen receptor GPR30 and potentially modulate nociceptive sensitivity via direct inputs from primary somatosensory cortex. In this revised version, the authors addressed many of the critiques raised through added analyses that convincingly support the notion that spinal GPR30 neurons are indeed an excitatory subpopulation of CCK+ neurons that contribute to neuropathic pain. While evidence of a direct functional corticospinal projection to CCK+/GPR30+neurons is not fully demonstrated, this work will be of broad interest to researchers interested in the neural circuitry of pain.

    2. Reviewer #1 (Public review):

      In this manuscript, Chen et al. investigate the role of the membrane estrogen receptor GPR30 in spinal mechanisms of neuropathic pain. Using a wide variety of techniques, they first provide convincing evidence that GPR30 expression is restricted to neurons within the spinal cord, and that GPR30 neurons are well-positioned to receive descending input from the primary sensory cortex (S1). In addition, the authors put their findings in the context the previous knowledge in the field, presenting evidence demonstrating that GRP30 is expressed in the majority of CCK-expressing spinal neurons. Overall, this manuscript furthers our understanding of neural circuity that underlies neuropathic pain and will be of broad interest to neuroscientists, especially those interested in somatosensation. Nevertheless, the manuscript would be strengthened by additional analyses and clarification of data that is currently presented.

      Strengths:

      The authors present convincing evidence for expression of GPR30 in the spinal cord that is specific to spinal neurons. Similarly, complementary approaches including pharmacological inhibition and knockdown of GPR30 are used to demonstrate a role for the receptor in driving nerve injury-induced pain in rodent models.

      Weaknesses:

      Although steps were taken to put their data into the broader context of what is already known about the spinal circuitry of pain, more considerations and analyses would help the authors better achieve their goal. For instance, to determine whether GPR30 is expressed in excitatory or inhibitory neurons, more selective markers for these subtypes should be used over CamK2. Moreover, quantitative analysis of the extent of overlap between GPR30+ and CCK+ spinal neurons is needed to understand the potential heterogeneity of the GPR30 spinal neuron population, and to interpret experiments characterizing descending SI inputs onto GPR30 and CCK spinal neurons. Filling these gaps in knowledge would make their findings more solid.

      Revised Manuscript Update:

      In their revised manuscript, Chen et al. have added additional data that establishes GPR30 spinal neurons as a population of excitatory neurons, half of which express CCK. These data help to position GPR30 neurons in the existing framework of spinal neuron populations that contribute to neuropathic pain, strengthening the author's findings.

    3. Reviewer #3 (Public review):

      Summary:

      The authors convincingly demonstrate that a population of CCK+ spinal neurons in the deep dorsal horn express the G protein coupled estrogen receptor GPR30 to modulate pain sensitivity in the chronic constriction injury (CCI) model of neuropathic pain in mice. Using complementary pharmacological and genetic knockdown experiments they convincingly show that GPR30 inhibition or knockdown reverses mechanical, tactile and thermal hypersensitivity, conditioned place aversion, and c-fos staining in the spinal dorsal horn after CCI. They propose that GPR30 mediates an increase in postsynaptic AMPA receptors after CCI using slice electrophysiology which may underlie the increased behavioral sensitivity. They then use anterograde tracing approaches to show that CCK and GPR30 positive neurons in the deep dorsal horn may receive direct connections from primary somatosensory cortex. Chemogenetic activation of these dorsal horn neurons proposed to be connected to S1 increased nociceptive sensitivity in a GPR30 dependent manner. Overall, the data are very convincing and the experiments are well conducted and adequately controlled. However, the proposed model of descending corticospinal facilitation of nociceptive sensitivity through GPR30 in a population of CCK+ neurons in the dorsal horn is not fully supported.

      Strengths:

      The experiments are very well executed and adequately controlled throughout the manuscript. The data are nicely presented and supportive of a role for GPR30 signaling in the spinal dorsal horn influencing nociceptive sensitivity following CCI. The authors also did an excellent job of using complementary approaches to rigorously test their hypothesis.

      Weaknesses:

      The primary weakness in this manuscript involves overextending the interpretations of the data to still propose a role for corticospinal descending facilitation. While the viral tracing demonstrates a potential connection between S1 and CCK+ or GPR30+ spinal neurons, no direct evidence is provided for S1 in facilitating any activity of these neurons in the dorsal horn.

      Comments on the latest version:

      The authors did an excellent job addressing many of the critiques raised. Despite acknowledging that a direct functional corticospinal projection to CCK/GPR30+neurons is not supported by the data and revising the title, these claims still persist throughout the manuscript. Manipulating gene expression or the activity of postsynaptic neurons through a trans-synaptic labeling strategy does not directly support any claim that those upstream neurons are directly modulating spinal neurons through the proposed pathway. Indeed they might, but that is not demonstrated here.

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      In this manuscript, Chen et al. investigate the role of the membrane estrogen receptor GPR30 in spinal mechanisms of neuropathic pain. Using a wide variety of techniques, they first provide convincing evidence that GPR30 expression is restricted to neurons within the spinal cord, and that GPR30 neurons are well-positioned to receive descending input from the primary sensory cortex (S1). In addition, the authors put their findings in the context of the previous knowledge in the field, presenting evidence demonstrating that GRP30 is expressed in the majority of CCK-expressing spinal neurons. Overall, this manuscript furthers our understanding of neural circuity that underlies neuropathic pain and will be of broad interest to neuroscientists, especially those interested in somatosensation. Nevertheless, the manuscript would be strengthened by additional analyses and clarification of data that is currently presented. 

      Strengths: 

      The authors present convincing evidence for the expression of GPR30 in the spinal cord that is specific to spinal neurons. Similarly, complementary approaches including pharmacological inhibition and knockdown of GPR30 are used to demonstrate the role of the receptor in driving nerve injury-induced pain in rodent models. 

      Weaknesses: 

      Although steps were taken to put their data into the broader context of what is already known about the spinal circuitry of pain, more considerations and analyses would help the authors better achieve their goal. For instance, to determine whether GPR30 is expressed in excitatory or inhibitory neurons, more selective markers for these subtypes should be used over CamK2. Moreover, quantitative analysis of the extent of overlap between GPR30+ and CCK+ spinal neurons is needed to understand the potential heterogeneity of the GPR30 spinal neuron population, and to interpret experiments characterizing descending SI inputs onto GPR30 and CCK spinal neurons. Filling these gaps in knowledge would make their findings more solid. 

      Thank you very much for your constructive feedback.

      In response to your suggestion, we have used more specific markers to distinguish excitatory (VGLUT2) and inhibitory (VGAT) neurons via in situ hybridization. These analyses revealed that GPR30 is predominantly expressed in excitatory neurons of the superficial dorsal horn (SDH), as presented in the Results section (lines 117-120) and in Figure 2A-B.

      Additionally, we performed a quantitative analysis to determine the extent of co-localization between GPR30+ and CCK+ neurons. The data were included in the Results (lines 131–132) and Figure 2G.

      Reviewer #2 (Public review):

      Using a variety of experimental manipulations, the authors show that the membrane estrogen receptor G protein-coupled estrogen receptor (GPER/GPR30) expressed in CCK+ excitatory spinal interneurons plays a major role in the pain symptoms observed in the chronic constriction injury (CCI) model of neuropathic pain. Intrathecal application of selective GPR30 agonist G-1 induced mechanical allodynia and thermal hyperalgesia in male and female mice. Downregulation of GPR30 in CCK+ interneurons prevented the development of mechanical and thermal hypersensitivity during CCI. They also show the up modulation of AMPA receptor expression by GPR30. 

      Generally, the conclusions are supported by the experimental results. I also would like to see significant improvements in the writing and the description of results. 

      Methodological details for some of the techniques are rather sparse. For example, when examining the co-localization of various markers, the authors do not indicate the number of animals/sections examined. Similarly, when examining the effect of shGper1, it is unclear how many cells/sections/animals were counted and analyzed. 

      In other sections, there is no description of the concentration of drugs used (for example, Figure 4H). In Figures 4C-E, there is no indication of the duration of the recordings, the ionic conditions, the effect of glutamate receptor blockers, etc 

      Some results appear anecdotal in the way they are described. For example, in Figure 5, it is unclear how many times this experiment was repeated. 

      We sincerely appreciate your valuable feedback and thoughtful recommendations.

      To address your concerns regarding methodological transparency, we have added the following details to the revised manuscript:

      The number of animals and sections analyzed in co-localization studies.

      The number of cells/sections/animals used in each quantification following shGper1 treatment.

      The concentrations of drugs administered (e.g., in Figure 4H).

      Detailed recording conditions, including duration, ionic composition, and pharmacological conditions (Figures 4C-E).

      In addition, we have thoroughly revised the writing throughout the manuscript to enhance clarity and precision in the description of our findings.

      Reviewer #3 (Public review): 

      Summary: 

      The authors convincingly demonstrate that a population of CCK+ spinal neurons in the deep dorsal horn express the G protein-coupled estrogen receptor GPR30 to modulate pain sensitivity in the chronic constriction injury (CCI) model of neuropathic pain in mice. Using complementary pharmacological and genetic knockdown experiments they convincingly show that GPR30 inhibition or knockdown reverses mechanical, tactile, and thermal hypersensitivity, conditioned place aversion, and c-fos staining in the spinal dorsal horn after CCI. They propose that GPR30 mediates an increase in postsynaptic AMPA receptors after CCI using slice electrophysiology which may underlie the increased behavioral sensitivity. They then use anterograde tracing approaches to show that CCK and GPR30 positive neurons in the deep dorsal horn may receive direct connections from the primary somatosensory cortex. Chemogenetic activation of these dorsal horn neurons proposed to be connected to S1 increased nociceptive sensitivity in a GPR30-dependent manner. Overall, the data are very convincing and the experiments are well conducted and adequately controlled. However, the proposed model of descending corticospinal facilitation of nociceptive sensitivity through GPR30 in a population of CCK+ neurons in the dorsal horn is not fully supported. 

      Strengths: 

      The experiments are very well executed and adequately controlled throughout the manuscript. The data are nicely presented and supportive of a role for GPR30 signaling in the spinal dorsal horn influencing nociceptive sensitivity following CCI. The authors also did an excellent job of using complementary approaches to rigorously test their hypothesis. 

      Weaknesses: 

      The primary weakness in this manuscript involves overextending the interpretations of the data to propose a direct link between corticospinal projections signaling through GPR30 on this CCK+ population of spinal dorsal horn neurons. For example, even in the cropped images presented, GPR30 is present in many other CCK-negative neurons. Only about a quarter of the cells labeled by the anterograde viral tracing experiment from S1 are CCK+. Since no direct evidence is provided for S1 signaling through GPR30, this conclusion should be revised. 

      Thank you for your encouraging comments and critical insights.

      We fully acknowledge the concern regarding the proposed direct involvement of corticospinal projections in modulating nociceptive behavior via GPR30 in CCK+ neurons. While our anterograde tracing experiments suggest anatomical overlap, we agree that definitive evidence of functional connectivity is lacking.

      Accordingly, we have revised the Abstract, Discussion, and Graphical Abstract to present our findings more cautiously. We now describe our observations as indicating that S1 projections potentially interact with GPR30<sup>+</sup> spinal neurons, rather than asserting a definitive functional link.

      To support this revised interpretation, we performed additional quantitative analyses examining the co-localization among S1 projections, CCK+, and GPR30+ neurons. Furthermore, we clarified that the chemogenetic activation studies targeted a mixed neuronal population and did not exclusively manipulate CCK+ neurons.

      These changes aim to better align our conclusions with the presented data and provide a more nuanced framework for future investigations.

      Reviewer #1 (Recommendations for the authors): 

      Major corrections 

      (1) Figure 2: The authors conclude that GPR30 is mainly expressed in excitatory spinal neurons because they are labeled by a virus with a Camk2 promoter. While there is evidence that Camk2 is specific to excitatory neurons in the brain, based on RNAseq datasets (e.g. Linnarsson Lab, http://mousebrain.org/adolescent/genesearch.html ) this is less clear cut within the spinal cord. A more direct way to assess the relative expression of GPR30 in excitatory versus inhibitory neurons would be to perform immunohistochemistry or FISH with GPR30/Vglut2/Vgat. 

      Alternatively, if this observation is not crucial for the overall arch of the story, I recommend the authors eliminate these data, as they do not support the idea that GPR30 is mainly in excitatory neurons. 

      We thank the reviewer for highlighting this important limitation. To strengthen our conclusion regarding the neuronal identity of GPR30-expressing cells, we performed fluorescent in situ hybridization (FISH) using vGluT2 (marker for excitatory neurons) and VGAT (marker for inhibitory neurons). The results confirmed that GPR30 is predominantly expressed in vGluT2-positive excitatory neurons within the spinal cord. These new data are presented in the revised manuscript (lines 117-120) and shown in Figure 2A-B.

      (2) (2a) Figure 2: The authors also report that GPR30 is expressed in most CCK+ spinal neurons. A more rigorous way to present the data would be to perform quantification and report the % of CCK neurons that are GPR30. 

      (2b) More importantly, it is unclear what % of GPR30 neurons are CCK+. These types of quantifications would provide useful insights into the heterogeneity of CCK and GPR30 neuron populations, and help align findings of experiments using the behavioral pharmacology using GRP antagonists to the knockdown of Gper1 in CCK spinal neurons - for instance, does a population of GRP30+/CCK- neurons exist? If so, it would be worth discussing what role (if any) that population might play in nerve injury-induced mechanical allodynia. 

      Understanding the breakdown of GPR30 populations becomes even more relevant when the authors characterize which cell types are targeted by descending projections from S1. It is clear that the vast majority of CCK+ neurons that receive descending input from S1 neurons are GPR30+, but there are many other GPR30+ neurons that do not receive input from SI neurons presented in 5M. Is this simply because only a small fraction of CCK+/GPR30+ neurons are targeted by descending S1 projections, or could they represent a distinct population of GPR30 neurons? 

      (2a) We appreciate the suggestion. Quantification showed that approximately 90% of CCK⁺ neurons express GPR30, and about 50% of GPR30⁺ neurons co-express CCK. These data are now provided in the revised Results (lines 131-132) and in Figure 2F-G.

      (2b) Indeed, our data reveal that a substantial portion of GPR30⁺ neurons do not co-express CCK. While this study focuses on GPR30 function in CCK⁺ neurons, we recognize the potential relevance of GPR30⁺/CCK⁻ populations. We have addressed this point in the Discussion (lines 303-306):

      “However, it should be noted that half of GPR30⁺ neurons are not co-localized with CCK⁺ neurons, and further studies are needed to explore the function of these GPR30⁺/CCK⁻ neurons in neuropathic pain.”

      Regarding descending input, our data in Figure 5 show that S1 projections selectively innervate a subset (~30%) of CCK⁺ neurons, most of which co-express GPR30. This suggests that S1-targeted CCK⁺/GPR30⁺ neurons may represent a functionally distinct population. We have added clarification to the revised manuscript, while acknowledging that further studies are needed to elucidate the roles of non-targeted GPR30⁺ neurons.

      (3) Throughout the manuscript both male and female mice were used in experiments. Rather than referring to male and female mice as different genders, it would be more appropriate to describe them as different sexes. 

      As suggested, we have replaced all instances of “gender” with “sex” throughout the revised manuscript.

      (4) Figure 5: To increase the ease of interpreting the figure, in panels 5J and 5N, it would be helpful to indicate directly on the figure panel which another marker was assessed in double-labeling analyses.

      We have revised Figures 5J and 5N to include clear labels identifying the markers used in double-labeling analyses, to improve interpretability.

      Minor corrections: 

      (1) Line 36, I believe the authors mean to say "GPER/GPR30 in spinal neurons", rather than just "spinal". 

      Corrected as suggested. The sentence now reads (line 34):

      “Here we showed that the membrane estrogen receptor G-protein coupled estrogen receptor (GPER/GPR30) in spinal neurons was significantly upregulated in chronic constriction injury (CCI) mice…”

      (2) There are minor grammatical errors throughout the manuscript that interfere with comprehension. Proofreading/editing of the English language use may be beneficial. 

      We have thoroughly revised the manuscript for clarity and corrected grammatical and syntactic errors to improve readability.

      (3) Line 169-170, reads "Known that EPSCs are mediated by glutamatergic receptors like AMPA receptors and several studies have been reported the relationship between GPR30 and AMPA receptor25,29". Rewriting the sentence such that it better describes what the known relationship is between GPR30 and AMPA would be helpful in setting up the rationale of the experiment in Figure 4. 

      We have rewritten this section to better clarify the rationale behind the electrophysiological experiments (lines 161-164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors, and emerging evidence suggesting that GPR30 enhances excitatory transmission by promoting clustering of glutamatergic receptor subunits, we examined whether GPR30 modulates EPSCs via AMPA receptor-dependent mechanisms.”

      (4) Line 198-199 "Then we explored the possible connections among GPR30, S1-SDH projections and CCK+ neuron." In the context of spinal circuitry, "connections" may raise the expectation that synaptic connectivity will be evaluated. What I think best describes what the authors investigated in Figure 5 is the "relationship" between GPR30, S1-SDH projections, and CCK+ neurons. 

      We have revised the sentence accordingly (lines 184-186):

      “Building on previous findings suggesting a functional interaction between S1-SDH projections and spinal CCK⁺ neurons, our current study aimed to further elucidate the structural relationship among GPR30, S1-SDH projections, and CCK⁺ neurons.”

      (5) Figure 5: To increase the ease of interpreting the figure, in panels 5J and FN, it would be helpful to indicate directly on the figure panel which other marker was assessed in double-labeling analyses. 

      We have added direct labels to figure panels to clarify double-labeled analyses in the revised Figure 5J and 5N.

      Reviewer #2 (Recommendations for the authors): 

      (1) Can the authors provide more detail about the distribution of CCK+ cells in the spinal cord and, in particular, the localization of double-stained (CCK/cfos) neurons? 

      We thank the reviewer for this suggestion. To better characterize the distribution of CCK⁺ neurons within the spinal dorsal horn (SDH), we performed immunostaining in CCK-tdTomato mice using lamina-specific markers: CGRP (lamina I), IB4 (lamina II), and NF200 (lamina III–V). Our results demonstrate that CCK⁺ neurons are primarily localized in the deeper laminae of the SDH. These findings are now described in the revised Results (lines 126–129) and shown in Figure 2E.

      In addition, we conducted c-Fos immunostaining in CCK-Ai14 mice and found increased activation of CCK⁺ neurons following CCI. This supports the involvement of CCK⁺ neurons in neuropathic pain. These data are included in the Results (lines 129–131) and Supplementary Figure S4.

      (2) Figure 2A. There is no formal quantification of the percentage of TdTomato+ neurons that are also CCK+. The description of these results is insufficient. 

      We appreciate this point and have revised the description of Figure 2A accordingly. To strengthen our analysis, we conducted additional FISH experiments with vGluT2 and VGAT probes. Quantification revealed that GPR30 is predominantly expressed in excitatory neurons (approximately 60%). These data are shown in the revised Results (lines 117-119) and Figures 2A-B and S3. This supports our conclusion that GPR30 is largely localized to excitatory spinal interneurons.

      (3) Figure 4H. What is the evidence that these are AMPA-mediated currents? This is not explained in the text. 

      Thank you for raising this point. We now provide detailed experimental procedures to clarify that the recorded EPSCs are AMPA receptor–mediated. Specifically, spinal slices from CCK-Cre mice were used, and excitatory postsynaptic currents were recorded in the presence of APV (100 μM, NMDA receptor blocker), bicuculline (20 μM, GABA_A receptor blocker), and strychnine (0.5 μM, glycine receptor blocker), ensuring that the observed currents were AMPA-dependent. These methodological details are now clearly described in the revised Results (lines 165–173) and supported by prior literature (Zhang et al., J Biol Chem 2012; Hughes et al., J Neurosci 2010).

      (1) Yan Zhang, Xiao Xiao, Xiao-Meng Zhang, Zhi-Qi Zhao, Yu-Qiu Zhang (2012). Estrogen facilitates spinal cord synaptic transmission via membrane-bound estrogen receptors: implications for pain hypersensitivity. J Biol Chem. Sep 28;287(40):33268-81.

      (2) Ethan G Hughes, Xiaoyu Peng, Amy J Gleichman, Meizan Lai, Lei Zhou, Ryan Tsou, Thomas D Parsons, David R Lynch, Josep Dalmau, Rita J Balice-Gordon (2010). Cellular and synaptic mechanisms of anti-NMDA receptor encephalitis. J Neurosci. 2010 Apr 28;30(17):5866-75.

      (4) What is the signaling mechanism leading to a larger amplitude of currents after G-1 infusion? 

      We thank the reviewer for this important question. G-1 is a selective agonist for GPR30. Based on previous studies by Luo et al. (2016), we speculate that activation of GPR30 may increase the clustering of glutamatergic receptor subunits at postsynaptic sites, thereby enhancing AMPA receptor-mediated currents. While our current study did not directly address the intracellular signaling cascade, we have incorporated this mechanistic speculation in the Discussion.

      Jie Luo, X.H., Yali Li, Yang Li, Xueqin Xu, Yan Gao, Ruoshi Shi, Wanjun Yao, Juying Liu, Changbin Ke (2016). GPR30 disrupts the balance of GABAergic and glutamatergic transmission in the spinal cord driving to the development of bone cancer pain. Oncotarget 7, 73462-73472. 10.18632/oncotarget.11867.

      (5) Figure 4I. Please include error bars. 

      We have revised Figure 4I to include error bars, as requested.

      (6) Line 198. What is the evidence that AAV2/1 EF1α FLP is an antegrade trans monosynaptic marker? 

      We thank you for this request. AAV2/1 has been widely used for anterograde monosynaptic tracing based on its properties (Wang et al., Nat Neurosci 2024; Wu et al., Neurosci Bull 2021): (1) it infects neurons at the injection site and undergoes active anterograde transport; (2) newly assembled viral particles are released at synapses and infect postsynaptic partners; (3) in the absence of helper viruses, the spread halts at the first synapse, ensuring monosynaptic restriction. We have elaborated on this in the revised manuscript (line 198), citing Wang et al. (Nat Neurosci 2024) and Wu et al. (Neurosci Bull 2021).

      (1) Hao Wang, Qin Wang, Liuzhe Cui, Xiaoyang Feng, Ping Dong, Liheng Tan, Lin Lin, Hong Lian, Shuxia Cao, Huiqian Huang, Peng Cao, Xiao-Ming Li (2024). A molecularly defined amygdalaindependent tetra-synaptic forebrain-tohindbrain pathway for odor-driven innate fear and anxiety. Nat Neurosci. 2024 Mar;27(3):514-526.

      (2) Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (7) Figure 5G. I do not understand the logic of this experiment. A Cre AAV is injected in the S1 cortex. Why should this lead to the expression of tdTomato on a downstream (postsynaptic?) neuron? The authors should quote the literature that supports this anterograde transsynaptic transport.

      We appreciate this question. As described in previous studies (e.g., Wu et al., Neurosci Bull 2021), AAV2/1-Cre injected into the S1 cortex leads to Cre expression in projection targets due to transsynaptic anterograde transport. Subsequent injection of a Cre-dependent AAV (AAV2/9-DIO-mCherry) into the spinal cord enables specific labeling of postsynaptic neurons that receive input from S1. We have clarified this mechanism in line 206 and provided the appropriate citation.

      Zi-Han Wu, Han-Yu Shao, Yuan-Yuan Fu, Xiao-Bo Wu, De-Li Cao, Sheng-Xiang Yan, Wei-Lin Sha, Yong-Jing Gao, Zhi-Jun Zhang (2021). Descending Modulation of Spinal Itch Transmission by Primary Somatosensory Cortex. Neurosci Bull. 2021 Sep;37(9):1345-1350.

      (8) The same question arises when interpreting the results obtained in Figure 6.

      We thank the reviewer for the question, and we have addressed it in point (7).

      (9) Line 257. How do the authors envision that estrogen would change its modulation of GPR30 under basal and neuropathic conditions? Is there any evidence for this speculation? 

      We thank the reviewer for raising this thoughtful question. In the current study, we focused on pharmacologically manipulating GPR30 activity via its selective agonist and antagonist. We did not directly investigate how endogenous estrogen regulates GPR30 under physiological and neuropathic states. We have recognized this limitation and highlighted the need for future research to investigate this regulatory mechanism.

      (10-20) In my opinion, the entire manuscript needs a careful revision of the English language. While one can follow the text, it contains numerous grammatical and syntactic errors that make the reading far from enjoyable. I am highlighting just a few of the many errors. 

      We appreciate the reviewer’s honest assessment. The manuscript has undergone thorough language editing by a native English speaker to correct grammatical errors, improve clarity, and enhance overall readability. We also restructured several sections, particularly the Discussion, to improve logical flow.

      (21) The discussion of results is a bit disorganized, with disconnected sentences and statements, and somewhat repetitive. For example, lines 303 to 306 lack adequate flow. It is also quite long and includes general statements that add little to the discussion of the new findings (lines 326-333). 

      We agree and have revised the Discussion extensively. Disconnected or repetitive sentences (e.g., lines 303-306, 326-333) have been removed or rewritten. For instance, we added a new transitional paragraph (lines 307-311) to improve flow:

      “Abnormal activation of neurons in the SDH is a key contributor to hyperalgesia, and enhanced excitatory synaptic transmission is a major mechanism driving increased neuronal excitability. Therefore, we evaluated excitatory postsynaptic currents (EPSCs) and observed increased amplitudes in CCK⁺ neurons following CCI, suggesting elevated excitability in these neurons.”

      We also removed redundant generalizations to maintain a focused discussion of our novel findings.

      Reviewer #3 (Recommendations for the authors): 

      (1) What is the distribution of GPR30 throughout the spinal cord and DRG? The authors demonstrate that this can overlap with a CCK+ population, but there are many GPR30+ and CCK negative neurons, even in the cropped images presented. It would be helpful to quantify the colocalization with CCK. 

      We thank the reviewer for this important point. As shown in the revised manuscript, GPR30 is expressed in both the spinal cord and dorsal root ganglia (DRG). However, our updated data (Figure 1B) demonstrate that Gper1 mRNA levels in the DRG are not significantly altered after CCI, suggesting a limited involvement of DRG GPR30 in neuropathic pain. These results are described in the revised Results (line 94).

      Regarding spinal co-expression, we performed a detailed quantification. Approximately 90% of CCK⁺ neurons express GPR30, while about 50% of GPR30⁺ neurons are CCK⁺. These co-localization results are now included in the revised Results and presented in Figure 2G.

      (2) It is clear that CCI and GPR30 influence excitatory synaptic transmission in CCK+ neurons. However, these experiments do not fully support the authors' claims of a postsynaptic upregulation of AMPARs. Comparing amplitudes and frequencies of spontaneous EPSCs cannot necessarily distinguish a pre- vs postsynaptic change since some of these EPSCs can arise from spontaneous action potential firing. I suggest revising this conclusion. 

      We appreciate these insightful comments. We fully agree that our data from spontaneous EPSC recordings (sEPSCs) in CCK⁺ neurons are not sufficient to distinguish between pre- and postsynaptic mechanisms, as sEPSCs may include spontaneous presynaptic activity. Therefore, we have revised the text throughout the manuscript to avoid overstating conclusions related to postsynaptic AMPA receptor upregulation.

      (3) What is the rationale for the evoked EPSC experiments from electrical stimulation in "the deep laminae of SDH?" I do not think that this experiment can rule out a presynaptic contribution of GPR30 to the evoked responses, particularly if these are Gs-coupled at presynaptic terminals. Paired-pulse stimulations could help answer this question, otherwise, alternative interpretations, also related to the point above, should be provided. 

      We thank the reviewer for this thoughtful critique. Indeed, electrical stimulation of the deep SDH laminae does not exclude presynaptic involvement, especially considering that GPR30 is a G protein–coupled receptor (GPCR) and could act presynaptically. We agree that paired-pulse ratio (PPR) analysis would be more informative in distinguishing pre- from postsynaptic effects, but this was not performed due to technical limitations in our current experimental setup.

      Accordingly, we have revised our interpretations in both the Results and Discussion to acknowledge that our data do not rule out presynaptic contributions. We now state that GPR30 activation enhances EPSCs in CCK⁺ neurons, while further studies are needed to dissect the precise site of action.

      (4) I appreciate the challenging nature of the trans-synaptic viral labeling approaches, but the chemogenetic and Gper knockdown experiments do not selectively target this CCK+ population of deep dorsal horn neurons. The data are clear that each of these components (descending corticospinal projections, CCK neurons, and GPR30) can modulate nociceptive hypersensitivity, but I do not agree with the overall conclusion that each of are directly linked as the authors propose. I recommend revising the overall conclusion and title to reflect the convincing data presented. 

      We thank the reviewer for this critical observation. We agree that while our data show functional roles for descending cortical input, CCK⁺ neurons, and GPR30 in modulating pain hypersensitivity, the evidence does not establish a definitive direct circuit integrating all three components.

      In response, we have revised our conclusions to reflect this limitation. Specifically, we avoided claiming a direct functional link among S1 projections, CCK⁺ neurons, and GPR30. Instead, we now propose that GPR30 modulates neuropathic pain primarily through its action in CCK⁺ spinal neurons, with potential involvement of descending facilitation from the somatosensory cortex.

      Additionally, we have revised the manuscript title to better reflect our mechanistic focus:<br /> “GPR30 in spinal CCK-positive neurons modulates neuropathic pain.”

      Minor Corrections

      (1) The authors should refer to mice by sex, not gender. 

      Corrected throughout the manuscript.

      (2) Page 9, line 195: "significantly" is used to refer to co-localization of 28.1%. What is this significant to? 

      We have revised the sentence to accurately describe the observed percentage, without implying statistical significance:

      “Our co-staining results revealed that a high proportion of CCK⁺ S1-SDH postsynaptic neurons expressed GPR30” (line 198-199).

      (3) I recommend modifying some of the transition phrases like "by the way," "what's more," and "besides". 

      All informal expressions have been replaced with academic alternatives including “Furthermore,” “Additionally,” and “Moreover.”

      (4) Additional guides to mark specific laminae in the dorsal horn would be useful. 

      We added immunostaining with laminar markers (CGRP for lamina I and NF200 for lamina III–V), and these data are now shown in Figure 2E and described in the Results (lines 126-129).

      (5) Page 5, line 115: immunochemistry should be immunohistochemistry. 

      Corrected as suggested.

      (6) Page 6, line 136: "Confirming the structural connnections" was not demonstrated here. Perhaps co-localization between GPR30 and CCK+. 

      The text was revised to “To functionally interrogate GPR30 and CCK⁺ neurons in neuropathic pain...” (line 133).

      (7) Page 8, line 166: unsure what "took and important role" means. 

      This phrasing was corrected for clarity and replaced with an accurate scientific description.

      (8) Page 8, line 168: "IPSCs of spinal CCK+ neurons" implies that they are sending inhibitory inputs. 

      We revised the term to “EPSCs” to correctly reflect excitatory synaptic currents in CCK⁺ neurons.

      (9) Page 8, line 169: "Known that EPSCs" is missing an introductory phrase. 

      The sentence was rewritten to include an appropriate introductory clause (lines 161–164):

      “Given that EPSCs are primarily mediated through glutamatergic receptors such as AMPA receptors...”

      (10) Page 10, line 227 and 228: "adequately" and "sufficiently" should be adequate and sufficient. 

      We corrected these terms to the proper adjective forms: “adequate” and “sufficient” (lines 224-225).

    1. eLife Assessment

      This study presents a valuable finding regarding the role of oxytocin neurons in thermogenesis and behavioral thermoregulation. The use of numerous converging methods, including behavior, fiber photometry, optogenetics, thermal recordings, metabolic analyses, and more, produces a multi-dimensional dataset delivering findings that provide solid support for the conclusions. Conclusions would be strengthened with validation of the approaches, inclusion of a loss of function experiment, and further investigation of the social nature of the behavior. The maternal findings are, at present, somewhat disconnected from the conclusions. The findings are novel and open new doors for understanding the role of the PVT and oxytocin in thermoregulation work; the work will be of strong interest to the thermoregulation, social behavior, and oxytocin signaling communities.

    2. Reviewer #1 (Public review):

      Summary:

      The authors identify and investigate a specific population of PVNOT neurons (oxytocin neurons of the paraventricular hypothalamus) that seem to be involved in both behavioral and autonomic thermoregulation. These cells are activated by social thermoregulatory behaviors, but can influence thermoregulation in both social and nonsocial contexts, specifically during transitions and when mice are at low core body temperature (Tb).

      Strengths:

      The manuscript has many strengths.

      This is a novel study, with a clear question that is addressed using an array of well-designed experiments employing integrative methods. Most of the figures are well-developed, and the analysis is generally rigorous and well-detailed. The authors are clearly very experienced in this field, and indeed, their scholarly introduction and discussion sections are to their credit.

      The link between thermoregulation and the oxytocin system is well established, as is the link between social behavior and the same broad system. However, the link between these three things is novel, if it can be well substantiated. I am not persuaded that was achieved here, but I do think this manuscript has many novel and useful offerings.

      The authors use a cooling floor, and only go down to 10 degrees Celsius. This is fine, but I would like to see the effects using ambient temperature also. This is not a crucial issue, as it is not necessary for the authors' interpretations, but it could improve measurement sensitivity.

      Through an elegant behavioral experiment in Figure 1, the authors identify c-Fos patterns in the PVN that are activated by active social huddling, and they show that at the RNA level these cells overlap with oxytocin, indicating that they are oxytocin-producing cells. But this is not well discussed or indeed quantified.

      The authors engage in a deep analysis of fiber photometry experiments, first by observing PVNOT neuron overall activity during a variety of different behaviors in the context of three different temperatures. Activity was associated with nesting, quiescence, and both types of huddling (when social opportunities exist). Social situations did not strongly affect this, nor did temperature conditions. These analyses indicate that the PVNOT neurons are involved in mediating specific behavioral outputs.

      With more detailed analysis, the authors investigated how PVNOT neuronal activity relates to behavioral state transition. They found that the probability of peak PVNOT neural activity strongly predicts the offset of quiescence or quiescent huddling, and therefore can be argued to signal an increase in physical activity, and as such, increased metabolism. However, the opposite pattern was observed for huddling and nesting (onset being associated with PVNOT activity), again arguing for increased thermogenesis as a function.

      What is particularly compelling is that these peaks of activity tend to occur during low Tb, again arguing for the function in increasing body warmth.

      The authors then employ an impressive setup where they image brown adipose tissue (BAT) in tandem with DeepLabCut (DLC) based animal tracking. Crucially, BAT activity and surface temperature correlated with the calcium peak of PVNOT neurons.

      Lastly, optogenetic activation of PVNOT neurons increased Tb when it was in the lower range, but not when in the higher range. It also affected BAT and rump temperature, again at low Tb. However, there is no real effect on behavior, except a trend in activity.

      The authors do some interesting tracing work at the end, though this is not functionally explored. That is not a criticism, as it does seem like this would be a whole follow-up study.

      Weaknesses:

      While novel and valuable, the manuscript feels incomplete in its current form.

      The main evidence lacking is a loss of function of the experiment. Ideally, the authors would chronically and/or acutely inhibit PVNOT neurons to establish their necessity. I know this seems obvious, but I think it is important.

      The relative lack of behavioral analysis following optogenetic activation of PVNOT neurons is puzzling. The authors must surely want to study what this intervention does to behavioral state transitions. I feel that the current level of analysis limits the overall conclusions of this study to a large extent.

      A broader criticism is that the social dimension of this manuscript seems overplayed. Naturally, oxytocin signalling can be implicated in social behavior based on a large literature. However, the focus on social thermogenesis seems like a crude integration of social behavior and thermogenesis. Given that the authors see their effects in both social and nonsocial cases of thermoregulation, I am not sure the attempts at integrating social functions and thermogenic functions of PVNOT neurons are warranted. That is, unless the authors have further experiments or analysis that can convincingly justify this link.

      In addition, the analysis of virgin females and lactating mothers seems out of place in Figure 4.

      The c-Fos/oxytocin overlap needs to be quantified.

      The methods section could be improved by explaining how the authors exclude animals that exhibit both types of huddling, if they occur within a 90-minute time window. This seems like it could cause significant confounds.

      The computer vision model is not well-explained. The authors need to be far more explicit here about how it was validated.

      The authors should cite and consider this preprint: https://www.biorxiv.org/content/10.1101/2024.09.17.613378v1

    3. Reviewer #2 (Public review):

      Summary:

      This is a very interesting study from Vandendoren and colleagues examining the role of PVN oxytocin neurons during thermoregulatory behaviors, in particular during thermoregulatory huddling. The findings are important and compelling, and have implications for the thermoregulation field as well as the social/naturalistic behavior field.

      Strengths:

      The study is very creative and tackles a challenging task to examine how natural and social behavior influences neural circuits for a homeostatic system such as thermoregulation. The authors use a combination of state-of-the-art tools (photometry, optogenetics, automated behavior tracking, thermal imaging, and core body temperature measurement), often in combination with each other, to produce a rigorous and high-dimensional dataset. Carrying out tightly temperature-controlled experiments and examining natural behavior, neural activity, and body physiology simultaneously is quite a feat. I applaud the authors for taking this on in a rigorous and detailed manner. This paper will be valuable for both the thermoregulation field as well as for researchers interested in naturalistic social behaviors. The conclusions are supported by the data.

      Weaknesses:

      I have a number of questions and suggestions for clarification that would help improve the interpretation of the findings.

      (1) Figure 1D-F: It would be helpful to include representative images of cFos expression in the PVN, LS, and DMH during both quiescent and solo huddling conditions, to better illustrate the reported differences.

      (2) Figure 1C: The data suggest a general suppression of neural activity during sleep-associated quiescent huddling, which somewhat complicates the interpretation of what specifically the active huddling cells are responding to. A more informative control might have been a comparison between huddling and a more generic form of social engagement (e.g., dyadic sniffing) to assess whether huddling-responsive neurons are broadly tuned to social stimuli. While it may not be feasible to add this experimentally at this time, a brief discussion of this limitation in the main text would be valuable.

      (3) Figure 2H-J vs. Figure 1: The fiber photometry data suggest increased PVN activity during quiescent huddling vs active huddling, which appears to contrast with the cFos results from Figure 1. It would be helpful for the authors to comment on possible reasons for this discrepancy-e.g., methodological differences, temporal resolution, or cell-type specificity.

      (4) Figure 2O: A comparable linear regression for active huddling would be informative to assess whether the observed relationships extend across behavioral states.

      (5) Temperature manipulation: The use of floor temperature changes presents a distinct physiological and sensory experience from, for example, manipulation of ambient temperature. A discussion of how this choice may affect neural circuit engagement or interpretation of thermoregulatory responses would be beneficial.

      (6) Correlations with behavior: Across the manuscript, it would be informative to see correlations between huddle duration and neural activity (e.g., cFos expression, calcium signal magnitude). Similarly, do longer huddles produce greater thermogenic effects?

      (7) Lactating vs. virgin mothers: The inclusion of maternal data is intriguing but feels somewhat disconnected from the central huddling-thermoregulation narrative. If these experiments are to remain, additional explanation of their rationale and how they fit into the broader story would help clarify their relevance.

      (8) Optogenetic manipulation: Have the authors tested the effect of PVN OT neuron stimulation or inhibition during huddling? Even a negative result would be of interest to the field. If these data exist (main or supplementary), I apologize for missing them. If not, the authors might consider including them or commenting briefly on any attempts or challenges in carrying out these experiments.

    4. Reviewer #3 (Public review):

      Summary:

      The authors aimed to elucidate the relationship between physiological state (i.e., behavioral status and thermogenic sympathetic activity) and the activity of hypothalamic paraventricular oxytocin (PVNOT) neurons in female mice. They studied this by combining automated classification of mouse behavior via video-based analysis with calcium imaging of PVNOT neuron activity. Sympathetic thermogenesis was inferred from surface temperature changes captured by infrared thermography, and the authors provided their custom analysis scripts in the manuscript. Notably, they found that a strong, pulsatile activation of PVNOT neurons was "occasionally" observed immediately before the animals transitioned from a resting to an active state. This pulsatile activity was observed in both pair-housed and individually housed animals. While PVNOT neurons are often associated with social behaviors, this finding suggests that the oxytocinergic system is also engaged during naturalistic behaviors, even in the absence of social interactions. If experiments were more convincingly performed and presented, the results would point to a broader physiological role of central oxytocin, including in the regulation of fundamental brain states and homeostatic processes, and offer a new perspective on the functional significance of central oxytocin signaling.

      Strengths:

      The oxytocinergic neural system is believed to subserve a wide range of physiological functions, and elucidating these roles requires monitoring PVNOT neuronal activity under various behavioral contexts, as well as manipulating this activity to establish causal links. In the present study, the authors show a technically sound experimental framework that integrates behavioral tracking in both individually and group-housed mice with the observation and manipulation of PVNOT neuron activity. This experimental setup represents a valuable methodological resource for researchers investigating the physiological functions of oxytocin.

      Weaknesses:

      While this study successfully established a new experimental setup for simultaneous analyses of behavior and PVNOT neuronal activity, there are several concerns regarding the interpretation of the results and the robustness of the conclusions, which should be more thoroughly addressed.

      (1) The study relies on the assumption that calcium imaging and optogenetic manipulation were restricted only to PVNOT neurons. However, the specificity of AAV-mediated gene expression was not verified quantitatively. A fair number of cell bodies in the PVN expressed GCaMP8s, but not OT, indicating potential off-target expression (see Figure S2A, B). The lack of quantitative validation weakens confidence in the causal interpretation of the results.

      (2) The study focuses on the transition from rest to active states following pulsatile activity of PVNOT neurons. However, the physiological significance of this pulsatile activity remains unclear. According to the authors, pulsatile activity occurred with an approximately 20% probability within 100 seconds prior to the end of the resting state. This implies that, in the remaining 80% of rest-to-active transitions, pulsatile PVNOT activity did not occur, suggesting that it is not essential for initiating the transition. A comparative analysis of behavioral and thermogenic changes between transitions with and without pulsatile PVNOT activity would help to further clarify the functional relevance of this phenomenon and strengthen the authors' interpretation of the findings.

      (3) The study identifies a correlation between pulsatile activity of PVNOT neurons and rest-to-active transitions, and tests for a causal relationship using optogenetic stimulation. However, since PVNOT neurons are known to co-release other neurotransmitters such as glutamate, it remains unclear whether the observed effects are mediated specifically through oxytocin receptor signaling. To address this question, functional intervention experiments using oxytocin receptor antagonists or receptor knockout mice are necessary.

      (4) The authors attempted to detect BAT thermogenesis and skin vasomotion using infrared thermography. This technique measures only skin hair temperatures (since the skin was not shaved), but does not measure "BAT temperature" or "vasomotor tone". As seen in Figure 5E, the temperatures of the body surface areas ("BAT", "Rump", and "Dorsal surface") mostly changed in parallel, indicating that these temperatures are strongly affected by body core temperature. Therefore, the thermographic measurements in this study did not provide convincing information on BAT thermogenesis or skin vasomotion. To avoid misleading reports, the authors need to use other techniques to directly measure temperatures, such as telemetry.

      (5) Photostimulation of PVNOT neurons increased Tb after 400 sec (6.6 min) (Figure 5). This latency is too long to conclude that the neuronal stimulation elicited BAT thermogenesis. A more reasonable explanation is that the increase in Tb was caused by the induction of physical activity (Figure S4C), which slowly generates heat and contributes to the elevation of Tb. However, this view contradicts the authors' claim. To address this concern, the authors should directly measure BAT thermogenesis and compare it with the rate of Tb elevation. If BAT thermogenesis occurs, the rate at which the BAT temperature increases must exceed the rate at which Tb rises.

    5. Author response:

      (1) Maternal lactation assay and PVN oxytocin neuron identity

      Reviewers and editors noted that the maternal lactation assay felt out of place (Editors, R1, R2) and asked for clearer validation of AAV specificity in the PVN (R3). These issues are linked: the primary purpose of the lactation assay was to physiologically validate that the recorded neurons are oxytocinergic, as PVNOT neurons exhibit well-established pulsatile activity during lactation.

      In response, we will (i) explicitly frame the lactation assay as a validation experiment, (ii) streamline its presentation to sit naturally with our identity-validation rationale, and (iii) clarify our AAV targeting and expression controls; we will also address our oxytocin immunohistochemistry quantification and its limitations (we observed notable intra-individual and technical variability in oxytocin immunoreactivity), which motivated the complementary physiological approach.

      (2) Clarifications and analyses.

      The reviewers pointed to several analyses, inferences, and conclusions that should be clarified. We will clarify: (i) the oxytocin histology in Figure 1 (marker definitions and quantification), (ii) the roles of floor versus ambient temperature, and (iii) further elucidate some of the quantitative links among behavioral state, neural activity, and body temperature (e.g., behavior bout duration vs. neural responses and Tb), (iv) the computer vision methodology. These additions will address the reviewers’ requests for clearer inferences and presentation.

      (3) Optogenetic inhibition. 

      We appreciate the suggestion to include an inhibition experiment (Editors, R1, R2). While interesting, this is beyond the scope of the current revision. Our stimulation experiments were designed to functionally test a specific observation from calcium imaging, namely, that PVNOT neurons show bursts of heightened activity at transitions from quiescence to arousal/thermogenesis, and to assess causal sufficiency for thermogenic/arousal-related readouts. We will make this rationale explicit, discuss the scope limits of the current dataset, and note inhibition as an important direction for future work.

    1. eLife Assessment

      This valuable study identifies a brown adipose tissue-specific heat shock factor 1-alcohol dehydrogenase 5 (ADH5) molecular cascade as a regulator of systemic aging, showing that ADH5 deficiency contributes to BAT dysfunction and health decline in aged mice. While there is evidence to support this mechanism, the conclusions remain incomplete, particularly regarding statistical rigor and clarity in data presentation.

    2. Reviewer #1 (Public review):

      Sebag et al. addressed the role of ADH5 in BAT in the development of aging and metabolic disarrangements associated with it. This is a follow-up study after the authors' demonstration of the role of BAT ADH5 in glucose homeostasis, obesity, and cold tolerance. By ablating ADH5 specifically in brown adipocytes or pharmacologically modulating ADH5 through activation of its transcription factor, the authors conclude that preservation of BAT function is crucial for healthy aging and ADH5 is causally involved in this process. The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging. However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution.

    3. Reviewer #2 (Public review):

      Summary:

      This study investigates the role of the enzyme Alcohol Dehydrogenase 5 (ADH5) in brown adipose tissue (BAT) during aging. BAT is crucial for thermogenesis and energy balance, but its function and mass diminish with age, contributing to metabolic dysfunction and age-related diseases. ADH5, also known as S-nitrosoglutathione reductase, regulates nitric oxide (NO) signaling by damaging S-nitrosylation modifications from proteins. The authors show that aging in mice leads to increased protein S-nitrosylation but reduced ADH5 expression in BAT, resulting in impaired metabolic and cognitive functions. Deletion of ADH5 in BAT accelerates tissue senescence and systemic metabolic decline.

      Mechanisticaremoving lly, aging suppresses ADH5 via downregulation of heat shock factor 1 (HSF1), a master regulator of protein homeostasis. Importantly, pharmacologically boosting HSF1 improves BAT function and mitigates both metabolic and cognitive declines in aged mice. The findings highlight a critical HSF1-ADH5 pathway in BAT that protects against aging-related dysfunction, suggesting that targeting this pathway may offer new therapeutic strategies for improving metabolic health and cognition during aging.

      Strengths:

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By identifying a specific enzyme that controls SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age-related decline, the authors have identified a putative mechanism to combat age-related decline in BAT function.

      Weaknesses:

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. The only mention of sex I could find is that the authors reported the general protein SNO status in BAT is increased with age in male C57Bl/6J mice. Is this also true in female mice? For all of the ADH5 knockout mouse data, are these also male mice? Do female ADH5 knockout mice have a consistent phenotype, or are the sex differences?

      (2) It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B).

      (3) For Figure 4D, the ChiP, it would be better to show the IgG control pulldowns. Also, there's an unexpected thing where all the values for the Adh5 flox mice are exactly the same - how is this possible? Finally, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo?

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured. I assume it's HSF1A, and maybe it's the part in the methods with the Metabolomic Analysis, but this wasn't clear. It would also help if release from the NC-Vehicle formulation could be included as a negative control.

      (5) What happens to BAT protein S-nitrosylation in HSF1A-treated mice?

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice?

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?

      (8) Figure 3B looks a bit odd since 7 of the 12 total mice seem to have an IL-beat level of exactly 5. I was a bit unclear about why arbitrary units were used for IL-1β levels since it says an ELISA was used to quantify IL-1β; however, in the methods the authors describe a Bio-Rad Laboratories Bio-plex Pro Mouse Cytokine 23-Plex approach, which I don't think is an ELISA. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels?

      (9) Figure 2C and 2D: I don't really understand why the Heat or VO2 need to be expressed as fold changes. Can't these just be expressed with absolute units? It's also confusing why the heat fold change is 1.0 in the light and the dark for the floxed animal. I bet this is because the knockout is normalized to the floxed animal for light and then normalized again for the dark period, but since both are on the same graph, readers could be confused into thinking there is no difference in the heat production or VO2 between light and dark, which would be surprising. This could all just be solved if absolute units were used.

    4. Author response:

      Reviewer #1 (Public review):

      The topic is appealing given the rise in the aging population and the unclear role of BAT function in this process. Overall, the study uses several techniques, is easy to follow, and addresses several physiological and molecular manifestations of aging.  However, the study lacks an appropriate statistical analysis, which severely affects the conclusions of the work. Therefore, interpretation of the findings is limited and must be done with caution. 

      We greatly appreciate the reviewer’s encouragement. Our team is fully committed to maintaining clarity and rigor in the design, execution, and reporting of this study. We are grateful to the reviewers for bringing these issues to our attention. We also acknowledge and are working on that several statistical analyses could be reperformed to better emphasize our focus on the genetic effect of ADH5 deletion in mice of the same age.

      Reviewer #2 (Public review):

      Strengths: 

      This research provides insight into the interplay between redox biology, proteostasis, and metabolic decline in aging. By identifying a specific enzyme that controls SNO status in BAT and further developing a therapy to target ADH5 in BAT to prevent age-related decline, the authors have identified a putative mechanism to combat age-related decline in BAT function. 

      We greatly appreciate the reviewer’s encouragement. 

      Weaknesses: 

      (1) Sex needs to be considered as a biological variable, at a minimum in the reporting of the phenotypes observed in this manuscript, but also potentially by further experimentation. 

      We thank the reviewer for the insightful remark, and we agree with the reviewer that sex needs to be considered as a biological variable. We will assess ADH5 expression in aged female mice.

      (2)  It would be helpful to know the extent of ADH5 loss in the adipose tissue of knockout mice, either by mRNA or by immunoblotting for ADH5. It could also be helpful to know if ADH5 is deleted from the inguinal adipose tissue of these mice, especially since they seem to accumulate fat mass as they age (Figure 2B). 

      We thank the reviewer for the comment/suggestion. Indeed, we have measured the ADH5 expression in both brown adipose tissue (BAT) and inguinal adipose tissue (iWAT). We regret that we did not include our results in the first submission and will provide these results in the revised manuscript.

      (3)  For Figure 4D, the ChiP, it would be better to show the IgG control pulldowns. Finally, it's not clear how these BAT samples were treated with HSF1A - was this done in vivo or ex vivo? 

      We thank the reviewer for their thoughtful comment and will provide detailed information in the revised manuscript.

      (4) I didn't understand what was on the y-axis in Figure 5A, nor how it was measured.

      We apologize for not making these critical points clearer in the first submission. In the revised manuscript we will include, in detail, the logistics of the experiments in the materials and methods section, figure annotation and figure legends.  

      (5) What happens to BAT protein S-nitrosylation in HSF1A-treated mice? 

      We thank the reviewer for the insightful remark, and we will measure general protein Snitrosylation status in the BAT of HSF1A-treated mice. 

      (6) Figure 1B: What is the age of the positive (ADH5BKO) and negative (Adh5 fl) mice? 

      We regret that we did not describe our results clearly in the first submission and will provide detailed information in the revised manuscript.

      (7) Figure 1F: Can you clarify what I'm looking at in the P16ink4a panels? The red staining? Is the blue staining DAPI? This is also a problem in Figures 3C, 3D and 5G, and 5I. Figure 4B looks great - maybe this could be used as an example?  

      We regret that we did not present results clearly in the first submission and will provide detailed information in the revised manuscript.

      (8) Figure 3B looks a bit odd. Can the approach to measuring IL-1β be clarified, and could the authors explain why they can't show units of mass for IL-1β levels? 

      We will provide detailed information in the revised manuscript.

      (9) Figure 2C and 2D: I don't really understand why the Heat or VO2 need to be expressed as fold changes. Can't these just be expressed with absolute units? 

      We thank the reviewer for the insightful comment. We will present these results as suggested in the revised manuscript.

    1. eLife Assessment

      This modelling study tests several hypotheses describing how seasonality and migration drive the epidemiology of Rift Valley Fever Virus among transhumant cattle in The Gambia. The work is methodologically solid, and findings offer valuable insights into how the movement of cattle in and out of the Gambia River and Sahel ecoregions could lead to source-sink transmission dynamics among cattle subpopulations, sustaining endemic transmission.

    2. Joint Public Review:

      Summary:

      This study uses data from a recent RVFV serosurvey among transhumant cattle in The Gambia to inform the development of an RVFV transmission model. The model incorporates several hypotheses that capture the seasonal nature of both vector-borne RVFV transmission and cattle migration. These natural phenomena are driven by contrasting wet and dry seasons in The Gambia's two main ecoregions and are purported to drive cyclical source-sink transmission dynamics. Although the Sahel is hypothesized to be unsuitable for year-long RVFV transmission, findings suggest that cattle returning from the Gambia River to the Sahel at the beginning of the wet season could drive repeated RVFV introductions and ensuing seasonal outbreaks. The model is also used to evaluate the potential impacts of cattle movement bans on transmission dynamics, although there is doubt about the certainty of these latter findings in light of various simplifying assumptions.

      Strengths:

      Like most infectious diseases in animal systems in low- and middle-income countries, the transmission dynamics of RVFV in cattle in The Gambia are poorly understood. This study harnesses important data on RVFV seroepidemiology to develop and parameterize a novel transmission model, providing plausible estimates of several epidemiological parameters and transmission dynamic patterns.

      This study is well written and easy to follow.

      The authors consider both deterministic and stochastic formulations of their model, demonstrating potential impacts of random events (e.g., extinctions) and providing confidence regarding model robustness.

      The authors use well-established Bayesian estimation techniques for model fitting and confront their transmission model with a seroepidemiological model to assess model fit.

      Elasticity analyses help to understand the relative importance of competing demographic and epidemiological drivers of transmission in this system.

      Weaknesses:

      The model predicts relatively stable annual dynamics reminiscent of a seasonal endemic pathogen, but RVF in sub-Saharan Africa is often characterized as causing periodic epizootics with sustained lulls in between outbreaks. Do the authors believe this conventional wisdom regarding RVF epidemiology is wrong, and that their results better support that transmission patterns are seasonal but truly relatively stable year-over-year, at least in the Gambia? The authors should discuss whether these predicted dynamics could be an artefact of the model's structure, and what ramifications this could have for their conclusions.

      It is unclear how the network analysis is used to inform the model. The network (Figure S2) suggests a highly fragmented population, which could better support, for example, a herd metapopulation approach. The first results section highlights that transhumant movements cover large distances (perhaps to justify the assumption of homogenous mixing within each ecoregion?), but the median (13.5km) is quite short.

      The model does not include an impact of infection on cattle birth rates, but the authors highlight the well-known impacts of RVF epizootics on cattle abortion and neonatal death.

      ODEs for M herds in the dry season are missing from the appendix. Even in the absence of transmission among this subpopulation in this season, demographic turnover should influence its SIR population dynamics. Were these not included in the model or simply omitted from the text?

      The importance of the LVFV positivity decay rate is highlighted, but the loss of immunity is not considered in the SIR model. The authors do discuss uncertainty regarding model structure, but could better justify their choice. Is there evidence of reduced infection risk among previously infected seronegatives, and why was an SIRS model not considered? How might findings be expected to differ under an SIRS model?

      Shouldn't disease-induced host death be included in the serocatalytic model? A high RVF mortality rate has been estimated, and FOI is relatively high, suggesting a non-negligible impact of RVF death on seroprevalence dynamics, and indeed possibly a greater impact than seroreversion.

      It is helpful that the authors have described findings from the previously conducted household survey, which is a key foundation for the model, but it needs to be made clearer what work was already conducted as part of the previous study, in particular the Methods sections RVFV seroprevalence & household survey data and Epidemiological setting & cattle population structure. Same for the sections Study Area and Data Collection in the appendix.

      The study limitations paragraph is vague. What modelling assumptions have introduced the greatest uncertainty, and what implications could this have for study conclusions?

      Two main issues with the simulations of a ban on transhuman movement:

      The introduction rightly highlights the importance of pastoral lifestyles for subsistence farmers in the Gambia. It therefore seems likely that transhumant movement bans would have great socioeconomic and ethical challenges in addition to obvious practical challenges. Is such an intervention even a remote possibility?

      The model's structure, including homogenous mixing within each ecoregion and step-change seasonality, allows for estimation of generalized transmission rates at a macro scale. However, it greatly simplifies the movement process itself and assumes that transhumant cattle movement is the only mechanism for RVF reintroduction into the Sahel region. The model is therefore likely to misrepresent the potential impacts of movement bans on transmission. As studies, for example, in healthcare settings have shown, where fine-scaled contact data are available, incorporating the specific and complex nature of inter-individual contact can change not only the magnitude but the direction of intervention impacts relative to predictions from a model with homogenous mixing assumptions. Conclusions from this work regarding the impacts of movement bans, therefore, seem poorly supported.

      This model seems perhaps better suited to exploring, for example, cattle vaccination, and potential differential efficiency when targeting T herds relative to M or L.

    3. Author response:

      (1) Stable annual dynamics vs. episodic outbreaks

      We agree that RVF is classically described as producing periodic epidemics interspersed with long inter-epidemic periods, often linked to extreme rainfall events. Our model predicts more regular seasonal dynamics, which reflects the endemic transmission patterns we have observed in The Gambia through serological surveys. In the revision, we will:

      Clarify that while epidemics occur in other parts of sub-Saharan Africa, our results may indicate a different epidemiological narrative in The Gambia, with sustained but low-level circulation (hyperendemicity).

      Discuss how model assumptions (e.g. seasonality, homogenous mixing) may bias results toward stable dynamics.

      Highlight the implications of this for interpretation and for public health decision-making.

      (2) Use of network analysis

      We acknowledge the reviewer’s concern. The network analysis was conducted descriptively to characterize cattle movement patterns and the structure of herd connections, but it was not formally incorporated into the model. In revisions we will:

      Clarify this distinction in the manuscript to avoid overinterpretation.

      Emphasize the need for future modelling work using finer-scale movement data, which could support more realistic herd metapopulation dynamics and better capture heterogeneity in transmission.

      (3) RVFV reproductive impacts

      While RVF outbreaks are known to cause abortions and neonatal deaths, these occur during relatively rare epidemics. In the Gambian context, where we’re not observing such large episodic outbreaks but rather low-level circulation, the annual impact of RVF infection on births is likely modest compared to baseline herd turnover. Moreover, cattle demography is partly managed, with replacement and movement buffering birth rates against short-term losses.

      Our model includes birth as a constant demographic process, it’s reasonable to assume stable population since we are not explicitly modelling outbreak-scale reproductive losses. This is consistent with other RVF transmission models that adopt a similar simplifying assumption. However, we will acknowledge this simplification as a limitation in the revised manuscript.

      (4) Missing ODEs for M herds in the dry season

      We thank the reviewer for identifying this omission. The ODEs for M herds in the dry season were not included in the appendix due to an oversight, though demographic turnover was incorporated in the model code. We will add the missing equations to the appendix.

      (5) Role of immunity loss and model structure (SIR vs. SIRS)

      We acknowledge that the decline of detectable antibodies over time (seropositivity decay/seroreversion) is an important consideration in RVFV serology, but whether this reflects true loss of protective immunity after natural infection remains unknown. Biologically, it is plausible that infected cattle develop long-lasting protection, as suggested by studies in humans, but there is an absence of longitudinal field data. From a modelling perspective, our aim was to predict age-seroprevalence curve dependent on FOI estimates and assess its ability to reproduce observed cross-sectional seroprevalence patterns. We therefore adopted a parsimonious SIR framework, treating loss of seropositivity as a potential explanation for the observed age disparity rather than modelling it as loss of immunity. In revisions we will:

      Clarify this rationale, emphasising that there is no direct evidence for waning immunity following natural RVFV infection in cattle, although evidence of seropositivity decay has been suggested in human.

      Further discuss the seropositivity decay rates predicted in our survey and their possible relation to test sensitivity.

      Highlight that while a SIRS structure could generate different long-term dynamics, evaluating this requires stronger evidence for true immunity loss; we consider this an important future modelling direction.

      (6) RVFV induced mortality in serocatalytic model

      We thank the reviewer for this comment. Disease-induced mortality was included in the serocatalytic model through the mortality parameter (γ), but we recognise that this might not have been sufficiently clear in the text. In revisions we will clarify in the Methods and Appendix.

      (7) Clarifying previous vs. current study components

      We will revise the Methods and Appendix to make clearer distinctions between our previous work (e.g. household survey data collection, seroprevalence estimates) and the analyses undertaken for this manuscript (e.g. model development and fitting).

      (8) Limitations paragraph

      We will expand the limitations section to specifically identify the assumptions contributing most to uncertainty. We will then outline how these may bias transmission dynamics and intervention estimates.

      (9) Movement ban simulations & suitability of model for vaccination interventions

      We appreciate the reviewer’s concerns regarding the movement ban simulation. On reassessment, we agree that our model structure might not be ideally suited to exploring them. In the revised manuscript, we will remove this analysis and emphasize how our modelling framework is more suited to exploring cattle vaccination scenarios, including targeting of specific herd types (e.g. T vs. M vs. L). We note that we are currently developing separate work focused on vaccination strategies in cattle, where this model structure might be more directly applicable, and will reserve a deeper investigation of vaccination interventions for that forthcoming publication.

    1. eLife Assessment

      This important study identifies a putative iron and zinc transporter in the plasma membrane of the obligate intracellular pathogen, Toxoplasma gondii. Using an array of different approaches, the authors convincingly demonstrate that this transporter regulates diverse cellular processes, including parasite metabolism and differentiation. This work will be of broad interest to cell biologists and biochemists studying metal ion transport mechanisms.

    2. Reviewer #1 (Public review):

      In this manuscript, Aghabi et al. present a comprehensive characterization of ZFT, a metal transporter located at the plasma membrane of the eukaryotic parasite Toxoplasma gondii. The authors provide convincing evidence that ZFT plays a crucial role in parasite fitness, as demonstrated by the generation of a conditional knockdown mutant cell line, which exhibits a marked impact on mitochondrial respiration, a process dependent on several iron-containing proteins. Consistent with previous reports, the authors also show that disruption of mitochondrial metabolism leads to conversion into the persistent bradyzoite stage. The study then employed advanced techniques, such as inductively coupled plasma-mass spectrometry (ICP-MS) and X-ray fluorescence microscopy (XFM), to demonstrate that ZFT depletion results in reduced parasite-associated metals, particularly iron and zinc. Additionally, the authors show that ZFT expression is modulated by the availability of these metals, although defects in the transporter could not be compensated for by exogenous addition of iron or zinc.

      While the manuscript does not directly investigate the transport function of ZFT through biochemical assays, the authors indirectly support the notion that ZFT can transport zinc by demonstrating its ability to compensate for a lack of zinc transport in a yeast heterologous system. Furthermore, phenotypic analyses suggest defects in iron availability, particularly with regard to Fe-S mitochondrial proteins and mitochondrial function. Overall, the manuscript provides a solid, well-rounded argument for ZFT's role in metal transport, using a combination of complementary approaches. Although direct biochemical evidence for the transporter's substrate specificity and transport activity is lacking, the converging evidence, including changes in metal concentrations upon ZFT depletion, yeast complementation data, and phenotypic changes linked to iron deficiency, presents a convincing case. Some aspects of the results may appear somewhat unbalanced, particularly since iron transport could not be confirmed through heterologous complementation, while zinc-related phenotypes in the parasites have not been thoroughly explored (which is challenging given the limited number of zinc-dependent proteins characterized in Toxoplasma). Nevertheless, given that metal acquisition remains largely uncharacterized in Toxoplasma, this manuscript provides an important first step in identifying a metal transporter in these parasites, and the data presented are generally convincing and insightful.

    3. Reviewer #2 (Public review):

      Summary:

      The intracellular pathogen Toxoplasma gondii scavenges metal ions such as iron and zinc to support its replication; however, mechanistic studies of iron and zinc uptake are limited. This study investigates the function of a putative iron and zinc transporter, ZFT. In this paper, the authors provide evidence that ZFT mediates iron and zinc uptake by examining the regulation of ZFT expression by iron and zinc levels, the impact of altered ZFT expression on iron sensitivity, and the effects of ZFT depletion on intracellular iron and zinc levels in the parasite. The effects of ZFT depletion on parasite growth are also investigated, showing the importance of ZFT function for the parasite.

      Strengths:

      A key strength of the study is the use of multiple complementary approaches to demonstrate that ZFT is involved in iron and zinc uptake. Additionally, the authors build on their finding that loss of ZFT impairs parasite growth by showing that ZFT depletion induces stage conversion and leads to defects in both the apicoplast and mitochondrion.

      Weaknesses:

      (1) Excess zinc was shown not to alter ZFT expression, but a cation chelator (TPEN) did lead to decreased expression. While TPEN is often used to reduce zinc levels, does it have any effect on iron levels? Could the reduction in ZFT after TPEN treatment be due to a reduction in the level of iron or another cation?

      (2) ZFT expression was found to be dynamic depending on the size of the vacuole, based on mean fluorescence intensity measurements. Looking at protein levels by Western blot at different times during infection would strengthen this finding.

      (3) ZFT localization remained at the parasite periphery under low iron conditions. However, in the images shown in Figure S1c, larger vacuoles (containing 4-8 parasites) are shown for the untreated conditions, and single parasite-containing vacuoles are shown for the low iron condition. As ZFT localization is predominantly at the basal end of the parasite in larger PV and at the parasite periphery for smaller vacuoles, it would be better to compare vacuoles of similar size between the untreated and low-iron conditions.

    4. Reviewer #3 (Public review):

      Summary:

      Aghabi et al set out to characterize a T. gondii transmembrane protein with a ZIP domain, termed ZFT. The authors investigate the consequences of ZFT downregulation and overexpression for parasite fitness. Downregulation of ZFT causes defects in the parasite's endosymbiotic organelles, the apicoplast and the mitochondrion. Specifically, lack of ZFT causes a decrease in mitochondrial respiration, consistent with its role as an iron transporter. This impact on the mitochondria appears to trigger partial differentiation to bradyzoites. The authors furthermore demonstrate that expression of TgZFT can rescue a yeast mutant lacking its zinc transporter and perform an array of direct metal ion measurements, including X-ray fluorescence microscopy and inductively coupled mass spectrometry (ICP-MS). These reveal reduced metal ions in parasites depleted in ZFT. Overall, the data by Aghabi et al. reveal that ZFT is a major metal ion transporter in T. gondii, importing iron and zinc for diverse essential processes.

      Strengths:

      This study's strength lies in the thorough characterization of the transporter. The authors combine a number of techniques to measure the impact of ZFT depletion, ranging from the direct measurement of metal ions to determining the consequences for the parasite's metabolism (mitochondrial respiration), as well as performing a yeast mutant complementation. This work is very thorough and clearly presented, leaving little doubt about this protein's function.

      Weaknesses:

      This study offers no major novel insights into the biology of T. gondii. The transporter was already annotated as a zinc transporter (ToxoDB), was deemed essential (PMID: 27594426), and localized to the plasma membrane (PMID: 33053376). This study mostly confirms and validates these previous datasets. The authors identify three other proteins with a ZIT domain. Particularly, the role of TGME49_225530 is intriguing, as it is likely fitness-conferring (score: -2.8, PMID: 27594426) and has no subcellular localization assigned. Characterizing this protein as well, revealing its localization, and identifying if and how these transporters coordinate metal ion transport would have been worthwhile.

      Another weakness is the data related to the impact of ZFT downregulation on the apicoplast in Figure 4. The authors show that downregulation of ZFT causes an increase in elongated apicoplasts (Figure 4d). The subsequent panels seem to show that the parasites present a dramatic growth defect at that time point. This growth arrest can directly explain the elongated apicoplast, but does not allow any conclusion about an impact on the organelle. In any case, an assessment of 'delayed death' as presented in Figure 4c seems futile, since the many other processes affected by zinc and iron depletion likely cause a rapid death, masking any potential delayed death.

    1. eLife Assessment

      In this manuscript, the authors report the fundamental finding that a secreted ubiquitin ligase of Shigella, called IpaH1.4, mediates the degradation of a host defense factor, RNF213. The data are convincing and represent a major contribution to our understanding of cell-autonomous immunity and bacterial pathogenesis as they provide new mechanistic insight into how the cytosolic bacterial pathogen Shigella flexneri evades IFN-induced host immunity.

    2. Reviewer #1 (Public review):

      Shigella flexneri is a bacterial pathogen that is an important globally significant cause of diarrhea. Shigella pathogenesis remains poorly understood. In their manuscript, Saavedra-Sanchez et al report their discovery that a secreted E3 ligase effector of Shigella, called IpaH1.4, mediates the degradation of a host E3 ligase called RNF213. RNF213 was previously described to mediate ubiquitylation of intracellular bacteria, an initial step in their targeting to xenophagosomes. Thus, Shigella IpaH1.4 appears to be an important factor to permit evasion of RNF213-mediated host defense. Strengths: The work is focused, convincing, well-performed and important, and the manuscript is well-written. The revised version addressed all the concerns raised during the initial review.

    3. Reviewer #2 (Public review):

      Summary:

      The authors find that the bacterial pathogen Shigella flexneri uses the T3SS effector IpaH1.4 to induce degradation of the IFNg-induced protein RNF213. They show that in the absence of IpaH1.4, cytosolic Shigella is bound by RNF213. Furthermore, RNF213 conjugates linear and lysine-linked ubiquitin to Shigella independently of LUBAC. Intriguingly, they find that Shigella lacking ipaH1.4 or mxiE, which regulates the expression of some T3SS effectors, are not killed even when ubiquitylated by RNF213 and that these mutants are still able to replicate within the cytosol, suggesting that Shigella encodes additional effectors to escape from host defenses mediated by RNF213-driven ubiquitylation.

      Strengths:

      The authors take a variety of approaches, including host and bacterial genetics, gain-of-function and loss-of-function assays, cell biology, biochemistry, . Overall, the experiments are elegantly designed, rigorous, and convincing.

    4. Reviewer #3 (Public review):

      Summary:

      In this study the authors set out to investigate whether and how Shigella avoids cell autonomous immunity initiated through M1-linked ubiquitin and the immune sensor and E3 ligase RNF213. The key findings are that the Shigella flexneri T3SS effector, IpaH1.4 induces degradation of RNF213. Without IpaH1.4, the bacteria are marked with RNF213 and ubiquitin following stimulation with IFNg. Interestingly, this is not sufficient to initiate the destruction of the bacteria, leading the authors to conclude that Shigella deploys additional virulence factors to avoid this host immune response. The second key finding of this study is that M1 chains decorate the mxiE/ipaH Shigella mutant independent of LUBAC, which is by and large, considered the only enzyme capable of generating M1-linked ubiquitin chains. These findings are fundamental in nature and of general interest.

      Strengths and weaknesses:

      The data is well-controlled and clearly presented with appropriate methodology. The authors provide compelling evidence that demonstrates that IpaH1.4 is the effector responsible for the degradation of RNF213 via the proteasome and their conclusions are well supported. They have clearly demonstrated how Shigella disarms RNF213-mediated immunity.

      This work builds on prior work from the same laboratory that suggests that M1 ubiquitin chains can be formed independently of LUBAC (in the prior publication this related to Chlamydia inclusions). Two key pieces of evidence support this statement - fluorescence microscopy-based images and accompanying quantification in Hoip and Hoil knockout cells for association of M1-ub, using an M1 specific antibody, and the use of an internally tagged Ub-K7R mutant. Whilst it remains possible that the M1 antibody is non-specific, as acknowledged by the authors, the data in supplementary figure 1, comparing K7R-ub and the N-terminally tagged K7R ub variant, provides evidence that during Shigella infection, LUBAC independent M1-ubiquitin chains are indeed formed. This represents an important new angle in ubiquitin biology.

      The importance of IFNgamma priming for RNF213 association to the mxiE or ipaH1.4 remains an interesting question that awaits future studies that compare different intracellular bacteria and the role of RNF213.

      Overall, the findings are important for the host-pathogen field, cell autonomous/innate immune signaling fields and microbial pathogenesis fields and the work is a very valuable addition to the recent advances in understanding the role of RNF213 in host immune responses to bacteria.

    5. Author response:

      The following is the authors’ response to the original reviews

      Reviewer #1 (Public review):

      Shigella flexneri is a bacterial pathogen that is an important globally significant cause of diarrhea. Shigella pathogenesis remains poorly understood. In their manuscript, Saavedra-Sanchez et al report their discovery that a secreted E3 ligase effector of Shigella, called IpaH1.4, mediates the degradation of a host E3 ligase called RNF213. RNF213 was previously described to mediate ubiquitylation of intracellular bacteria, an initial step in their targeting of xenophagosomes. Thus, Shigella IpaH1.4 appears to be an important factor in permitting evasion of RNF213-mediated host defense.

      Strengths:

      The work is focused, convincing, well-performed, and important. The manuscript is well-written.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of the novelty and importance of our study. We provide a comprehensive response to each of the reviewer’s specific recommendations below and highlight any changes made to the manuscript in response to those recommendations.

      Reviewer #1 (Recommendations for the authors):

      (1) In the abstract (and similarly on p.10), the authors claim to have shown "IpaH1.4 protein as a direct inhibitor of mammalian RNF213". However, they do not show the interaction is direct. This, in my opinion, would require demonstrating an interaction between purified recombinant proteins. I presume that the authors are relying on their UBAIT data to support the direct interaction, but this is a fairly artificial scenario that might be prone to indirect substrates. I would therefore prefer that the 'direct' statement be modified (or better supported with additional data). Similarly, on p.7, the section heading states "S. flexneri virulence factors IpaH1.4 and IpaH2.5 are sufficient to induce RNF213 degradation". The corresponding experiment is to show sufficiency in a 293T cell, but this leaves open the participation of additional 293T-expressed factors. So I would remove "are sufficient to", or alternatively add "...in 293T cells".

      We agree with the reviewer and made the recommended changes to the text in the abstract, in the results section on page 7, and in the Discussion on page 11. During the revision of our manuscript two additional studies were published that provide convincing biochemical evidence for the direct interaction between IpaH1.4 and RNF213 (PMID: 40205224; PMID: 40164614). These studies address the reviewer’s concern extensively and are now briefly discussed and cited in our revised MS.

      (2) In the abstract the authors state "Linear (M1-) and lysine-linked ubiquitin is conjugated to bacteria by RNF213 independent of the linear ubiquitin chain assembly complex (LUBAC)." However, it is not shown that RNF213 is able to directly perform M1-ubiquitylation. It is shown that RNF213 is required for M1-linked ubiquitylation in IpaH1.4 or MxiE mutants, this is different than showing conjugation is done by RNF213 itself. This should be reworded.

      We agree and edited the text accordingly

      (3) Introduction: one of the main points of the paper is that RNF213 conjugates linear ubiquitin to the surface of bacteria in a manner independent of the previously characterized linear ubiquitin conjugation (LUBAC) complex. This is indeed an interesting result, but the introduction does not put this discovery in much context. I would suggest adding some discussion of what was known, if anything, about the type of Ub chain formed by RNF213, and specifically whether linear Ub had previously been observed or not.

      We now provide context in the Introduction on page 3 and briefly discuss previous work that had implicated LUBAC in the ubiquitylation of cytosolic bacteria. We emphasize that LUBAC specifically generates linear (M1-linked) ubiquitin chains, while the types of ubiquitin linkages deposited on bacteria through RNF213-dependent pathways had remained unidentified.

      (4) Figure 3C: is the difference in 7KR-Ub between WT and HOIP KO cells significant? If so, the authors may wish to acknowledge the possibility that HOIP partially contributes to M1-Ub of MxiE mutant Shigella

      The frequencies at which bacteria are decorated with 7KR-Ub is not statistically different between WT and HOIP KO cells. We have included this information in the panel description of Figure 3.

      (5) On page 11, the authors state that "...we observed that LUBAC is dispensable for M1-linked ubiquitylation of cytosolic S. flexneri ∆ipaH1.4. We found that lysine-less internally tagged ubiquitin or an M1-specific antibody bound to S. flexneri ∆ipaH1.4 in cells lacking LUBAC (HOIL-1KO or HOIPKO) but failed to bind bacteria in RNF213-deficient cells". In fact, what is shown is that M1-ubiquitylation in ∆ipaH1.4 infection is RNF213-dependent (5E), but the work with lysine mutants, HOIP or HOIL-1 KOs are all with ∆mxiE, not ∆ipaH1.4 (3B) in this version of the manuscript. Ideally, the data with ∆ipaH1.4 could be added, but alternatively, the conclusion could be re-worded.

      We now include the data demonstrating that staining of ∆ipaH1.4 with an M1-specific antibody is unchanged from WT cells in HOIL-1 KO and HOIP KO cells. These data are shown in supplementary data (Fig. S3E) and referred to on page 9 of the revised manuscript.

      (6) The UBAIT experiment should be explained in a bit more detail in the text. The approach is not necessarily familiar to all readers, and the rationale for using Salmonella-infected ceca/colons is not well explained (and seems odd). Some appropriate caution about interpreting these data might also be welcome. Did HOIP or HOIL show up in the UBAIT? This perhaps also deserves some discussion.

      As expected, HOIP (listed under its official gene name Rnf31 in the table of Fig.S2B) was identified as a candidate IpaH1.4 interaction partner as the third most abundant hit from the UBAIT screen. Remarkably, Rnf213 was the hit with the highest abundance in the IpaH1.4 UBAIT screen. To address the reviewer’s comments, we now explain the UBAIT approach in more detail and provide the rational for using intestinal protein lysates from Salmonella infected mice. The text on page 8 reads as follows: “To investigate potential physical interactions between IpaH1.4 and IpaH2.5, we reanalyzed a previously generated dataset that employed a method known as ubiquitin-activated interaction traps (UBAITs) (32). As shown in Fig. S2A, the human ubiquitin gene was fused to the 3′ end of IpaH2.5, producing a C-terminal IpaH2.5-ubiquitin fusion protein. When incubated with ATP, ubiquitin-activating enzyme E1, and ubiquitin-conjugating enzyme E2, the IpaH2.5-ubiquitin "bait" protein is capable of binding to and ubiquitylating target substrates. This ubiquitylation creates an iso-peptide bond between the IpaH2.5 bait and its substrate, thereby enabling purification via a Strep affinity tag incorporated into the fusion construct (32). IpaH2.5-ubiquitin bait and IpaH3-ubiquitin control proteins were incubated with lysates from murine intestinal tissue. To detect interaction partners in a physiologically relevant setting, we used intestinal lysates derived from mice infected with Salmonella, which in contrast to Shigella causes pronounced inflammation in WT mice and therefore better simulates human Shigellosis in an animal model. Using UBAIT we identified HOIP (Rnf31) as a likely IpaH2.5 binding partner (Fig. S2B), thus confirming previous observations (28) and validating the effectiveness our approach. Strikingly, we identified mouse Rnf213 as the most abundant interaction partner of the IpaH2.5-ubiquitin bait protein (Fig. S2B). Collectively, our data and concurrent reports showing direct interactions between IpaH1.4 and human RNF213 (36, 37) indicate that the virulence factors IpaH1.4 and IpaH2.5 directly bind and degrade mouse as well as human RNF213.”

      (7) It would be helpful if the authors discussed their results in the context of the prior work showing IpaH1.4/2.5 mediate the degradation of HOIP. Do the authors see HOIP degradation? If indeed HOIP and RNF213 are both degraded by IpaH1.4 and IpaH2.5, are there conserved domains between RNF213 and HOIP being targeted? Or is only one the direct target? A HOIP-RNF213 interaction has previously been shown (https://doi.org/10.1038/s41467-024-47289-2). Since they interact, is it possible one is degraded indirectly? To help clarify this, a simple experiment would be to test if RNF213 degraded in HOIP KO cells (or vice-versa)?

      We appreciate the reviewer’s suggestions. We conducted the proposed experiments and found that WT S. flexneri infections result in RNF213 degradation in both WT and HOIP KO cells. Similarly, we found that HOIP degradation was independent of RNF213. We have included these data in Figs. 5A and S3B of our revised submission. A study published during revisions of our paper demonstrates that the LRR of IpaH1.4 binds to the RING domains of both RNF213 and LUBAC (PMID: 40205224). We refer to this work in our revised manuscript.

      Reviewer #2 (Public review):

      Summary:

      The authors find that the bacterial pathogen Shigella flexneri uses the T3SS effector IpaH1.4 to induce degradation of the IFNg-induced protein RNF213. They show that in the absence of IpaH1.4, cytosolic Shigella is bound by RNF213. Furthermore, RNF213 conjugates linear and lysine-linked ubiquitin to Shigella independently of LUBAC. Intriguingly, they find that Shigella lacking ipaH1.4 or mxiE, which regulates the expression of some T3SS effectors, are not killed even when ubiquitylated by RNF213 and that these mutants are still able to replicate within the cytosol, suggesting that Shigella encodes additional effectors to escape from host defenses mediated by RNF213-driven ubiquitylation.

      Strengths:

      The authors take a variety of approaches, including host and bacterial genetics, gain-of-function and loss-of-function assays, cell biology, and biochemistry. Overall, the experiments are elegantly designed, rigorous, and convincing.

      Weaknesses:

      The authors find that ipaH1.4 mutant S. flexneri no longer degrades RNF213 and recruits RNF213 to the bacterial surface. The authors should perform genetic complementation of this mutant with WT ipaH1.4 and the catalytically inactive ipaH1.4 to confirm that ipaH1.4 catalytic activity is indeed responsible for the observed phenotype.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of our work, especially its scientific rigor. We conducted the experiment suggested by the reviewer and included the new data in the revised manuscript. As expected, complementation of the ∆ipaH1.4 with WT IpaH1.4 but not with the catalytically dead C338S mutant restored the ability of Shigella to efficiently escape from recognition by RNF213 (Figs. 5C-D).

      Reviewer #2 (Recommendations for the authors):

      The authors should perform genetic complementation of the ipaH1.4 mutant with WT ipaH1.4 and the catalytically inactive ipaH1.4 to confirm that ipaH1.4 catalytic activity is indeed responsible for the observed phenotype.

      We performed the suggested experiment and show in Figs. 5C-D that complementation of the ∆ipaH1.4 mutant with WT IpaH1.4 but not with the catalytically dead C338S mutant restored the ability of Shigella to efficiently escape from recognition by RNF213. These data demonstrate that the catalytic activity of IpaH1.4 is required for evasion of RNF213 binding to the bacteria.

      Reviewer #3 (Public review):

      Summary:

      In this study, the authors set out to investigate whether and how Shigella avoids cell-autonomous immunity initiated through M1-linked ubiquitin and the immune sensor and E3 ligase RNF213. The key findings are that the Shigella flexneri T3SS effector, IpaH1.4 induces degradation of RNF213. Without IpaH1.4, the bacteria are marked with RNF213 and ubiquitin following stimulation with IFNg. Interestingly, this is not sufficient to initiate the destruction of the bacteria, leading the authors to conclude that Shigella deploys additional virulence factors to avoid this host immune response. The second key finding of this paper is the suggestion that M1 chains decorate the mxiE/ipaH Shigella mutant independent of LUBAC, which is, by and large, considered the only enzyme capable of generating M1-linked ubiquitin chains.

      Strengths:

      The data is for the most part well controlled and clearly presented with appropriate methodology. The authors convincingly demonstrate that IpaH1.4 is the effector responsible for the degradation of RNF213 via the proteasome, although the site of modification is not identified.

      Weaknesses:

      (1)The work builds on prior work from the same laboratory that suggests that M1 ubiquitin chains can be formed independently of LUBAC (in the prior publication this related to Chlamydia inclusions). In this study, two pieces of evidence support this statement -fluorescence microscopy-based images and accompanying quantification in Hoip and Hoil knockout cells for association of M1-ub, using an antibody, to Shigella mutants and the use of an internally tagged Ub-K7R mutant, which is unable to be incorporated into ubiquitin chains via its lysine residues. Given that clones of the M1-specific antibody are not always specific for M1 chains, and because it remains formally possible that the Int-K7R Ub can be added to the end of the chain as a chain terminator or as mono-ub, the authors should strengthen these findings relating to the claim that another E3 ligase can generate M1 chains de novo.

      (2) The main weakness relating to the infection work is that no bacterial protein loading control is assayed in the western blots of infected cells, leaving the reader unable to determine if changes in RNF213 protein levels are the result of the absent bacterial protein (e.g. IpaH1.4) or altered infection levels.

      (3)The importance of IFNgamma priming for RNF213 association to the mxiE or ipaH1.4 strain could have been investigated further as it is unclear if RNF213 coating is enhanced due to increased protein expression of RNF213 or another factor. This is of interest as IFNgamma priming does not seem to be needed for RNF213 to detect and coat cytosolic Salmonella.<br /> Overall, the findings are important for the host-pathogen field, cell-autonomous/innate immune signaling fields, and microbial pathogenesis fields. If further evidence for LUBAC independent M1 ubiquitylation is achieved this would represent a significant finding.

      We would like to thank the reviewer for their time evaluating our manuscript and the positive assessment of our work and its significance. We provide a comprehensive response to the main three critiques listed under ‘weaknesses’ and also have responded to each of the reviewer’s specific recommendations below. We highlight any changes made to the manuscript in response to those recommendations.

      (1) As the reviewer correctly pointed out, 7KR ubiquitin cannot only be used for linear ubiquitylation but can also function as a donor ubiquitin and can be attached as mono-ubiquitin to a substrate or to an existing ubiquitin chain as a chain terminator. To distinguish between 7KR INT-Ub signals originating from linear versus mono-ubiquitylation, we followed the reviewer’s advice and generated a N-terminally tagged 7KR INT-Ub variant. The N-terminal tag prevents linear ubiquitylation but still allows 7KR INT-Ub to be attached as a mono-ubiquitin. We found that the addition of this N-terminal tag significantly reduced but not completely abolished the number of Δ_mxiE_ bacteria decorated with 7KR INT-Ub. These data are shown in a new Fig. S1 and indicate that 7KR lacking the N-terminal tag is attached to bacteria both in the form of linear (M1-linked) ubiquitin and as donor ubiquitin, possibly as a chain terminator. While we cannot rule out that the anti-M1 antibodies used here cross-react with other ubiquitin linkages, we reason that the 7KR data strongly argues that linear ubiquitin is part of the ubiquitin coat encasing IpaH1.4-deficient cytosolic Shigella. Collectively, our data show that both linear and lysine-linked (especially K27 and K63) ubiquitin chains are part of the RNF213-dependent ubiquitin coat on the surface of IpaH1.4 mutants. And furthermore, our data strongly indicate that this ubiquitylation of IpaH1.4 mutants is independent of LUBAC.

      (2) We used GFP-expressing strains of S. flexneri for our infection studies and were therefore able to use GFP expression as a loading control. We have incorporated these data into our revised figures. These new data (Figs. 4A, 5A, and S3B) show that bacterial infection levels were comparable between WT and mutant infections and that therefore the degradation of RNF213 (or HOIP – see new data in Fig. S3B) is not due to differences in infection efficiency.

      (3) We agree with the reviewer that the mechanism by which RNF213 binds to bacteria is an important unanswered question. Similarly, whether other ISGs have auxiliary functions in this process or whether binding efficiencies vary between different bacterial species are important questions in the field. However, these questions go far beyond the scope of this study and were therefore not addressed in our revisions.

      Reviewer #3 (Recommendations for the authors):

      (1) An N-terminally tagged K7R-ub should be used as a control to test whether the signal found around the mutant shigella is being added via the N terminal Met into chains. As it is known that certain batches of the M1-specific antibodies are in fact not specific and able to detect other chain types, the authors should test the specificity of the antibody used in this study (eg against different di-Ub linkage types) and include this data in the manuscript.

      We agree with the reviewer in principle. The anti-linear ubiquitin (anti-M1) monoclonal antibody, clone 1E3, prominently used in this study was tested by the manufacturer (Sigma) by Western blotting analysis and according to the manufacturer “this antibody detected ubiquitin in linear Ub, but not Ub K11, Ub K48, Ub K63.” However, this analysis did not include all possible Ub linkage types and thus the reviewer is correct that the anti-M1 antibody could theoretically also detect some other linkage types. To address this concern, we added new data during revisions demonstrating that 7KR INT-Ub targeting to S. flexneri is largely dependent on the N-terminus (M1) of ubiquitin. Our combined observations therefore overwhelmingly support the conclusion that linear (M1-linked) as well as K-linked ubiquitin is being attached to the surface of IpH1.4 S. flexneri bacteria in an RNF213-dependent and LUBAC-independent manner.

      (2) The M1 signal detected on bacteria with the antibody is still present in either Hoip or Hoil KO’s but due to the potential non-specificity of the antibody, the authors should test whether K7R ub is detected on bacteria in the Hoil ko (in addition to Hoip KO). This would strengthen the authors’ data on LUBAC-independent M1 and is important because Hoil can catalyse non-canonical ubiquitylation.

      The specific linear ubiquitin-ligating activity of LUBAC is enacted by HOIP. We show that linear ubiquitylation of susceptible S. flexneri mutants as assessed by anti-M1 ubiquitin staining or 7KR INT-Ub recruitment occurs in HOIPKO cells at WT levels (Figs. 3B, 3C, S3E [new data]). In our view , these data unequivocally show that the observed linear ubiquitylation of cytosolic S. flexneri ipaH1.4 and mxiE mutants is independent of LUBAC.

      (3) For Figure 4A, do mxiE bacteria show similar invasion - authors should include a bacterial protein control to show levels of bacteria in WT and mxiE infected conditions. A similar control should be included in Figure 5A.

      We used GFP-expressing strains of S. flexneri for our infection studies and were therefore able to use GFP expression as a loading control. We have incorporated these data into our revised figures. These new data (Figs. 4A, 5A, and S3B) show that bacterial infection levels were comparable between WT and mutant infections and that therefore the degradation of RNF213 (or HOIP – see new data in Fig. S3B) is not due to differences in infection efficiency.

      (4) Can the authors speculate why IFNg priming is needed for the coating of Shigella mxiE mutant but not in the case of Salmonella or Burkholderia? Is this just amounts of RNF213 or something else?

      In our studies we did not directly compare ubiquitylation rates of cytosolic Shigella, Burkholderia, and Salmonella bacteria with each other under the same experimental conditions. However, such a direct comparison is needed to determine whether IFNgamma priming is required for RNF213-dependent bacterial ubiquitylation of some but not other pathogens. Two papers published during the revisions of our manuscript (PMID: 40164614, PMID: 40205224) reports robust RNF213 targeting to IpaH1.4 Shigella mutants in unprimed cells HeLa cells (whereas we used A549 and HT29 cells). Therefore, differences in reagents, cell lines, and/or other experimental conditions may determine whether IFNgamma priming is necessary to observe substantial RNF213 translocation to cytosolic bacteria.

      (5) Typos - there are several, but this is hard to annotate with line numbers so the authors should proofread again carefully.

      We proofread the manuscript and corrected the small number of typos we identified

    1. eLife Assessment

      This study presents important methodologies for repeated brain ultrasound localization microscopy (ULM) in awake mice and a set of results indicating that wakefulness reduces vascularity and blood flow velocity. The data supporting these findings are solid. This study is relevant for scientists investigating vascular physiology in the brain.

    2. Reviewer #1 (Public review):

      Summary:

      Wang and Colleagues present a study aimed at demonstrating the feasibility of repeated ultrasound localization microscopy (ULM) recording sessions on mice chronically implanted with a cranial window transparent to US. They provided quantitative information on their protocol, such as the required number of Contrast enhancing microbubbles (MBs) to get a clear image of the vasculature of a brain coronal section. Also, they quantified the co-registration quality over time-distant sessions and the vasodilator effect of isoflurane.

      Strengths:

      Strengths:

      The study showed a remarkable performance in recording precisely the same brain coronal section over repeated imaging sessions. In addition, it sheds light on the vasodilator effect of isoflurane (an anesthetic whose effects are not fully understood) on the different brain vasculature compartments, although, as the Authors stated, some insights in this aspect have already been published with other imaging techniques. The experimental setting and protocol are very well described.

      In this newly revised version, the Authors made evident efforts to strengthen the messages of their study. All the limitations of their research have been clearly acknowledged.

      A central issue remains. To answer my concerns about the need for multivariate analyses, the Author stated that: "Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies." Although this sentence does not convince me, if the purpose of this study was to showcase the potentialities of ULM for future longitudinal awake studies, why don't they avoid any statistics? The trend for decreased vein size and increased arterial blood flow during wakefulness is evident from the plot and physiologically plausible. Why impose wrong statistics instead of dropping them altogether? I do not see the lack of statistics as detrimental to this study, based on the feedback received from the Authors.

    3. Reviewer #2 (Public review):

      Summary:

      The authors present a very interesting collection of methods and results using brain ultrasound localization microscopy (ULM) in awake mice. They emphasize the effect of the level of anesthesia on the quantifiable elements assessable with this technique (i.e. vessel diameter, flow speed, in veins and arteries, area perfused, in capillaries) and demonstrate the possibility of achieving longitudinal cerebrovascular assessment in one animal during several weeks with their protocol.

      The authors made a good rewriting the article based on the reviewers' comments. One of the message of the first version of the manuscript was that variability in measurements (vessel diameter, flow velocity, vascularity) were much more pronounced under changes of anesthesia than when considering longitudinal imaging across several weeks. This message is now not quite mitigated, as longitudinal imaging seems to show a certain variability close to the order of magnitude observed under anesthesia. In that sense, the review process was useful in avoiding hasty conclusion and calls for further caution in ULM awake longitudinal imaging, in particular regarding precision of positioning and cancellation of tissue motion.

      Strengths:

      Even if the methods elements considered separately are not new (brain ULM in rodents, setup for longitudinal awake imaging similar to those used in fUS imaging, quantification of vessel diameters/bubble flow/vessel area), when masterfully combined as it is done in this paper, they answer two questions that have been long-running in the community: what is the impact of anesthesia on the parameters measured by ULM (and indirectly in fUS and other techniques)? Is it possible to achieve ULM in awake rodents for longitudinal imaging? The manuscript is well constructed, well written, and graphics are appealing.

      The manuscript has been much strengthened by the round of review, with more animals for the longitudinal imaging study.

      Weaknesses:

      The manuscript has been only marginally modified since our last round of review, so there is probably not much we reviewers can additionally elaborate to improve it. Therefore my last concerns about the reliability of longitudinal quantifications and on certain discrepancies remains for this paper. As a general piece of advice, I would just say that every claim (' is higher', is lower', is stable') should be supported by evidence and statistical testing if it is not already the case.

      Response 06: the authors' response is not satisfactory. Even if the difference in terms of ROI boundaries between fig 4e and fig 4j has been underlined by the authors, they only provide a wordy comment and no additional quantitative analysis that could explain the discrepancy I pointed out. By doing so they take the risk of making misinterpretations. The reader is left with a discrepancy that could be explained by 2 mechanisms: -pial vessel population behave differently from penetrating arterioles and venules OR - the imaging of pial vessels with ULM is not good enough to enable proper quantification because the vessels are not clearly visible (out of plane extent). In any case Figure 4j does not "provides a more comprehensive representation of cortical vasculature" as stated. If the changes in pial vessels cannot be reliably measured, they should be excluded from the ROI.

      Line 161: be careful with the use of vessel density, as pointed by reviewer 1.

      Line 196: "the decrease in venous vessel area (averaging 55% across mice) was greater than that of arterial (averaging 35%)" no stat test has been performed.

    4. Author response:

      The following is the authors’ response to the previous reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Wang and Colleagues present a study aimed at demonstrating the feasibility of repeated ultrasound localization microscopy (ULM) recording sessions on mice chronically implanted with a cranial window transparent to US. They provided quantitative information on their protocol, such as the required number of Contrast enhancing microbubbles (MBs) to get a clear image of the vasculature of a brain coronal section. Also, they quantified the co-registration quality over time-distant sessions and the vasodilator effect of isoflurane.

      Strengths:

      The study showed a remarkable performance in recording precisely the same brain coronal section over repeated imaging sessions. In addition, it sheds light on the vasodilator effect of isoflurane (an anesthetic whose effects are not fully understood) on the different brain vasculature compartments, although, as the Authors stated, some insights in this aspect have already been published with other imaging techniques. The experimental setting and protocol are very well described.

      Wang and co-authors submitted a revised version of their study, which shows improvements in the clarity of the data description.

      However, the flaws and limitations of this study are substantially unchanged.

      The main issues are:

      Statistics are still inadequate. The TOST test proposed in this revised version is not equivalent to an ANOVA. Indeed, multivariate analyses should be the most appropriate, given that some quantifications were probably made on multiple vessels from different mice. The 3 reviewers mentioned the flaws in statistics as the primary concern.

      Response 01: We thank the reviewer for raising this important point. We fully acknowledge the limitations of our current statistical analysis. We would like to clarify that the TOST procedure was applied exclusively to the measurements taken from the same vessel segment in the same animal across different time points, with the purpose of evaluating the consistency of vessel diameter measurements. We recognize that the statistical analysis in this study remains limited, which we have acknowledged as a key limitation in the manuscript. This constraint arises primarily from the limited number of animals, and our analysis should be interpreted as a representative case study rather than a generalized statistical conclusion. We have revised the manuscript to clarify these points and to more explicitly acknowledge the statistical limitations.

      (Line 329) “Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      No new data has been added, such as testing other anesthetics.

      Response 02: We acknowledge that the current study does not include data involving other anesthetics, and we have also discussed this point in our initial response. In fact, we did attempt to use other anesthetics such as ketamine. However, we found it difficult to draw reliable conclusions due to experimental limitations such as variable anesthesia recovery profiles and injection timing, as elaborated in the following paragraphs. Therefore, we decided not to include these data in the current study to avoid potential misinterpretation.

      One major limitation of our experimental setup is that imaging in the awake state is necessarily conducted after a brief period of isoflurane-anesthesia. This brief anesthesia allows for the intravenous injection of microbubbles via the tail vein. Isoflurane is particularly suited for this purpose due to its rapid onset and offset. Mice can recover quickly once the gas is withdrawn, which enables relatively consistent post-anesthesia imaging in the awake state.

      In contrast, other anesthetic agents present challenges. Their recovery profiles are slower, more variable, and less controllable. Reversal drugs can be administered to awaken the animals, but they add another variability. These may lead to greater fluctuations in cerebral hemodynamics and factors introduce uncertainty in the timing of bolus microbubble injection. As such, our current setup is not ideal for systematically comparing different anesthetics and could yield misleading results.

      A more appropriate strategy for comparing awake ULM imaging with different anesthetics would be performing awake imaging first, followed by imaging under anesthesia. This would ensure that the awake condition is free from residual anesthetic effects. However, this method raises higher requirement in bubble delivery, as no anesthesia can be used for the intravenous injection.

      To address this, we are actively exploring another solution using indwelling jugular vein catheterization. By surgically implanting a catheter into the jugular vein prior to imaging, we can establish a stable and reproducible route for microbubble delivery in fully awake animals without any anesthesia induction. This method has the potential to enable direct and reliable comparisons across different physiological states. However, the implementation of this technique and the associated experimental findings go beyond the scope of the current study and will be presented in a future manuscript.

      In the present work, we have emphasized the methodological limitations of our approach and clarified that our primary goal is to highlight the necessity and feasibility of awake-state ULM imaging. The focus is not to comprehensively characterize the effects of different anesthetic agents on microvascular brain flow. We appreciate your understanding and interest in this important future direction. 

      Based the responses and previous revision, we have further refined the discussion of the relevant limitations:

      (Line 324) “Although isoflurane is widely used in ultrasound imaging because it provides long-lasting and stable anesthetic effects, it is important to note that the vasodilation observed with isoflurane is not representative of all anesthetics. Some anesthesia protocols, such as ketamine combined with medetomidine, do not produce significant vasodilation and are therefore preferred in experiments where vascular stability is essential, such as functional ultrasound imaging. Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      (Line 347) “Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions and the short imaging window available after bolus injection. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. In addition, since microbubbles are rapidly cleared from circulation, the duration of effective imaging is limited to only a few minutes, which also overlaps with the anesthesia recovery period, constraining the usable awake-state imaging window. Future improvement on microbubble infusion using an indwelling jugular vein catheter presents a promising alternative to address these limitations. This method allows for stable microbubble infusion without the need for anesthesia induction, ensuring that the awake imaging condition is free from residual anesthetic effects. Moreover, it has the potential to extend the duration of imaging sessions, offering a longer and more stable time window for data acquisition. Furthermore, by performing ULM imaging in the awake state first, instead of starting with anesthetized imaging, researchers can achieve a more rigorous comparison of how various anesthetics influence cerebral microvascular dynamics relative to the awake baseline.”

      The Authors still insist on using the term Vascularity which they define as: 'proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal.'. Why not use apparent cerebral blood volume or just CBV? Introducing an unnecessary and redundant term is not scientifically acceptable. In this revised version, vascularity is also used to indicate a higher vascular density (Line 275), which does not make sense: blood vessels do not generate from the isoflurane to the awake condition in a few minutes. Rev2 also raised this point.

      Response 03: Thank you for revisiting this important point. We acknowledge that the term vascularity is difficult to interpret for readers, and we also recognize that we did not sufficiently justify its use in the earlier version.

      Based on your suggestion, we have now replaced all instances of “vascularity” with “fractional vessel area”. While the underlying definition remains the same, fractional vessel area offers a more intuitive description. The term “fractional” denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes, such as Figures 4i–k to evaluate various brain regions. We would also like to clarify that this was not introduced as an unnecessary or redundant term, but rather as a more suitable metric for longitudinal ULM analysis. We did consider using apparent cerebral blood volume (CBV), estimated from microbubble counts. However, we found that it was less robust and meaningful in the context of longitudinal ULM comparisons. Below we provide further justification for using the vessel area instead:

      (1) Using the vessel area is more robust:

      In longitudinal ULM comparisons, normalization across time points is essential to enable fair and meaningful comparisons. In our study, we normalized the data based on a cumulative 5 million microbubbles (e.g., Fig. 2). Other normalization strategies could also be adopted, as long as the resulting vascular maps reach a sufficiently saturated state. However, even with normalization, it remains important to use a quantitative metric that is minimally biased and invariant to experimental fluctuations across time points. Vessel area, derived from binarized vessel maps, is less sensitive to variations in acquisition time and microbubble concentration. This is because repeated microbubble trajectories through the same location are not counted multiple times. In contrast, apparent CBV, calculated from the microbubble counts, is more susceptible to different concentration conditions. Since repeated detections in the same location accumulate, the metric can be dependent on injection efficiency and imaging duration. While CBV may still be valid under well-controlled, steady-state conditions, we found the vessel area to be a more robust and reliable metric for longitudinal analysis under our current bolus-injection protocol.

      (2) Using the vessel area is more meaningful:

      Compared to CBV, the vessel area provides a more direct representation of structural characteristics such as vessel diameter. Anesthesia-induced vasodilation leads to an increase in vessel diameter. Although local diameter changes can be assessed by manually selecting vessel segments, this approach is labor-intensive and prone to selection bias. To enable a more comprehensive and objective assessment of such morphological changes, fractional vessel area provides a more informative alternative to CBV, as it captures diameter-related variations at a global or regional scale, and avoids potential biases associated with manually selecting specific vessels or regions.

      In response to: vascularity is also used to indicate a higher vascular density (Line 275), which does not make sense: blood vessels do not generate from the isoflurane to the awake condition in a few minutes.

      We agree that blood vessels cannot be generated in a few minutes. Vascularity (now fractional vessel area) should be interpreted as apparent vessel density, which reflects a probabilistic estimate of vessel density based on the detectable microbubble. 

      Both apparent vessel density and apparent CBV are indirect, sampling-based approximations of vascular features, and both are fundamentally limited by microbubble detection sensitivity. Low microbubble concentrations lead to underestimation of both CBV and vessel area. A change from zero to non-zero in these metrics does not imply the physical appearance or disappearance of vessels, but rather reflects a change in the likelihood of detecting flow in each region.

      In summary, while neither fractional vessel area (vascularity in previous versions) nor apparent CBV is a perfect metric due to the inherent limitations of ULM, we believe the vessel area provides a more robust and meaningful parameter for our longitudinal comparisons. We have revised the main text to include this explanation and acknowledge the limitations and interpretation of fractional vessel area more explicitly.

      Revision in Results:

      (Line 181) “To validate the broader applicability of our findings, we conducted ROI-based analyses using fractional vessel area and mean velocity as primary metrics. These metrics extended the analysis of vessel diameter and flow velocity to entire brain regions or selected ROIs, which provides a more objective assessment of cerebral blood flow changes at a global scale and reduces the bias associated with manually selecting vessel segments. For vessel area measurements, the term fractional denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes.”

      Revision in Methods: definition of vascularity

      (Line 571) “In ROI-based analysis, we focused on two primary parameters: fractional vessel area and mean velocity. Fractional vessel area was defined as the proportion of the pixel count occupied by blood vessels within each ROI, obtained by binarizing the ULM vessel density maps and calculating the percentage of the pixels with MB signal. Mean velocity was calculated by averaging all non-zero pixel of velocity estimates within the ROI. The velocity distribution within each ROI was also visualized using violin plots, as shown in Fig. 2, 4 and 6, to illustrate the range and density of flow velocity estimates across different acquisition. In this study, we focused on these two metrics because they represent the most straightforward extension of single-vessel analysis to brain-wide vascular changes.”

      We put our ROI analysis code on GitHub and added a “Code availability” section. We hope it can serve as a foundation for users to explore different quantitative metrics in their own longitudinal ULM studies. We hope to provide an example to inspire further exploration.

      (Line 578) “Code availability

      To support quantitative longitudinal analysis of ULM data, we developed an open-source MATLAB application (https://github.com/ekerwang/ULMQuantitativeAnalysis). This tool is designed to facilitate ROI-based analysis of ULM images for longitudinal comparisons. It supports multiple quantification metrics, including but not limited to vessel area and mean velocity used in this study. Users can select and adapt different metrics based on their specific applications, as a wide range of ULM-based quantification metrics have been developed for different pathological and pharmacological studies.”

      The long-term recordings mentioned by the Authors refer to the 3-week time frame analyzed in this study. However, within each acquisition, the time available from imaging is only a few minutes (< 10', referring to most of the plots showing time courses) after the animals' arousal from isoflurane and before bubbles disappear. This limitation should be acknowledged.

      Response 04: Thank you for this comment. We agree that the current imaging sessions are constrained by the short time window available after the animal’s arousal from isoflurane and before bubbles disappear. This limitation indeed restricts the duration of usable awake-state imaging in our current bolus injection protocol. As discussed earlier, we are actively exploring the use of a jugular vein catheterization approach to address this limitation. This approach has the potential to extend the imaging session duration and provide a longer, more stable time window. We have now acknowledged this limitation more explicitly in the revised Discussion section.

      (Line 347) “Another limitation of this study is the potential residual vasodilatory effect of isoflurane anesthesia on awake imaging sessions and the short imaging window available after bolus injection. The awake imaging sessions were conducted shortly after the mice had emerged from isoflurane anesthesia, required for the MB bolus injections. The lasting vasodilatory effects of isoflurane may have influenced vascular responses, potentially contributing to an underestimation of differences in vascular dynamics between anesthetized and awake state. In addition, since microbubbles are rapidly cleared from circulation, the duration of effective imaging is limited to only a few minutes, which also overlaps with the anesthesia recovery period, constraining the usable awake-state imaging window. Future improvement on microbubble infusion using an indwelling jugular vein catheter presents a promising alternative to address these limitations. This method allows for stable microbubble infusion without the need for anesthesia induction, ensuring that the awake imaging condition is free from residual anesthetic effects. Moreover, it has the potential to extend the duration of imaging sessions, offering a longer and more stable time window for data acquisition. Furthermore, by performing ULM imaging in the awake state first, instead of starting with anesthetized imaging, researchers can achieve a more rigorous comparison of how various anesthetics influence cerebral microvascular dynamics relative to the awake baseline.”

      The more precise description of the number of mice and blood vessels analyzed in Figure 6 makes it apparent the limited number of independent samples used to support the findings of this work. A limitation that should be acknowledged. The newly provided information added as Supplementary Figure 1 should be moved to the main text, eventually in the figure legends. The limited data in support of the findings was also highlighted by Rev2 and, indirectly, by Rev3.

      Response 05: We acknowledge the limited number of independent samples used in this study. In the revised manuscript, we have explicitly emphasized this limitation in the Discussion section. Specifically, we added the following statement:

      (Line 329) “Our current study primarily focused on demonstrating the feasibility of longitudinal ULM imaging in awake animals, instead of conducting a systematic investigation of how isoflurane anesthesia alters cerebral blood flow. Due to the limited number of animals used, the analyses presented in this work should be interpreted as example case studies. While the trends observed across animals were consistent, the small sample size restricts the scope of statistical inference. For future work, it would be valuable to design more rigorous control experiments with larger sample sizes to systematically compare the effects of isoflurane anesthesia, awake states, and other anesthetics that do not induce vasodilation on cerebral blood flow.”

      Following your suggestion, we have also moved the newly provided information (the table in Supplementary Figure 1) into figure captions. In addition, we have modified in the Methods section to ensure that this information is clear.

      (Line 406) “Eight healthy female C57 mice (8-12 weeks) were used for this study, numbered as Mouse 1 to Mouse 8. Three mice (Mouse 1–3) were used to compare imaging results between awake and anesthetized states (Fig. 3 and 4). Three additional mice (Mouse 4–6) underwent longitudinal imaging over a three-week period (Fig. 5 and 6). Among them, Mouse 4 was also used as an example to demonstrate the overall system schematic and saturation conditions (Fig. 1 and 2). Several mice (Mouse 2, 6, 7, and 8) exhibited suboptimal cranial window quality or image artifacts and were included to illustrate common surgical or imaging issues (Supplementary Fig. 1). The specific usage of each animal is also annotated in the corresponding figure captions.”

      Reviewer #2 (Public Review):

      The authors present a very interesting collection of methods and results using brain ultrasound localization microscopy (ULM) in awake mice. They emphasize the effect of the level of anesthesia on the quantifiable elements assessable with this technique (i.e. vessel diameter, flow speed, in veins and arteries, area perfused, in capillaries) and demonstrate the possibility of achieving longitudinal cerebrovascular assessment in one animal during several weeks with their protocol.

      The authors made a good rewriting of the article based on the reviewers' comments. One of the message of the first version of the manuscript was that variability in measurements (vessel diameter, flow velocity, vascularity) were much more pronounced under changes of anesthesia than when considering longitudinal imaging across several weeks. This message is now not quite mitigated, as longitudinal imaging seems to show a certain variability close to the order of magnitude observed under anesthesia. In that sense, the review process was useful in avoiding hasty conclusion and calls for further caution in ULM awake longitudinal imaging, in particular regarding precision of positioning and cancellation of tissue motion.

      Strengths:

      Even if the methods elements considered separately are not new (brain ULM in rodents, setup for longitudinal awake imaging similar to those used in fUS imaging, quantification of vessel diameters/bubble flow/vessel area), when masterfully combined as it is done in this paper, they answer two questions that have been longrunning in the community: what is the impact of anesthesia on the parameters measured by ULM (and indirectly in fUS and other techniques)? Is it possible to achieve ULM in awake rodents for longitudinal imaging? The manuscript is well constructed, well written, and graphics are appealing.

      The manuscript has been much strengthened by the round of review, with more animals for the longitudinal imaging study.

      Weaknesses:

      Some weaknesses remain, not hindering the quality of the work, that the authors might want to answer or explain.

      When considering fig 4e and fig 4j together: it seems that in fig 4e the vascularity reduction in the cortical ROI is around 30% for downward flow, and around 55% for upward flow; but when grouping both cortical flows in fig 4j, the reduction is much smaller (~5%), even at the individual level (only mouse 1 is used in fig 4e). Can you comment on that?

      Response 06: Thank you for carefully pointing this out. This discrepancy arises primarily from differences in ROI selections.

      The vascularity metric (now we changed the term into fractional vessel area, based on Reviewer 1’s comments) is calculated as the proportion of vessel-occupied pixels relative to the total ROI area. As such, it is best suited for longitudinal comparisons within the same ROI rather than across-ROI comparisons, particularly when the size and vessel composition of the ROIs differ.

      In Fig. 4e, the cortical ROI includes mostly the penetrating vessels, which are selected due to their clear distinction between upward (venous) and downward (arterial) flow directions. Pial vessels were intentionally excluded because flow direction alone does not reliably distinguish arteries from veins in these surface vessels. Thus, the goal of this analysis was to indicate arteriovenous differences, rather than to represent the full cortical vascular changes.

      In contrast, the ROIs used in Fig. 4j aim to provide a more comprehensive view of cortical vascular responses without distinguishing flow direction. That’s why both penetrating and pial vessels are included. Since pial vessels showed relatively smaller vascularity changes within the coronal cross-sections analyzed in our study, their inclusion in the cortical ROI likely contributed to the smaller overall reduction in vascularity observed in Figure 4j.

      To address this potential confusion, we have added further clarification in the Results section of the revised manuscript.

      (Line 209) “It is worth noting that prior analyses (Fig. 4d–h) aimed to illustrate arteriovenous differences. Since pial vessels are difficult to distinguish as arteries or veins based on flow direction in coronal plane imaging, they were excluded from the ROI selection in those analyses. In the current whole-brain comparisons (Fig. 4i-k), the cortical ROIs no longer exclude pial vessels, since distinguishing between arteries and veins is not required. This aims to provide a more comprehensive representation of cortical vasculature.”

      When considering fig 4e, fig 4j, fig 6e and fig 6i altogether, it seems that vascularity can be highly variable, whether it be under anesthesia or vascular imaging, with changes between 5 to 40%. Is this vascularity quantification worth it (namely, reliable for example to quantify changes in a pathological model requiring longitudinal imaging)?

      Response 07: Thank you for raising this important point. We found that imaging in the awake state is inherently more variable than under anesthesia. In contrast, anesthetized imaging offers a more controlled and stable physiological condition, as anesthesia suppresses many sources of variation. For pathological studies, if the vascular or hemodynamic changes induced by anesthesia do not interfere with the scientific question being addressed, imaging under anesthesia can still be a practical and effective approach, due to its experimental simplicity and better physiological consistency.

      The higher variability observed in awake imaging arises from both physiological fluctuations in animals and unavoidable experimental inconsistencies, such as small misalignment on the imaging plane across sessions. If the research question aims to avoid the confounding effects of anesthesia, then instead of suppressing variation through anesthesia, it is important to acknowledge the natural baseline variation in the awake state. However, efforts should be made to minimize technical sources of variation. We have added a brief discussion of this issue at the end of the manuscript to reflect this consideration.

      (Line 396) “However, it is also important to note that although longitudinal awake imaging presents promise to avoid the confounding effects of anesthetics, imaging under anesthesia remains more convenient and controllable in many cases. For applications where the physiological question of interest is not sensitive to anesthesia-induced vascular effects, anesthetized imaging still offers a simpler and more stable approach. Awake imaging inherently exhibits greater physiological variability. However, care must be taken at the experimental level to minimize confounding sources of variation, such as stress level of the animal or handling inconsistencies, to ensure that the measurements are physiologically meaningful.”

      Regarding whether fractional vessel area (formerly referred to as vascularity) is a worthwhile metric for longitudinal quantification: based on our experience and comparisons, we found vessel area to be relatively robust and informative (see also Response 02 to Reviewer 1 for details). However, we acknowledge that other quantitative metrics—such as microbubble count, tortuosity, or flow directionality—may be more suitable depending on the specific pathological model or research question. How these metrics perform in awake imaging and longitudinal disease models is indeed an open and important question. We hope our work can serve as a foundation to inspire further investigation in this direction. To facilitate such exploration, we have developed and open-sourced a MATLAB-based analysis tool that supports multiple quantitative ULM metrics for longitudinal comparison. We encourage users to adapt and extend this framework to evaluate different quantitative metrics.

      (Line 578) “Code availability

      To support quantitative longitudinal analysis of ULM data, we developed an open-source MATLAB application (https://github.com/ekerwang/ULMQuantitativeAnalysis). This tool is designed to facilitate ROI-based analysis of ULM images for longitudinal comparisons. It supports multiple quantification metrics, including but not limited to vessel area and mean velocity used in this study. Users can select and adapt different metrics based on their specific applications, as a wide range of ULM-based quantification metrics have been developed for different pathological and pharmacological studies.”

      Reviewer #2 (Recommendations For The Authors):

      Images in figure 4 lack color bars.

      Response 08: Thank you for pointing this out. The color bars for the images in Figure 4 are the same as those used in the corresponding images in Figure 3. We have now added the explanation of color bars to the revised version of Figure 4 caption.

      Fig 4d: upward and downward are probably swapped.

      Response 09: Thank you for pointing this out, and we apologize for the oversight. They were mistakenly swapped. We have corrected this error in the revised figure.

      No quantitative conclusions are drawn regarding the changes in vessel diameter under anesthesia? Is it not significant? If it is not then why bring changes in diameter to our attention in fig 3 (white arrows) and figure 4b?

      Response 10: Our intention in highlighting diameter changes in Figure 3 (white arrows) and Figure 4b was to provide an illustrative example of isoflurane-induced diameter changes at the single-vessel level. These examples are meant to serve as case studies, not as the basis for broad statistical conclusions.

      In the initial version of the manuscript, we attempted to draw quantitative conclusions by measuring vessel diameters from ten manually selected vessel segments at each location. However, based on feedback from other reviewers, we decided to remove this analysis in the revised version. Manual selection of vessel segments is highly subjective and prone to bias, limiting its reliability for quantitative interpretation.

      Instead, we focused on ROI-based analysis using fractional vessel area (formerly referred to as vascularity), which reflects widespread changes in vessel diameter across regions. It is a more generalizable and less biased metric for quantifying vascular diameter changes.

      We further explained this in the Results section:

      (Line 181) “To validate the broader applicability of our findings, we conducted ROI-based analyses using fractional vessel area and mean velocity as primary metrics. These metrics extended the analysis of vessel diameter and flow velocity to entire brain regions or selected ROIs, which provides a more objective assessment of cerebral blood flow changes at a global scale and reduces the bias associated with manually selecting vessel segments. For vessel area measurements, the term fractional denotes that the vessel area is normalized to the total area of the selected ROI. This normalization is essential for fair comparisons across ROIs of different sizes.”

      Line 210 "In summary, statistical analysis revealed a decrease in individual vessel diameter" this does not seem to be supported by this version of the manuscript as no analysis is done on a representative group of vessels for the diameter.

      Response 11: Thank you for pointing out this important issue. In line with our previous response (Response 10), we would like to clarify that the analysis of individual vessel diameter was intended to serve as an example study, rather than a statistically supported conclusion based on a group of vessels. To avoid confusion, we have removed the phrase “statistical analysis revealed a decrease in individual vessel diameter” from the manuscript. 

      The meaning of the *** in fig 6b and 6c should be clarified as: -it is not explicitly stated - the equivalence test interpretation is less usual than other tests.

      Response 12: We thank the reviewer for pointing out this important issue. We agree that the use of asterisks (***) in Fig. 6b and 6c may have led to confusion, as such markers are typically associated with statistical significance in difference testing. In our case, the analysis was based on the two one-sided test (TOST) procedure to assess statistical equivalence, which is indeed less commonly used and could be misinterpreted.

      To address this, we have replaced the asterisks *** in the figure with the label “equiv.”, which more clearly reflects the intended interpretation. Additionally, we have revised the figure caption and the main text to explicitly state that these markers denote statistical equivalence (not difference) as determined by TOST, with the equivalence margin defined as three times the standard deviation of one week.

      (Figure 6 Caption) “Statistical analysis was performed using the two one-sided test (TOST) to evaluate consistency of measurement. The label “equiv.” indicates statistically equivalent measurements (p < 0.001), defined as interweek differences smaller than three times the standard deviation of one week.”

      (Line 240) “Statistical testing of equivalence was conducted using the two one-sided test (TOST) procedure, which evaluates whether the difference between two time points falls within a predefined equivalence margin. Specifically, equivalence is defined as the inter-week difference being smaller than three times the standard deviation of one week. A statistically significant result in TOST (p < 0.001) supports the interpretation that the measurements are statistically equivalent, which is denoted as “equiv.” in the figures.”

      Line 237 and following: please consider rephrasing into "To further generalize these findings and examine longitudinal variation in ROI-based analysis, we used Mouse 4 as an example to show the consistency of blood flow density across different flow directions in the cortex (Fig. 6d) and extended the quantitative analysis to all three mice (Fig. 6e) (individual ULM upward and downward flow images for all three mice over the threeweek longitudinal study period can be found in Supplementary Fig. 4)." The paragraph will make much more sense.

      Response 13: We appreciate your helpful rephrasing. We have fully adopted your proposed revision to enhance the clarity and coherence of the text. The sentence now reads exactly as you recommended:

      (Line 250): “To further generalize these findings and examine longitudinal variation in ROI-based analysis, we used Mouse 4 as an example to show the consistency of blood flow density across different flow directions in the cortex (Fig. 6d) and extended the quantitative analysis to all three mice (Fig. 6e) (individual ULM upward and downward flow images for all three mice over the three-week longitudinal study period can be found in Supplementary Fig. 4).”

      Line 248: "While arterial and venous flow velocity distributions exhibit clear distinctions, their variations over the three weeks remained acceptable" the meaning of acceptable remains elusive.

      Response 14: Thank you for pointing out the ambiguity in the phrase “remained acceptable”. To improve clarity and precision, we have revised the sentence to provide a more informative description. The updated sentence now reads:

      (Line 261) “While arterial and venous flow velocity distributions exhibit clear distinctions, the distribution shapes remained relatively consistent across the three weeks. Specifically, variation in median velocity were within 1 mm/s. In contrast, anesthesia-induced changes can lead to velocity shifts exceeding 1 mm/s.”

      Line 253: consider rephrasing in "Despite subcortical regions showing the largest vascularity variability consecutive to anesthesia-induced changes, vascularity in those regions was relatively stable values in the longitudinal study" as otherwise the link between the 2 parts of the sentence feels odd.

      Response 15: Thank you for your constructive suggestion regarding the logical flow of the sentence. We fully agree with your point and have revised the sentence exactly as you proposed.

      (Line 268) “Despite subcortical regions showing the largest vascularity variability consecutive to anesthesia-induced changes, vascularity in those regions was relatively stable values in the longitudinal study.”

    1. eLife Assessment

      This important study investigates why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. The authors perform deep transcriptomic and epigenetic comparisons between the mouse and the 13-lined ground squirrel (13LGS) to provide convincing evidence that identifies mechanisms that drive rod vs cone-rich retina development. Overall, this key question is investigated using an impressive collection of new data, cross-species analysis, and subsequent in vivo experiments. However, the functional analysis showing the sufficiency and necessity of Zic3 and Mef2C remains incomplete, and further analyses are needed to support the claim that these enhancers are newly evolved in 13LGS.

    2. Reviewer #1 (Public review):

      Summary:

      In this manuscript, Weir et al. investigate why the 13-lined ground squirrel (13LGS) retina is unusually rich in cone photoreceptors, the cells responsible for color and daylight vision. Most mammals, including humans, have rod-dominant retinas, making the 13LGS retina both an intriguing evolutionary divergence and a valuable model for uncovering novel mechanisms of cone generation. The developmental programs underlying this adaptation were previously unknown.

      Using an integrated approach that combines single-cell RNA sequencing (scRNAseq), scATACseq, and histology, the authors generate a comprehensive atlas of retinal neurogenesis in 13LGS. Notably, comparative analyses with mouse datasets reveal that in 13LGS, cones can arise from late-stage neurogenic progenitors, a striking contrast to mouse and primate retinas, where late progenitors typically generate rods and other late-born cell types but not cones. They further identify a shift in the timing (heterochrony) of expression of several transcription factors. Further, the authors show that these factors act through species-specific regulatory elements. And overall, functional experiments support a role for several of these candidates in cone production.

      Strengths:

      This study stands out for its rigorous and multi-layered methodology. The combination of transcriptomic, epigenomic, and histological data yields a detailed and coherent view of cone development in 13LGS. Cross-species comparisons are thoughtfully executed, lending strong evolutionary context to the findings. The conclusions are, in general, well supported by the evidence, and the datasets generated represent a substantial resource for the field. The work will be of high value to both evolutionary neurobiology and regenerative medicine, particularly in the design of strategies to replace lost cone photoreceptors in human disease.

      Weaknesses:

      (1) Overall, the conclusions are strongly supported by the data, but the paper would benefit from additional clarifications. In particular, some of the conclusions could be toned down slightly to reflect that the observed changes in candidate gene function, such as those for Zic3 by itself, are modest and may represent part of a more complex regulatory network.

      (2) Additional explanations about the cell composition of the 13LGS retina are needed. The ratios between cone and rod are clearly detailed, but do those lead to changes in other cell types?

      (3) Could the lack of a clear trajectory for rod differentiation be just an effect of low cell numbers for this population?

      (4) The immunohistochemistry and RNA hybridization experiments shown in Figure S2 would benefit from supporting controls to strengthen their interpretability. While it has to be recognized that performing immunostainings on non-conventional species is not a simple task, negative controls are necessary to establish the baseline background levels, especially in cases where there seems to be labeling around the cells. The text indicates that these experiments are both immunostainings and ISH, but the figure legend only says "immunohistochemistry". Clarifying these points would improve readers' confidence in the data.

      (5) Figure S3: The text claims that overexpression of Zic3 alone is sufficient to induce the cone-like photoreceptor precursor cells as well as horizontal cell-like precursors, but this is not clear in Figure S3A nor in any other figure. Similarly, the effects of Pou2f1 overexpression are different in Figure S3A and Figure S3B. In Figure S3B, the effects described (increased presence of cone-like and horizontal-like precursors) are very clear, whereas it is not in Figure S3A. How are these experiments different?

      (6) The analyses of Zic3 conditional mutants (Figure S4) reveal an increase in many cone, rod, and pan-photoreceptor genes with only a reduction in some cone genes. Thus, the overall conclusion that Zic3 is essential for cones while repressing rod genes doesn't seem to match this particular dataset.

      (7) Throughout the text, the authors used the term "evolved". To substantiate this claim, it would be important to include sequence analyses or to rephrase to a more neutral term that does not imply evolutionary inference.

    3. Reviewer #2 (Public review):

      Summary:

      This paper aims to elucidate the gene regulatory network governing the development of cone photoreceptors, the light-sensing neurons responsible for high acuity and color vision in humans. The authors provide a comprehensive analysis through stage-matched comparisons of gene expression and chromatin accessibility using scRNA-seq and scATAC-seq from the cone-dominant 13-lined ground squirrel (13LGS) retina and the rod-dominant mouse retina. The abundance of cones in the 13LGS retina arises from a dominant trajectory from late retinal progenitor cells (RPCs) to photoreceptor precursors and then to cones, whereas only a small proportion of rods are generated from these precursors.

      Strengths:

      The paper presents intriguing insights into the gene regulatory network involved in 13LGS cone development. In particular, the authors highlight the expression of cone-promoting transcription factors such as Onecut2, Pou2f1, and Zic3 in late-stage neurogenic progenitors, which may be driven by 13LGS-specific cis-regulatory elements. The authors also characterize candidate cone-promoting genes Zic3 and Mef2C, which have been previously understudied. Overall, I found that the across-species analysis presented by this study is a useful resource for the field.

      Weaknesses:

      The functional analysis on Zic3 and Mef2C in mice does not convincingly establish that these factors are sufficient or necessary to promote cone photoreceptor specification. Several analyses lack clarity or consistency, and figure labeling and interpretation need improvement.

    4. Reviewer #3 (Public review):

      Summary:

      The authors perform deep transcriptomic and epigenetic comparisons between mouse and 13-lined ground squirrel (13LGS) to identify mechanisms that drive rod vs cone-rich retina development. Through cross-species analysis, the authors find extended cone generation in 13LGS, gene expression within progenitor/photoreceptor precursor cells consistent with a lengthened cone window, and differential regulatory element usage. Two of the transcription factors, Mef2c and Zic3, were subsequently validated using OE and KO mouse lines to verify the role of these genes in regulating competence to generate cone photoreceptors.

      Strengths:

      Overall, this is an impactful manuscript with broad implications toward our understanding of retinal development, cell fate specification, and TF network dynamics across evolution and with the potential to influence our future ability to treat vision loss in human patients. The generation of this rich new dataset profiling the transcriptome and epigenome of the 13LGS is a tremendous addition to the field that assuredly will be useful for numerous other investigations and questions of a variety of interests. In this manuscript, the authors use this dataset and compare it to data they previously generated for mouse retinal development to identify 2 new regulators of cone generation and shed insights into their regulation and their integration into the network of regulatory elements within the 13LGS compared to mouse.

      Weaknesses:

      (1) The authors chose to omit several cell classes from analyses and visualizations that would have added to their interpretations. In particular, I worry that the omission of 13LGS rods, early RPCs, and early NG from Figures 2C, D, and F is notable and would have added to the understanding of gene expression dynamics. In other words, (a) are these genes of interest unique to late RPCs or maintained from early RPCs, and (b) are rod networks suppressed compared to the mouse?

      (2) The authors claim that the majority of cones are generated by late RPCs and that this is driven primarily by the enriched enhancer network around cone-promoting genes. With the temporal scRNA/ATACseq data at their disposal, the authors should compare early vs late born cones and RPCs to determine whether the same enhancers and genes are hyperactivated in early RPCs as well as in the 13LGS. This analysis will answer the important question of whether the enhancers activated/evolved to promote all cones, or are only and specifically activated within late RPCs to drive cone genesis at the expense of rods.

      (3) The authors repeatedly use the term 'evolved' to describe the increased number of local enhancer elements of genes that increase in expression in 13LGS late RPCs and cones. Evolution can act at multiple levels on the genome and its regulation. The authors should consider analysis of sequence level changes between mouse, 13LGS, and other species to test whether the enhancer sequences claimed to be novel in the 13LGS are, in fact, newly evolved sequence/binding sites or if the binding sites are present in mouse but only used in late RPCs of the 13LGS.

      (4) The authors state that 'Enhancer elements in 13LGS are predicted to be directly targeted by a considerably greater number of transcription factors than in mice'. This statement can easily be misread to suggest that all enhancers display this, when in fact, this is only the cone-promoting enhancers of late 13LGS RPCs. In a way, this is not surprising since these genes are largely less expressed in mouse vs 13LGS late RPCs, as shown in Figure 2. The manuscript is written to suggest this mechanism of enhancer number is specific to cone production in the 13LGS- it would help prove this point if the authors asked the opposite question and showed that mouse late RPCs do not have similar increased predicted binding of TFs near rod-promoting genes in C7-8.

    1. eLife Assessment

      This important study shows that calcium stores in the endoplasmic reticulum of the parasitic protozoan, Toxoplasma gondii play a major role in buffering calcium levels in the cytosol as well as other organelles such as the mitochondrion. Advanced imaging techniques, including use of genetically encoded calcium indicators provide compelling evidence for the role of the SERCA-Ca2+ ATPase pump in regulating organellar calcium levels. However, it remains unclear whether intra-organellar calcium transport occurs via ER-mitochondria membrane contact sites or other mechanisms. This work will be of interest to cell and molecular biologists interested in calcium signalling in divergent eukaryotes.

    2. Reviewer #1 (Public review):

      Li et al. investigate Ca2+ signaling in T. gondii and argue that Ca2+ tunnels through the ER to other organelles to fuel multiple aspects of T. gondii biology. They focus in particular on TgSERCA as the presumed primary mechanism for ER Ca2+ filling. Although, when TgSERCA was knocked out there was still a Ca2+ release in response to TG present. Overall the data supports a model where the Ca2+ filling state of the ER modulates Ca2+ dynamics in other organelles.

      Comments on revisions:

      I thank the authors for their careful revisions and response to my comments, which have been addressed.

      Regarding the most critical point of the paper that is Ca2+ transfer from the ER to other organelles, the authors in their rebuttal and in the revised manuscript argue that ER Ca2+ is critical to redistribute and replenish Ca2+ in other organelles in the cell. I agree this conclusion and think it is best stated in the authors' response to point #7: "We propose that this leaked calcium is subsequently taken up by other intracellular compartments. This effect is observed immediately upon TG addition. However, pre-incubation with TG or knockdown of SERCA reduces calcium storage in the ER, thereby diminishing the transfer of calcium to other stores."

      In their rebuttal the authors particularly highlight experiments in Figures 1H-K, 4G-H, and 5H-K in support of this conclusion. The data in Fig 1H-K show that with TG there is increased Ca2+ release from acidic stores. In all cases TG results in a rise in cytoplasmic Ca2+ that could load the acidic stores. So under those conditions the increased acidic organelle Ca2+ is likely due to a preceding high cytosolic Ca2+ transient due to TG. The experiments in 4G-H and 5H-K are more convincing and supportive of an important role of ER Ca2+ to maintain Ca2+ levels in other organelles. Overall, and to avoid a detailed, lengthy discussion of every point, the data support a model where in the absence of SERCA activity ER Ca2+ is reduced as well as Ca2+ in other organelles. I think it would be helpful to present and discuss this finding throughout the manuscript as under physiological conditions ER Ca2+ is regularly mobilized for signaling and homeostasis and this maintains Ca2+ levels in other organelles. This is supported by the new experiment in Supp Fig. 2A.

    1. eLife Assessment

      Whole-brain imaging of neuronal activity in freely behaving animals holds great promise for neuroscience, but numerous technical challenges limit its use. In this important study, the authors describe a new set of deep learning-based tools to track and identify the activity of head neurons in freely moving nematodes (C. elegans) and jellyfish (Clytia hemisphaerica). While the tools convincingly enable high tracking speed and accuracy in the settings in which the authors have evaluated them, the claim that these tools should be easily generalizable to a wide variety of datasets is incompletely supported.

    2. Reviewer #1 (Public review):

      In this important study, the authors develop a suite of machine vision tools to identify and align fluorescent neuronal recording images in space and time according to neuron identity and position. The authors provide compelling evidence for the speed and utility of these tools. While such tools have been developed in the past (including by the authors), the key advancement here is the speed and broad utility of these new tools. While prior approaches based on steepest descent worked, they required hundreds of hours of computational time, while the new approaches outlined here are >600-fold faster. The machine vision tools here should be immediately useful to readers specifically interested in whole-brain C. elegans data, but also for more general readers who may be interested in using BrainAlignNet for tracking fluorescent neuronal recordings from other systems.

      I really enjoyed reading this paper. The authors had several ground truth examples to quantify the accuracy of their algorithms and identified several small caveats users should consider when using these tools. These tools were primarily developed for C. elegans, an animal with stereotyped development, but whose neurons can be variably located due to internal motion of the body. The authors provide several examples of how BrainAlignNet reliably tracked these neurons over space and time. Neuron identity is also important to track, and the authors showed how AutoCellLoader can reliably identify neurons based on their fluorescence in the NeuroPAL background. A challenge with NeuroPAL though, is the high expression of several fluorophores, which compromises behavioral fidelity. The authors provide some possible avenues where this problem can be addressed by expressing fewer fluorophores. While using all four channels provided the best performance, only using the tagRFP and CyOFP channels was sufficient for performance that was close to full performance using all 4 NeuroPAL channels. This result indicates that the development of future lines with less fluorophore expression could be sufficient for reliable neuronal identification, which would decrease the genetic load on the animal, but also open other fluorescent channels that could be used for tracking other fluorescent tools/markers. Even though these tools were developed for C. elegans specifically, they showed BrainAlignNet can be applied to other organisms as well (in their case, the cnidarian C. hemisphaerica), which broadens the utility of their tools.

      Strengths:

      (1) The authors have a wealth of ground-truth training data to compare their algorithms against, and provide a variety of metrics to assess how well their new tools perform against hand annotation and/or prior algorithms.

      (2) For BrainAlignNet, the authors show how this tool can be applied to other organisms besides C. elegans.

      (3) The tools are publicly available on GitHub, which includes useful README files and installation guidance.

      Weaknesses:

      (1) Most of the utility of these algorithms is for C. elegans specifically. Testing their algorithms (specifically BrainAlignNet) on more challenging problems, such as whole-brain zebrafish, would have been interesting. This is a very, very minor weakness, though.

      (2) The tools are benchmarked against their own prior pipeline, but not against other algorithms written for the same purpose.

      (3) Considerable pre-processing was done before implementation. Expanding upon this would improve accessibility of these tools to a wider audience.

    3. Reviewer #2 (Public review):

      Summary:

      The paper introduced the pipeline to analyze brain imaging of freely moving animals: registering deforming tissues and maintaining consistent cell identities over time. The pipeline consists of three neural networks that are built upon existing models: BrainAlignNet for non-rigid registration, AutoCellLabeler for supervised annotation of over 100 neuronal types, and CellDiscoveryNet for unsupervised discovery of cell identities. The ambition of the work is to enable high-throughput and largely automated pipelines for neuron tracking and labeling in deforming nervous systems.

      Strengths:

      (1) The paper tackles a timely and difficult problem, offering an end-to-end system rather than isolated modules.

      (2) The authors report high performance within their dataset, including single-pixel registration accuracy, nearly complete neuron linking over time, and annotation accuracy that exceeds individual human labelers.

      (3) Demonstrations across two organisms suggest the methods could be transferable, and the integration of supervised and unsupervised modules is of practical utility.

      Weaknesses:

      (1) Lack of solid evaluation. Despite strong results on their own data, the work is not benchmarked against existing methods on community datasets, making it hard to evaluate relative performance or generality.

      (2) Lack of novelty. All three models do not incorporate state-of-the-art advances from the respective fields. BrainAlignNet does not learn from the latest optical flow literature, relying instead on relatively conventional architectures. AutoCellLabeler does not utilize the advanced medNeXt3D architectures for supervised semantic segmentation. CellDiscoveryNet is presented as unsupervised discovery but relies on standard clustering approaches, with limited evaluation on only a small test set.

      (3) Lack of robustness. BrainAlignNet requires dataset-specific training and pre-alignment strategies, limiting its plug-and-play use. AutoCellLabeler depends heavily on raw intensity patterns of neurons, making it brittle to pose changes. By contrast, current state-of-the-art methods incorporate spatial deformation atlases or relative spatial relationships, which provide robustness across poses and imaging conditions. More broadly, the ANTSUN 2.0 system depends on numerous manually tuned weights and thresholds, which reduces reproducibility and generalizability beyond curated conditions.

      Evaluation:

      To make the evaluation more solid, it would be great for the authors to (1) apply the new method on existing datasets and (2) apply baseline methods on their own datasets. Otherwise, without comparison, it is unclear if the proposed method is better or not. The following papers have public challenging tracking data: https://elifesciences.org/articles/66410, https://elifesciences.org/articles/59187, https://www.nature.com/articles/s41592-023-02096-3.

      Methodology:

      (1) The model innovations appear incrementally novel relative to existing work. The authors should articulate what is fundamentally different (architectural choices, training objectives, inductive biases) and why those differences matter empirically. Ablations isolating each design choice would help.

      (2) The pipeline currently depends on numerous manually set hyperparameters and dataset-specific preprocessing. Please provide principled guidelines (e.g., ranges, default settings, heuristics) and a robustness analysis (sweeps, sensitivity curves) to show how performance varies with these choices across datasets; wherever possible, learn weights from data or replace fixed thresholds with data-driven criteria.

      Appraisal:

      The authors partially achieve their aims. Within the scope of their dataset, the pipeline demonstrates impressive performance and clear practical value. However, the absence of comparisons with state-of-the-art algorithms such as ZephIR, fDNC, or WormID, combined with small-scale evaluation (e.g., ten test volumes), makes the strength of evidence incomplete. The results support the conclusion that the approach is useful for their lab's workflow, but they do not establish broader robustness or superiority over existing methods.

      Impact:

      Even though the authors have released code, the pipeline requires heavy pre- and post-processing with numerous manually tuned hyperparameters, which limits its practical applicability to new datasets. Indeed, even within the paper, BrainAlignNet had to be adapted with additional preprocessing to handle the jellyfish data. The broader impact of the work will depend on systematic benchmarking against community datasets and comparison with established methods. As such, readers should view the results as a promising proof of concept rather than a definitive standard for imaging in deformable nervous systems.

    4. Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

    5. Author response:

      Reviewer #1 (Public review):

      In this important study, the authors develop a suite of machine vision tools to identify and align fluorescent neuronal recording images in space and time according to neuron identity and position. The authors provide compelling evidence for the speed and utility of these tools. While such tools have been developed in the past (including by the authors), the key advancement here is the speed and broad utility of these new tools. While prior approaches based on steepest descent worked, they required hundreds of hours of computational time, while the new approaches outlined here are >600-fold faster. The machine vision tools here should be immediately useful to readers specifically interested in whole-brain C. elegans data, but also for more general readers who may be interested in using BrainAlignNet for tracking fluorescent neuronal recordings from other systems.

      I really enjoyed reading this paper. The authors had several ground truth examples to quantify the accuracy of their algorithms and identified several small caveats users should consider when using these tools. These tools were primarily developed for C. elegans, an animal with stereotyped development, but whose neurons can be variably located due to internal motion of the body. The authors provide several examples of how BrainAlignNet reliably tracked these neurons over space and time. Neuron identity is also important to track, and the authors showed how AutoCellLoader can reliably identify neurons based on their fluorescence in the NeuroPAL background. A challenge with NeuroPAL though, is the high expression of several fluorophores, which compromises behavioral fidelity. The authors provide some possible avenues where this problem can be addressed by expressing fewer fluorophores. While using all four channels provided the best performance, only using the tagRFP and CyOFP channels was sufficient for performance that was close to full performance using all 4 NeuroPAL channels. This result indicates that the development of future lines with less fluorophore expression could be sufficient for reliable neuronal identification, which would decrease the genetic load on the animal, but also open other fluorescent channels that could be used for tracking other fluorescent tools/markers. Even though these tools were developed for C. elegans specifically, they showed BrainAlignNet can be applied to other organisms as well (in their case, the cnidarian C. hemisphaerica), which broadens the utility of their tools.

      Strengths:

      (1) The authors have a wealth of ground-truth training data to compare their algorithms against, and provide a variety of metrics to assess how well their new tools perform against hand annotation and/or prior algorithms.

      (2) For BrainAlignNet, the authors show how this tool can be applied to other organisms besides C. elegans.

      (3) The tools are publicly available on GitHub, which includes useful README files and installation guidance.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Most of the utility of these algorithms is for C. elegans specifically. Testing their algorithms (specifically BrainAlignNet) on more challenging problems, such as whole-brain zebrafish, would have been interesting. This is a very, very minor weakness, though.

      We appreciate the reviewer’s point that expanding to additional animal models would be valuable. In the study, we have so far tested our approaches on C. elegans and Jellyfish. Given that this is considered a ‘very, very minor weakness’ and that it does not directly affect the results or analyses in the paper, we think this might be better to address in future work.

      (2) The tools are benchmarked against their own prior pipeline, but not against other algorithms written for the same purpose.

      We agree that it would be valuable to benchmark other labs’ software pipelines on our datasets. We note that most papers in this area, which describe those pipelines, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ on our data might not represent those pipelines in their best light when compared to our pipeline that was developed with our data in mind. Data from different microscopy platforms can be surprisingly different and we wouldn’t want to perform an analysis that had this bias. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (3) Considerable pre-processing was done before implementation. Expanding upon this would improve accessibility of these tools to a wider audience.

      Indeed, some pre-processing was performed on images before registration and neuron identification -- understanding these nuances can be important. The pre-processing steps are described in the Results section and detailed in the Methods. They are also all available in our open-source software. For BrainAlignNet, the key steps were: (1) selecting image registration problems, (2) cropping, and (3) Euler alignment. Steps (1) and (3) were critically important and are extensively discussed in the Results and Discussion sections of our study (lines 142-144, 218-234, 318-323, 704-712). Step (2) is standard in image processing. For AutoCellLabeler and CellDiscoveryNet, the pre-processing was primarily to align the 4 NeuroPAL color channels to each other (i.e. make sure the blue/red/orange/etc channels for an animal are perfectly aligned). This is also just a standard image processing step to ensure channel alignment. Thus, the more “custom” pre-processing steps were extensively discussed in the study and the more “common” steps are still described in the Methods. The implementation of all steps is available in our open-source software.

      Reviewer #2 (Public review):

      Summary:

      The paper introduced the pipeline to analyze brain imaging of freely moving animals: registering deforming tissues and maintaining consistent cell identities over time. The pipeline consists of three neural networks that are built upon existing models: BrainAlignNet for non-rigid registration, AutoCellLabeler for supervised annotation of over 100 neuronal types, and CellDiscoveryNet for unsupervised discovery of cell identities. The ambition of the work is to enable high-throughput and largely automated pipelines for neuron tracking and labeling in deforming nervous systems.

      Strengths:

      (1) The paper tackles a timely and difficult problem, offering an end-to-end system rather than isolated modules.

      (2) The authors report high performance within their dataset, including single-pixel registration accuracy, nearly complete neuron linking over time, and annotation accuracy that exceeds individual human labelers.

      (3) Demonstrations across two organisms suggest the methods could be transferable, and the integration of supervised and unsupervised modules is of practical utility.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) Lack of solid evaluation. Despite strong results on their own data, the work is not benchmarked against existing methods on community datasets, making it hard to evaluate relative performance or generality.

      We agree that it would be valuable to benchmark many labs’ software pipelines on some common datasets, ideally from several different research labs. We note that most papers in this area, which describe the other pipelines that have been developed, provide the same performance metrics that we do (accuracy of neuron identification, tracking accuracy, etc), so a crude, first-order comparison can be obtained by comparing the numbers in the papers. But, we agree that a rigorous head-to-head comparison would require applying these different pipelines to a common dataset. We considered performing these analyses, but we were concerned that using other labs’ software ‘off the shelf’ and comparing the results to our pipeline (where we have extensive expertise) might bias the performance metrics in favor of our software. Therefore, we feel that this comparison would be best pursued by all of these labs collaboratively (so that they can each provide input on how to run their software optimally). Indeed, this is an important area for future study. In this spirit, we have been sharing our eat-4::GFP datasets (that permit quantification of tracking accuracy) with other labs looking for additional ways to benchmark their tracking software.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) Lack of novelty. All three models do not incorporate state-of-the-art advances from the respective fields. BrainAlignNet does not learn from the latest optical flow literature, relying instead on relatively conventional architectures. AutoCellLabeler does not utilize the advanced medNeXt3D architectures for supervised semantic segmentation. CellDiscoveryNet is presented as unsupervised discovery but relies on standard clustering approaches, with limited evaluation on only a small test set.

      We appreciate that the machine learning field moves fast. Our goal was not to invent entirely novel machine learning tools, but rather to apply and optimize tools for a set of challenging, unsolved biological problems. We began with the somewhat simpler architectures described in our study and were largely satisfied with their performance. It is conceivable that newer approaches would perhaps lead to even greater accuracy, flexibility, and/or speed. But, oftentimes, simple or classical solutions can adequately resolve specific challenges in biological image processing.

      Regarding CellDiscoveryNet, our claim of unsupervised training is precise: CellDiscoveryNet is trained end-to-end only on raw images, with no human annotations, pseudo-labels, external classifiers, or metadata used for training, model selection, or early stopping. The loss is defined entirely from the input data (no label signal). By standard usage in machine learning, this constitutes unsupervised (often termed “self-supervised”) representation learning. Downstream clustering is likewise unsupervised, consuming only image pairs registered by CellDiscoveryNet and neuron segmentations produced by our previously-trained SegmentationNet (which provides no label information).

      (3) Lack of robustness. BrainAlignNet requires dataset-specific training and pre-alignment strategies, limiting its plug-and-play use. AutoCellLabeler depends heavily on raw intensity patterns of neurons, making it brittle to pose changes. By contrast, current state-of-the-art methods incorporate spatial deformation atlases or relative spatial relationships, which provide robustness across poses and imaging conditions. More broadly, the ANTSUN 2.0 system depends on numerous manually tuned weights and thresholds, which reduces reproducibility and generalizability beyond curated conditions.

      Regarding BrainAlignNet: we agree that we trained on each species’ own data (worm, jellyfish) and we would suggest other labs working on new organisms to do the same based on our current state of knowledge. It would be fantastic if there was an alignment approach that generalized to all possible cases of non-rigid-registration in all animals – an important area for future study. We also agree that pre-alignment was critical in worms and jellyfish, which we discuss extensively in our study (lines 142-144, 318-321, 704-712).

      Regarding AutoCellLabeler: the animals were not recorded in any standardized pose and were not aligned to each other beforehand – they were basically in a haphazard mix of poses and we used image augmentation to allow the network to generalize to other poses, as described in our study. It is still possible that AutoCellLabeler is somehow brittle to pose changes (e.g. perhaps extremely curved worms) – while we did not detect this in our analyses, we did not systematically evaluate performance across all possible poses. However, we do note that this network was able to label images taken from freely-moving worms, which by definition exhibit many poses (Figure 5D, lines 500-525); aggregating the network’s performance across freely-moving data points allowed it to nearly match its performance on high-SNR immobilized data. This suggests a degree of robustness of the AutoCellLabeler network to pose changes.

      Regarding ANTSUN 2.0: we agree that there are some hyperparameters (described in our study) that affect ANTSUN performance. We agree that it would be worthwhile to fully automate setting these in future iterations of the software.

      Evaluation:

      To make the evaluation more solid, it would be great for the authors to (1) apply the new method on existing datasets and (2) apply baseline methods on their own datasets. Otherwise, without comparison, it is unclear if the proposed method is better or not. The following papers have public challenging tracking data: https://elifesciences.org/articles/66410, https://elifesciences.org/articles/59187, https://www.nature.com/articles/s41592-023-02096-3.

      Please see our response to your point (1) under Weaknesses above.

      Methodology:

      (1) The model innovations appear incrementally novel relative to existing work. The authors should articulate what is fundamentally different (architectural choices, training objectives, inductive biases) and why those differences matter empirically. Ablations isolating each design choice would help.

      There are other efforts in the literature to solve the neuron tracking and neuron identification problems in C. elegans (please see paragraphs 4 and 5 of our Introduction, which are devoted to describing these). However, they are quite different in the approaches that they use, compared to our study. For example, for neuron tracking they use t->t+1 methods, or model neurons as point clouds, etc (a variety of approaches have been tried). For neuron identification, they work on extracted features from images, or use statistical approaches rather than deep neural networks, etc (a variety of approaches have been tried). Our assessment is that each of these diverse approaches has strengths and drawbacks; we agree that a meta-analysis of the design choices used across studies could be valuable.

      We also note that there are not really any pipelines to directly compare against CellDiscoveryNet, as we are not aware of any other fully unsupervised approach for neuron identification in C. elegans.

      (2) The pipeline currently depends on numerous manually set hyperparameters and dataset-specific preprocessing. Please provide principled guidelines (e.g., ranges, default settings, heuristics) and a robustness analysis (sweeps, sensitivity curves) to show how performance varies with these choices across datasets; wherever possible, learn weights from data or replace fixed thresholds with data-driven criteria.

      We agree that there are some ANTSUN 2.0 hyperparameters (described in our Methods section) that could affect the quality of neuron tracking. It would be worthwhile to fully automate setting these in future iterations of the software, ensuring that the hyperparameter settings are robust to variation in data/experiments.

      Appraisal:

      The authors partially achieve their aims. Within the scope of their dataset, the pipeline demonstrates impressive performance and clear practical value. However, the absence of comparisons with state-of-the-art algorithms such as ZephIR, fDNC, or WormID, combined with small-scale evaluation (e.g., ten test volumes), makes the strength of evidence incomplete. The results support the conclusion that the approach is useful for their lab's workflow, but they do not establish broader robustness or superiority over existing methods.

      We wish to remind the reviewer that we developed BrainAlignNet for use in worms and jellyfish. These two animals have different distributions of neurons and radically different anatomy and movement patterns. Data from the two organisms was collected in different labs (Flavell lab, Weissbourd lab) on different types of microscopes (spinning disk, epifluorescence). We believe that this is a good initial demonstration that the approach has robustness across different settings.

      Regarding comparisons to other labs’ C. elegans data processing pipelines, we agree that it will be extremely valuable to compare performance on common datasets, ideally collected in multiple different research labs. But we believe this should be performed collaboratively so that all software can be utilized in their best light with input from each lab, as described above. We agree that such a comparison would be very valuable.

      Impact:

      Even though the authors have released code, the pipeline requires heavy pre- and post-processing with numerous manually tuned hyperparameters, which limits its practical applicability to new datasets. Indeed, even within the paper, BrainAlignNet had to be adapted with additional preprocessing to handle the jellyfish data. The broader impact of the work will depend on systematic benchmarking against community datasets and comparison with established methods. As such, readers should view the results as a promising proof of concept rather than a definitive standard for imaging in deformable nervous systems.

      Regarding worms vs jellyfish pre-processing: we actually had the exact opposite reaction to that of the reviewer. We were surprised at how similar the pre-processing was for these two very different organisms. In both cases, it was essential to (1) select appropriate registration problems to be solved; and (2) perform initialization with Euler alignment. Provided that these two challenges were solved, BrainAlignNet mostly took care of the rest. This suggests a clear path for researchers who wish to use this approach in another animal. Nevertheless, we also agree with the reviewer’s caution that a totally different use case could require some re-thinking or re-strategizing. For example, the strategy of how to select good registration problems could depend on the form of the animal’s movement.

      Reviewer #3 (Public review):

      Context:

      Tracking cell trajectories in deformable organs, such as the head neurons of freely moving C. elegans, is a challenging task due to rapid, non-rigid cellular motion. Similarly, identifying neuron types in the worm brain is difficult because of high inter-individual variability in cell positions.

      Summary:

      In this study, the authors developed a deep learning-based approach for cell tracking and identification in deformable neuronal images. Several different CNN models were trained to: (1) register image pairs without severe deformation, and then track cells across continuous image sequences using multiple registration results combined with clustering strategies; (2) predict neuron IDs from multicolor-labeled images; and (3) perform clustering across multiple multicolor images to automatically generate neuron IDs.

      Strengths:

      Directly using raw images for registration and identification simplifies the analysis pipeline, but it is also a challenging task since CNN architectures often struggle to capture spatial relationships between distant cells. Surprisingly, the authors report very high accuracy across all tasks. For example, the tracking of head neurons in freely moving worms reportedly reached 99.6% accuracy, neuron identification achieved 98%, and automatic classification achieved 93% compared to human annotations.

      We thank the reviewer for noting these strengths of our study.

      Weaknesses:

      (1) The deep networks proposed in this study for registration and neuron identification require dataset-specific training, due to variations in imaging conditions across different laboratories. This, in turn, demands a large amount of manually or semi-manually annotated training data, including cell centroid correspondences and cell identity labels, which reduces the overall practicality and scalability of the method.

      We performed dataset-specific training for image registration and neuron identification, and we would encourage new users to do the same based on our current state of knowledge. This highlights how standardization of whole-brain imaging data across labs is an important issue for our field to address and that, without it, variations in imaging conditions could impact software utility. We refer the reviewer to an excellent study by Sprague et al. (2025) on this topic, which is cited in our study.

      However, at the same time, we wish to note that it was actually reasonably straightforward to take the BrainAlignNet approach that we initially developed in C. elegans and apply it to jellyfish. Some of the key lessons that we learned in C. elegans generalized: in both cases, it was critical to select the right registration problems to solve and to preprocess with Euler registration for good initialization. Provided that those problems were solved, BrainAlignNet could be applied to obtain high-quality registration and trace extraction. Thus, our study provides clear suggestions on how to use these tools across multiple contexts.

      (2) The cell tracking accuracy was not rigorously validated, but rather estimated using a biased and coarse approach. Specifically, the accuracy was assessed based on the stability of GFP signals in the eat-4-labeled channel. A tracking error was assumed to occur when the GFP signal switched between eat-4-negative and eat-4-positive at a given time point. However, this estimation is imprecise and only captures a small subset of all potential errors. Although the authors introduced a correction factor to approximate the true error rate, the validity of this correction relies on the assumption that eat-4 neurons are uniformly distributed across the brain - a condition that is unlikely to hold.

      We respectfully disagree with this critique. We considered the alternative suggested by the reviewer (in their private comments to the authors) of comparing against a manually annotated dataset. But this annotation would require manually linking ~150 neurons across ~1600 timepoints, which would require humans to manually link neurons across timepoints >200,000 times for a single dataset. These datasets consist of densely packed neurons rapidly deforming over time in all 3 dimensions. Moreover, a single error in linking would propagate across timepoints, so the error tolerance of such annotation would be extremely low. Any such manually labeled dataset would be fraught with errors and should not be trusted. Instead, our approach relies on a simple, accurate assumption: GFP expression in a neuron should be roughly constant over a 16min recording (after bleach correction) and the levels will be different in different neurons when it is sparsely expressed. Because all image alignment is done in the red channel, the pipeline never “peeks” at the GFP until it is finished with neuron alignment and tracking. The eat-4 promoter was chosen for GFP expression because (a) the nuclei labeled by it are scattered across the neuropil in a roughly salt-and-pepper fashion – a mixture of eat-4-positive and eat-4-negative neurons are found throughout the head; and (b) it is in roughly 40% of the neurons, giving very good overall coverage. Our view is that this approach of labeling subsets of neurons with GFP should become the standard in the field for assessing tracking accuracy – it has a simple, accurate premise; is not susceptible to human labeling error; is straightforward to implement; and, since it does not require manual labeling, is easy to scale to multiple datasets. We do note that it could be further strengthened by using multiple strains each with different ‘salt-and-pepper’ GFP expression patterns.

      (3) Figure S1F demonstrates that the registration network, BrainAlignNet, alone is insufficient to accurately align arbitrary pairs of C. elegans head images. The high tracking accuracy reported is largely due to the use of a carefully designed registration sequence, matching only images with similar postures, and an effective clustering algorithm. Although the authors address this point in the Discussion section, the abstract may give the misleading impression that the network itself is solely responsible for the observed accuracy.

      Our tracking accuracy requires (a) a careful selection of registration problems, (b) highly accurate registration of the selected registration problems, and (c) effective clustering. We extensively discussed the importance of the choosing of the registration problems in the Results section (lines 218-234 and 318-321), Discussion section (lines 704-708), and Methods section (955-970 and 1246-1250) of our paper. We also discussed the clustering aspect in the Results section (lines 247-259), Discussion section (lines 708-712), and Methods section (lines 1162-1206). In addition, our abstract states that the BrainAlignNet needs to be “incorporated into an image analysis pipeline,” to inform readers that other aspects of image analysis need to occur (beyond BrainAlignNet) to perform tracking.

      (4) The reported accuracy for neuron identification and automatic classification may be misleading, as it was assessed only on a subset of neurons labeled as "high-confidence" by human annotators. Although the authors did not disclose the exact proportion, various descriptions (such as Figure 4f) imply that this subset comprises approximately 60% of all neurons. While excluding uncertain labels is justifiable, the authors highlight the high accuracy achieved on this subset without clearly clarifying that the reported performance pertains only to neurons that are relatively easy to identify. Furthermore, they do not report what fraction of the total neuron population can be accurately identified using their methods-an omission of critical importance for prospective users.

      The reviewer raises two points here: (1) whether AutoCellLabeler accuracy is impacted by ease of human labeling; and (2) what fraction of total neurons are identified. We address them one at a time.

      Regarding (1), we believe that the reviewer overlooked an important analysis in our study. Indeed, to assess its performance, one can only compare AutoCellLabeler’s output against accurate human labels – there is simply no way around it. However, we noted that AutoCellLabeler was identifying some neurons with high confidence even when humans had low confidence or had not even tried to label the neurons (Fig. 4F). To test whether these were in fact accurate labels, we asked additional human labelers to spend extra time trying to label a random subset of these neurons (they were of course blinded to the AutoCellLabeler label). We then assessed the accuracy of AutoCellLabeler against these new human labels and found that they were highly accurate (Fig. 4H). This suggests that AutoCellLabeler has strong performance even when some human labelers find it challenging to label a neuron. However, we agree that we have not yet been able to quantify AutoCellLabeler performance on the small set of neuron classes that humans are unable to identify across datasets.

      Regarding (2), we agree that knowing how many neurons are labeled by AutoCellLabeler is critical. For example, labeling only 3 neurons per animal with 100% accuracy isn’t very helpful. We wish to emphasize that we did not omit this information: we reported the number of neurons labeled for every network that we characterized in the study, alongside the accuracy of those labels (please see Figures 4I, 5A, and 6G; Figure 4I also shows the number of human labels per dataset, which the reviewer requested). We also showed curves depicting the tradeoff between accuracy and number of neurons labeled, which fully captures how we balanced accuracy and number of neurons labeled (Figures 5D and S4A). It sounds like the reviewer also wanted to know the total number of recorded neurons. The typical number of recorded neurons per dataset can also be found in the paper in Fig. 2E.

    1. eLife Assessment

      This study makes a novel and valuable contribution by adapting step selection functions, traditionally used in animal ecology, to explore human movement and environmental risk exposure in urban slums, offering a promising framework for spatial epidemiology, particularly regarding leptospirosis. The integration of GPS telemetry with environmental data and the stratification by gender and serostatus are notable strengths that enhance the study's relevance for public health applications. The strength of evidence is compelling.

    2. Reviewer #1 (Public review):

      Summary:

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions.

      Strengths:

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis).

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings.

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes.

    3. Reviewer #2 (Public review):

      Summary:

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status.

      Strengths:

      The authors assembled a rich dataset by collecting human GPS logger data, combined with field-recorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection).

      [Editors' note: I have reviewed the authors' revised submission and confirm that they have adequately addressed the reviewers' comments for this manuscript.]

    4. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public review): 

      Summary: 

      The study investigated how individuals living in urban slums in Salvador, Brazil, interact with environmental risk factors, particularly focusing on domestic rubbish piles, open sewers, and a central stream. The study makes use of the step selection functions using telemetry data, which is a method to estimate how likely individuals move towards these environmental features, differentiating among groups by gender, age, and leptospirosis serostatus. The results indicated that women tended to stay closer to the central stream while avoiding open sewers more than men. Furthermore, individuals who tested positive for leptospirosis tended to avoid open sewers, suggesting that behavioral patterns might influence exposure to risk factors for leptospirosis, hence ensuring more targeted interventions. 

      Strengths: 

      (1) The use of step selection functions to analyze human movement represents an innovative adaptation of a method typically used in animal ecology. This provides a robust quantitative framework for evaluating how people interact with environmental risk factors linked to infectious diseases (in this case, leptospirosis). 

      (2) Detailed differentiation by gender and serological status allows for nuanced insights, which can help tailor targeted interventions and potentially improve public health measures in urban slum settings. 

      (3) The integration of real-world telemetry data with epidemiological risk factors supports the development of predictive models that can be applied in future infectious disease research, helping to bridge the gap between environmental exposure and health outcomes. 

      Weaknesses: 

      (1) The sample size for the study was not calculated, although it was a nested cohort study. 

      We thank Reviewer #1 for highlighting this weakness. We will make sure that this is explained in the next version of the manuscript. At the time of recruiting participants, we found no literature on how to perform a sample size calculation for movement studies involving GPS loggers and associated methods of analysis. Therefore, we aimed to recruit as many individuals as possible within the resource constraints of the study.  

      “Participants who were already enrolled in the cohort study were recruited to take part in the movement analysis study. At the time of recruitment, we found no published scientific studies detailing how to perform sample size calculations for research using GPS data in humans. Therefore, we opted to use convenience sampling instead. A target of 30 people per study area, balanced by gender and blind to their serological status, was chosen for this study.” [Lines 163 - 169]

      (2) The step‐selection functions, though a novel method, may face challenges in fully capturing the complexity of human decision-making influenced by socio-cultural and economic factors that were not captured in the study. 

      We agree with Reviewer #1 that this model may fail to capture the full breadth of human decisionmaking when it comes to moving through local environments. We included a section discussing the aspect of violence and how this influences residents’ choices, along with some possibilities on how to record and account for this. Although it is outside of the scope of this study, we believe that coupling these quantitative methods with qualitative studies would provide a comprehensive understanding of movement in these areas.  

      (3) The study's context is limited to a specific urban slum in Salvador, Brazil, which may reduce the generalizability of its findings to other geographical areas or populations that experience different environmental or socio-economic conditions. 

      We thank the reviewer for highlighting this limitation. We have made this more clear in the discussion section: 

      “As a result, the findings are biased towards the more represented individuals, limiting their generalisability. Additionally, all participants are from specific areas in Salvador, which may further limit the generalisability to similar contexts.” [Lines 561 - 564]

      (4) The reliance on self-reported or telemetry-based movement data might include some inaccuracies or biases that could affect the precision of the selection coefficients obtained, potentially limiting the study's predictive power. 

      We agree that telemetry data has inherent inaccuracies, which we have tried to account for by using only those data points within the study areas. We would like to clarify that there is no self-reported movement data used in this study. All movement data was collected using GPS loggers.  

      (5) Some participants with less than 50 relocations within the study area were excluded without clear justification, see line 149. 

      We found that the SSF models would not run properly if there weren’t enough relocations. Therefore, we decided to remove these individuals from the analysis. They are also removed from any descriptive statistics presented. We have now clarified this in the manuscript.  

      “Individuals with less than 50 relocations within the study area were excluded from the analysis to ensure good model convergence. Details of these excluded individuals can be found in Supplementary Material I.” [Lines 183 – 186]

      (6) Some figures are not clear (see Figure 4 A & B). 

      We have improved the resolution of the image and believe it is more clear now. Please let us know if the resolution still is not clear enough.  

      (7) No statement on conflict of interest was included, considering sponsorship of the study. 

      The conflict of interest forms for each author were sent to eLife separately. I believe these should be made available upon publication, but please reach out if these need to be re-sent.  

      Reviewer #2 (Public review): 

      Summary: 

      Pablo Ruiz Cuenca et al. conducted a GPS logger study with 124 adult participants across four different slum areas in Salvador, Brazil, recording GPS locations every 35 seconds for 48 hours. The aim of their study was to investigate step-selection models, a technique widely used in movement ecology to quantify contact with environmental risk factors for exposure to leptospires (open sewers, community streams, and rubbish piles). The authors built two different types of models based on distance and based on buffer areas to model human environmental exposure to risk factors. They show differences in movement/contact with these risk factors based on gender and seropositivity status. This study shows the existence of modest differences in contact with environmental risk factors for leptospirosis at small spatial scales based on socio-demographics and infection status. 

      Strengths: 

      The authors assembled a rich dataset by collecting human GPS logger data, combined with fieldrecorded locations of open sewers, community streams, and rubbish piles, and testing individuals for leptospirosis via serology. This study was able to capture fine-scale exposure dynamics within an urban environment and shows differences by gender and seropositive status, using a method novel to epidemiology (step selection). 

      Weaknesses: 

      Due to environmental data being limited to the study area, exposure elsewhere could not be captured, despite previous research by Owers et al. showing that the extent of movement was associated with infection risk. Limitations of step selection for use in studying human participants in an urban environment would need to be explicitly discussed. 

      The environmental factors used in the study required research teams to visit the sites and map the locations. Given that individuals travelled throughout the city of Salvador, performing this task at a large scale would be unachievable. Therefore, we limited the data to only those points within the study area boundaries to avoid any biases from interactions with unrecorded environmental factors.  

      Reviewing Editor Comments: 

      The manuscript would benefit from clearer articulation of SSF assumptions, data exclusions, and buffer choices, as well as improvements in figure clarity, to strengthen its generalizability and impact. 

      Please see replies to Reviewer #2 below regarding the assumptions (2.3), data exclusions (2.1) and buffer choices (2.2). We have improved Figure 4 clarity, please let us know if this is not sufficient.  

      Reviewer #1 (Recommendations for the authors): 

      (1) Provide comprehensive details on telemetry data collection for improved data quality and reproducibility. 

      Details for this are included under the “Methods/GPS Data” section. We have included a sentence to explain that we used to GPS device manufacturer’s software to programme them. We believe this provides enough information on how to collect the data for reproducibility, but please let us know if there is further information that we could provide.  

      “Individuals who consented to take part in this study were asked to wear GPS loggers for continuous periods of up to 48 hours, which could be repeated. The GPS loggers used were i-got U GT-600, set to record their location every 35 seconds. We used the manufacturer’s software to programme the devices. Data were collected between March and November 2022.” [Lines 172 - 176]

      (2) Check all figures and improve on clarity (see Figure 4). 

      We have updated Figure 4 and believe the resolution is better now. Please let us know if this it not the case from the readers perspective.  

      (3) Revisit sentence structures to improve readability and reduce overly complex phrasing. 

      We have reviewed the manuscript and made some changes to improve readability. 

      Reviewer #2 (Recommendations for the authors): 

      I thank Ruiz Cuenca et al. for putting together this interesting manuscript on the use of step selection functions for understanding exposure to leptospires in urban Brazil. I thoroughly enjoyed reading it and have a few suggestions that may improve the manuscript. 

      I also apologise, but I was not able to find some of the supplementary materials, for instance, Supplementary Material I. That may have been my oversight. 

      To eLife: These should have been included with the submitted manuscript file. Please let me know if it has to be resubmitted to eLife.

      (1) Descriptive statistics 

      Some more descriptive statistics would be helpful. For instance, what was the leptospirosis infection status of the six individuals who were removed due to having <50 points inside the area? As part of the analysis relies on exposure, defined as GPS locations within a 20m buffer of open sewers, community streams, and rubbish piles, it would be good to have some descriptive statistics around this. How many visits to these different sites did people make, and how did these statistics vary by study area, age, gender, and leptospirosis infection status? 

      We thank Reviewer #2 for highlighting this. Thanks to their comment, we noticed a mistake in the code which excluded more individuals from the summary statistics table than were actually excluded from the full analysis. There were only 2 individuals that had less than 50 relocations across the whole day (5 am to 9 pm) which were excluded from further analysis. The mistake has been rectified and the summary statistics updated. (see table 1)

      We have included the demographic details of excluded participants as a table in the supplementary material, which we have referenced to in the manuscript. We have also explained that the exclusion is to aid model convergence, as we found that too few relocations would result in SSF models not working properly.  

      “Individuals with less than 50 relocations within the study area were excluded from the analysis to ensure good model convergence. Details of these excluded individuals can be found in Supplementary Material I.” [Lines 183 – 186]

      We have also now included a table (Table 2),  to show more descriptive statistics of how much time individuals spent within each of the environmental buffers. 

      (2) Definitions of buffers 

      I was surprised that the authors chose a 20m buffer for each factor but 10m around the household.Could this be more clearly justified, especially given that there will be location errors in both the GPS location point and the GPS logger points? These buffers do appear quite small, particularly in an urban environment where obstruction from buildings can be expected to yield substantial GPS errors. 

      The 20 meter buffer represents an intense interaction with the point of interest. This distance was decided after visiting the sites and seeing the points of interest in person. The 10 meter buffer accounts for the size of dwellings in these areas. We have included these explanations in the new manuscript:  

      “The buffer rasters, one for each factor, were created using a 20 meter buffer around each reference point. The size of this buffer was decided after visiting the study areas and represented an area within which it could be considered a strong interaction with the point of interest.” [Lines 198 – 202]

      “Buffer rasters were also created for each individual’s household location, with a 10 meter buffer around each location.This represented space within and immediately outside each house.  This buffer size accounted for the size of dwellings in these study areas.” [Lines 205 - 208]

      (3) Assumptions of the step selection function 

      Step selection functions (SSFs) rely on a number of assumptions. Whether these assumptions are met needs to be critically discussed within the article. (For a discussion of the assumptions, I am relying on points raised in this article: Integrated step selection analysis: bridging the gap between resource selection and animal movement (2015): Tal Avgar, Jonathan R. Potts, Mark A. Lewis, Mark S. Boyce, DOI: https://doi.org/10.1111/2041-210X.12528). 

      First, SSFs typically assume each step is independent, conditional only on the previous step (Markovian process). This is violated in circular movements, for instance. Circular movements are highly likely in human movement as people will leave and return to their homes during the day. While this is partially addressed by conducting separate analyses by time of day, circular journeys can still exist within these segments. 

      Second, SSFs do not account for goal-oriented behaviour like intentional destination-seeking. So, for instance, when someone executes a plan to visit a specific stream to fetch drinking water, such behaviour is poorly approximated using SSFs because SSFs compare observed steps to random alternatives drawn from a movement kernel, assuming movement is opportunistic rather than intentional. 

      This is true of SSF that do not include movement attributes. However, in our SSF we have included both step lengths and turning angles, which, according to Avgar et al, should be enough to account for this goal-oriented behaviour. It may be clearer to call the model an integrated step selection function (iSSF), as they do in Avgar et al., which we can change in the next version of the manuscript.  

      Third, turning angles in human movement are often sharp due to regular street layout, which can violate the assumptions of SSFs, which usually assume smooth, correlated movement. 

      As this paper proposes SSFs as a novel method to measure exposure to environmentally transmitted pathogens, a discussion on the extent to which assumptions of SSFs are valid for this purpose should be included in the paper. 

      We thank Reviewer #2 for highlighting these points. We have included a section discussing these assumptions in detail: 

      “Additionally, these models have some underlying assumptions that may be violated in this study. Step-selection functions assume each step is independent, conditioned on the previous step. This can be violated by circular journeys. Although we attempted to account for these by analysing specific periods of the day, a higher temporal resolution of analysis may be needed if circular journeys are still present within each period. Another assumption is that movement is smooth through the environment. In urban environments this may not hold true, as street layouts may force sharp corners in movements. The effect of violating this assumption is not immediately clear and requires further methodological research to understand its significance. Finally, we assumed that by including movement characteristics (step lengths and turning angles) into our models, we were accounting for goal-oriented behaviour. These assumptions need to be considered in future studies that attempt to use step-selection functions to analyse human mobility.” [Lines 593 - 607]

      (4) Abstract 

      While it is highlighted in the abstract that this "study introduces a novel method for analysing human telemetry data in infectious disease research, providing critical insights for targeted interventions", I did not see any discussion about how the findings can inform interventions. 

      We thank Reviewer #2 for highlighting this. We have now removed this wording from the abstract to avoid misunderstanding.  

      (5) Effect sizes 

      It would have helped me if there had been some discussion around the size of these effects. Especially for the distance-based models, the effects seem very small. Maybe this is a misinterpretation on my part, but it would help to contextualise if the observed effect were small or large. 

      We agree with Reviewer #2 on this point and have now included a paragraph explaining that these effect sizes are indeed very small. We believe that this may be linked to the spatial scale of the rasters used (1 meter), as the selection coefficients represent changes with regards to increasing distances of 1 meter. This may not be that significant for human mobility. However, given the focus on analysing fine scale movement, we decided to keep the spatial scale of the rasters as small as possible. 

      “It is important to highlight that the effect sizes of the selection coefficients for the distance based rasters are very small and could be considered negligible. This may be linked to the spatial scale used, as these values represent increases of 1 meter. A coarser scale may have produced larger effect sizes that may have been easier to conceptualise. However, given the focus on fine-scale movement, we decided to keep this spatial scale for the analysis.” [Lines 421 - 427]

    1. eLife Assessment

      This valuable study presents a theoretical model of how punctuated mutations influence multistep adaptation, supported by empirical evidence from some TCGA cancer cohorts. This solid model is noteworthy for cancer researchers as it points to the case for possible punctuated evolution rather than gradual genomic change. However, the parametrization and systematic evaluation of the theoretical framework in the context of tumor evolution remain incomplete, and alternative explanations for the empirical observations are still plausible.

    2. Reviewer #1 (Public review):

      Summary:

      Grasper et al. present a combined analysis of the role of temporal mutagenesis in cancer, which includes both theoretical investigation and empirical analysis of point mutations in TCGA cancer patient cohorts. They find that temporally elevated mutation rates contribute to cancer fitness by allowing fast adaptation when the fitness drops (due to previous deleterious mutations). This may be relevant in the case of tumor suppressor genes (TSG), which follow the 2-hit hypothesis (i.e., biallelic 2 mutations are necessary to deactivate TS), and in cases where temporal mutagenesis occurs (e.g., high APOBEC, ROS). They provide evidence that this scenario is likely to occur in patients with some cancer types. This is an interesting and potentially important result that merits the attention of the target audience. Nonetheless, I have some questions (detailed below) regarding the design of the study, the tools and parametrization of the theoretical analysis, and the empirical analysis, which I think, if addressed, would make the paper more solid and the conclusion more substantiated.

      Strengths:

      Combined theoretical investigation with empirical analysis of cancer patients.

      Weaknesses:

      Parametrization and systematic investigation of theoretical tools and their relevance to tumor evolution.

    3. Reviewer #2 (Public review):

      This work presents theoretical results concerning the effect of punctuated mutation on multistep adaptation and empirical evidence for that effect in cancer. The empirical results seem to agree with the theoretical predictions. However, it is not clear how strong the effect should be on theoretical grounds, and there are other plausible explanations for the empirical observations.

      For various reasons, the effect of punctuated mutation may be weaker than suggested by the theoretical and empirical analyses:

      (1) The effect of punctuated mutation is much stronger when the first mutation of a two-step adaptation is deleterious (Figure 2). For double inactivation of a TSG, the first mutation--inactivation of one copy--would be expected to be neutral or slightly advantageous. The simulations depicted in Figure 4, which are supposed to demonstrate the expected effect for TSGs, assume that the first mutation is quite deleterious. This assumption seems inappropriate for TSGs, and perhaps the other synergistic pairs considered, and exaggerates the expected effects.

      (2) More generally, parameter values affect the magnitude of the effect. The authors note, for example, that the relative effect decreases with mutation rate. They suggest that the absolute effect, which increases, is more important, but the relative effect seems more relevant and is what is assessed empirically.

      (3) Routes to inactivation of both copies of a TSG that are not accelerated by punctuation will dilute any effects of punctuation. An example is a single somatic mutation followed by loss of heterozygosity. Such mechanisms are not included in the theoretical analysis nor assessed empirically. If, for example, 90% of double inactivations were the result of such mechanisms with a constant mutation rate, a factor of two effect of punctuated mutagenesis would increase the overall rate by only 10%. Consideration of the rate of apparent inactivation of just one TSG copy and of deletion of both copies would shed some light on the importance of this consideration.

      Several factors besides the effects of punctuated mutation might explain or contribute to the empirical observations:

      (1) High APOBEC3 activity can select for inactivation of TSGs (references in Butler and Banday 2023, PMID 36978147). This selective force is another plausible explanation for the empirical observations.

      (2) Without punctuation, the rate of multistep adaptation is expected to rise more than linearly with mutation rate. Thus, if APOBEC signatures are correlated with a high mutation rate due to the action of APOBEC, this alone could explain the correlation with TSG inactivation.

      (3) The nature of mutations caused by APOBEC might explain the results. Notably, one of the two APOBEC mutation signatures, SBS13, is particularly likely to produce nonsense mutations. The authors count both nonsense and missense mutations, but nonsense mutations are more likely to inactivate the gene, and hence to be selected.

    1. eLife Assessment

      This important work fills a gap in the characterization of motor architecture and chemical coupling of the male reproductive system, crucial to understanding male reproduction and fertility. The convincing analysis reveals two distinct types of glutamatergic neurons that co-release either serotonin or octopamine. While serotonergic neurons are required for male fertility, octopaminergic neurons are dispensable, indicating a division of labour. This work lays the foundations for future investigations into the conserved key principles by which multi-transmitter systems control coordinated motor outputs.

    2. Reviewer #1 (Public review):

      Summary:

      This very thorough anatomical study addresses the innervation of the Drosophila male reproductive tract. Two distinct glutamatergic neuron types were classified: serotonergic (SGNs) and octopaminergic (OGNs). By expansion microscopy, it was established that glutamate and serotonin /octopamine are co-released. The expression of different receptors for 5-HT and OA in muscles and epithelial cells of the innervation target organs was characterized. The pattern of neurotransmitter receptor expression in the target organs suggests that seminal fluid and sperm transport and emission are subjected to complex regulation. While silencing of abdominal SGNs leads to male infertility and prevents sperm from entering the ejaculatory duct, silencing of OGNs does not render males infertile.

      Strengths:

      The studied neurons were analysed with different transgenes and methods, as well as antibodies against neurotransmitter synthesis enzymes, building a consistent picture of their neurotransmitter identity. The careful anatomical description of innervation patterns together with receptor expression patterns of the target organs provides a solid basis for advancing the understanding of how seminal fluid and sperm transport and emission are subjected to complex regulation. The functional data showing that SGNs are required for male fertility and for the release of sperm from the seminal vesicle into the ejaculatory duct is convincing.

      Weaknesses:

      The functional analysis of the characterized neurons is not as comprehensive as the anatomical description, and phenotypic characterization was limited to simple fertility assays. It is understandable that a full functional dissection is beyond the scope of the present work. The paper contains experiments showing neuron-independent peristaltic waves in the reproductive tract muscles, which are thematically not very well integrated into the paper. Although very interesting, one wonders if these experiments would not fit better into a future work that also explores these peristaltic waves and their interrelation with neuromodulation mechanistically.

    3. Reviewer #2 (Public review):

      Summary:

      Cheverra et al. present a comprehensive anatomical and functional analysis of the motor neurons innervating the male reproductive tract in Drosophila melanogaster, addressing a gap in our understanding of the peripheral circuits underlying ejaculation and male fertility. They identify two classes of multi-transmitter motor neurons-OGNs (octopamine/glutamate) and SGNs (serotonin/glutamate)-with distinct innervation patterns across reproductive organs. The authors further characterize the differential expression of glutamate, octopamine, and serotonin receptors in both epithelial and muscular tissues of these organs. Behavioral assays reveal that SGNs are essential for male fertility, whereas OGNs and glutamatergic transmission are dispensable. This work provides a high-resolution map linking neuromodulatory identity to organ-specific motor control, offering a valuable framework to explore the neural basis of male reproductive function.

      Strengths:

      Through the use of an extensive set of GAL4 drivers and antibodies, this work successfully and precisely defines the neurons that innervate the male reproductive tract, identifying the specific organs they target and the nature of the neurotransmitters they release. It also characterizes the expression patterns and localization of the corresponding neurotransmitter receptors across different tissues. The authors describe two distinct groups of dual-identity neurons innervating the male reproductive tract: OGNs, which co-express octopamine and glutamate, and SGNs, which co-express serotonin and glutamate. They further demonstrate that the various organs within the male reproductive system differentially express receptors for these neurotransmitters. Based on these findings, the authors propose that a single neuron capable of co-releasing a fast-acting neurotransmitter alongside a slower-acting one may more effectively synchronize and stagger events that require precise timing. This, together with the differential expression of ionotropic glutamate receptors and metabotropic aminergic receptors in postsynaptic muscle tissue, adds an additional layer of complexity to the coordinated regulation of fluid secretion, organ contractility, and directional sperm movement-all contributing to the optimization of male fertility.

      Weaknesses:

      The main weakness of the manuscript is the lack of detail in the presentation of the results. Specifically, all microscopy image figures are missing information about the number of samples (N), and in the case of colocalization experiments, quantitative analyses are not provided. Additionally, in the first behavioral section, it would be beneficial to complement the data table with figures similar to those presented later in the manuscript for consistency and clarity.

      Wider context:

      This study delivers the first detailed anatomical map connecting multi-transmitter motor neurons with specific male reproductive structures. It highlights a previously unrecognized functional specialization between serotonergic and octopaminergic pathways and lays the groundwork for exploring fundamental neural mechanisms that regulate ejaculation and fertility in males. The principles uncovered here may help explain how males of Drosophila and other organisms adjust reproductive behaviors in response to environmental changes. Furthermore, by shedding light on how multi-transmitter systems operate in reproductive control, this model could provide insights into therapeutic targets for conditions such as male infertility and prostate cancer, where similar neuronal populations are involved in humans. Ultimately, this genetically accessible system serves as a powerful tool for uncovering how multi-transmitter neurons orchestrate coordinated physiological actions necessary for the functioning of complex organs.

    4. Reviewer #3 (Public review):

      Summary:

      This work provides an overview of the motor neuron landscape in the male reproductive system. Some work had been done to elucidate the circuits of ejaculation in the spine, as well as the cord, but this work fills a gap in knowledge at the level of the reproductive organs. Using complementary approaches, the authors show that there are two types of motor neurons that are mutually exclusive: neurons that co-express octopamine and glutamate and neurons that co-express serotonin and glutamate. They also show evidence that both types of neurons express large dense core vesicles, indicating that neuropeptides play a role in male fertility. This paper provides a thorough characterization of the expression of the different glutamate, octopamine, and serotonin receptors in the different organs and tissues of the male reproductive system. The differential expression in different tissues and organs allows building initial theories on the control of emission and expulsion. Additionally, the authors characterize the expression of synaptic proteins and the neuromuscular junction sites. On a mechanistic level, the authors show that neither octopamine/glutamate neuron transmission nor glutamate transmission in serotonin/glutamate neurons is required for male fertility. This final result is quite surprising and opens up many questions on how ejaculation is coordinated.

      Strengths:

      This work fills an important gap in the characterization of innervation of the male reproductive system by providing an extensive characterization of the motor neurons and the potential receptors of motor neuron release. The authors show convincing evidence of glutamate/monoamine co-release and of mutual exclusivity of serotonin/glutamate and octopamine/glutamate neurons.

      Weaknesses:

      (1) Often, it is mentioned that the expression is higher or lower or regional without quantification or an indication of the number of samples analysed.

      (2) The experiment aimed at tracking sperm in the male reproductive system is difficult to interpret when it is not assessed whether ejaculation has occurred.

      (3) The experiment looking at peristaltic waves in the male organs is missing labeling of the different regions and quantification of the observed waves.

    1. eLife Assessment

      This useful study uses creative scalp EEG decoding methods to attempt to demonstrate that two forms of learned associations in a Stroop task are dissociable, despite sharing similar temporal dynamics. However, the evidence supporting the conclusions is incomplete due to concerns with the experimental design and methodology. This paper would be of interest to researchers studying cognitive control and adaptive behavior, if the concerns raised in the reviews can be addressed satisfactorily.

    2. Reviewer #1 (Public review):

      Summary:

      This study focuses on characterizing the EEG correlates of item-specific proportion congruency effects. In particular, two types of learned associations are characterized. One being associations between stimulus features and control states (SC), and the other being stimulus features and responses (SR). Decoding methods are used to identify SC and SR correlates and to determine whether they have similar topographies and dynamics.

      The results suggest SC and SR associations are simultaneously coactivated and have shared topographies, with the inference being that these associations may share a common generator.

      Strengths:

      Fearless, creative use of EEG decoding to test tricky hypotheses regarding latent associations.

      Nice idea to orthogonalize the ISPC condition (MC/MI) from stimulus features.

      Weaknesses:

      (1) I'm relatively concerned that these results may be spurious. I hope to be proven wrong, but I would suggest taking another look at a few things.

      While a nice idea in principle, the ISPC manipulation seems to be quite confounded with the trial number. E.g., color-red is MI only during phase 2, and is MC primarily only during Phase 3 (since phase 1 is so sparsely represented). In my experience, EEG noise is highly structured across a session and easily exploited by decoders. Plus, behavior seems quite different between Phase 2 and Phase 3. So, it seems likely that the classes you are asking the decoder to separate are highly confounded with temporally structured noise.

      I suggest thinking of how to handle this concern in a rigorous way. A compelling way to address this would be to perform "cross-phase" decoding, however I am not sure if that is possible given the design.

      The time courses also seem concerning. What are we to make of the SR and SC timecourses, which have aggregate decoding dynamics that look to be <1Hz?

      Some sanity checks would be one place to start. Time courses were baselined, but this is often not necessary with decoding; it can cause bias (10.1016/j.jneumeth.2021.109080), and can mask deeper issues. What do things look like when not baselined? Can variables be decoded when they should not be decoded? What does cross-temporal decoding look like - everything stable across all times, etc.?

      (2) The nature of the shared features between SR and SC subspaces is unclear.

      The simulation is framed in terms of the amount of overlap, revealing the number of shared dimensions between subspaces. In reality, it seems like it's closer to 'proportion of volume shared', i.e., a small number of dominant dimensions could drive a large degree of alignment between subspaces.

      What features drive the similarity? What features drive the distinctions between SR and SC? Aside from the temporal confounds I mentioned above, is it possible that some low-dimensional feature, like EEG congruency effect (e.g., low-D ERPs associated with conflict), or RT dynamics, drives discriminability among these classes? It seems plausible to me - all one would need is non-homogeneity in the size of the congruency effect across different items (subject-level idiosyncracies could contribute: 10.1016/j.neuroimage.2013.03.039).

      (3) The time-resolved within-trial correlation of RSA betas is a cool idea, but I am concerned it is biased. Estimating correlations among different coefficients from the same GLM design matrix is, in general, biased, i.e., when the regressors are non-orthogonal. This bias comes from the expected covariance of the betas and is discussed in detail here (10.1371/journal.pcbi.1006299). In short, correlations could be inflated due to a combination of the design matrix and the structure of the noise. The most established solution, to cross-validate across different GLM estimations, is unfortunately not available here. I would suggest that the authors think of ways to handle this issue.

      (4) Are results robust to running response-locked analyses? Especially the EEG-behavior correlation. Could this be driven by different RTs across trials & trial-types? I.e., at 400 ms post-stim onset, some trials would be near or at RT/action execution, while others may not be nearly as close, and so EEG features would differ & "predict" RT.

      (5) I suggest providing more explanation about the logic of the subspace decoding method - what trialtypes exactly constitute the different classes, why we would expect this method to capture something useful regarding ISPC, & what this something might be. I felt that the first paragraph of the results breezes by a lot of important logic.

      In general, this paper does not seem to be written for readers who are unfamiliar with this particular topic area. If authors think this is undesirable, I would suggest altering the text.

    3. Reviewer #2 (Public review):

      Summary:

      In this EEG study, Huang et al. investigated the relative contribution of two accounts to the process of conflict control, namely the stimulus-control association (SC), which refers to the phenomenon that the ratio of congruent vs. incongruent trials affects the overall control demands, and the stimulus-response association (SR), stating that the frequency of stimulus-response pairings can also impact the level of control. The authors extended the Stroop task with novel manipulation of item congruencies across blocks in order to test whether both types of information are encoded and related to behaviour. Using decoding and RSA, they showed that the SC and SR representations were concurrently present in voltage signals, and they also positively co-varied. In addition, the variability in both of their strengths was predictive of reaction time. In general, the experiment has a solid design, but there are some confounding factors in the analyses that should be addressed to provide strong support for the conclusions.

      Strengths:

      (1) The authors used an interesting task design that extended the classic Stroop paradigm and is potentially effective in teasing apart the relative contribution of the two different accounts regarding item-specific proportion congruency effect, provided that some confounds are addressed.

      (2) Linking the strength of RSA scores with behavioural measures is critical to demonstrating the functional significance of the task representations in question.

      Weakness:

      (1) While the use of RSA to model the decoding strength vector is a fitting choice, looking at the RDMs in Figure 7, it seems that SC, SR, ISPC, and Identity matrices are all somewhat correlated. I wouldn't be surprised if some correlations would be quite high if they were reported. Total orthogonality is, of course, impossible depending on the hypothesis, but from experience, having highly covaried predictors in a regression can lead to unexpected results, such as artificially boosting the significance of one predictor in one direction, and the other one to the opposite direction. Perhaps some efforts to address how stable the timed-resolved RSA correlations for SC and SR are with and without the other highly correlated predictors will be valuable to raising confidence in the findings.

      (2) In "task overview", SR is defined as the word-response pair; however, in the Methods, lines 495-496, the definition changed to "the pairing between word and ISPC" which is in accordance with the values in the RDMs (e.g., mccbb and mcirb have similarity of 1, but they are linked to different responses, so should they not be considered different in terms of SR?). This needs clarification as they have very different implications for the task design and interpretation of results, e.g., how correlated the SC and SR manipulations were.

    1. eLife Assessment

      This important study used five metrics to compare the cost-effectiveness of intramural and extramural research funded by the National Institutes of Health in the United States between 2009 and 2019. They found that each type of research had its own set of strengths: extramural research was more cost-effective in terms of publications, whereas intramural research was more cost-effective in terms of influencing clinical work. The evidence supporting these findings is mostly solid, but there are a number of questions about the methods and data - notably about indirect cost recovery and other non-NIH sources of funding - that need to be answered.

    2. Reviewer #1 (Public review):

      Summary:<br /> This article carefully compares intramural vs. extramural National Institutes of Health funded research during 2009-2019, according to a variety of bibliometric indices. They find that extramural awards more cost-effectively fund outputs commonly used for academic review such as number of publications and citations per dollar, while intramural awards are more cost-effective at generating work that influences future clinical work, more closely in line with agency health goals.

      Strengths:<br /> Great care was taken in selecting and cleaning the data, and in making sure that intramural vs. extramural projects were compared appropriately. The data has statistical validation. The trends are clear and convincing.

      Weaknesses:<br /> The Discussion is too short and descriptive, and needs more perspective - why are the findings important and what do they mean? Without recommending policy, at least these should discuss possible implications for policy.

      The biggest problem I have with this submission is Figure 3, which shows a big decrease in clinical-related parameters between 2014 and 2019 in both intramural and extramural research (panels C, D and E). There is no obvious explanation for this and I did not see any discussion of this trend, but it cries out for investigation. This might, for example, reflect global changes in funding policies which might also influence the observed closing gaps between intramural and extramural research.

    3. Reviewer #2 (Public review):

      Summary:<br /> This article reports a cost-effectiveness comparison of intramural and extramural that NIH funded between 2009 and 2019. Using data obtained from NIH RePORTER, they linked total project costs to publication output, using robust validated metrics including Relative Citation Ratio (RCR), Approximate Potential to Translate (APT), and clinical citations. They find that after adjusting for confounders in regression and propensity-score analyses, extramural projects were generally more cost-effective, though intramural projects were more cost effective for generating clinical citations. They also describe differences in the topics of intramural- and extramural-funded publications, with intramural projects more likely to generate papers on viral infections and immunity or cancer metastases and survival, but less likely to generate papers on pregnancy and maternal health, brain connectivity and tasks, and adolescent experiences and depression. The authors aptly describe the different natures of the intramural and extramural funding models, including that extramural researchers spend much time writing grant applications and that the work described in extramural publications often receives funding from sources other than NIH grants.

      Strengths:<br /> The authors leveraged publicly available data (including RePORTER and the iCite repository) and used robust validated metrics (RCR, APT, clinical citations). They carefully considered a large number of confounders, including those related to the PI, and performed several well-described regression analyses.

      Weaknesses:<br /> Figure 3A shows intramural projects producing about 2.75 papers per year in 2009, whereas extramural projects are producing just over 1 paper per year. Extramural projects appear to catch up over the next five years. While the authors attempt to explain the difference in their figure legend, another explanation is that the intramural projects started well before 2009 but, as the authors state, intramural data only became available in 2009.

      As the authors note, funding information is often complex and difficult to characterize for an analysis like this. How did the authors handle: i) publications linked to multiple extramural grants; ii) publications linked to intramural and extramural grants; iii) publications linked NIH grants and non-NIH grants?<br /> I would think it necessary to somehow apportion credit, as otherwise it would appear that extramural projects are more productive than they truly are.

      Also, it is not clear if the authors took account of the indirect costs paid by the NIH to universities that have received extramural grants.

    4. Reviewer #3 (Public review):

      Summary:<br /> The manuscript "Comparing the outputs of intramural and extramural grants funded by National Institutes of Health" demonstrates a comparative study on two funding mechanisms adopted by the National Institutes of Health (NIH). The authors adopted a quantitative approach and introduced five metrics to compare the output of intramural and extramural grants. These findings reveal the impacts of intramural and extramural grants on the scientific community, providing funders with insights into the future decisions of funding mechanisms they should take.

      Strengths:<br /> The authors clearly presented their methods for processing the NIH project data and classifying projects into either intramural or extramural categories. The limitations of the study are also well-addressed.

      Weaknesses:<br /> The article would benefit from a more thorough discussion of the literature, a clearer presentation of the results (especially in the figure captions), and the inclusion of evidence to support some of the claims.

    1. eLife Assessment

      This paper presents important new findings about the impact of the TAK-003 vaccine against dengue based on a convincing reanalysis of trial data. The results corroborate those of the original trial analyses, but with reduced uncertainty about the estimates of the impact of the vaccine. The findings will be of interest to clinicians, infectious disease epidemiologists, trial statisticians and policymakers seeking to understand the vaccine's efficacy profile and associated uncertainties.

    2. Reviewer #1 (Public review):

      Summary:

      The authors reduce uncertainties in TAK-003 vaccine efficacy estimates by applying a multi-level model to all published Phase III clinical trial case data and sharing parameters across strata consistent with the data generation process. In line with our current understanding of the vaccine, they show that its efficacy depends on the serostatus and infecting serotype.

      Strengths:

      The methodology is well-described and technically sound, with clear explanations of how the authors reduce uncertainty through the model structure. The comparison of model estimates with and without independence parameter assumptions is particularly valuable. The data come from the Phase III RCT conducted over 4.5 years in 8 countries, and the study is the first to model efficacy using available country-specific data. The analysis is timely and addresses important public health questions regarding TAK-003 efficacy.

      Weaknesses:

      It is unclear whether the simulation study used to validate the model sampled from the priors (as stated in the methods) or the posterior distributions. Supplementary figures 19-28 show that sampled parameters often derive from narrower distributions than the priors, with sampled areas varying by subgroup. Sampling from posterior distributions makes the validation somewhat circular. As many parameters are estimated stratified by multiple subgroups, identifiability issues may arise. Model variations with fewer parameter dependencies could impact the resulting estimates.

      Assessment of aims and conclusions:

      The authors achieve their aims of reducing uncertainty in efficacy estimates and show that efficacy varies by serostatus and serotype. The conclusions are well-justified, although they could be strengthened by clarifying the model validation, as discussed above.

      Impact and utility:

      This work contributes valuable evidence demonstrating TAK-003's serostatus and serotype-specific efficacy and highlights remaining uncertainties in the protection or risk against DENV3/4 in seronegative individuals. The methods are well-described and would be useful to other modellers, and could be applied to additional dengue vaccines like the Butantan-DV vaccine currently under development.

      Additional context:

      Several factors may influence the estimates but cannot be addressed using public data, including the role of subclinical infections, flavivirus cross-immunity, and the imperfect use of hospitalisation as a proxy for severe disease.

    3. Reviewer #2 (Public review):

      Summary:

      In this paper, the authors used a multi-level modelling approach to reanalyse trial data from Takeda's Phase III randomised control trial investigating the efficacy of the TAK-003 vaccine against dengue. The aim of the paper is to refine uncertainty by incorporating all the available data into the model and pooling across stratifications that are correlated. A major challenge in constructing a likelihood that allows for data available at differing levels of aggregation by group and outcome, and at different time intervals. This is done by first supposing that the data is available without aggregation for all groups, outcomes and time points, and then marginalising over the aggregated levels. The model is validated using simulations and then applied to trial data from Takeda. Results appear to corroborate those of Takeda with reduced uncertainty in the estimates.

      Strengths:

      The main strength of the paper is the multi-level modelling approach. It is a particularly natural one for this setting. One reason for this, as discussed in the paper, is that correlations across stratifications can arise when there are similarities in their underlying causal structure. It is more realistic to model this nested data structure hierarchically. Another reason, also well discussed in the paper, is the reduction in uncertainty you get when you pool estimates across related groups. Multi-level modelling is also beneficial when group sizes are different. For example, there were too few cases of DENV-4 from seronegatives, which resulted in hospitalised disease for the original analysis to produce estimates, but by using multi-level modelling, this paper can produce estimates. The modelling framework developed in this paper will be simple to extend to further trial data collected in the future.

      Another strength is that it is reanalysing existing trial data, which is both cost-effective and beneficial for scientific reproducibility. This approach also helps to assess the robustness of conclusions about the efficacy of the TAK-003 vaccine to use of different analytical methods.

      The paper is well-written. The tables and figures presented in this paper are particularly informative. Protection conferred by the vaccine varies depending upon which variant a person is exposed to, their serostatus, and time since vaccination. The analysis presented supports the discussed conclusions. Comparisons between the results of this paper and the results of the original trial analysis are also shown and demonstrate a reduction in the uncertainty of parameter estimates, as desired.

      Weaknesses:

      The weakness of the paper is that it reports per-exposure protection instead of vaccine efficacy. This is methodologically sound, but it does limit the comparability of this study with the original trial analyses, which reported vaccine efficacy. It is therefore unclear whether the reduction in uncertainty observed is due solely to the multi-level modelling approach or whether it may be due in part to the parameters of interest being slightly different.

    4. Reviewer #3 (Public review):

      Summary:

      The authors provide estimates of the efficacy of the dengue vaccine, which is notoriously complex given the different serotypes and complex immunity. Through their method using publicly available data, the estimates have less uncertainty and are of use to the field in understanding the future possible impact of the vaccine.

      Strengths:

      This is an elegant analysis addressing an important question. The pooling of common factors for estimation is nice and adds strength to the analysis. It is an important analysis for the field and our understanding of the vaccine, and for the analysis of future multi-site trials for the dengue vaccine.

      Weaknesses:

      It would be useful to have more understanding of how the way the vaccine efficacy is defined here is related to the previous estimates and a greater understanding of how the estimated impact changes over time.

    1. eLife Assessment

      This study makes a valuable contribution by separating two timescales of adaptation: rapid, within block reductions in learning rate, and slower, location specific, meta-learned adjustments. Behavioural data and computational modeling converge to support both processes. The evidence is solid with neuroimaging results suggesting that meta-learned learning rates are encoded in the orbitofrontal cortex, while prediction errors are represented in a distributed network including the ventral striatum and are modulated by expected error magnitude, though the specificity of these effects requires further contextualization. The manuscript is timely and clearly written; its main limitation is the weak linkage between neural signals and behavior, leaving uncertainty over whether the reported signals play a mechanistic role in learning.

    2. Reviewer #1 (Public review):

      Summary:

      Simoens and colleagues use a continuous estimation task to disentangle learning rate adjustments on shorter and longer timescales. They show that participants rapidly decrease learning rates within a block of trials in a given "location", but that they also adjust learning rates for the very first trial based on information accrued gradually about the statistics of each location, which can be viewed as a form of metalearning. The authors show that the metalearned learning rates are represented in patterns of neural activity in the orbitofrontal cortex, and that prediction errors are represented in a constellation of brain regions, including the ventral striatum, where they are modulated by expectations about error magnitude to some degree. Overall, the work is interesting, timely, and well communicated. My primary concern with the work was that the link between the brain signals and their role in the behavior of interest was not well explored, raising some questions about the degree to which signals are really involved in the learning process, versus playing some downstream role.

      Strengths:

      The authors build on an interesting task design, allowing them to distinguish moment-to-moment adjustments in learning rate from slower adjustments in learning rate corresponding to slowly-gained knowledge about the statistics of specific "locations". Behavior and computational modeling clearly demonstrate that individuals adjust to environmental statistics in a sort of metalearning. fMRI data reveal representations of interest, including those related to adjusted learning rates and their impact on the degree of prediction error encoding in the striatum.

      Weaknesses:

      It was nice to see that the authors could distinguish differences between the OFC signals that they observed and those in the visual regions based on changes through the session. However, the linkage between these brain activations and a functional role in generating behavior was left unexplored. Without further exploration, it is hard to tell exactly what role the signals might be playing, if any, in the behavior of interest.

    3. Reviewer #2 (Public review):

      Summary:

      Across two experiments, this work presents a novel spatial predictive inference paradigm that facilitates the investigation of meta-learning across multiple environments with distinct statistics, as well as more local learning from sequences of observations within an environment. The authors present behavioral data indicating that people can indeed learn to distinguish between noise levels and calibrate their learning rates accordingly across environments, even on initial trials when revisiting an environment. They complement their behavioral results with computational modeling, further bolstering claims of both local and global adaptation. Additional fMRI results support the role of OFC in this meta-learning process, with central OFC activity reflecting similarity between environments. This similarity emerges over time with task experience. Holistically, this paradigm and these data add to our understanding of how humans dynamically adapt their behavior on different timescales.

      Strengths:

      The novel paradigm represents a clever and creative expansion of spatial predictive inference tasks. The cover story was well chosen to facilitate an intuitive understanding of both the differences between environments and the estimation of the mean within environments.

      Additionally, the authors present complementary results from two experiments, which strengthen the behavioral findings. This is especially effective as the initial experiment's results were a bit noisy, and the modifications within the second experiment increased both power and the specificity/accuracy of participant predictions. Taken together, the behavioral results provide convincing evidence that participants did distinguish environments based on their underlying statistics and adapted their initial behavior accordingly.

      Beyond this, the combination of behavioral results, computational modeling, and neuroimaging enhances the impact of the work. It paints a fuller picture of whether and how humans meta-learn the global statistics of environments, and this is an important direction for the field of adaptive learning.

      Weaknesses:

      (1) The authors make the distinction between meta-learned "global" learning rates and within-environment learning rate adaptation in response to "local" fluctuations/observations. Though the experimental paradigm is novel, there are certainly links to prior work - for instance, though change point structures don't entail revisiting unique environments, they do require meta-learning from environmental statistics that is distinct from transient local adaptation to prediction errors. This tendency to increase one's learning rate after large prediction errors is appropriate in change point environments, though, as is true in this study, the amount of increase should be dependent on. This represents a similar kind of slower-timescale learning or reuse of more "global" parameters, and can be seen to different extents in prior work. It might benefit readers if the authors were to link the current work to previous research more explicitly to draw clearer connections between the approaches and findings.

      (2) Throughout much of the paper, the authors refer to the distinctions between environments primarily as differences in "initial learning rates" or "environment-specific learning rates." This is particularly prominent when discussing fMRI results. Though the optimal initial learning rate did differ across environments, this was the result of differences in underlying task statistics. It will be important to clarify this throughout the text, because of the confounds between task statistics and initial learning rate (and to some extent, the position on the screen), it is not possible to separate the impact of these specific variables. This is also relevant to understanding the justification for using methods like RSA to test whether brain regions represent task states similarly. If the main hypothesis is that neural activity reflects the (initial) learning rate itself, then a univariate analysis approach would seem more natural.

      (3) For the neuroimaging results in particular, the specificity of some of the results (e.g. ventral striatum showing an effect of prediction error only in the low noise condition in the second half of task experience, only on the first trial) is a bit surprising. Additional justification of or context for these results would be useful to help readers gauge how expected or surprising these findings are.

      (4) There are some methodological details that are unclear (e.g., how were the positions of the crabs selected relative to the location they emerged from? Looking at Figure 1C, it looks like the crabs spread out unevenly, and that the single position they emerge from is not necessarily at the center of the crab locations.) Additional detail and clarity would help address some unanswered questions (more details below).

    1. eLife Assessment

      The authors make an important advance in enzyme annotation by fusing biochemical knowledge with language‑model-based learning to predict catalytic residues from sequence alone. Squidly, a new ML method, outperforms existing tools on standard benchmarks and on the CataloDB dataset. The work has solid support, yet clarifications on dataset biases, ablation analyses, and uncertainty filtering would strengthen its efficiency claims.

    2. Reviewer #1 (Public review):

      In this well-written and timely manuscript, Rieger et al. introduce Squidly, a new deep learning framework for catalytic residue prediction. The novelty of the work lies in the aspect of integrating per-residue embeddings from large protein language models (ESM2) with a biology-informed contrastive learning scheme that leverages enzyme class information to rationally mine hard positive/negative pairs. Importantly, the method avoids reliance on the use of predicted 3D structures, enabling scalability, speed, and broad applicability. The authors show that Squidly outperforms existing ML-based tools and even BLAST in certain settings, while an ensemble with BLAST achieves state-of-the-art performance across multiple benchmarks. Additionally, the introduction of the CataloDB benchmark, designed to test generalization at low sequence and structural identity, represents another important contribution of this work.

      I have only some minor comments:

      (1) The manuscript acknowledges biases in EC class representation, particularly the enrichment for hydrolases. While CataloDB addresses some of these issues, the strong imbalance across enzyme classes may still limit conclusions about generalization. Could the authors provide per-class performance metrics, especially for underrepresented EC classes?

      (2) An ablation analysis would be valuable to demonstrate how specific design choices in the algorithm contribute to capturing catalytic residue patterns in enzymes.

      (3) The statement that users can optionally use uncertainty to filter predictions is promising but underdeveloped. How should predictive entropy values be interpreted in practice? Is there an empirical threshold that separates high- from low-confidence predictions? A demonstration of how uncertainty filtering shifts the trade-off between false positives and false negatives would clarify the practical utility of this feature.

      (4) The excerpt highlights computational efficiency, reporting substantial runtime improvements (e.g., 108 s vs. 5757 s). However, the comparison lacks details on dataset size, hardware/software environment, and reproducibility conditions. Without these details, the speedup claim is difficult to evaluate. Furthermore, it remains unclear whether the reported efficiency gains come at the expense of predictive performance.

      (5) Given the well-known biases in public enzyme databases, the dataset is likely enriched for model organisms (e.g., E. coli, yeast, human enzymes) and underrepresents enzymes from archaea, extremophiles, and diverse microbial taxa. Would this limit conclusions about Squidly's generalisability to less-studied lineages?

    3. Reviewer #2 (Public review):

      Summary:

      The authors aim to develop Squidly, a sequence-only catalytic residue prediction method. By combining protein language model (ESM2) embedding with a biologically inspired contrastive learning pairing strategy, they achieve efficient and scalable predictions without relying on three-dimensional structure. Overall, the authors largely achieved their stated objectives, and the results generally support their conclusions. This research has the potential to advance the fields of enzyme functional annotation and protein design, particularly in the context of screening large-scale sequence databases and unstructured data. However, the data and methods are still limited by the biases of current public databases, so the interpretation of predictions requires specific biological context and experimental validation.

      Strengths:

      The strengths of this work include the innovative methodological incorporation of EC classification information for "reaction-informed" sample pairing, thereby enhancing the discriminative power of contrastive learning. Results demonstrate that Squidly outperforms existing machine learning methods on multiple benchmarks and is significantly faster than structure prediction tools, demonstrating its practicality.

      Weaknesses:

      Disadvantages include the lack of a systematic evaluation of the impact of each strategy on model performance. Furthermore, some analyses, such as PCA visualization, exhibit low explained variance, which undermines the strength of the conclusions.

    1. Author response:

      We thank the reviewers for their constructive feedback on the article’s strengths and weaknesses. In response, we plan to strengthen our work in a revised version by (i) providing an additional example of our method’s implementation and (ii) framing our contribution more clearly as a continuation of the line of research that characterises neuronal models in terms of their bifurcation structure.

      Experimental validation, however, is beyond the scope of this study. Constructing experimental bifurcation diagrams remains a major challenge, particularly for unstable branches. Although some techniques exist to approximate branches of unstable steady states, unstable limit cycles are far more difficult to capture. Additionally, in practice, many factors vary during recordings, and generating reliable diagrams would require a large number of tightly controlled experimental repetitions whose stability often cannot be ensured. Two-dimensional bifurcation diagrams, as needed for the analysis in our manuscript, are even more challenging to obtain because the extensive and stable recordings would have to be available from the same cell at different values of the second parameter (such as different extracellular potassium concentrations). At this stage, our method can be applied to the reduction of detailed conductance-based models, which themselves are constrained by experimental data (for example, gating functions fitted to voltage-clamp recordings). This way, simple yet dynamically faithful phenomenological models for efficient use in network analysis and simulation can be derived from more complex, biophysical models. In contrast to the traditional voltage fitting approach, these models can also capture changes in additional parameters (such as extracellular potassium concentration).

    2. Reviewer #2 (Public review):

      Summary:

      The authors derive an integrate-and-fire model to describe the dynamics of a more complex Wang-Buzsaki model and compare the two models. A detailed discussion of bifurcation schemes in both models is convincing and allows us to evaluate the simpler model.

      Strengths:

      The idea is interesting, and the mathematical approach appears to be convincing. In addition, differences between the simple and original models are also discussed.

      Weaknesses:

      A comparison to experimental data is necessary to support the theoretical work.

    3. Reviewer #1 (Public review):

      Summary:

      From a big picture viewpoint, this work aims to provide a method to fit parameters of reduced models for neural dynamics so that the resulting tuned model has a bifurcation diagram that matches that of a more complex, computationally expensive model. The matching of bifurcation diagrams ensures that the model dynamics agree on a region of parameter space, rather than just at specially tuned values, and that the models share properties such as qualitative features of their phase response curves, as the authors demonstrate. A notable point is the inclusion of extracellular potassium concentration dynamics into the reduced model - here, the quadratic integrate-and-fire model; this is straightforward but nonetheless useful for studying certain phenomena.

      Strengths:

      The paper demonstrates the method specifically on the fitting of the quadratic integrate-and-fire model, with potassium concentration dynamics included, to the Wang-Buzsaki model extended to include the potassium component. The method works very well overall in this instance. The resulting model is thoroughly compared with the original, in terms of bifurcation diagrams, production of various activity patterns, phase response curves, and associated phase-locking and synchronization properties.

      Weaknesses:

      It is important to note that the proposed method requires that a target bifurcation diagram be known. In practical terms, this means that the method may be well suited to fitting a reduced model to another, more complicated model, but is not likely to be useful for fitting the model to data. Certainly, the authors did not illustrate any such application. Secondly, the authors do not provide any sort of general algorithm but rather give a demonstration of a single example of fitting one specific reduced model to one specific conductance-based model. Finally, the main idea of the paper seems to me to be a natural descendant of the chain of reasoning, starting from Rinzel - continuing through Bertram; Golubitsky/Kaper/Josic; Izhikevich; and others - that a fundamental way to think about neuronal models, especially those involving bursting dynamics, is in terms of their bifurcation structure. According to this line of reasoning, two models are "the same" if they have the same bifurcation structure. Thus, it becomes natural to fit a reduced model to a more complicated model based on the bifurcation structure. The authors deserve credit for recognizing and implementing this step, and their work may be a useful example to the community. But the manuscript should have described and cited this chain of works to put the current study in the correct context.

    4. eLife Assessment

      This work demonstrates an objective way to select parameter values for a quadratic integrate-and-fire model so that its bifurcation diagram matches a specific target diagram, generated from the Wang-Buzsaki model. The method is useful for the field and is presented with convincing evidence. The method is currently limited in its ability to be applied to data, but improves our mathematical tools to treat a rarely studied type of bifurcation.

    1. eLife Assessment

      In this important study, the authors conducted extensive atomistic and coarse-grained simulations as well as a lattice Monte Carlo analysis to probe the driving force and functional impact of supercomplex formation in the inner mitochondrial membrane. The study highlighted the major contribution from membrane mechanics to the supercomplex formation and revealed interesting differences in structural and dynamical features of the protein components upon complex formation. Upon revision, the analysis is considered solid, although the magnitude of estimated membrane deformation energies seem somewhat surprisingly large. Overall, the study is thorough, creative and the impact on the field of bioenergetics is expected to be significant.

    2. Reviewer #1 (Public review):

      This paper by Poverlein et al reports the substantial membrane deformation around the oxidative phosphorylation super complex, proposing that this deformation is a key part of super complex formation. I found the paper interesting and well-written.

      * Analysis of the bilayer curvature is challenging on the fine lengthscales they have used and produces unexpectedly large energies (Table 1). Additionally, the authors use the mean curvature (Eq. S5) as input to the (uncited, but it seems clear that this is Helfrich) Helfrich Hamiltonian (Eq. S7). If an errant factor of one half has been included with curvature, this would quarter the curvature energy compared to the real energy, due to the squared curvature. The bending modulus used (ca. 5 kcal/mol) is small on the scale of typically observed biological bending moduli. This suggests the curvature energies are indeed much higher even than the high values reported. Some of this may be due to the spontaneous curvature of the lipids and perhaps the effect of the protein modifying the nearby lipids properties.

      * It is unclear how CDL is supporting SC formation if its effect stabilizing the membrane deformation is strong or if it is acting as an electrostatic glue. While this is a weakenss for a definite quantification of the effect of CDL on SC formation, the study presents an interesting observation of CDL redistribution and could be an interesting topic for future work.

      In summary, the qualitative data presented are interesting (especially the combination of molecular modeling with simpler Monte Carlo modeling aiding broader interpretation of the results). The energies of the membrane deformations are quite large. This might reflect the roles of specific lipids stabilizing those deformations, or the inherent difficulty in characterizing nanometer-scale curvature.

    3. Reviewer #3 (Public review):

      Summary:

      In this contribution, the authors report atomistic, coarse-grained and lattice simulations to analyze the mechanism of supercomplex (SC) formation in mitochondria. The results highlight the importance of membrane deformation as one of the major driving forces for the SC formation, which is not entirely surprising given prior work on membrane protein assembly, but certainly of major mechanistic significance for the specific systems of interest.

      Strengths:

      The combination of complementary approaches, including an interesting (re)analysis of cryo-EM data, is particularly powerful, and might be applicable to the analysis of related systems. The calculations also revealed that SC formation has interesting impacts on the structural and dynamical (motional correlation) properties of the individual protein components, suggesting further functional relevance of SC formation. In the revision, the authors further clarified and quantified their analysis of membrane responses, leading to further insights into membrane contributions. They have also toned down the decomposition of membrane contributions into enthalpic and entropic contributions, which is difficult to do. Overall, the study is rather thorough, highly creative and the impact on the field is expected to be significant.

      Weaknesses:

      Upon revision, I believe the weakness identified in previous work has been largely alleviated.

    1. eLife Assessment

      This is an important study describing the morphological changes during boundary formation between sensory and non-sensory tissues of the inner ear. The authors provided solid evidence that a transcription factor, Lmx1a and ROCK-dependent actinomyosin are key for border formation in the inner ear. However, future studies will be needed to investigate the direct relationships among boundary formation, Lmx1a and ROCK. This work will be of interest to developmental biologists interested in boundary formation.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript investigated the mechanism underlying boundary formation necessary for proper separation of vestibular sensory end organs. In both chick and mouse embryos, it was shown that a population of cells abutting the sensory (marked by high Sox2 expression) /nonsensory cell populations (marked by Lmx1a expression) undergo apical expansion, elongation, alignment and basal constriction to separate the lateral crista (LC) from the utricle. Using Lmx1a mouse mutant, organ cultures, pharmacological and viral-mediated Rock inhibition, it was demonstrated that the Lmx1a transcription factor and Rock-mediated actomyosin contractility is required for boundary formation and LC-utricle separation.

      Strengths:

      Overall, the morphometric analyses were done rigorously and revealed novel boundary cell behaviors. The requirement of Lmx1a and Rock activity in boundary formation was convincingly demonstrated.

      Weaknesses:

      However, the precise roles of Lmx1a and Rock in regulating cell behaviors during boundary formation were not clearly fleshed out. For example, phenotypic analysis of Lmx1a was rather cursory; it is unclear how Lmx1a, expressed in half of the boundary domain, control boundary cell behaviors and prevent cell mixing between Lmx1a+ and Lmx1a- compartments? Well-established mechanisms and molecules for boundary formation were not investigated (e.g. differential adhesion via cadherins, cell repulsion via ephrin-Eph signaling). Moreover, within the boundary domain, it is unclear whether apical multicellular rosettes and basal constrictions are drivers of boundary formation, as boundary can still form when these cell behaviors were inhibited. Involvement of other cell behaviors, such as directional cell intercalation and oriented cell division also warrant consideration. With these lingering questions, the mechanistic advance of the present study is modest.

      Revision: The clarity of the text was improved. The open questions regarding the mechanisms were not experimentally addressed but discussed.

    3. Reviewer #3 (Public review):

      Summary:

      Lmx1a is an orthologue of apterous in flies, which is important for dorsal-ventral border formation in the wing disc. Previously, this research group has described the importance of the chicken Lmx1b in establishing the boundary between sensory and non-sensory domains in the chicken inner ear. Here, the authors described a series of cellular changes during border formation in the chicken inner ear, including alignment of cells at the apical border and concomitant constriction basally. The authors extended these observations to the mouse inner ear and showed that these morphological changes occurred at the border of Lmx1a positive and negative regions, and these changes failed to develop in Lmx1a mutants. Furthermore, the authors demonstrated that the ROCK-dependent actomyosin contractility is important for this border formation and blocking ROCK function affected epithelial basal constriction and border formation in both in vitro and in vivo systems.

      Strengths:

      The morphological changes described during border formation in the developing inner ear are interesting. Linking these changes to the function of Lmx1a and ROCK dependent actomyosin contractile function are provocative.

      Weaknesses:

      There are several outstanding issues that need to be clarified before one can pin the morphological changes observed being causal to border formation and that Lmx1a and ROCK are involved.

      Comments on the latest version:

      The revised manuscript has provided clarity of their results on some levels, but unfortunately, the basal restriction during border formation remains unclear and the study did not advance the understanding of role of Lmx1a in boundary formation. Overall comments are indicated below:

      (1) The authors states in the rebuttal, "we do not think that ROCK activity is required for the formation or maintenance of the basal constriction at the interface of Lmx1a-expressing and non-expressing cells"<br /> If the above is the sentiment of the authors, then the manuscript is not written to support this sentiment clearly, starting with this misleading sentence in the Abstract, "The boundary domain is absent in Lmx1a-deficient mice, which exhibit defects in sensory organ segregation, and is disrupted by the inhibition of ROCK-dependent actomyosin contractility."

      (2) As acknowledged by the authors, the data as they currently stand could be explained by Lmx1a functioning in specifying the non-sensory fate and may not function directly in boundary formation. With this caveat in mind, the role of Lmx1a in boundary formation remains unclear.

      (3) I feel like the word "orchestrate" in the title is an overstatement.

    1. eLife Assessment

      This valuable study expands the inventory of polyadenylated RNAs cleaved by the double-stranded RNA endonuclease Rnt1 in budding yeast, using solid methodology based on high-throughput sequencing. Previous studies had anecdotally discovered mRNA substrates, and this global characterization is comprehensive with multiple complementary controls. This study sets the stage for deeper investigations into the biological function of Rnt1 and substrate cleavage.

    2. Reviewer #1 (Public review):

      Sarpaning et al. provide a thorough characterization of putative Rnt1 cleavage of mRNA in S. cerevisiae. Previous studies have discovered Rnt1 mRNA substrates anecdotally, and this global characterization expands the known collection of putative Rnt1 cleavage sites. The study is comprehensive, with several types of controls to show that Rnt1 is required for several of these cleavages.

      Comments on revisions:

      The authors have responded appropriately to the review.

    3. Reviewer #2 (Public review):

      This study presents a useful inventory of polyadenylated RNAs cleaved by the double-stranded RNA endonuclease Rnt1 in yeast. The data were obtained with solid methodology based on high-throughput sequencing, and the evidence that Rnt1 contributes to cellular homeostasis through controlling the turnover of selected mRNAs is convincing.

      Comments on revisions:

      I appreciate the authors' thorough and thoughtful response, and I find that the manuscript has been substantially strengthened by the additional data, analyses, and textual clarifications.

    1. eLife Assessment

      This study combines mathematical models and experimental data to analyse the emergence of heterogeneity within clonal NK cell responses during antigen-specific cell expansion. It comprises different experimental data and extensively explores various mathematical models, to study NK cell turnover during acute immune responses and homeostatic turnover within murine cytomegalovirus infection (MCMV). The solid study presents valuable findings and provides insights on heterogeneous NK cell development

    2. Reviewer #1 (Public review):

      Summary:

      The objective of this study was to infer the population dynamics (rates of differentiation, division and loss) and lineage relationships of NK cell subsets during an acute immune response and under homeostatic conditions.

      Strengths:

      A rich dataset and a detailed analysis of a particular class of stochastic models.

      Weaknesses: (relating to initial submission)

      The stochastic models used are quite simple; each population is considered homogeneous with first-order rates of division, death, and differentiation. In Markov process models such as these there is no dependence of cellular behavior on its history of divisions. In recent years models of clonal expansion and diversification, in the settings of T and B cells, have progressed beyond this picture. So I was a little surprised that there was no mention of the literature exploring the role of replicative history in differentiation (e.g. Bresser Nat Imm 2022), nor of the notion of family 'division destinies' (either in division number, or the time spent proliferating, as described by the Cyton and Cyton2 models developed by Hodgkin and collaborators; e.g. Heinzel Nat Imm 2017). The emerging view is that variability in clone (family) size arises may arise predominantly from the signals delivered at activation, which dictate each precursor's subsequent degree of expansion, rather than from the fluctuations deriving from division and death modeled as Poisson processes.

      As you pointed out, the Gerlach and Buchholz Science papers showed evidence for highly skewed distributions of family sizes, and correlations between family size and phenotypic composition. Is it possible that your observed correlations could arise if the propensity for immature CD27+ cells to differentiate into mature CD27- cells increases with division number? The relative frequency of the two populations would then also be impacted by differences in the division rates of each subset - one would need to explore this. But depending on the dependence of the differentiation rate on division number, there may be parameter regimes (and timepoints) at which the more differentiated cells can predominate within large clones even if they divide more slowly than their immature precursors. One might not then be able to rule out the two-state model. I would like to see a discussion or rebuttal of these issues.

      Comments on revisions:

      The authors have put in a lot of effort to address the reviews and have explored alternative models carefully.

      In the sections relating to homeostasis and the endogenous response, as far as I can tell you are estimating net growth rates (the k parameters) throughout - this is to be expected if you're working with just cell numbers and no information relating to proliferation. In these sections there are many places where you refer to proliferation rates and death rates when I think you just mean net positive or net negative growth rates. It's important to be precise about this even if the language can get a bit repetitive. (These net rates of growth or loss relate to clonal rather than cellular dynamics, which may be worth explaining). Later, you do use data relating to dead cells, which in principle can be used to get independent measures of death rates, but these data were not used in the fitting.

      There is so much evidence that T and B cell differentiation are often contingent on division that it would be very reasonable to consider it as a possibility for NK cells too. (Differentiation could be asymmetric, as you explored, or simply symmetric with some probability per division). These processes can be cast into simple ODE models but no longer allow you to aggregate division and death rates - so for parameter estimation you need to add measures of proliferation (Ki67 or similar) or death. This may be worth some discussion?

    3. Reviewer #2 (Public review):

      Summary:

      Wethington et al. investigated the mechanistic principles underlying antigen-specific proliferation and memory formation in mouse natural killer (NK) cells following exposure to mouse cytomegalovirus (MCMV), a phenomenon predominantly associated with CD8+ T cells. Using a stochastic modeling approach, the authors aimed to develop a quantitative model of NK cell clonal dynamics during MCMV infection. Starting from a single immature Ly49+CD27+ NK cell, a two-state linear model (with a death variant) explained the negative correlation between clone size at 8 dpi and the CD27+ fraction, but failed to reproduce the first and second moments of CD27+ and CD27− NK cell populations at 8 dpi. To address this limitation, the authors added an intermediate maturation state, yielding a three-stage model (CD27+Ly6C− → CD27−Ly6C− → CD27−Ly6C+) that fits the first and second moments under two constraints: CD27+ NK cells proliferate faster than CD27− NK cells, and clone size is negatively correlated with the CD27+ fraction (upper bound of −0.2). The model predicts high proliferation in the intermediate state and high death in mature CD27−Ly6C+ cells, and it was validated using Adams et al. (2021) NK reporter mice tracking CD27+/− populations after tamoxifen, allowing discrimination between bone marrow-derived and pre-existing peripheral NK cells. To test the prediction that mature CD27− NK cells have a higher death rate, the authors measured Ly49H+ NK cell viability in the mouse spleen at different time points post-MCMV infection. Data confirmed lower viability of mature (CD27−) than immature (CD27+) cells during days 4-8 post-infection, and a model variant supported that higher CD27− death increases their proportion in the dead cell compartment. Altogether, the authors propose a three-stage quantitative model of antigen-specific expansion and maturation of naïve Ly49H+ NK cells with the trajectory CD27+Ly6C− (immature) → CD27−Ly6C− (mature I) → CD27−Ly6C+ (mature II), highlighting high proliferation in the mature I state and increased death in the mature II state.

      Strengths:

      Models explaining correlations and first and second moments, supported by analytical investigations, stochastic simulations, and model selection, identify key processes in antigen-specific NK expansion and maturation. The work distinguishes expansion, contraction, and memory in NK cells from CD8+ T cells and informs NK therapy development.

      Weaknesses (relating to initial submission):

      The conclusions of this paper are largely supported by the available data. However, a comparative analysis with more recent works in the field would be desirable. Clarifications:

      (1) Initial Conditions and Grassmann Data: The Grassmann data is used solely as a constraint, while the simulated values of CD27+/CD27− cells could have been directly fitted to the Grassmann data, which assumes a 1:1 ratio of CD27+/CD27− at t = 0. This would allow an alternative initial condition rather than starting from a single CD27+ cell.

      (2) Correlation Coefficients in the Three-State Model: Although the parameter scan of the three-stage model (Figure 2) demonstrates the potential for negative correlations between colony size and the fraction of CD27+ cells, the calculated correlation coefficients using the fitted parameter values are not shown. Including these would validate that the fitted parameters lie in the negative-correlation regime.

      (3) Viability Dynamics and Adaptive Response: The authors measured the time evolution of CD27+/− dynamics and viability over 30 days post-infection (Figure 4). It would be valuable to test whether the three-state model can reproduce the adaptive response of CD27− cells to MCMV infection, particularly the observed drop in CD27− viability at 5 dpi and its rebound at 8 dpi. Demonstrating this would test whether the model can simultaneously explain viability dynamics and moment dynamics, and would enable sensitivity analysis of CD27− viability with respect to model parameters.

    1. eLife Assessment

      This study combines genetic, cell biological, and interaction data to propose a model of meiotic double-strand break regulation in C. elegans. Solid evidence supports the main conclusions, while by nature of a screening-type study, more may be needed to solidify speculations in future studies. Yet, comprehensive cataloging of the physical and genetic interactions of factors required for meiotic double-strand break is useful information for the field.

    2. Reviewer #1 (Public review):

      Summary:

      The manuscript by Raices et al., provides some novel insights into the role and interactions between SPO-11 accessory proteins in C. elegans. The authors propose a model of meiotic DSBs regulation, critical to our understanding of DSB formation and ultimately crossover regulation and accurate chromosome segregation. The work also emphasizes the commonalities and species-specific aspects of DSB regulation.

      Strengths:

      This study capitalizes on the strengths of the C. elegans system to uncover genetic interactions between a lSPO-11 accessory proteins. In combination with physical interactions, the authors synthesize their findings into a model, which will serve as the basis for future work, to determine mechanisms of DSB regulation.

      Weaknesses:

      The methodology, although standard, still lacks some rigor, especially with the IPs.

    3. Reviewer #2 (Public review):

      Summary:

      Meiotic recombination initiates with the formation of DNA double-strand break (DSB) formation, catalyzed by the conserved topoisomerase-like enzyme Spo11. Spo11 requires accessory factors that are poorly conserved across eukaryotes. Previous genetic studies have identified several proteins required for DSB formation in C. elegans to varying degrees; however, how these proteins interact with each other to recruit the DSB-forming machinery to chromosome axes remains unclear.

      In this study, Raices et al. characterized the biochemical and genetic interactions among proteins that are known to promote DSB formation during C. elegans meiosis. The authors examined pairwise interactions using yeast two-hybrid (Y2H) and co-immunoprecipitation and revealed an interaction between a chromatin-associated protein HIM-17 and a transcription factor XND-1. They further confirmed the previously known interaction between DSB-1 and SPO-11 and showed that DSB-1 also interacts with a nematode-specific HIM-5, which is essential for DSB formation on the X chromosome. They also assessed genetic interactions among these proteins, categorizing them into four epistasis groups by comparing phenotypes in double vs. single mutants. Combining these results, the authors proposed a model of how these proteins interact with chromatin loops and are recruited to chromosome axes, offering insights into the process in C. elegans compared to other organisms.

      Weaknesses:

      This work relies heavily on Y2H, which is notorious for having high rates of false positives and false negatives. Although the interactions between HIM-17 and XND-1 and between DSB-1 and HIM-5 were validated by co-IP, the significance of these interactions was not tested in vivo. Cataloging Y2H and genetic interactions does not yield much more insight. The model proposed in Figure 4 is also highly speculative.

    4. Reviewer #3 (Public review):

      The goal of this work is to understand the regulation of double-strand break formation during meiosis in C. elegans. The authors have analyzed physical and genetic interactions among a subset of factors that have been previously implicated in DSB formation or the number of timing of DSBs: CEP-1, DSB-1, DSB-2, DSB-3, HIM-5, HIM-17, MRE-11, REC-1, PARG-1, and XND-1.

      The 10 proteins that are analyzed here include a diverse set of factors with different functions, based on prior analyses in many published studies. The term "Spo11 accessory factors" has been used in the meiosis literature to describe proteins that directly promote Spo11 cleavage activity, rather than factors that are important for the expression of meiotic proteins or that influence the genome-wide distribution or timing of DSBs. Based on this definition, the known SPO-11 accessory factors in C. elegans include DSB-1, DSB-2, DSB-3, and the MRN complex (at least MRE-11 and RAD-50). These are all homologs of proteins that have been studied biochemically and structurally in other organisms. DSB-1 & DSB-2 are homologs of Rec114, while DSB-3 is a homolog of Mei4. Biochemical and structural studies have shown that Rec114 and Mei4 directly modulate Spo11 activity by recruiting Spo11 to chromatin and promoting its dimerization, which is essential for cleavage. The other factors analyzed in this study affect the timing, distribution, or number of RAD-51 foci, but they likely do so indirectly. As elaborated below, XND-1 and HIM-17 are transcription factors that modulate the expression of other meiotic genes, and their role in DSB formation is parsimoniously explained by this regulatory activity. The roles of HIM-5 and REC-1 remain unclear; the reported localization of HIM-5 to autosomes is consistent with a role in transcription (the autosomes are transcriptionally active in the germline, while the X chromosome is largely silent), but its loss-of-function phenotypes are much more limited than those of HIM-17 and XND-1, so it may play a more direct role in DSB formation. The roles of CEP-1 (a Rad53 homolog) and PARG-1 are also ambiguous, but their homologs in other organisms contribute to DNA repair rather than DSB formation.

      An additional significant limitation of the study, as stated in my initial review, is that much of the analysis here relies on cytological visualization of RAD-51 foci as a proxy for DSBs. RAD-51 associates transiently with DSB sites as they undergo repair and is thus limited in its ability to reveal details about the timing or abundance of DSBs since its loading and removal involve additional steps that may be influenced by the factors being analyzed.

      The paper focuses extensively on HIM-5, which was previously shown through genetic and cytological analysis to be important for breaks on the X chromosome. The revised manuscript still claims that "HIM-5 mediates interactions with the different accessory factors sub-groups, providing insights into how components on the DNA loops may interact with the chromosome axis." The weak interactions between HIM-5 and DSB-1/2 detected in the Y2H assay do not convincingly support such a role. The idea that HIM-5 directly promotes break formation is also inconsistent with genetic data showing that him-5 mutants lack breaks on the X chromosomes, while HIM-5 has been shown to be is enriched on autosomes. Additionally, as noted in my comment to the authors, the localization data for HIM-5 shown in this paper are discordant with prior studies; this discrepancy should be addressed experimentally.

      This paper describes REC-1 and HIM-5 as paralogs, based on prior analysis in a paper that included some of the same authors (Chung et al., 2015; DOI 10.1101/gad.266056.115). In my initial review I mentioned that this earlier conclusion was likely incorrect and should not be propagated uncritically here. Since the authors have rebutted this comment rather than amending it, I feel it is important to explain my concerns about the conclusions of previous study. Chung et al. found a small region of potential homology between the C. elegans rec-1 and him-5 genes and also reported that him-5; rec-1 double mutants have more severe defects than either single mutant, indicative of a stronger reduction in DSBs. Based on these observations and an additional argument based on microsynteny, they concluded that these two genes arose through recent duplication and divergence. However, as they noted, genes resembling rec-1 are absent from all other Caenorhabditis species, even those most closely related to C. elegans. The hypothesis that two genes are paralogs that arose through duplication and divergence is thus based on their presence in a single species, in the absence of extensive homology or evidence for conserved molecular function. Further, the hypothesis that gene duplication and divergence has given rise to two paralogs that share no evident structural similarity or common interaction partners in the few million years since C. elegans diverged from its closest known relatives is implausible. In contrast, DSB-1 and DSB-2 are both homologs of Rec114 that clearly arose through duplication and divergence within the Caenorhabditis lineage, but much earlier than the proposed split between REC-1 and HIM-5. Two genes that can be unambiguously identified as dsb-1 and dsb-2 are present in genomes throughout the Elegans supergroup and absent in the Angaria supergroup, placing the duplication event at around 18-30 MYA, yet DSB-1 and DSB-2 share much greater similarity in their amino acid sequence, predicted structure, and function than HIM-5 and REC-1. Further, Raices place HIM-5 and REC-1 in different functional complexes (Figure 3B).

      The authors acknowledge that HIM-17 is a transcription factor that regulates many meiotic genes. Like HIM-17, XND-1 is cytologically enriched along the autosomes in germline nuclei, suggestive of a role in transcription. The Reinke lab performed ChIP-seq in a strain expressing an XND-1::GFP fusion protein and showed that it binds to promoter regions, many of which overlap with the HIM-17-regulated promoters characterized by the Ahringer lab (doi: 10.1126/sciadv.abo4082). Work from the Yanowitz lab has shown that XND-1 influences the transcription of many other genes involved in meiosis (doi: 10.1534/g3.116.035725) and work from the Colaiacovo lab has shown that XND-1 regulates the expression of CRA-1 (doi: 10.1371/journal.pgen.1005029). Additionally, loss of HIM-17 or XND-1 causes pleiotropic phenotypes, consistent with a broad role in gene regulation. Collectively, these data indicate that XND-1 and HIM-17 are transcription factors that are important for the proper expression of many germline-expressed genes. Thus, as stated above, the roles of HIM-17 and XND-1 in DSB formation, as well as their effects on histone modification, are parsimoniously explained by their regulation of the expression of factors that contribute more directly to DSB formation and chromatin modification. I feel strongly that transcription factors should not be described as "SPO-11 accessory factors."

    5. Author response:

      The following is the authors’ response to the original reviews

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Raices et al., provides novel insights into the role and interactions between SPO-11 accessory proteins in C. elegans. The authors propose a model of meiotic DSBs regulation, critical to our understanding of DSB formation and ultimately crossover regulation and accurate chromosome segregation. The work also emphasizes the commonalities and species-specific aspects of DSB regulation.

      Strengths:

      This study capitalizes on the strengths of the C. elegans system to uncover genetic interactions between a large number of SPO-11 accessory proteins. In combination with physical interactions, the authors synthesize their findings into a model, which will serve as the basis for future work, to determine mechanisms of DSB regulation.

      Weaknesses:

      The methodology, although standard, lacks quantification. This includes the mass spectrometry data , along with the cytology. The work would also benefit from clarifying the role of the DSB machinery on the X chromosome versus the autosomes.

      • We have uploaded the MS data and added a summary table with the number of peptides and coverage.

      • We have added statistics to the comparisons of DAPI body counts.

      • We have provided additional images of the change in HIM-5 localization

      • We have quantified the overlap (or lack thereof) between XND-1 and HIM-17 and the DNA axis

      Reviewer #2 (Public Review):

      Summary:

      Meiotic recombination initiates with the formation of DNA double-strand break (DSB) formation, catalyzed by the conserved topoisomerase-like enzyme Spo11. Spo11 requires accessory factors that are poorly conserved across eukaryotes. Previous genetic studies have identified several proteins required for DSB formation in C. elegans to varying degrees; however, how these proteins interact with each other to recruit the DSB-forming machinery to chromosome axes remains unclear.

      In this study, Raices et al. characterized the biochemical and genetic interactions among proteins that are known to promote DSB formation during C. elegans meiosis. The authors examined pairwise interactions using yeast two-hybrid (Y2H) and co-immunoprecipitation and revealed an interaction between a chromatin-associated protein HIM-17 and a transcription factor XND-1. They further confirmed the previously known interaction between DSB-1 and SPO-11 and showed that DSB-1 also interacts with a nematodespecific HIM-5, which is essential for DSB formation on the X chromosome. They also assessed genetic interactions among these proteins, categorizing them into four epistasis groups by comparing phenotypes in double vs. single mutants. Combining these results, the authors proposed a model of how these proteins interact with chromatin loops and are recruited to chromosome axes, offering insights into the process in C. elegans compared to other organisms.

      Weaknesses:

      This work relies heavily on Y2H, which is notorious for having high rates of false positives and false negatives. Although the interactions between HIM-17 and XND-1 and between DSB-1 and HIM-5 were validated by co-IP, the significance of these interactions was not tested, and cataloging Y2H interactions does not yield much more insight.

      We appreciate that the reviewer recognized the value of our IP data, but we beg to differ that we rely too heavily on the Y2H. We also provide genetic analysis on bivalent formation to support the physical interaction data. We do acknowledge that there are caveats with Y2H, however, including that a subset of the interactions can only be examined with proteins in one orientation due to auto-activation. While we acknowledge that it would be nice to have IP data for all of the proteins using CRISPR-tagged, functional alleles, these strains are not all feasible (e.g. no functional rec-1 tag has been made) and are beyond the scope of the current work.

      Moreover, most experiments lack rigor, which raises serious concerns about whether the data convincingly supports the conclusions of this paper. For instance, the XND-1 antibody appears to detect a band in the control IP; however, there was no mention of the specificity of this antibody.

      We previously showed the specificity of this antibody in its original publication showing lack of staining in the xnd-1 mutant by IF (Wagner et al., 2010). To further address this, however, we have now included a new supplementary figure (Figure S1) demonstrating the specificity of the XND-1 antibody by Western blot. The antibody detects a distinct band in extracts from wild-type (N2) worms, but this band is absent in two independent xnd-1 mutant strains. This confirms that the antibody specifically recognizes XND-1, supporting the validity of the IP results shown in the main figures.

      Additionally, epistasis analysis of various genetic mutants is based on the quantification of DAPI bodies in diakinesis oocytes, but the comparisons were made without statistical analyses.

      We have added statistical analysis to all datasets where quantification was possible, strengthening the rigor and interpretation of our findings.

      For cytological data, a single representative nucleus was shown without quantification and rigorous analysis. The rationale for some experiments is also questionable (e.g. the rescue by dsb-2 mutants by him-5 transgenes in Figure 2), making the interpretation of the data unclear. Overall, while this paper claims to present "the first comprehensive model of DSB regulation in a metazoan", cataloging Y2H and genetic interactions did not yield any new insights into DSB formation without rigorous testing of their significance in vivo. The model proposed in Figure 4 is also highly speculative.

      Regarding the cytology, we provide new images and quantification of HIM-17 and XND-1 overlap with the DNA axes. We also added full germ line images showing HIM-5 localization in wild type and dsb-1 mutants, to provide a more complete and representative view of the observed phenotype. To further support our findings, we’ve also included images demonstrating that this phenotype is consistently observed with both in live worm with the the him-5::GFP transgene and in fixed worms with an endogenously tagged version of HIM-5.

      Reviewer #3 (Public Review):

      During meiosis in sexually reproducing organisms, double-strand breaks are induced by a topoisomerase-related enzyme, Spo11, which is essential for homologous recombination, which in turn is required for accurate chromosome segregation. Additional factors control the number and genome-wide distribution of breaks, but the mechanisms that determine both the frequency and preferred location of meiotic DSBs remain only partially understood in any organism.

      The manuscript presents a variety of different analyses that include variable subsets of putative DSB factors. It would be much easier to follow if the analyses had been more systematically applied. It is perplexing that several factors known to be essential for DSB formation (e.g., cohesins, HORMA proteins) are excluded from this analysis, while it includes several others that probably do not directly contribute to DSB formation (XND-1, HIM-17, CEP-1, and PARG-1).

      We respectfully disagree with the reviewer’s statement regarding the selection of factors included in our analysis. In this work, our focus was specifically on SPO-11 accessory factors — proteins that directly interact with or regulate SPO-11 activity during doublestrand break formation. Cohesins and chromosome axis proteins (such as the HORMA domain proteins) are essential for establishing the correct chromosome architecture that supports DSB formation, but there is no evidence that they are direct accessory factors of SPO-11. Therefore, they were intentionally excluded from this study to maintain a clear and focused scope on proteins that more directly modulate SPO-11 function.

      Conversely, XND-1, HIM-17, CEP-1, and PARG-1 have all been implicated in regulating aspects of SPO-11-mediated DSB formation or its immediate environment. Although their contributions mayinvolve broader chromatin or DNA damage response regulation, prior literature supports their inclusion as relevant modulators of SPO-11 activity, justifying their analysis within the context of this work.

      The strongest claims seem to be that "HIM-5 is the determinant of X-chromosome-specific crossovers" and "HIM-5 coordinates the actions of the different accessory factors subgroups." Prior work had already shown that mutations in him-5 preferentially reduce meiotic DSBs on the X chromosome. While it is possible that HIM-5 plays a direct role in DSB induction on the X chromosome, the evidence presented here does not strongly support this conclusion. It is also difficult to reconcile this idea with evidence from prior studies that him-5 mutations predominantly prevent DSB formation on the sex chromosomes, while the protein localizes to autosomes.

      HIM-5 is not the only protein that is autosomally enriched but preferentially affects the X chromosome: MES-4 and MRG-1 are both autosomally-enriched but influence silencing of the X chromosome. While HIM-5 appears autosomally-enriched, it does not appear to be autosomal-exclusive. While we would ideally perform ChIP to determine its localization on chromatin, this method for assaying DSB sites is likely insufficient to identify DSB sites which differ in each nucleus and for which there are no known hotspots in the worm.

      him-5 mutants confer an ~50% reduction in total number of breaks and a very profound change in break dynamics (seen by RAD-51 foci (Meneely et al., 2012)). Since the autosomes receives sufficient breaks in this context to attain a crossover in >98% of nuclei, this indicates that the autosomes are much less profoundly impacted by loss of DSB functions than is the X chromosome. Indeed, prior data from co-author, Monica Colaiacovo, showed that fewer breaks occur on the X (Gao, 2015) likely resulting from differences in the chromatin composition of the X and autosome resulting from X chromosome silencing.

      The conclusion that HIM-5 must be required for breaks on the X comes from the examination of DSB levels and their localization in different mutants that impair but do not completely abrogate breaks. In any situation where HIM-5 protein expression is affected (xnd-1, him-17, and him-5 null alleles), breaks on the X are reduced/ eliminated. By contrast, in dsb-2 mutants, where HIM-5 expression is unaffected, both X and autosomal breaks are impacted equally. As discussed above, in the absence of HIM-5 function, there are ~15 breaks/ nucleus. The Ppie1::him-5 transgene is expressed to lower levels than Phim-5::him-5, but in the best case, the ectopic expression of this protein should give a maximum of ~15 breaks (the total # of breaks is thought to be ~30/nucleus). By these estimates, Ppie-1::him-5; him-17 and him-5 null mutants have the same number of breaks. Yet, in the former case, breaks occur on the X; whereas in the latter they do not. The best explanation for this discrepancy is that HIM-5 is sufficient to recruits the DSB machinery to the X chromosome.

      The one experiment that seems to elicit the conclusion that HIM-5 expression is sufficient for breaks on the X chromosome is flawed (see below). The conclusion that HIM-5 "coordinates the activities of the different accessory sub-groups" is not supported by data presented here or elsewhere.

      We have reorganized the discussion to more directly address the reviewers’ concerns. We raise the possibility that HIM-5 has an important role in bringing together the SPO-11 and its interacting components (DSB-1/2/3) with the other DSB inducing factors, including those factors that regulating DSB timing (XND-1), coordination with the cell cycle (REC-1), association with the chromosome axis (PARG-1, MRE-11), and coupling to downstream resection and repair (MRE-11, CEP-1).  

      This raises a natural question: if HIM-5 has such a central role, why are the phenotypes of HIM-5 so mild? We propose that while the loss of DSBs on the X appears mild, more profound effects are seen in the total number, timing, and placement of the DSBs across the genome- all of which are diminished or altered in the absence of HIM-5. The phenotypes of him-5 loss reminiscent of those observed in Prdm9-/- in mice where breaks are relocated to transcriptional start sites and show significant delay in formation. As with PRDM9, the comparatively subtle phenotypes of HIM-5 loss do not diminish its critical role in promoting proper DSB formation in most mammals.

      Like most other studies that have examined DSB formation in C. elegans, this work relies on indirect assays, here limited to the cytological appearance of RAD-51 foci and bivalent chromosomes, as evidence of break formation or lack thereof. Unfortunately, neither of these assays has the power to reveal the genome-wide distribution or number of breaks. These assays have additional caveats, due to the fact that RAD-51 association with recombination intermediates and successful crossover formation both require multiple steps downstream of DSB induction, some of which are likely impaired in some of the mutants analyzed here. This severely limits the conclusions that can be drawn. Given that the goal of the work is to understand the effects of individual factors on DSB induction, direct physical assays for DSBs should be applied; many such assays have been developed and used successfully in other organisms.

      We appreciate the reviewer’s thoughtful comments. We agree that RAD-51 foci are an indirect readout of DSB formation and that their dynamics can be influenced by defects in downstream repair processes. However, in C. elegans, the available methods for directly detecting DSBs are limited. Unlike other organisms, C. elegans lacks γH2AX, eliminating the possibility of using γH2AX as a DSB marker. TUNEL assays, while conceptually appealing, have proven unreliable and poorly reproducible in the germline context. Similarly, RPA foci do not consistently correlate with the number of DSBs and are influenced by additional processing steps.

      Given these limitations, RAD-51 foci remain the most widely accepted surrogate for monitoring DSB formation in C. elegans. While we fully acknowledge the caveats associated with this approach — particularly the potential effects of downstream repair defects — RAD-51 analysis continues to provide valuable insight into DSB dynamics and regulation, especially when interpreted in combination with other phenotypic assessments.

      Throughout the manuscript, the writing conflates the roles played by different factors that affect DSB formation in very different ways. XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes, including genes encoding proteins that directly promote DSBs. Mutations in either xnd-1 or him-17 result in dysregulation of germline gene expression and pleiotropic defects in meiosis and fertility, including changes in chromatin structure, dysregulation of meiotic progression, and (for xnd-1) progressive loss of germline immortality. It is thus misleading to refer to HIM-17 and XND-1 as DSB "accessory factors" or to lump their activities with those of other proteins that are likely to play more direct roles in DSB induction.

      It is clear that we will not reach agreement about the direct vs indirect roles here of chromatin remodelers/transcription factors in break formation. In yeast, there is a precedent for SPP1 and in mouse for Prdm9, both of which could be described as transcription factors as well, as having roles in break formation by creating an open chromatin environment for the break machinery. We envision that these proteins function in the same fashion. The changes in histone acetylation in the xnd-1 mutants supports such a claim.

      We do not know what the reviewer is referring to in statement that “XND-1 and HIM-17 have previously been shown to be transcription factors that promote the expression of many germline genes.” While the Carelli et al paper indeed shows a role for HIM-17 in expression of many germline genes, there is only one reference to XND-1 in this manuscript (Figure S3A) which shows that half of XND-1 binding sites overlap with the co-opted germline promoters. There is no transcriptional data at all on xnd-1 mutants, save our studies (referenced herein) that XND-1 regulates him-5 expression.

      For example, statements such as the following sentence in the Introduction should be omitted or explained more clearly: "xnd-1 is also unique among the accessory factors in influencing the timing of DSBs; in the absence of xnd-1, there is precocious and rapid accumulation of DSBs as monitored by the accumulation of the HR strand-exchange protein RAD-51.

      We are not sure what is confusing here. The distribution of RAD-51 foci is significantly altered in xnd-1 mutants and peak levels of breaks are achieved as nuclei leave the transition zone (Wagner et al., 2010; McClendon et al., 2016). There is no other mutation that causes this type of change in RAD-51 distribution.

      "The evidence that HIM-17 promotes the expression of him-5 presented here corroborates data from other publications, notably the recent work of Carelli et al. (2022), but this conclusion should not be presented as novel here.

      We have clarified this in the text. We note that this paper showed alterations in him-5 levels by RNA-Seq but they did not validate these results with quantitative RT-PCR. Thus, our studies do provide an important validation of their prior results.

      The other factors also fall into several different functional classes, some of which are relatively well understood, based largely on studies in other organisms. The roles of RAD50 and MRE-11 in DSB induction have been investigated in yeast and other organisms as well as in several prior studies in C. elegans. DSB-1, DSB-2, and DSB-3 are homologs of relatively well-studied meiotic proteins in other organisms (Rec114 and Mei4) that directly promote the activity of Spo11, although the mechanism by which they do so is still unclear.

      Whilst we agree that we understand some of the functions of the homologs, there are clearly examples in other processes of conserved proteins adopting unique regulatory function. We should not presume evolutionary conservation until proven. Indeed the comparison between the Mer2 proteins becomes particularly relevant here. For example, the RMM complex in plants does not contain PRD3, although this protein is thought to have function in DSB formation and repair (Lambing et al, 2022; Vrielynck et al., 2021; Thangavel et al., 2023). In Sordaria, as well, the Mer2 homolog has distinct functions (Tesse et al., 2017).  

      Mutations in PARG-1 (a Poly-ADP ribose glycohydrolase) likely affect the regulation of polyADP-ribose addition and removal at sites of DSBs, which in turn are thought to regulate chromatin structure and recruitment of repair factors; however, there is no convincing evidence that PARG-1 directly affects break formation.

      Our prior collaborative studies on PARG-1 showed that is has a non-catalytic function that promote DSBs that is independent of accumulation of PAR (Janisiw et al., 2020; Trivedi et al., 2022)

      CEP-1 is a homolog of p53 and is involved in the DNA damage response in the germline, but again is unlikely to directly contribute to DSB induction.

      We respectfully disagree with the reviewer’s statement. While CEP-1 is indeed a homolog of p53 and plays a major role in the DNA damage response, prior work from Brent Derry’s lab and from our group (Mateo et al., 2016) demonstrated that specific cep-1 separationof-function alleles affect DSB induction and/or repair pathway choice independently of canonical DNA damage checkpoint activation. In particular, defects in DSB formation observed in certain cep-1 mutants can be rescued by exogenous irradiation, supporting a direct or closely linked role in promoting DSB formation rather than merely responding to damage. Thus, based on these functional data, we considered CEP-1 a relevant factor to include in our analysis. We have now clarified this rationale in the revised manuscript.

      HIM-5 and REC-1 do not have apparent homologs in other organisms and play poorly understood roles in promoting DSB induction. A mechanistic understanding of their functions would be of value to the field, but the current work does not shed light on this. A previous paper (Chung et al. G&D 2015) concluded that HIM-5 and REC-1 are paralogs arising from a recent gene duplication, based on genetic evidence for a partially overlapping role in DSB induction, as well as an argument based on the genomic location of these genes in different species; however, these proteins lack any detectable sequence homology and their predicted structures are also dissimilar (both are largely unstructured but REC-1 contains a predicted helical bundle lacking in HIM-5). Moreover, the data presented here do not reveal overlapping sets of genetic or physical interactions for the two genes/proteins. Thus, this earlier conclusion was likely incorrect, and this idea should not be restated uncritically here or used as a basis to interpret phenotypes.

      Actually, there is quite good bioinformatic analysis that the rec-1 and him-5 loci evolved from a gene duplication and that each share features of the ancestral protein (Chung et al., 2015). We are sorry if the reviewer casts aspersions on the prior literature and analyses. The homology between these genes with the ancestral protein is near the same degree as dsb-1, dsb-2, or dsb-3 to their ancestral homologs (<17%).

      DSB-1 was previously reported to be strictly required for all DSB and CO formation in C. elegans. Here the authors test whether the expression of HIM-5 from the pie-1 promoter can rescue DSB formation in dsb-1 mutants, and claim to see some rescue, based on an increase in the number of nuclei with one apparent bivalent (Figure 2C). This result seems to be the basis for the claim that HIM-5 coordinates the activities of other DSB proteins. However, this assay is not informative, and the conclusion is almost certainly incorrect. Notably, a substantial number of nuclei in the dsb-1 mutant (without Ppie-1::him-5) are reported as displaying a single bivalent (11 DAPI staining bodies) despite prior evidence that DSBs are absent in dsb-1 mutants; this suggests that the way the assay was performed resulted in false positives (bivalents that are not actually bivalents), likely due to inclusion of nuclei in which univalents could not be unambiguously resolved in the microscope. A slightly higher level of nuclei with a single unresolved pair of chromosomes in the dsb-1; Ppie-1::him-5 strain is thus not convincing evidence for rescue of DSBs/CO formation, and no evidence is presented that these putative COs are X-specific. The authors should provide additional experimental evidence - e.g., detection of RAD-51 and/or COSA-1 foci or genetic evidence of recombination - or remove this claim. The evidence that expression of Ppie-1::him-5 may partially rescue DSB abundance in dsb-2 mutants is hard to interpret since it is currently unknown why C. elegans expresses 2 paralogs of Rec114 (DSB-1 and DSB-2), and the age-dependent reduction of DSBs in dsb-2 mutants is not understood.

      We have removed this claim in part because we have been unable to create the triple mutants strains to analyze COSA-1 foci.

      To the point about 11 vs 12 DAPI bodies: the literature is actually replete with examples of 11 DAPI bodies vs 12 in mutants with no breaks:

      Hinman al., 2021: null allele of dsb-3 has an average of 11.6 +/- 0.6 breaks;

      Stamper et al, 2013, show just over 60% of dsb-1 nuclei with 12 DAPI bodies and 5-10% with 10 DAPI bodies. (Figure 1);

      In addition, we also previously showed (Machovina et al., 2016) that a subset of meiotic nuclei have a single RAD-51 focus and can achieve a crossover. RAD-51 foci in spo-11 were also reported in Colaiacovo et al., 2003.

      Several of the factors analyzed here, including XND-1, HIM-17, HIM-5, DSB-1, DSB-2, and DSB-3, have been shown to localize broadly to chromatin in meiotic cells. Coimmunoprecipitation of pairs of these factors, even following benzonase digestion, is not strong evidence to support a direct physical interaction between proteins.

      Similarly, the super-resolution analysis of XND-1 and HIM-17 (Figure 1EF) does not reveal whether these proteins physically interact with each other, and does not add to our understanding of these proteins functions, since they are already known to bind to many of the same promoters. Promoters are also likely to be located in chromatin loops away from the chromosome axis, so in this respect, the localization data are also confirmatory rather than novel.

      While the binding to promoters would be expected to be on DNA loops, that has not been definitively shown in the worm germ line. The supplemental data of the Carelli paper suggests that there are ~250 binding sites for each protein at these coopted promoters. This could not account for crossover map seen in C. elegans.

      The reviewer states correct that we do not reveal that these proteins interact, but we have shown that the two proteins co-IP and have a Y2H interaction. This interaction is supporedt by a recent publication (Blazickova et al., 2025) corroborating this conclusion and identifies XND-1 in HIM-17 co-IPs also in the presence of benzonase. We do now show, however, by immuno-localization that the two proteins appear to be adjacent, but nonoverlapping. As now described in the text, AlphaFold 3 modeling and structural analysis suggests that the two proteins do interact directly and that the tagged 5’ end of HIM-17 used in our studies is likely to be at least 200nm from the putative XND-1 binding interface, a distance that is consistent with our confocal images showing frequent juxtaposition of the two proteins.

      The phenotypic analysis of double mutant combinations does not seem informative. A major problem is that these different strains were only assayed for bivalent formation, which (as mentioned above) requires several steps downstream of DSB induction. Additionally, the basis for many of the single mutant phenotypes is not well understood, making it particularly challenging to interpret the effects of double mutants. Further, some of the interactions described as "synergistic" appear to be additive, not synergistic. While additive effects can be used as evidence that two genes work in different pathways, this can also be very misleading, especially when the function of individual proteins is unknown. I find that the classification of genes into "epistastasis groups" based on this analysis does not shed light on their functions and indeed seems in some cases to contradict what is known about their functions. ‘

      As described above, each of the proteins analyzed is thought to have a direct role in regulating meiotic DSB formation and single mutant phenotypes are consistent with this interpretation. In almost all-if not all- of these cases, IR induced breaks suppress univalent phenotypes (or uncover a downstream repair defect (e.g. in mre-11)) supporting this conclusion. We have changed the terminology from “epistasis groups” since this is not strict epistasis, but rather, “functional groups”.  

      The yeast two-hybrid (Y2H) data are only presented as a single colony. While it is understandable to use a 'representative' colony, it is ideal to include a dilution series for the various interactions, which is how Y2H data are typically shown.

      The Y2H data are presented as spots on a plate and are from three to four individual transformants per interaction tested, and are not individual colonies. The experiment was repeated in triplicate from different transformations. We have now made this clearer in the materials and methods section. This approach has been successfully used to examine protein interactions in our prior manuscripts of yeast and human proteins [Gaines et al (2015) Nat. Comms 6:7834; Kondrashova et al (2017) Cancer Discovery 7:984; Garcin et al (2019) PLoS Genetics 15:e1008355; Bonilla et al (2021) eLife 1: e68080) Prakash et al (2022) PNAS 119: e2202727119, etc]

      Additional (relatively minor) concerns about these data:

      (1) Several interactions reported here seem to be detected in only one direction - e.g., MRE-11-AD/HIM-5-BD, REC-1-AD/XND-1-BD, and XND-1-AD/HIM-17-BD - while no interactions are seen with the reciprocal pairs of fusion proteins. I'm not sure if some of this is due to pasting "positive" colony images into the wrong position in the grid, but this should be addressed.

      The asymmetry in the interactions observed is due to the well-known phenomenon in yeast two-hybrid (Y2H) assays where certain plasmids exhibit self-activation when fused in one orientation, making interpretation of reciprocal interactions challenging. In our experiment, some of the plasmids indeed showed self-activation in one direction, which likely accounts for the lack of interaction seen with the reciprocal pairs of fusion proteins. We have clarified this point in the Methods.

      (2) DSB-3 was only assayed in pairwise combinations with a subset of other proteins; this should be explained; it is also unclear why the interaction grids are not symmetrical about the diagonal.

      We have now completed the analysis by adding the interactions of DSB-3 with the remaining proteins that were missing from the initial set.

      (3) I don't understand why the graphic summaries of Y2H data are split among 3 different figures (1, 2, and 3).

      We chose to split the graphic summaries of the Y2H data across Figures 1, 2, and 3 because we felt this organization better aligns with the flow of the results presented in each figure. Each set of interactions is shown in the context of the specific experiments and findings discussed in those sections, which we believe helps provide a clearer and more logical presentation of the data.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Figure 1: B) The IP is difficult to interpret - there is a band of the corresponding size to XND-1 in the control lane calling into question the specificity of the IP/Western.

      We added a supplemental figure with the specificity of the antibody showing that there is a background non-specific band.

      C) More information about the mass spectrometry should be included. No indication of the number of times a peptide was identified, or the overall coverage of the identified proteins.

      Done

      This is important as in the results section (line 114) the authors indicate that there was "strong" interaction yet there is no way to assess this.

      D) Why wasn't hatching measured in the him-5p::him-5; him-17(ok424) strain?

      Great question. I guess we need to do this while back out for review. If anyone has suggestions of what to say here. Clearly we overlooked this point but do have the strain.

      E) Quantification of the cytology should be included.

      We have now quantified overlap between XND-1 and HIM-17

      Figure 2: C) Statistics should be included.

      Done

      E) Quantification should be included for the cytology. I recommend changing the eals15 to HIM-5.

      We included better images showing whole gonads instead of one or two nuclei. We were not sure what the reviewers want us to quantify here since the relocalization of the protein to the cytoplasm is very clear.

      I have a general issue with the use of the term epistasis - this is used to order gene function based on different mutant phenotypes, usually with null alleles. While I think the authors have valid points with how they group the different SPO-11 accessory proteins, I do not think they should use the word epistasis, but rather genetic interactions.

      We appreciate the reviewers thoughts on this matter and have removed the term epistasis and use functional groups or genetic interactions throughout the text.

      Figure 4 and the nature of the X chromosome: First, I think it would help the non-C. elegans reader to include a little more information on the X chromosome with respect to its differences compared to the autosomes. I also think that, if possible, it would be beneficial to include a model of the X in Figure 4.

      We have added more about X/autosome differences in the intro and during the discussion of HIM-5 function and have added a figure showing difference in the behavior of the X/autosomes during DSB/crossover formation.

      Minor points:

      Abstract: Given the findings of Silva and Smolikove on SPO-11 breaks, I recommend removing "early" from line 28 in the Abstract.

      Done

      Introduction (line 93): I think "biochemical studies" is a stretch here - I recommend "interaction studies".

      Done

      Results: (lines 160-161): mutations are not required for breaks. Line 172, there is a problem with the sentence.

      Corrected

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      (1) Figure 1B- The signal for XND-1 seems to appear both in the control and him-17::HA IP. Do the authors have tested the specificity of the XND-1 antibody?

      We included a supplementary figure demonstrating the specificity of the XND-1 antibody by Western blot. This was also previously published (Wagner et al., 2010)

      (2) Figure 1D - can the authors provide an explanation why the him-5p::him-5 transgene that drives a higher expression than pie-1p::him-5 fails to suppress the Him phenotype seen in him-17? What are the HIM-5 levels like in these two strains compared to N2 and him-17 null mutants? Can this information provide explanation for the differential effect of the him-5 transgenes?

      We previously reported that him-5p::him-5 drives higher expression than pie-1p::him-5 (McClendon et al, 2016).

      The reason that him-5p::him-5 does not rescue, despite higher wild type expression is that HIM-17 directly regulates expression of him-5. Since HIM-17 does not regulate the pie-1 promoter, the pie-1p::him-5 construct can at least partially suppress the him-17 mutation.

      We have (hopefully) explained this better in the text.  

      (3) Line 102- the subheading "HIM-5 is the essential factor for meiotic breaks in the Xchromosome" may not be appropriate for this section. This is what has previously been known. However, the results in Figure 1 demonstrate that a him-5 transgene can partially rescue the him-17 and ¬xnd-1 phenotype, but not that it is essential for meiotic DSB formation on X chromosomes.

      We think some of the concern here is sematic and have changed the phraseology to say that HIM-5 is SUFFICIENT for DSBs on the X… which had not previously been shown.

      Vis-à-vis the X chromosome, in all genetic backgrounds examined, the absence of HIM-5 consistently results in a complete lack of DSBs on the X. For instance, in dsb-2 mutants— where HIM-5 is still expressed—DSBs are reduced genome-wide, but the X chromosome occasionally retains breaks. In contrast, even a weak allele of him-17 results specifically in the loss of X chromosome breaks, underscoring a unique requirement for HIM-5 in promoting DSBs on the X. While Figure 1 shows that a him-5 transgene can partially rescue him-17 and xnd-1 phenotypes, the consistent observation that X breaks are absent without HIM-5 supports its classification as sufficient for DSB formation on the X chromosome.

      (4) Figure 1E - please consider enlarging the images and showing multiple examples.

      Done.

      I also suggest that the authors perform a more rigorous analysis to support the conclusion that XND-1 and HIM-17 localize away from the axis by quantifying multiple images and doing line-scan analysis.

      Provided. New images are provided in both, the main and supplemental figures, and quantification is included. There is no detectable overlap of the two protein with one another or the DNA axes (see quantification of overlap in Fig. 1).

      (5) Line 162 - This is the first mention of DSB-1, DSB-2, and DSB-3 in the paper. DSB-1 and DSB-2 are Rec114 homologs in C. elegans (Tesse et al., 2017), while DSB-3 is a homolog of Mei4 (Hinman et al., 2021). These proteins should be properly introduced in the introduction with appropriate citations.

      Done. We appreciate the reviewer pointing out that this was the first reference to these genes.

      (6) Line 169 - the rationale for this experiment is unclear. Why did the Y2H interaction between HIM-5 and DSB-1 prompt the authors to test the rescue of dsb-1 or dsb-2 phenotypes by the ectopic expression of him-5? Do the authors have evidence that HIM-5 level is reduced in dsb-1 or dsb-2 mutants?

      We have reorganized this section to better explain the motivation for looking at these interactions. We did see a difference in the localization in HIM-5 in the dsb-1 mutant animals and we did have a sense that HIM-5 was critical for breaks on the X. We reasoned that it could have independent functions in promoting breaks that were not yet appreciated so wanted to do this experiment.

      (7) Line 172 - "very slightly reduced". This claim requires statistical analysis.

      We added statistical analysis, but we also removed this claim.

      (8) Figures 2C and 2D - Can the authors provide an explanation why the pie-1p::him-5 transgene fails to suppress the phenotypes in dsb-1, while the him-5p::him-5 trasgene can? Again, the rationale for these experiments is unclear. Because of this, the interpretation is also unclear.

      The difference in rescue between the pie-1p::him-5 and him-5p::him-5 transgenes likely reflects differences in expression levels. As previously shown (McClendon et al., 2016), the him-5p::him-5 construct results in significantly higher expression of HIM-5 protein compared to pie-1p::him-5. This elevated expression likely explains its ability to partially rescue the dsb-1 phenotype. In contrast, the lower expression driven by the pie-1 promoter is insufficient to compensate for the absence of dsb-1 function. We have clarified the rationale and interpretation of these experiments in the revised manuscript to better reflect this point.

      (9) Lines 184-185 - the data for endogenously tagged HIM-5::3xHA are not shown anywhere in the paper. This must be shown.

      We have added this in the supplemental figures.

      (10) Figure 2D and 2E - what does the localization of pie-1p::him-5::GFP (eaIs15) and him5p::him-5::GFP (eaIs4) look like in wild-type and dsb-1 mutants? Are the cytoplasmic aggregates caused by increased levels of HIM-5 expression? Can the differential behavior of him-5 transgenes provide explanation for differential rescues?

      We now show both live and fixed images of Phim-5::him-5::gfp transgenes, as well as the localization of the endogenously HA-tagged HIM-5 locus (Figure 2 and S3). In all cases, the protein is initially nuclear and then absent from meiotic nuclei with similar timing. The Ppie1::him-5 transgene was very difficult to image due to low expression (even in wild type) so it not shown here. We presume it is the slightly elevated level of expression of the Phim5::him-5::gfp that can explain the differential rescue.

      (11) Lines 221-222, where are the results shown? Please refer to Figure S3.

      Done

      (12) Figure S3 - these need statistical analyses.

      Done

      (13) Lines 230-231 - what about the rec-1; parg-1; cep-1 triple mutant?

      This is an excellent suggestion and not one we have not yet pursued. Given the lack of strong phenotypes in all combination of double mutants, we prioritized other experiments . However, we agree that examining the rec-1; parg-1; cep-1 triple mutant would provide a valuable test of whether these factors act in the same pathway, and we appreciate the reviewer highlighting this potential future direction.

      (14) Line 298 - I suggest the authors take a look at the Alphafold prediction of DSB-1/DSB-2/DSB-3 and the comparison to human and budding yeast Rec114/Mei4 complex in Guo et al., 2022 eLife, which could provide insights into the Y2H results.

      We thank the reviewer for these comments and have indeed used these interactions and predicted homologies to zero in a region of interaction between these proteins that resembles what is seen in humans and yeast with a dimer of REC114 like proteins wraps stabilizing a central Mei4 helix . This is now shown in Figure 3H, I. Satisfyingly, this modeling predicts that a trimer comprised of 2 DSB-1 proteins with DSB-3 is more stable than a DSB1-DSB-2-DSB-3 trimer. This might explain why DSB-2 is not required in young adults and only becomes essential as DSB-1 levels drop in older animals (Rosu et al., 2013)

      (15) Can the authors introduce mutations within the DSB-1 interfaces that disrupt the interaction to either SPO-11 or DSB-2?

      We have begun to address this question by introducing targeted mutations within DSB-1. As shown in Figure 3E and 3F, mutations in the C-terminal region of DSB-1—which includes a core of four α-helices—disrupt its interaction with DSB-2 and DSB-3, but not with SPO-11. These findings suggest that the C-terminus mediates interactions specifically with DSB2 and DSB-3

      (16) Line 323 - The him-5 phenotypes are too weak to support the idea that it serves as the linchpin for the whole DSB complex. Do the authors have an explanation for why him-5 mutants exhibit X-chromosome-specific DSB defects?

      In response to the reviewer, above, and in the text, we have included a more detailed explanation of why we think HIM-5 has a key role in coordinating meiotic break formation. Although, identified for its role on the X, the phenotypes associated with DSB formation in the mutant are really quite pleiotropic and severe.

      (17) Line 436 - C. elegans lacks DSB hotspots.

      Removed

      Minor comments:

      (1) Figure 1A - please show the raw data for the yeast two-hybrid.

      We show representative yeast colonies in Figure S3.

      (2) It looks like the labeling for Figure 1B and 1C are switched.

      Fixed.

      (3) Figure 1B - what does the red box indicate? Please explain it in the legend.

      It indicates the XND-1 band. We added that information in the legend.

      (4) Figure 1C - in the legend, it was noted that the results are from GFP pulldowns of HIM17::GFP. However, the method for Figure 1B and the method section noted that HIM-17 was tagged with 3xHA, and the pull-down was performed using anti-HA affinity matrix. Please reconcile this discrepancy.

      That’s because they were done in two different sets of experiments. For the IPs we used a HIM-17::HA strain and for the MS, a HIM-17::GFP strain.

      (5) Also in Figure 1C - please call Table S2 in the main text when discussing the mass spec results. Also, it is not clear what HIM-17 and GFP indicate in the table. What makes CKU80 different from the other proteins listed under GFP? Please explain more clearly in the legend.

      We have move the table to supplemental data where we have included all of the peptide counts and gene coverage. We have included in the revised method rationale for inclusion in this table which explains why CKU-80 differs.

      (6) Line 527 - it is unclear what experiment was done for HIM-17. Please revise it to indicate that this is for "HIM-17 immunoprecipitation". Also please indicate the strain used for HIM17 pull-down (AV280?).

      (7) Line 113- please be specific about how the HIM-17 IP was performed. Which epitope and strains are used for pull-downs?

      This indeed was AV280. This has been added to the text and methods.

      (8) Figure 1D- What does ND mean? In the text, it was stated that there was only a minor suppression of hatching rates. The hatching rate for him-5p::him-5; him-17 must have been measured, and the data must be presented.

      ND does mean not determined. We have removed the statement about “minor suppression”. We only tested the overall population dynamics in the Phim-5::him-5;him17(ok424) and the DAPI body counts. The failure to suppress the latter suggests there would be no enect on hatching rates, although we did not test this directly. Since we had done this for the Ppie-1::him-5;him-17 strain, we provided this information to further support the claims of genetic rescue by ectopic expression.

      (9) Line 151 - please specify that STED was used.

      We have removed the STED images, and just show the confocal images with Lightning Processing.

      (10) Figure 1E- the authors suggested that HIM-17 and XND-1 mainly localize to autosomes but not the X chromosome. However, there is not enough evidence that the chromosome excluded from HIM-17 staining is indeed an X chromosome.

      (11) Figure 1E (Line 154) - what are the active chromatin markers examined? Where are the data?

      We have previously shown that the chromosome lacking XND-1 staining is the X (Wagner et al., 2010). The X is heterochromatic and chromatin marks associated with active transcription, including H3K4me3 and HTZ-1 (a variant H2A), preferentially localize to autosomes, effectively anti-marking the X chromosome. As shown in the new Figure 1E, a single chromosome has very little XND-1 and HIM-17 associated proteins. This is the X chromosome.

      (12) Line 172 - It should be a comma instead of the period after "In dsb-1 mutants".

      Fixed

      (13) Figure S3H-K - I suggest the authors indicate the alleles of mre-11 (null vs. iow1) on the graph, similarly to him-5(e1490) to avoid confusion.

      Done

      (14) Lines 294 and 600 - Guo et al. 2022 is now published in eLife. The authors must cite the published paper, not the preprint.

      Fixed

      (15) Line 407 - the reference Carelli et al., 2022 is missing.

      Added

      (16) Line 766 - please remove "is" before nuclear.

      Done

      Reviewer #3 (Recommendations For The Authors):

      Major issues:

      In my view, the most interesting mechanistic finding in the paper is the evidence that HIM-5 may not bind to chromatin in the absence of DSB-1. If validated, this would suggest that HIM-5 is likely to be directly involved in a process that promotes break formation, in contrast to factors such as HIM-17 and XND-1. It does not, however, support the idea that HIM-5 is at the top of a hierarchy of DSB factors, as it is interpreted here. More importantly, the data supporting this claim are unconvincing; only a single image of an unfixed gonad from an animal expressing HIM-5::GFP is shown. Immunofluorescence should be performed and the results must be quantified.

      We have provided additional images of the HIM-5 relocalization to show that we observed this in both fixed and live worms with two different tagged strains. The exclusion from the nucleus is seen in all scenarios. Whether the protein now accumulates exclusively in the cytoplasm/ is destabilized is challenging to address with the fixed images due to the arbitrariness of defining “background” staining.

      More generally, this type of analysis, looking at the interdependence of different factors for their association with chromosomes, is much more informative than the genetic interaction data presented in the paper, which does not seem to provide any mechanistic insights into the functions of the factors analyzed. The paper could potentially be greatly improved through a more extensive, systematic analysis of the interdependence of DSBpromoting factors for their localization to chromosomes.

      We have at least added this for XND-1 and HIM-17 and show they are not interdependent for chromosome association. We also provide for the first time data on the localization of HIM-5 in the dsb-1 mutant. Many of the other interactions have already been shown in the literature and/or were not warranted base on the lack of genetic interaction we present here.

      Minor issues:

      The title is vague and inconclusive. A more concrete title summarizing the major findings would help readers to assess whether the work is of interest.

      We have discussed the title extensively with all authors and all would like to keep the current title.

      The authors claim that the expression of HIM-5 from a different promoter (Ppie-1::him-5) but not its endogenous promoter (Phim-5::him-5) can partially rescue the DSB defect in him-17 mutants. To support this claim, they should really quantify the germline expression of HIM-5 in wild-type, him-17, him-17; Ppie-1::him-5, and Phim-5::him-5; him-17.

      We had previously reported the expression in the N2 background of both transgenes (McClendon et al., 2016)

      Panel O appears to be missing from Figure S3.

      Fixed

      The evidence for chromosome fusions in cep-1; mre-11 mutants shown in S4D is not convincing and the claim should be removed unless stronger evidence can be obtained.

      A clearer image has been added

      The basis of the following statement is unclear: "Furthermore, rec-1;him-5 double mutants give an age-dependent severe loss of DSBs (like dsb-2 mutants) suggesting that the ancestral function of the protein may have a more profound effect on break formation." The manuscript does not seem to include data regarding age-dependent loss of DSBs and no other publication is cited to support this claim. The interpretation is also perplexing; I think that it may be predicated on the idea that REC-1 and HIM-5 are paralogs, but as stated above, this claim is not well supported and is likely specious.

      We have added the reference. This was shown in Chung et al., 2013 – the paper that presented the cloning of the rec-1 locus.

  2. Sep 2025
    1. eLife Assessment

      The study provides valuable insights into the role of thalamic nuclei in associative threat and extinction learning, supported by a large dataset and multipronged analyses. However, aspects of the evidence remain incomplete, particularly regarding the statistical methods, the claims of plasticity, and the network modeling framework. With this addressed, this manuscript will be of interest to those interested in learning and memory, fear, thalamic circuitry, and related mental heath conditions.

    2. Reviewer #1 (Public review):

      Summary:

      Badarnee and colleagues analyse fMRI data collected during an associative threat-learning task. They find evidence for parallel processes mediated by the mediodorsal, LGn, and pulvinar nuclei of the thalamus. The evidence for these conclusions is promising, but limited by a lack of clarity regarding the preprocessing and statistical methods.

      Strengths:

      The approach is inventive and novel, providing information about thalamocortical interactions that are scant in the current literature.

      Weaknesses:

      (1) There are not sufficient details present to allow for the direct interrogation of the methods used in the study.

      (2) The figures do not contain sufficiently granular details, making it challenging to determine whether the observed effects were robust to individual differences.

    3. Reviewer #2 (Public review):

      Summary:

      The authors quantify human fMRI BOLD responses in pulvinar and mediodorsal thalamic nuclei during a fear conditioning and extinction task across two days, in a large sample size (hundreds of participants). They show that the BOLD responses in these areas differentiate the conditioned (CS+) and safety (CS-) stimuli. Additionally, this changes with repeated trials, which could be a neural correlate of fear learning. They show that the anterior pulvinar is most correlated with the MD, and that this is not due to anatomical proximity. They perform graph analysis on the pulvinar subnuclei, which suggests that the medial pulvinar is a hub between the sensory (lateral/inferior) and associative (anterior) pulvinar. They show different patterns of thalamic activity across conditioning, extinction, recall, and renewal.

      Strengths:

      The data has a large sample size (n=293 in some measures, n=412 in others). This is a validated human fear conditioning/extinction task that Dr Milad's group has been working with for several years. Few labs have investigated the thalamus activity during fear conditioning and extinction, particularly with a large sample size. There is an independent replication of the pulvinar network structure (Figure 3), which suggests that the processing in the more sensory-related inferior and lateral pulvinar is relayed to the anterior pulvinar (and possibly thereby to more action-related prefrontal areas) via an intermediate step in the medial pulvinar - potentially a novel discovery, but that needs more validation.

      Weaknesses:

      (1) The authors cannot make causal claims about their results based on correlational neuroimaging evidence. Causal claims should be pared back. E.g., sentence 1 in the Results section: "The anterior pulvinar and MD contribute to early associative threat learning, as evidenced by increased functional activation in response to CS+ compared to CS- at the block level (Fig. 1b-c)." needs to be reworded to something like "The anterior pulvinar and MD have increased functional activation... This suggests that these areas may contribute to early associate threat learning."

      (2) Figure 1: The fact that the difference in BOLD activity between CS+ and CS- goes away on the third trial is not addressed. This is a very large effect in the data.

      (3) Figure 3: Could the observed network structure be due to anatomical proximity? Perhaps the authors should do an analogous analysis to what they did in Figure 2 for this intra-pulvinar analysis. This analysis doesn't take into account the indirect connections through corticothalamic and thalamocortical connections with the visual cortex and the pulvinar. There is an implicit assumption that there are interconnections between the pulvinar subnuclei, but there are few strong excitatory projections between these subnuclei to my knowledge. If visual areas are included in the graph, it would make things more complex, but would probably dramatically change the story. In this way, the message is somewhat constructed or arbitrary.

      (3) In the results section describing Figures 4-7, there are no statistics supporting the claims made. There needs to be a set of graphs comparing the results across the study sessions and days, with statistical comparisons between the different experiments to confirm differences.

      (4) Figure 7 does not include the major corticothalamic and thalamocortical projections from early, mid-level, and higher visual cortex to the different pulvinar nuclei. I doubt that there are strong direct projections between the pulvinar nuclei; rather, the functional connections are probably mediated through interconnections with cortical visual areas.

      (5) Stylistic: There are a lot of hypotheses and interpretations presented in this primary literature paper, which may be better suited for a review or perspective piece.

      (6) In the discussion, there is an assumption that the fMRI BOLD responses to CS+ and CS- need to be different to indicate that an area is processing these distinctly, but the BOLD signal can only detect large-scale changes in overall activity. It's easy to imagine that an area could be involved in processing these two stimuli distinctly without showing an overall difference in the gross amount of activity.

      (7) There is strong evidence that the BOLD responses to the threat-related and safety-related stimuli are different, modest evidence for their claims of learning/plasticity in these pathways, and circumstantial evidence supporting their hypothesized graph network models. Overall, most of the claims made in the discussion are better considered possible interpretations rather than proven findings - this is not a criticism, as these experiments and subject matter are extremely complex.

      This study continues to validate the power and utility of this in human fear conditioning/extinction paradigm, and extends this paradigm to investigating fear learning beyond the traditional limbic system pathways. It's possible that their models for the pulvinar nuclei interconnections could guide future neuromodulation or DBS studies that could provide more causal evidence for their hypotheses.

    4. Reviewer #3 (Public review):

      Summary:

      The present work was aimed at investigating the specific contributions of thalamic nuclei to associative threat learning and extinction. Using fMRI, they examined activation patterns across pulvinar divisions, the lateral geniculate nucleus (LGN), and the mediodorsal thalamus (MD) during threat acquisition, extinction, and recall. Their goal was to uncover whether distinct thalamic systems support different modes of learning-automatic survival mechanisms versus more deliberate processes - and to propose a hierarchical pulvinar model of fear conditioning. They also try to refine current neuroanatomical models of threat learning and memory, highlighting the role of thalamic nuclei in it.

      Strengths:

      (1) Valuable theoretical elaboration and modeling regarding the differential role of pulvinar subdivisions on feedforward (inferior, lateral) and higher-order integration (anterior), and their functional interplay with other relevant subcortical and cortical structures in associative threat and extinction learning.

      (2) Large sample sizes and multipronged analytical approaches were used for hypothesis testing.

      (3) Exhaustive literature review in the field of associative threat, as well as regarding the role of thalamic nuclei and other brain structures in it.

      Weaknesses:

      (1) Several weaknesses should be pointed out regarding how fMRI data were collected, as well as decisions regarding how the fMRI data were preprocessed and analyzed:

      a) fMRI data have low resolution (3 cubic mm), which certainly limits the examination of small nuclei such as the ones investigated here, and especially the examination of the LGN and inferior pulvinar.

      b) fMRI was normalized to standard space. Analyzing the data in individual-subject space would have given you the options of avoiding altering every participant's brain and of using a probabilistic thalamic atlas that better adapts to each subject's brain and thalamic nuclei (see, for instance, Iglesias et al., 2018). This would have been ideal and would have given the authors more precision, especially considering the low resolution of the fMRI data and the size of the thalamic nuclei of interest.

      c) On top of the two previous points, the authors decided to smooth the data to 6mm, which means that every single voxel within these small nuclei was blurred/mixed with the 2 immediately contiguous voxels (if they followed the standard SPM12 normalization resampling default which resamples, or upsamples the data in this case, to 2 x 2 x 2mm). Given the strong changes in structural connectivity and function that can occur, especially in the thalamus, on voxels of this size, this and the previous 2 decisions do not favor anatomical precision.

      d) Motion during scanning was poorly controlled in the preprocessing. Including the motion parameters as covariates of no interest in the GLM does not fully guarantee that motion is not influencing the results, and that motion is not differentially influencing some experimental conditions more than others.

      (2) It is not clearly indicated in the manuscript how many subjects and how many trials went into each of the analyses. It would be important to indicate this in the text and/or the figures.

      (3) It is not clear either, why, given the large sample size, some of the results were not conducted using reproducibility strategies such as dividing the sample into 2 or 3 groups or using further cross-validation strategies.

      (4) Limited testing of alternative hypotheses. The results clearly seem to be a selection of the findings supporting the hypotheses that the authors sought to confirm. (just one example: in the analysis reported in Figures 1-2; are there other correlations between the activation of the anterior pulvinar and MD with other pulvinar nuclei? only the MD-anterior Puv is reported).

      (5) The manuscript does not contain a limitations subsection. Practically every study has limitations, and this one is not an exception. Better to tell the limitations to the readers upfront so they can factor them into their evaluation of the relevance of the manuscript and reported evidence.

      (6) Data should be made available to the scientific community. Code too. Even if you just used standard fMRI toolboxes, any code used to run analyses will be helpful to the community, or if someone decides to try to replicate your findings.

      Despite these weaknesses and what can be derived from them, this manuscript constitutes a valuable contribution to the field to start characterizing and conceptualizing the involvement of thalamic nuclei and their interactions with other brain regions in the associative threat learning circuitries. It also paves the road for further testing of the functional dynamics among these regions and circuitries, and modeling testing.

    5. Author response:

      We thank the reviewers and editors for their thoughtful and constructive feedback. We have carefully considered the comments and plan to revise the manuscript as follows:

      · Methods: We will expand the Methods section to provide additional details regarding the Pavlovian fear conditioning procedure, including instructions, experimental parameters, and the randomization process.

      · Figures and Statistical Reporting: We will break down some figures where appropriate and clearly display the distributions of key variables. We will also include additional statistical details in the main text and elaborate on the analyses where needed.

      · Language and Interpretation: We will revise the text to consistently use correlational rather than causal terminology, ensuring that our conclusions accurately reflect the findings from the fMRI data.

      · Computational Model of the Pulvinar: We will further elaborate on the assumptions and limitations of the intra-pulvinar model, discuss potential neural pathways and candidate regions (e.g., visual cortex), and highlight directions for future work, including studies in nonhuman primates to investigate anatomical connectivity.

      · Alternative Hypotheses of the mediodorsal thalamus-anterior pulvinar relationships: Other pulvinar subregions were already included as covariates in our hierarchical regression analyses, allowing us to account for anatomical proximity and shared variance. We will make this analysis more explicit and clarify the thinking process behind this analysis to allow readers to assess the specificity of the anterior pulvinar-mediodorsal thalamus relationship.

      · Limitations: We will add a dedicated subsection outlining key limitations, including considerations specific to fMRI studies.

      · Data Availability: All data and materials used in this study will be made available upon request from the corresponding author, subject to obtaining the necessary institutional authorization for the data-sharing agreement.

      We are confident that these revisions will enhance the clarity, transparency, and interpretability of the work, and we are grateful to the reviewers for their valuable suggestions. We will provide a detailed, point-by-point response along with the revised submission as soon as possible.

    1. eLife Assessment

      In this manuscript, the authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. The authors combine extracellular electrophysiology of the hawkmoth antennae with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. The work provides valuable support for the hypothesis that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. Nevertheless, the evidence reported provides only incomplete support for their conclusions, especially with regard to the biological implications of their assumption-heavy models.

    2. Joint Public Review:

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      (6) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating, and the PTTF model proposed is somewhat disappointing. The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells. Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      (7) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      (8) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 "low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)". The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the "effectiveness [of OLC15] increased over time." They conclude that the drug "obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12)." The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      (9) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      (10) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases). The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      (a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      (b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      (c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

    3. Author response:

      Joint Public Review

      This manuscript puts forward the provocative idea that a posttranslational feedback loop regulates daily and ultradian rhythms in neuronal excitability. The authors used in vivo long-term tip recordings of the long trichoid sensilla of male hawkmoths to analyze spontaneous spiking activity indicative of the ORNs' endogenous membrane potential oscillations. This firing pattern was disrupted by pharmacological blockade of the Orco receptor. They then use these recordings together with computational modeling to predict that Orco receptor neuron (ORN) activity is required for circadian, not ultradian, firing patterns. Orco did not show a circadian expression pattern in a qPCR experiment, and its conductance was proposed to be regulated by cyclic nucleotide levels. This evidence led the authors to conclude that a post-translational feedback loop (PTFL) clockwork, associated with the ORN plasma membrane, allows for temporal control of pheromone detection via the generation of multi-scale endogenous membrane potential oscillations. The findings will interest researchers in neurophysiology, circadian rhythms, and sensory biology. However, the manuscript has limited experimental evidence to support its central hypothesis and is undermined by several questionable assumptions that underlie their data analysis and model builds, as well as insufficient biological data, including critical controls to validate and/or fully justify the model the authors are proposing.

      We thank the reviewers for their thorough and thoughtful comments and believe that the manuscript will be much stronger once we incorporate the requested changes.

      Please note that we used ORN as acronym for “olfactory receptor neuron” throughout the manuscript. ORNs contain odorant receptors (ORs), and in insects these ORs have to associate with the olfactory receptor co-receptor (Orco) in the cilium of the neuron to form functional OR-Orco complexes for odorant detection. Besides this chaperone function, Orco can form homomers with the potential to act as ionic pacemaker channels; a role which we explore in this study.

      Strengths:

      The study is notable for its combination of long-term in vivo tip recordings with computational modeling, which is technically challenging and adds weight to the authors' claims. The link between Orco, cyclic nucleotides, and circadian regulation is potentially important for sensory neuroscience, and the modeling framework itself - a stochastic Hodgkin-Huxley formulation that explicitly incorporates channel noise - is a solid and forward-looking contribution. Together, these elements make the study conceptually bold and of clear interest to circadian and olfactory biologists.

      Major weaknesses:

      At the same time, several limitations temper the conclusions. The pharmacological evidence relies on a single antagonist and concentration, without key controls. The circadian analysis is based on relatively small numbers of neurons, with rhythms detected only in subsets, and the alignment procedure used in constant darkness raises concerns of bias. The molecular evidence is sparse, with only three qPCR timepoints, and the model, while creative, rests on assumptions that are not yet fully supported by in vivo data.

      Please see our responses to the detailed comments.

      Detailed comments are provided below:

      (1) The role for Orco proposed in the authors' model largely stems from the effects seen following the administration of (a single dose) of the Orco antagonist, OLC15. However, this hypothesis is undercut by the lack of adequate pharmacological controls, including a basic multipoint OLC15 dose-response series in addition to the administration of blockers for the other channels that are embedded in their model, but which were ruled out as being involved in the modulation of biological rhythms. In addition, these studies would (ideally) also benefit from the inclusion of the same concentration (series) of an inactive OLC15 analog to better control for off-target effects.

      The Orco agonist VUAA1 (Jones et al., 2011) binds directly to Orco and increases the channel open time probability. In M. sexta hawkmoths, we have already published that VUAA 1 increases the low spontaneous activity of ORNs in a dose-dependent fashion (Nolte et al., 2016). Chen and Luetje (2012) systematically varied the chemical structure of VUAA1 to identify new Orco ligands and discovered 22 Orco Ligand Candidates (OLC) that either activated or inhibited Orco. In their heterologous expression system, Orco was most sensitive to inhibition by OLC15. Based on these results, we published a dose-response curve of OLC15 inhibition (1-100 µM) using in vivo tip recordings of pheromone-sensitive long trichoid sensilla of M. sexta (Nolte et al., 2016). In that study, we could also demonstrate that OLC15 antagonizes the VUAA1 activation of Orco.

      Furthermore, we tested other published Orco antagonists in in vivo assays in intact hawkmoths, focusing on amiloride-derived antagonists, because we previously identified an amiloride-sensitive cation channel in hawkmoth ORNs. We found that, in contrast to OLC15, the amilorides HMA and MIA were not Orco-specific but instead affected different targets depending on time-of-day (Nolte et al., 2016). Based on those experiments and the dose-response curves we determined that the Orco agonist VUAA1 (Jones et al., 2011) and the Orco antagonist OLC15 (Chen and Luetje, 2012) worked best in hawkmoth ORNs to target Orco pharmacologically. Based on comparative tests with other published Orco antagonists we settled since then in all further experiments on a dose of 50 µM OLC15.

      We will clarify the Methods section accordingly.

      (2) The expression pattern of Orco was assessed using qPCR at only three timepoints. Rhythmic transcripts can easily be missed with such sparse sampling (Hughes et al., 2017). A minimum of six evenly spaced timepoints across a 24-hour cycle would be required to confidently rule out circadian transcriptional regulation. In addition, the use of the timeless mRNA control from another study is not acceptable. Furthermore, qPCR analysis measures transcript abundance, not transcription, as the authors repeatedly state. Transcriptional studies would require nuclear run-off or, more recently, can be done with snRNAseq analysis. Taken together, these concerns undermine the authors' desire to rule out TTFL-based control that directly led them to implicate a PTTF-based model.

      We agree with the referees that more time points and a direct comparison between timeless and Orco mRNA levels should be included in this manuscript. We will include these additional qPCR experiments and edit the manuscript to make clear that we measure transcript abundance, but we will not perform snRNAseq analysis due to time- and financial constraints. We are currently working on the transcriptional control of Orco, both during ontogeny and throughout the day but this work in progress is beyond the scope of this manuscript.

      (3) The modelling presented is based on Orco as a ZT-dependent conductance tied to the cAMP oscillations that were reported by this group in the cockroach and from the presence and functionality in Manduca of homomeric Orco complexes that are devoid of tuning ORs. While these complexes have been generated in cell culture and other heterologous expression systems, as well as presumably exist in vivo in the Drosophila empty neuron and other tuning OR mutants, there is no evidence that these complexes exist in wild-type Manduca ORNs. While this doesn't necessarily undermine every aspect of their models, the authors should note the presence of Orco/OR complexes rather than Orco homomeric complexes.

      Our ELISAs found circadian oscillations in cAMP levels not only in antennae of the Madeira cockroach (Schendzielorz et al., 2014, 2012), but also in hawkmoth antennae (Schendzielorz et al., 2015). We will add the 2015 citation to the Modeling chapter in the Methods section to clarify this.

      We agree with the referees that we cannot distinguish between Orco homo- and heteromers in the different compartments of our hawkmoth ORNs. Thus, as the referee suggests, we will add text regarding the presence and localization of OR-Orco heteromers. However, we have indications that Orco homomers could indeed be present in the hawkmoth ORNs. In a heterologous expression system, MsexOrco expression alone was sufficient to increase intracellular Ca<sup>2+</sup> levels in response to VUAA1 application (Nolte et al., 2013). In differentiating primary cell cultures of hawkmoth antennae, Orco expression started during a developmental time window where ORNs did not yet express pheromone receptors, and Orco affected spontaneous activity (Nolte et al., 2016). Thus, Orco homomers are present in developing hawkmoth ORNs during a time window where ORNs already express spontaneous activity but cannot heteromerize with pheromone receptors. However, we do not know whether and in what ratio homo- and heteromers of Orco and ORs are present in the respective sensillum compartments of adult hawkmoths (Nolte et al., 2013; Stengl, 1994; Stengl and Hildebrand, 1990).

      We will clarify our manuscript accordingly.

      (4) Some aspects of the authors' models, most notably the decision to phase align/optimize their DD and OLC15 recordings, are likely to bias their interpretations.

      It is consensus that insects display daily and circadian rhythms in pheromone-dependent mating, odor-gated feeding, and egg-laying behavior that phase-locks to environmental rhythms, corresponding with daily/circadian rhythms of sensory neuron physiology (e.g., Merlin et al., 2007; Rymer et al., 2007; Schendzielorz et al., 2015, 2012). However, circadian rhythms can be easily masked by stress, like the disturbances during a very challenging long-term recording experiment over several days. In addition, we observed in our animal raising facility that in LD 17:7 light-dark cycles the originally nocturnal hawkmoths M. sexta distribute their activity patterns over the course of the day, finding nocturnal as well as diurnal hawkmoths. Thus, light-dark cycles were not enough to ensure phase-synchronized behavioral rhythms, and it is very likely that the nocturnal hawkmoths rely heavily on pheromone/odor dependent synchronization as also found in other moth species (Ghosh et al., 2024). Here, we used isolated males that were never exposed to the female pheromones so that their circadian activity patterns readily disperse. Therefore, it became necessary in free-running conditions to first determine the respective behavioral rhythm for each animal, and then to phase-align their activity patterns to allow for statistical analysis. Otherwise, circadian differences would average out in a free-running population. As requested by the referees in point (7), we will use additional tests for rhythmicity in each of our recordings and revise the manuscript accordingly.

      Assuming that hawkmoths need pheromone presence as additional Zeitgeber, we are currently working on a new set of experiments where we attempt to improve synchronization by exposure to LD cycles and pheromone before DD and OLC15 recordings. We will add these experiments to the manuscript.

      (5) The tip recordings from long trichoid sensilla are critical aspects of this study. These recordings were carried out on upper sensillar tips located on the distal-most second annulus. Since there are approximately 80 annuli on the Manduca antennae, it is unclear whether the recordings are representative of the antennal response.

      We think the reviewers might have misinterpreted our description of the recording site. In the Methods, we state that we clip off the 20 most distal annuli (leaving a stump of about 60 annuli) and insert the reference electrode into the flagellum up to the second annulus from the cut end, i.e., the recording site is located at 2/3 – 3/4 of the antenna length as seen from the head of the animal. We will make this more clear in the Methods section.

      In addition, our lab did show with antibody stainings against Orco that apparently all ORNs that innervate long and short trichoid sensilla along the whole flagellum express the same staining pattern (Nolte et al., 2016). Furthermore, our patch clamp recordings of primary cell cultures of whole male antennae found largely overlapping ion channel populations across ORNs. This would indicate that all ORNs, whether they express pheromone- or general odorant receptors, could potentially share the same Orco-dependent spontaneous activity rhythms. In our lab, different experimenters from different years that recorded from long trichoid sensilla on different annuli did not detect obvious differences in neither the spontaneous activity nor the pheromone responses (c.f., Dolzer et al., 2003; Gawalek and Stengl, 2018; Schneider et al., 2025). Thus, it is very likely that we are reporting a general encoding mechanism that is not locally restricted along the antennal flagellum.

      (5.1) The authors do not provide any data in support of their cAMP/cGMP-based Orco gating…

      There are publications supporting cyclic nucleotide gating of Orco in Drosophila, but only after previous phosphorylation via protein kinase C (PKC; review: (Wicher and Miazzi, 2021)). Since Orco is very conserved among insect species, it is likely that these PKC and cGMP/cAMP-dependent regulations are present in other insect species. We are currently running thorough tip-recording experiments on the regulation of Orco gating, which are beyond the scope of this manuscript. However, we will add a set of experiments to this manuscript that demonstrates cAMP gating of Orco.

      (5.2)… and the PTTF model proposed is somewhat disappointing.

      For a detailed introduction of our PTFL membrane clock hypothesis please see our opinion paper (Stengl and Schneider, 2024).

      (5.3) The model seems to be influenced by their long-held proposal that insect olfactory signaling has a critical metabotropic component involving cyclic nucleotides, PKC, etc, a view that may be influenced by the use of Orco homomeric complexes generated in HEK cells.

      Indeed, we propose a metabotropic pheromone-transduction cascade, which in moths and cockroaches is based on G-protein-mediated activation of phospholipase C but not on adenylyl cyclase activation. Our hypothesis is not influenced by HEK cell heterologous expression studies of Orco but is supported by our own work comparing in vivo tip recordings of intact hawkmoths with patch clamp experiments on hawkmoth primary cell cultures of olfactory receptor neurons, which are able to respond to their species-specific pheromones in vitro ((Schneider et al., 2025; Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In addition, a multitude of publications by other laboratories with in vivo and in vitro studies using physiological, genetic, and immunocytochemical assays all support a metabotropic signal transduction cascade in insect olfaction (reviews: Stengl, 2010; Stengl and Funk, 2013; Wicher and Miazzi, 2021). In contrast, the hypothesis suggesting a solely ionotropic pheromone- and general odor-dependent transduction cascade for all insect species is based on very sparse experimental evidence, based primarily on heterologous expression studies such as HEK cells that lack the insect’s WT molecular surroundings, and thus, cannot predict OR-Orco function in vivo. Furthermore, the ionotropic hypothesis is heavily based upon the argument that an inverse 7TM receptor cannot couple to G-proteins, which lacks careful backup via biochemical and structural studies. In addition, the ionotropic hypothesis lacks support via carefully performed physiological in vivo studies in different insect species that paid attention to analysis of the distinct kinetic components of ORN´s odor/pheromone responses and that employ physiological concentrations and durations of odor/pheromone stimuli (please see our most recent publication by Schneider et al. (2025)).

      (5.4) Nevertheless, structural studies on Orco do not support a cyclic nucleotide binding site, although PKC-based phosphorylation has been implicated in the fine-tuning/adaptation of olfactory signaling.

      While structural studies did not find evidence for conserved known cyclic nucleotide binding sites on Orco, this does not exclude the presence of so far unknown binding sites, or via sites that fold out only after a specific sequence of previous phosphorylations of the many phosphorylation sites on Orco. Indeed, physiological studies in Drosophila presented evidence for cyclic nucleotide dependence of Orco after previous PKC-dependent phosphorylation (Getahun et al., 2013). Our ongoing in vivo experiments in hawkmoths further corroborate a PKC- and cAMP-dependent modulation of Orco. These studies will be published in a follow-up publication.

      (6) Because only 5/11 LD and 7/10 DD animals showed daily rhythms, with averages lacking clear daily modulation, the methods are not sufficiently reliable enough to reveal novel underlying mechanisms of circadian rhythm generation. The reported results are therefore not yet reliable or quantifiable. To quantify their results, the authors should apply tests for circadian rhythmicity using methods such as RAIN, JTK CYCLE, MetaCycle, or Echo. The use of FFT and Wavelet is applauded, but these methods do not have tests of significance for rhythms and can be biased when analyzing data in which there could only be 1-3 circadian cycles. Because the conclusions appear to be based on 11-12 neurons that were recorded for 2-4 days, the reader is concerned that the methods are not yet perfected to provide strong evidence for circadian regulation of spontaneous firing of ORNs. The average data (e.g., Figure 3Bii and 3Cii) highlight the apparent lack of daily rhythms. In summary, the results would be more compelling if more than 50% of the recordings had significant circadian amplitudes and with similar periods and phases.

      The long-term tip-recordings of intact hawkmoths are very challenging and take a very long time to accomplish, thus, we are very happy that we succeeded in obtaining so many of them (N=34). Since 5/11 LD recordings and 7/10 DD recordings revealed daily/circadian rhythmicity and since many other physiological recordings at different ZTs of different members of our laboratory all revealed ZT-dependent pheromone-transduction we can be certain that the physiology of hawkmoth antennae is under strict circadian control. Please see also our response to (4) above commenting the phase-dispersal of activity rhythms observed in our experiments, as well as in the behavior of hawkmoth males in the mating cage.

      Nevertheless, we will follow the advice of the referees to apply additional tests for significance of rhythms in spontaneous activity, and we are thankful for the tests suggested that we were not aware of.

      (7) The statement that circadian patterns of ORN firing are lost with the Orco antagonist (OLC15) is not strongly supported. The manuscript should be revised to quantify how Orco changed circadian amplitude in the 12 recorded neurons. Measures of circadian amplitude can avoid confusing/vague statements like Line 394 “low and high frequency bands appeared to merge during the activity phase around ZT 0 in the animals that showed clear circadian rhythms (N = 5 of 11 in LD)”. The conclusion that Orco blocks circadian firing appears to be contradicted by Figure 6, which indicates that ~6 of these neurons had circadian periods detected by wavelet. The manuscript would be strengthened with details about the specificity and reproducibility of the Orco antagonist. The authors quantify the gradual decrease in firing with the slope of a linear fit to estimate how the “effectiveness [of OLC15] increased over time.” They conclude that the drug “obliterated circadian rhythms and attenuated the spontaneous activity in several, but not all experiments (N = 8 of 12).” The report would be greatly strengthened with corroborating data from additional Orco antagonists and additional doses of OLC15 (the authors use only 50 uM OLC15).

      We will revise our data analysis, according to the valuable suggestions of the referees.

      However, based upon our previous studies with other Orco antagonists and different doses of OLC15 (Nolte et al., 2016) we found that 50 µM OLC15 is the best Orco antagonist dose in M. sexta to target Orco-dependent modulation of spontaneous action potential activity of hawkmoth olfactory receptor neurons. Please see also our response to (1).

      (8) The manuscript includes several statements that are more speculation than conclusion. For example, there is no evidence for tuning or plasticity in this report. Statements like the following should be removed or addressed with experiments that show changes in odor response specificity or sensitivity: "ORN signalosomes are highly plastic endogenous PTFL clocks comprising receptors for circadian and ultradian Zeitgebers that allow to tune into internal physiological and external environmental rhythms as basis for active sensing." (Discussion Line 622). The paper concludes that (line 380) "mean frequency of spontaneous spiking and the frequency of bursting expressed daily modulation, and are both most likely controlled via a circadian clock that targets the leak channel Orco." This is too bold given the available results.

      We will revise the discussion accordingly and clarify which statements are supported via published evidence and which are predictions based upon our novel hypothesis published in our opinion paper (Stengl and Schneider, 2024).

      (9.1) Because Orco conductance is modulated by cyclic nucleotides, it remains highly plausible that circadian regulation occurs upstream at the level of signaling pathways (e.g., calcium, calcium-binding proteins, GPCRs, cyclases, phosphodiesterases).

      We agree with the referees that it is very likely that there are multiple layers of interconnected feedback cycles that control Orco localization and activity. Our novel hypothesis suggests interlocked TTFL and PTFL control of physiological circadian rhythms, not strictly hierarchical TTFL control, which would require a daily turnover of membrane proteins and transcriptional control via the established TTFL clock in insect ORNs. We currently search for TTFL control at all levels of odor/pheromone transduction using ZT-dependent transcriptomics in combination with qPCR and single nuclear transcriptomics, involving also all the molecules suggested by the referees. These studies are ongoing, are very time- and money-consuming, and are beyond the scope of this manuscript.

      (9.2) The possibility that circadian oscillations of cyclic nucleotides are generated by the canonical TTFL mechanism has not been excluded. In fact, extensive work in Drosophila has demonstrated that the TTFL-based molecular clock proteins are required for circadian rhythms in olfaction.

      Our experiments that test circadian TTFL control at different levels of the cAMP transduction cascade in hawkmoth antennae are on the way and are part of another publication. We will revise our discussion accordingly.

      The experiments published for TTFL dependent control of Drosophila olfaction that we are aware of (Krishnan et al., 1999; Tanoue et al., 2004) do not exclude interlinked PTFL and TTFL clocks. Krishnan et al. (1999) demonstrate that the TTFL clock in antennal olfactory receptor neurons correlates with circadian rhythms in odor responses measured in electroantennogram (EAG) recordings, not in single sensillum recordings as in our experiments. EAG recordings comprise not only voltage responses of the olfactory sensory neurons but also voltage changes generated in non-neuronal antennal cells such as trichogen and tormogen cells that built the transepithelial potential gradient via vATPases that generates the high K<sup>+</sup> concentration in the sensillum lymph (Jain et al., 2024; Klein, 1992; Thurm and Küppers, 1980). In addition, EAG recordings most likely contain responses of afferent neurons originating from somata in the brain that maintain central control of the antennae. Thus, EAG recordings are difficult to interpret.

      (11) A defining feature of circadian oscillators is the feedback mechanism that generates a time delay (e.g., PERIOD/TIMELESS repressing their own transcription). While the authors describe how cyclic nucleotides can regulate Orco conductance, they do not provide a convincing explanation of how Orco activity could, in turn, feed back into the proposed PTFL to sustain oscillations. For these reasons, the authors should consider:

      a) Providing a broader discussion of non-TTFL models of circadian rhythms (e.g., redox cycles, post-translational modifications).

      We will revise the discussion accordingly.

      b) Reassessing Orco expression using a higher-resolution temporal sampling ({greater than or equal to}6 timepoints per 24 h).

      We will add those experiments to the revised version of the manuscript (see our response to (2)).

      c) Clarifying or revising the PTFL model to explicitly address how feedback would be achieved. Alternatively, the data may be more consistent with Orco conductance rhythms being regulated by post-translational mechanisms downstream of the canonical TTFL oscillator, as suggested by the Drosophila olfactory system literature.

      We will revise the manuscript accordingly.

      Minor weaknesses:

      (1) The authors should compare the firing patterns of ORN neurons to the bursts, clusters, and packets of retinal efferent spikes reported in Liu JS and Passaglia CL (2011; JBR). By comparing measures in moths to measures in Limulus, the authors might be able to address the question: Is the daily firing pattern of ORN neurons likely a conserved feature of circadian control of sensory sensitivity?

      We will revise the discussion accordingly.

      (2) The methods need further details. For example, it is unclear if or how single neuron activity was discriminated and whether the results were compromised by the relatively large environmental fluctuations in temperature (21-27oC), humidity (35-60%), or other cues known to modulate spontaneous firing.

      We will clarify the Methods section.

      References

      Chen S, Luetje CW. 2012. Identification of New Agonists and Antagonists of the Insect Odorant Receptor Co-Receptor Subunit. PLOS ONE 7:e36784. doi:10.1371/journal.pone.0036784

      Dolzer J, Fischer K, Stengl M. 2003. Adaptation in pheromone-sensitive trichoid sensilla of the hawkmoth Manduca sexta. J Exp Biol 206:1575–1588. doi:10.1242/jeb.00302

      Gawalek P, Stengl M. 2018. The Diacylglycerol Analogs OAG and DOG Differentially Affect Primary Events of Pheromone Transduction in the Hawkmoth Manduca sexta in a Zeitgebertime-Dependent Manner Apparently Targeting TRP Channels. Front Cell Neurosci 12:218. doi:10.3389/fncel.2018.00218

      Getahun MN, Olsson SB, Lavista-Llanos S, Hansson BS, Wicher D. 2013. Insect Odorant Response Sensitivity Is Tuned by Metabotropically Autoregulated Olfactory Receptors. PLOS ONE 8:e58889. doi:10.1371/journal.pone.0058889

      Ghosh S, Suray C, Bozzolan F, Palazzo A, Monsempès C, Lecouvreur F, Chatterjee A. 2024. Pheromone-mediated command from the female to male clock induces and synchronizes circadian rhythms of the moth Spodoptera littoralis. Curr Biol 34:1414-1425.e5. doi:10.1016/j.cub.2024.02.042

      Jain K, Prelic S, Hansson BS, Wicher D. 2024. Expression of Drosophila melanogaster V-ATPases in Olfactory Sensillum Support Cells. Insects 15:1016. doi:10.3390/insects15121016

      Jones PL, Pask GM, Rinker DC, Zwiebel LJ. 2011. Functional agonism of insect odorant receptor ion channels. Proc Natl Acad Sci 108:8821–8825. doi:10.1073/pnas.1102425108

      Klein U. 1992. The insect V-ATPase, a plasma membrane proton pump energizing secondary active transport: immunological evidence for the occurrence of a V-ATPase in insect ion-transporting epithelia. J Exp Biol 172:345–354. doi:10.1242/jeb.172.1.345

      Krishnan B, Dryer SE, Hardin PE. 1999. Circadian rhythms in olfactory responses of Drosophila melanogaster. Nature 400:375–378. doi:10.1038/22566

      Merlin C, Lucas P, Rochat D, François M-C, Maïbèche-Coisne M, Jacquin-Joly E. 2007. An Antennal Circadian Clock and Circadian Rhythms in Peripheral Pheromone Reception in the Moth Spodoptera littoralis. J Biol Rhythms 22:502–514. doi:10.1177/0748730407307737

      Nolte A, Funk NW, Mukunda L, Gawalek P, Werckenthin A, Hansson BS, Wicher D, Stengl M. 2013. In situ Tip-Recordings Found No Evidence for an Orco-Based Ionotropic Mechanism of Pheromone-Transduction in Manduca sexta. PLOS ONE 8:e62648. doi:10.1371/journal.pone.0062648

      Nolte A, Gawalek P, Koerte S, Wei H, Schumann R, Werckenthin A, Krieger J, Stengl M. 2016. No Evidence for Ionotropic Pheromone Transduction in the Hawkmoth Manduca sexta. PLOS ONE 11:e0166060. doi:10.1371/journal.pone.0166060

      Rymer J, Bauernfeind AL, Brown S, Page TL. 2007. Circadian rhythms in the mating behavior of the cockroach, Leucophaea maderae. J Biol Rhythms 22:43–57. doi:10.1177/0748730406295462

      Schendzielorz J, Schendzielorz T, Arendt A, Stengl M. 2014. Bimodal Oscillations of Cyclic Nucleotide Concentrations in the Circadian System of the Madeira Cockroach Rhyparobia maderae. J Biol Rhythms 29:318–331. doi:10.1177/0748730414546133

      Schendzielorz T, Peters W, Boekhoff I, Stengl M. 2012. Time of Day Changes in Cyclic Nucleotides Are Modified via Octopamine and Pheromone in Antennae of the Madeira Cockroach. J Biol Rhythms 27:388–397. doi:10.1177/0748730412456265

      Schendzielorz T, Schirmer K, Stolte P, Stengl M. 2015. Octopamine Regulates Antennal Sensory Neurons via Daytime-Dependent Changes in cAMP and IP3 Levels in the Hawkmoth Manduca sexta. PLOS ONE 10:e0121230. doi:10.1371/journal.pone.0121230

      Schneider AC, Schröder K, Chang Y, Nolte A, Gawalek P, Stengl M. 2025. Hawkmoth Pheromone Transduction Involves G-Protein–Dependent Phospholipase Cβ Signaling. eNeuro 12:ENEURO.0376-24.2024. doi:10.1523/ENEURO.0376-24.2024

      Stengl M. 2010. Pheromone Transduction in Moths. Front Cell Neurosci 4:133. doi:10.3389/fncel.2010.00133

      Stengl M. 1994. Inositol-trisphosphate-dependent calcium currents precede cation currents in insect olfactory receptor neurons in vitro. J Comp Physiol A 174:187–194. doi:10.1007/BF00193785

      Stengl M, Funk NW. 2013. The role of the coreceptor Orco in insect olfactory transduction. J Comp Physiol A 199:897–909. doi:10.1007/s00359-013-0837-3

      Stengl M, Hildebrand JG. 1990. Insect olfactory neurons in vitro: morphological and immunocytochemical characterization of male-specific antennal receptor cells from developing antennae of male Manduca sexta. J Neurosci 10:837–847. doi:10.1523/JNEUROSCI.10-03-00837.1990

      Stengl M, Schneider AC. 2024. Contribution of membrane-associated oscillators to biological timing at different timescales. Front Physiol 14:1243455. doi:10.3389/fphys.2023.1243455

      Tanoue S, Krishnan P, Krishnan B, Dryer SE, Hardin PE. 2004. Circadian Clocks in Antennal Neurons Are Necessary and Sufficient for Olfaction Rhythms in Drosophila. Curr Biol 14:638–649. doi:10.1016/j.cub.2004.04.009

      Thurm U, Küppers J. 1980. Epithelial physiology of insect sensilla In: Locke M, Smith DS, editors. Insect Biology in the Future. Academic Press. pp. 735–763. doi:10.1016/B978-0-12-454340-9.50039-2

      Wicher D, Miazzi F. 2021. Functional properties of insect olfactory receptors: ionotropic receptors and odorant receptors. Cell Tissue Res 383:7–19. doi:10.1007/s00441-020-03363-x

    1. eLife Assessment

      This study presents valuable new insights into the patterns of organelle inheritance in the protozoan parasite Toxoplasma gondii. An innovative dual-labeling approach used in this study to track maternal-derived and de novo synthesized organelles provides a technical advance with potential to be more broadly applied. Solid evidence is provided that different organelles show distinct inheritance fates during cell replication; however, the data describing the residual body component in this process is incomplete.

    2. Reviewer #1 (Public review):

      Summary:

      This work asks the question of how different organelles and structures in the apicomplexan parasite Toxoplasma gondii are recycled and/or segregated to the daughter cells during cell replication. In particular, they consider an unusual cell structure called the residual body that links replicating cells during the intracellular infection stage of this parasite. The residual body has historically been considered a 'dumping ground' for unnecessary relics of the mother cell during division, but this notion is increasingly being revised. Indeed, cell replication in Toxoplasma is often misinterpreted as cell division (cytokinesis), but in fact, the cell replicates its organelles and structures to multiple 10s of copies in seemingly distinctly formed daughter cells, but cytokinesis is delayed for many such cycles and typically only occurs simultaneously with parasite egress from its host cell. The residual body is, in fact, the connection between these pre-cytokinetic replicated daughters, and effectively, this is still a single cell at this stage. The authors have previously shown that an actin network extends through the residual body between these daughter cells, and ER and mitochondria common to all cells are also linked through this structure. This study examining the fates of organelles during cell replication is timely for continuing our understanding of how this fascinating component of the cell participates in these processes. The authors use Halo-tags as their principal tool to track discrete populations of proteins, labelling their organelle locations, and this provides beautiful insight into these processes.

      Strengths:

      Using dyes conjugated to Halo tags, this work elegantly tracks the fates of proteins synthesised by an original 'mother' cell over several replication cycles of pre-cytokinetic 'daughters'. Using this tool, they show that some organelles are made intact just once and that some of these can be subsequently sorted to the daughters (micronemes and rhoptries) while others are dismantled (IMC) and the daughters must make their own. A third set of organelles (largely synthesis, sorting, and metabolic compartments) is divided and inherited, and new daughter-synthesised proteins are added to the preexisting maternal proteins in these structures. A role for actin and myosin is clearly demonstrated for micronemes and rhoptries, and this correlates with their relatively late inheritance into the developing daughters. Overall, this work gives clarity to the behaviours of several cell structures during replication and paves the way to a better understanding of the mechanisms that drive the differences between structures and the universality of these processes in other apicomplexan parasites.

      Weaknesses:

      In addressing the question of residual body participation in sorting of organelles, it would be useful to clearly define this structure and when and where it is delineated from the posterior of a mother cell during the formation of daughter structures. This might seem like a moot point, but it would give clarity to notions of recycling and 'reservoirs'. Mother cells retain their active invasion apparatus until very late in daughter formation, and the need for micronemes and rhoptries to be released from this service late in the process might explain why they are only then trafficked to the cell posterior and then into the daughters. So, is this a distinct 'residual body' body function/reservoir or just a spatial constraint of this sequence of daughter formation? In subsequent cell replications (4, 8, 16... stages), is there a separation between the residual body that links them all and the posterior of each new 'mother cell', and if so, when is this distinction lost? This is important because without a definition, we might be confusing different processes. Are rhoptries/micronemes that originate in one 'mother' able to be sorted to the 'daughters' from a distinct mother in this syncytium? If so, this would make it a sorting centre, but otherwise we could be just capturing the activities at the posterior of any given cell during replication. The authors' further thoughts on this would be very interesting.

      The Group 2 structures are described as those that are divided between daughters and receive newly synthesised proteins that add to the maternal protein of these compartments. While this is a logical conclusion for several that are mentioned, where the maternal protein signal is seen to be depleted with replication (including for the apicoplast, ER, glideosome, and Golgi). Data for the addition of new proteins to these existing structures is actually only presented in direct support of this for the Golgi.

    3. Reviewer #2 (Public review):

      Summary:

      Toxoplasma gondii is an obligate intracellular parasite and the causative agent of Toxoplasmosis. Parasite invasion into host cells, intracellular replication, and then egress, which results in the destruction of the infected cell, is central to pathogenicity. This manuscript focuses on understanding how maternal resources (in this case, cellular organelles) are shared between daughter parasites during cell division. Many organelles are single copy, meaning that division and inheritance by the daughters is crucial for successful replication. The major strength of this study was the use of a Halo-based pulse chase assay to characterize patterns of organelle inheritance. The results show that both microneme and rhoptries (secretory vesicles) previously thought to be synthesized de novo are inherited by daughter parasites. Thus, this paper adds new insight to our understanding of cell division in this important parasite.

      Strengths:

      This study demonstrated that pulse labeling of proteins can be used to monitor protein synthesis, turnover, and movement. This approach will be of great interest to the field. Using this method, the authors demonstrate three main modes of organelle inheritance.

      (1) Organelles, where there are multiple copies (such as secretory vesicles, micronemes, and rhoptries), are divided between the daughter parasites, with additional contribution of newly formed vesicles. New and old material remain as separate entities in the cell.

      (2) Single-copy organelles, which are expanded to include newly synthesized material prior to division, such as the Golgi and apicoplast.

      (3) Cytoskeletal structures that are synthesized anew during each round of division. These studies provide more refined insight into patterns or organelle inheritance and demonstrate that secretory organelles are not made de novo during each round of division as previously thought. The paper has a logical flow, and overall, the data is presented in a clear and organized fashion.

      Weaknesses:

      (1) Descriptions of methodology and statistical analysis were incomplete.

      (2) There are inconsistencies between the data in Figures 1 and 5. In Figure 1, a small amount of maternal IMC is visible in stage 2 parasites. Although this is a ~90% reduction, these parasites should be quantified as parasites with material IMC. However, the graph in Figure 5C indicates that no material parasites have GAPM1a, given that graph 5C is a binary measure (present vs. absent), one would expect a non-zero percent of parasites to have maternal material.

      (3) The conclusion from Figure 6 was not justified based on the data. I agree with the author's conclusion that the accumulation of micronemes and rhoptries in the residual body was time-dependent. In Figure 6A, the signal observed in the residual body at times 6:30, 13, and 14 hours is not observed in subsequent time points. However, the fate of these micronemes and rhoptries is unclear. It cannot be concluded that these vesicles are recycled back to the mother. They could also have been degraded. In fact, the graphs of microneme inheritance in Figure 2B show a decrease in maternal signal from 100% to 80% between stages 1 and 2, indicating that some microneme degradation is taking place.

      (4) To convincingly demonstrate that the redistribution of micronemes and rhoptries was due to recovery of MyoF protein levels after auxin washout, a Western blot should be performed to show MyoF protein levels over time. In addition, the decrease in mMIC2 protein levels in the residual body in Figure 8F should be measured and normalized for photobleaching. Both apical and basal signals appear to be reduced over the time course of imaging.

    4. Reviewer #3 (Public review):

      Summary:

      Knoerzer-Suckow et al. explore the mechanisms of organelle inheritance during endodyogeny in Toxoplasma gondii using an innovative dual-labeling approach to track the distribution of maternal organelles into daughter parasites. They can clearly distinguish between maternal and daughter-derived organelles using their dual-labeling Halo Tag approach. They reveal that different organelles are trafficked to daughter parasites in three broad patterns, which they have binned into groups. Their findings reveal a role for MyoF in the inheritance of micronemes and rhoptries, and notably, they observe that the inner membrane complex (IMC) is not recycled. Instead, the IMC undergoes a pronounced relocalization to the posterior of the maternal cell, where it is likely targeted for degradation.

      Strengths:

      The data surrounding their MyoF knockdown experiments, IMC degradation, and trafficking of MIC2 after auxin washout are compelling. These data add to the knowledge of how organelle inheritance occurs in T. gondii, increasing the field's understanding of endodyogeny.

      Weaknesses:

      (1) The evidence provided to support the claim that microneme and rhoptry inheritance specifically traffics through the residual body does not sufficiently substantiate the claim. The temporal resolution of the imaging is inadequate to precisely trace the path of microneme and rhoptry inheritance. From the data shown in the manuscript, it can be concluded that at least some of the micronemes and rhoptries might be recycled through the residual body, but it is unclear whether many or most of these organelles do so.

      (2) The absence of specific markers for the residual body brings into question whether microneme inheritance occurs through a discrete residual body or simply via the basal end of the maternal parasite. The authors need a robust way to visualize and define the residual body to claim that micronemes and rhoptries are specifically transported through this structure.

    1. eLife Assessment

      This is a solid paper on intermittent fasting that will be of interest to readers. The data presented are certainly valuable as a resource. The findings of both shared and tissue-specific signatures, both at the proteomic and transcriptomic levels, align well with what has been established and bring new insight into metabolic adaptation and its consequences in muscle, cortex, and liver.

    2. Reviewer #1 (Public review):

      Summary:

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience.

      Strengths:

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      Weaknesses:

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot.

    3. Reviewer #2 (Public review):

      Summary:

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations.

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths:

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study.

      Weaknesses:

      The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

    4. Reviewer #3 (Public review):

      Summary:

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths:

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design.

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues.

      (3) The authors did not overstep their conclusions and imply an overreached mechanism.

      Weaknesses:

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

    5. Author response:

      Reviewer #1 (Public review):

      Summary: 

      In this study, the authors employed comprehensive proteomics and transcriptomics analysis to investigate the systemic and organ-specific adaptations to IF in males. They found that shared biological signaling processes were identified across tissues, suggesting unifying mechanisms linking metabolic changes to cellular communication, which revealed both conserved and tissue-specific responses by which IF may optimize energy utilization, enhance metabolic flexibility, and promote cellular resilience. 

      Strengths: 

      This study detected multiple organs, including the liver, brain, and muscle, and revealed both conserved and tissue-specific responses to IF.

      We appreciate the recognition of the study’s strengths and the opportunity to clarify the points raised.

      Weaknesses: 

      (1) Why did the authors choose the liver, brain, and muscle, but not other organs such as the heart and kidney? The latter are proven to be the largest consumers of ketones, which is also changed in the IF treatment of this study.

      We agree that the heart and kidney are critical organs in ketone metabolism. Our selection of the liver, brain, and muscle was guided by their distinct metabolic functions and relevance to systemic energy balance, neuroplasticity, and locomotor activity, key domains influenced by intermittent fasting (IF). These tissues also offer complementary perspectives on central and peripheral adaptations to IF. Notably, we have previously examined the effects of IF on the heart (eLife 12:RP89214), and we fully acknowledge the importance of the kidney. We intend to include it in future studies to broaden the scope and deepen our understanding of IF-induced systemic responses.

      (2) The proteomics and transcriptomics analyses were only performed at 4 months. However, a strong correlation between IF and the molecular adaptations should be time point-dependent.

      We appreciate this insightful comment. The 4-month time point was selected to capture long-term adaptations to IF, beyond acute or transitional effects. While we acknowledge that molecular responses to IF are time-dependent, our goal in this study was to establish a foundational understanding of sustained systemic and tissue-specific changes. We fully agree that a longitudinal approach would provide deeper insights into the temporal dynamics of IF-induced adaptations. To address this, we are currently undertaking a comprehensive 2-year study that is specifically designed to explore these time-dependent effects in greater detail.

      (3) The context lacks a "discussion" section, which would detail the significance and weaknesses of the study.

      We appreciate this observation. The manuscript was originally structured to emphasize results and interpretation within each section, but we recognize that a dedicated discussion section would enhance clarity and contextual depth. In the revised version, we will add a comprehensive discussion section addressing broader implications, limitations, and future directions of the study.

      (4) There is no confirmation for the proteomic and transcriptomic profiling. For example, the important changes in proteomics could be further identified by a Western blot. 

      We acknowledge the importance of orthogonal validation to support high-throughput findings. While our study primarily focused on uncovering systemic patterns through proteomic and transcriptomic profiling, we agree that targeted confirmation would strengthen the conclusions. To this end, we have included immunohistochemical validation of a key protein common to all three organs—Serpin A1C. Additionally, we are planning a dedicated follow-up study to expand functional validation of several key proteins identified in this manuscript, which will be pursued as a separate project.

      Reviewer #2 (Public review): 

      Summary: 

      Fan and colleagues measure proteomics and transcriptomics in 3 organs (liver, skeletal muscle, cerebral cortex) from male C57BL/6 mice to investigate whether intermittent fasting (IF; 16h daily fasting over 4 months) produces systemic and organ-specific adaptations. 

      They find shared signaling pathways, certain metabolic changes, and organ-specific responses that suggest IF might affect energy utilization, metabolic flexibility, while promoting resilience at the cellular level.

      Strengths: 

      The fact that there are 3 organs and 2 -omics approaches is a strength of this study. 

      We appreciate the reviewer’s recognition of the breadth of our study design. By integrating proteomics and transcriptomics across three metabolically distinct organs, we aimed to provide a comprehensive view of systemic and tissue-specific adaptations to IF. This multi-organ, multi-omics approach was central to uncovering both conserved and divergent biological responses.

      Weaknesses: 

      (1) The analytical approach of the data generated by the present study is not well posed, because it doesn't help to answer key questions implicit in the experimental design. Consequently, the paper, as it is for now, reads as a mere description of results and not a response to specific questions.

      We thank the reviewer for this important observation. Our initial aim was to establish a foundational atlas of molecular changes induced by IF across key organs. However, we recognize that clearer framing of the biological questions would enhance interpretability. In the revised manuscript, we will have restructured the introduction, results, and discussion to align more explicitly with specific hypotheses, particularly those related to energy metabolism, cellular resilience, and inter-organ signaling. We have also added targeted analyses and clarified how each dataset contributes to answering these questions.

      (2) The presentation of the figures, the knowledge of the literature, and the inclusion of only one sex (male) are all weaknesses.

      We appreciate this feedback and agree that these are important considerations. Regarding figure presentation, we will revise several figures for improved clarity, add more descriptive legends, and reorganize supplemental materials to better support the main findings. On the literature front, we will expand the discussion to include recent and relevant studies on IF, metabolic adaptation, and sex-specific responses. As for the use of only male mice, this was a deliberate choice to reduce hormonal variability and focus on establishing baseline molecular responses. We fully acknowledge the importance of sex as a biological variable and will soon be conducting studies in female mice to address this gap.

      Reviewer #3 (Public review):

      Summary: 

      Fan et al utilize large omics data sets to give an overview of proteomic and gene expression changes after 4 months of intermittent fasting (IF) in liver, muscle, and brain tissue. They describe common and distinct pathways altered under IF across tissues using different analysis approaches. The main conclusions presented are the variability in responses across tissues with IF. Some common pathways were observed, but there were notable distinctions between tissues.

      Strengths: 

      (1) The IF study was well conducted and ran out to 4 months, which was a nice long-term design. 

      (2) The multiomics approach was solid, and additional integrative analysis was complementary to illustrate the differential pathways and interactions across tissues. 

      (3) The authors did not overstep their conclusions and imply an overreached mechanism. 

      We sincerely thank the reviewer for acknowledging the strengths of our study design and analytical approach. We aimed to strike a careful balance between comprehensive data generation and cautious interpretation, and we appreciate the recognition that our conclusions were appropriately framed within the scope of the data.

      Weaknesses: 

      The weaknesses, which are minor, include the use of only male mice and the early start (6 weeks) of the IF treatment. See specifics in the recommendations section.

      We appreciate the reviewer’s thoughtful comments. The decision to use male mice and initiate IF at 6 weeks was based on minimizing hormonal variability and capturing early adult metabolic programming. We acknowledge that sex and developmental timing are important biological variables. To address this, we are conducting parallel studies in female mice and evaluating IF initiated at later life stages. These follow-up investigations will help determine the extent to which sex and timing influence the molecular and physiological outcomes of IF.

    1. eLife Assessment

      This important study provides evidence for our understanding of HIV transmission dynamics by age and sex in Zambia during the PopART trial; by combining phylogenetic and individual-based mathematical modelling (IBM), it adds depth to the epidemiological literature and may inform more strategic allocation of HIV prevention resources in sub-Saharan Africa. The authors employ two complementary and well-established methodologies (phylogenetics and IBM), and this dual approach is a notable strength. However, the evidence supporting key conclusions is incomplete, with several claims insufficiently substantiated by the data presented. Improvements in data presentation (e.g., quantification of qualitative statements, statistical estimates, and clearer description of results) would substantially strengthen the paper.

    2. Reviewer #1 (Public review):

      Summary:

      This manuscript describes the results of phylogenetic and epidemiological modeling of the PopART community cohorts in Zambia. The current manuscript draft is methodologically strong, but needs revision to strengthen the take-home messages. As written, there are many possible take-away conclusions. For example, the agreement between IBM and phylogenetic analysis is noteworthy and provides a methodological focus. The revealed age patterns of transmission could be a focus. The effects of the PopART intervention and the consequences of a 1-year disruption could be a focus. It is important, though, that any main messages summarized by the authors are substantiated by the evidence provided and do not extrapolate beyond the data that have been generated. I recommend that the authors think deeply about what the most important, well-supported messages are and reframe the discussion and abstract accordingly.

      Strengths/weaknesses by section:

      (1) ABSTRACT

      The Abstract summarizes qualitative findings nicely, but the authors should incorporate quantitative results for all of the qualitative findings statements.

      The ending claim is not substantiated by the modeling scenarios that have been run: "targeted interventions for demographic groups such as under-35 men may be the key to finally ending HIV." It is straightforward to run this specific scenario in the model to determine whether or not this is true.

      The authors should add confidence intervals to the quantitative metrics, such as the 93.8% and 62.1% incidence reduction.

      (2) RESULTS

      The authors should check the Results section for any qualitative claims not substantiated by the analyses performed, and ensure the corresponding analyses are presented to support the claims.

      The Results and Methods describe the model's implementation of the PopART intervention differently. The Methods describes it as including VMMC, TB, and STI services, while the Results only mentions intensified HIV testing and linkage.

      A limitation of the model is that HIV disease progression is based on the ATHENA cohort in the Netherlands, which is a different HIV subtype (B) than the one in the research setting (C). The model should be configured using subtype C progression data, which have been published, or at least a sensitivity analysis should be conducted with respect to disease progression assumptions.

      In Table 2, the authors should consider adding a p-value to establish whether or not IBM and phylogenetics estimates are different.

      (3) DISCUSSION

      The literature review and comparison of study results to previously published phylogenetic studies is very nice. The authors could strengthen this by providing quantitative estimates with CIs for a more scientific comparison of the study results vs. prior studies, perhaps as a table or figure.

      The authors state that due to "the narrow geographical catchment area... The results should not be automatically extrapolated to apply to other SSA settings." The authors should exercise this caution when comparing the results to studies in South Africa and elsewhere.

      There are many other limitations to the analysis, including some mentioned above, that are not acknowledged. The authors should think carefully about what the most important limitations are and acknowledge them honestly at the end of the Discussion section.

    3. Reviewer #2 (Public review):

      Summary:

      The authors analyzed PopART data to better characterize the age and sex-specific heterosexual HIV transmission dynamics in Zambia, with the goal of allocating resources.

      Strengths:

      Important analysis to hone in on the key driver of HIV transmission in Zambia, which hopefully can be used to tune prevention efforts to maximize effect while limiting required resources. Two analytic approaches were used, and while the phylogenetic data were markedly more limited, they mirrored the simulated epidemic. The authors did a nice job reviewing the limitations of the data and the analyses. The authors did a nice job of providing analyses to support their goals and hypothesis, and this work may have more impact now that resources in SSA for HIV prevention and treatment may become more scarce

      Weaknesses:

      To increase the impact and utility of this work, it would be helpful to parse the analysis just a bit further to estimate the roles of undiagnosed vs diagnosed and untreated subpopulations on this transmission. PopART is a multifaceted intervention, but the cost, effort, and approach to reengagement in care vs testing/treatment can be quite different.